Friday, April 27, 2012

Pass-by-value and C++11



One of the first things you learn in an introductory C++ class - at least,
you should - is that immutable, non-primitive parameters should be passed
by const reference instead of by value. This avoids unnecessary copies and
ensures better performance.

In C++11, however, the introduction of move constructs means this isn't
always necessarily the case. Consider the following:

class Foo
{
    std::string m_str;
   
public:
    Foo(const std::string& str)
    : m_str(str)
    {
    }
};

Foo foo("hello");

How many string copies are performed here? Just one - within the Foo()
constructor. Prior to C++11 this was the best we could do, but with the
advent of rvalue references, we can do the following:

class Bar
{
   std::string m_str;
 
public:
   Bar(std::string str)
   : m_str(std::move(str))
   {
   }
};

Bar bar("hello");

This code requires zero string copies. First the string is constructed,
then (because implicit constructions result in an r-value) std::string's
move constructor is invoked for Bar's parameter. Finally, the move constructor
is explicitly invoked to again move the result to Bar's m_str member.

Assuming that the move operation is lightweight and the copy operation is
heavyweight, this means it's actually faster to pass by value in this
circumstance. In fact, if the assumptions about relative performance of
copy vs move hold true, passing by value is always as performant, and often
significantly better, than passing by const reference, when a copy will
be made of the parameter:

std::string myStr = "hello";
Foo foo(myStr);             // one copy
Foo foo("hello");           // one copy
Bar bar(myStr);             // one copy
Bar bar("hello");           // zero copies
Bar bar(std::move(myStr));  // zero copies

Passing by const reference, however, is still faster in cases when the
parameter doesn't need to be stored, only read. For example:

void Foo(const std::string& str)
{
    std::cout << str;
}

void Bar(std::string str)
{
    std::cout << str;
}

std::string myStr = "hello";
Foo(myStr);             // zero copies
Foo("hello");           // zero copies
Bar(myStr);             // one copy
Bar("hello");           // zero copies
Bar(std::move(myStr));  // zero copies

Unfortunately, this characteristic makes it harder to design an API that is
both clean and efficient. The decision to pass by value or reference must be
made at the interface level, and yet the efficiency of each approach is
determined by implementation details.

There are a few approaches I've considered. The first is all or nothing; pick
a technique and use it everywhere. This naturally favors passing by reference, as
get operations are typically far more common than set operations.

A second is to broadly classify functions based on the likelihood of their
implementation details. For example, constructors and setters are very likely
to store a copy of their parameters, so they should be accept parameters by
value, and everywhere else can accept them by reference.

A third is to always tailor the interface to the implementation. This will
result in the best performance, if the interfaces are well maintained. It has
the downside of being inconsistent, and the potential for conflicts where
two derived classes implement different behavior. (In practice, I would expect
this to be extremely rare.)



No comments:

Post a Comment