Friday, April 27, 2012

Pass-by-value and C++11



One of the first things you learn in an introductory C++ class - at least,
you should - is that immutable, non-primitive parameters should be passed
by const reference instead of by value. This avoids unnecessary copies and
ensures better performance.

In C++11, however, the introduction of move constructs means this isn't
always necessarily the case. Consider the following:

class Foo
{
    std::string m_str;
   
public:
    Foo(const std::string& str)
    : m_str(str)
    {
    }
};

Foo foo("hello");

How many string copies are performed here? Just one - within the Foo()
constructor. Prior to C++11 this was the best we could do, but with the
advent of rvalue references, we can do the following:

class Bar
{
   std::string m_str;
 
public:
   Bar(std::string str)
   : m_str(std::move(str))
   {
   }
};

Bar bar("hello");

This code requires zero string copies. First the string is constructed,
then (because implicit constructions result in an r-value) std::string's
move constructor is invoked for Bar's parameter. Finally, the move constructor
is explicitly invoked to again move the result to Bar's m_str member.

Assuming that the move operation is lightweight and the copy operation is
heavyweight, this means it's actually faster to pass by value in this
circumstance. In fact, if the assumptions about relative performance of
copy vs move hold true, passing by value is always as performant, and often
significantly better, than passing by const reference, when a copy will
be made of the parameter:

std::string myStr = "hello";
Foo foo(myStr);             // one copy
Foo foo("hello");           // one copy
Bar bar(myStr);             // one copy
Bar bar("hello");           // zero copies
Bar bar(std::move(myStr));  // zero copies

Passing by const reference, however, is still faster in cases when the
parameter doesn't need to be stored, only read. For example:

void Foo(const std::string& str)
{
    std::cout << str;
}

void Bar(std::string str)
{
    std::cout << str;
}

std::string myStr = "hello";
Foo(myStr);             // zero copies
Foo("hello");           // zero copies
Bar(myStr);             // one copy
Bar("hello");           // zero copies
Bar(std::move(myStr));  // zero copies

Unfortunately, this characteristic makes it harder to design an API that is
both clean and efficient. The decision to pass by value or reference must be
made at the interface level, and yet the efficiency of each approach is
determined by implementation details.

There are a few approaches I've considered. The first is all or nothing; pick
a technique and use it everywhere. This naturally favors passing by reference, as
get operations are typically far more common than set operations.

A second is to broadly classify functions based on the likelihood of their
implementation details. For example, constructors and setters are very likely
to store a copy of their parameters, so they should be accept parameters by
value, and everywhere else can accept them by reference.

A third is to always tailor the interface to the implementation. This will
result in the best performance, if the interfaces are well maintained. It has
the downside of being inconsistent, and the potential for conflicts where
two derived classes implement different behavior. (In practice, I would expect
this to be extremely rare.)



Wednesday, April 25, 2012

Designing a data-driven engine, part II - Error handling

In my experience one of the first things that needs to be decided when starting a new project is how to handle errors. The method chosen tends to have a large effect on code style and design decisions which is very difficult to reverse. In particular, it's very difficult to turn non-exception safe code into exception safe code, so if you are going to use exceptions in the project, it's best to start using them right away.

I'm a big fan of exceptions in general, but I would not simply use them for that reason. Most game engines don't use C++ exceptions, and for a good reason: the best behavior for most games when encountering a fatal error is to crash immediately. Exceptions also introduce some performance overhead, and if you're never going to recover from an error, there's no upside. Finally, exceptions can be problematic when debugging, as they can hide the true source of errors, often producing crash dumps which are far removed from the actual error site.

Nonetheless, for my 3d engine (working name 'spire'), I decided to use C++ exceptions. Here are the reasons why:

  1. I want to be able to recover from some errors. I want to be able to do things like edit resources and script code in one window and watch the immediate effects in another. If the engine crashes at the slightest error, that will be very difficult. And in general, I won't know whether or not an error is recoverable at the error site; that decision should be made in a place with more contextual knowledge of what the program state is.  (As an example, when reloading a resource dynamically I probably don't want the program to crash, but when loading certain critical resources at startup, I probably do. Both are likely to involve some of the same code paths.)
  2. For non-recoverable errors, I want to fail fast. Exceptions actually make this fairly simple; all you have to do is not catch an exception and the program will crash at the throw site. 
  3. I need to be able to test failures. I consider testing failure conditions as important as acceptance tests, and although there are ways to unit test crashes, they're pretty messy compared to ASSERT_THROW. Because this is a one man project, I don't have a QA team to catch regressions, so I need to rely heavily on automated testing. 
  4. My target platforms are PCs where I don't need to worry much about the code bloat incurred by enabling exceptions. In terms of actual performance degradation, it's negligible in most circumstances, and any particular bottlenecks can be designated as noexcept.
The next  issue is what types of exceptions to throw. My preference is to have a unique exception type for each  error condition that can occur. There are two main reasons for this: first, it gives me the ability to write unit tests expecting a very specific error to occur. I've caught bugs before which would have been missed if I was simply testing for failure, instead of for a specific exception. Second, when writing recovery code, it gives me the flexibility to be as specific as I need to be about which kinds of errors should be caught and recovered from.

Defining all of the specialized exception classes isn't hard; they are all specializations of a template called Error. Here is an example implementation:

template <typename Tag, typename Base = std::runtime_error>
class Error : public Base
{
};

typedef Error<struct _ResourceError> ResourceError;
typedef Error<struct _XmlParsingError, ResourceError> XmlParsingError;

My actual implementation is slightly more complex as it utilizes boost::exception and has constructors which take an error message. But the concept, and the ability to define new specialization with a simple typedefs, remains the same.

One final thing I like about exceptions - as I mentioned earlier, turning non-exception safe code  into exception safe code is very difficult. But the opposite is not hard at all. If I want to have a build profile where exceptions are disabled and I crash at every throw site, that's doable by using a throw macro, i.e.:

#ifdef USING_EXCEPTIONS
#define THROW(x) throw (x)
#else
#define THROW(x) *(static_cast<char*>(nullptr)) = 0
#endif

If I ever want to port to a platform where exceptions are too expensive - such as a mobile device where I need to reduce the executable size - I can use this technique to turn off error recovery and make every throw into an immediate crash.


Monday, April 23, 2012

New Nyx video

This includes a rudimentary water implementation and some further tweaks to the world generation algorithm. I also captured what it looks like during loading, so it's easier to see how node generation is prioritized.


Designing a data-driven engine, part I - Properties

I've started a new project which is fairly ambitious - I'm building a new 3d engine from scratch. My goal is to make something flexible enough so that I can easily build new demos and potentially integrate some of my existing projects under a common framework, so that I don't have to rewrite so much boilerplate.

One of my major goals for this project is to emphasize data driven design wherever possible. This is sort of an ambiguous buzzword at this point, so let me clarify my understanding of "data-driven": wherever possible, the program's behavior should be specified at run-time, rather than at compile-time. There are several advantages, and some disadvantages, to this approach:

  • You can have the same compiled codebase run multiple, potentially very different, applications. This reduces the need to fork projects and have multiple versions of the same code floating around. As I want to use this engine for many small projects, the last thing I want to do is to have to repeat myself.
  • Extremely rapid turnaround time is possible. In fact, I intend on implementing the ability to reload many kinds of resources without needing to restart the application; for example, I could edit shader code in one window and watch the effects of my changes in real-time in another.
  • Collaboration with non-programmers is greatly facilitated. The more that can be done without needing to edit C++ code, the more work I can delegate in a group project.
  • Validation is potentially easier. Tracking down bugs in C++ code is hard; tracking down certain types of bugs in an XML document can be done automatically by running it through an XML validator.
  • Performance is potentially decreased. You need to load and parse the data, instead of the compiler doing it for you in a design where everything is hard-coded. Loose coupling of components means fewer optimizations are available to the compiler, as more abstractions (such as virtual method calls) are required to glue the components together.
  • Huge amounts of glue code are required. Serialization code is tedious, bug prone, and therefore requires a lot of test coverage.
The last point is particularly concerning for me, as since this is a hobby project by a single programmer, I need to be constantly aware of how much time various features will require to implement. Much of the benefits of a data driven engine are to reduce the time I need to spend implementing new features, but if that is outweighed by the time spent writing glue code, I gain nothing from it.

If I were to write a similar engine in C# or Java, I wouldn't have to worry about glue code at all. Those languages provide reflection mechanisms which make the problem trivial, at least in the general case. The problem is C++'s complete lack of reflection. I decided that rather than writing glue code for every class in my engine, it would be far simpler to create a basic reflection system, and then write glue once. This should, in theory, also decrease the number of bugs in my program, because there's much less code to test.

I don't need a full reflection system, nor would I really want one, given the complexity of C++. Something which is capable of reflecting a small subset of data types would be more than sufficient. I decided on the following list:
  • Booleans
  • Integers
  • Reals (i.e. floats or doubles)
  • Strings
  • Collections (supporting Enumerate, Filter, and Grow operations)
  • Dictionaries (with string keys only)
  • Objects
To clarify the last point, I'm using "object" in the sense that it might be used in a language such as Javascript: essentially, a dictionary where the set of valid keys are fixed and always present.

I am not going to go into the implementation details in this post, but here's a few examples from the unit tests for the property system:

    bool b = false;
    Property p = b;
    p.AsBool().Set(true);
    ASSERT_EQ(b, true);
    b = false;
    ASSERT_EQ(p.AsBool().Get(), false);
This illustrates the basic operation for value types. A Property has refernece semantics and can be assigned directly to any supported value type. Integers, reals, and strings work in much the same way as bools - they provide get and set operations. Collections are more interesting:

    std::vector<int> v;
    v.push_back(0);
    v.push_back(1);
    v.push_back(2);
    Property p = v;
    ASSERT_EQ(p.AsCollection().GetSize(), 3);
    Property new_int = p.AsCollection().Grow();
    new_int.AsInt().Set(3);
    ASSERT_EQ(v.size(), 4);
    ASSERT_EQ(v.back(), 3);

As you can see, you can grow a collection, but you can't specify what value you're adding to it. That's because you don't know what the actual type is held by the collection, only that it can be encapsulated by a property. This implies that only collections of default-constructible types can be represented as properties. Here's how you can enumerate through a collection:

    std::vector<int> v;
    v.push_back(0);
    v.push_back(1);
    v.push_back(2);
    std::vector<int> o;
    Property p = v;
    p.AsCollection().Enumerate([&] (size_t, Property p)
    {
        return o.push_back(static_cast<int>(p.AsInt().Get())), true;
    });
    ASSERT_EQ(o.size(), 3);
    ASSERT_EQ(o[0], 0);
    ASSERT_EQ(o[1], 1);
    ASSERT_EQ(o[2], 2);
The return value for the enumerator allows you to short-circuit the enumeration. Filtering works in a similar manner - returning false removes the item from the collection.

Dictionaries are similar to collections, but for key-value stores. The keys must be STL strings.
    std::map<std::string, int> m;
    m["zero"] = 0;
    m["one"] = 1;
    m["two"] = 2;
    Property p = m;
    ASSERT_EQ(p.Type(), PropertyTypes::Dictionary);
    ASSERT_NO_THROW(p.AsDictionary());
    ASSERT_EQ(p.AsDictionary().Get("zero").AsInt().Get(), 0);
    ASSERT_EQ(p.AsDictionary().Get("one").AsInt().Get(), 1);
    ASSERT_EQ(p.AsDictionary().Get("two").AsInt().Get(), 2);
Dictionaries implement the same basic operations as collections - grow, enumerate, and filter.

Objects are a bit more complex. An object maps to a C++ composite type, but as C++ lacks reflection it's necessary to specify a field mapping. This is done using macros:

struct Foo
{
   int a;
   float b;
   bool c;
   std::string d;
};

BEGIN_PROPERTY_MAP(Foo)
    DEF_PROPERTY(a)
    DEF_PROPERTY(b)
    DEF_PROPERTY(c)
    DEF_PROPERTY(d)
END_PROPERTY_MAP()

Foo f;
Property p = f;
p.AsObject().Get("a").AsInt().Set(2);
p.AsObject().Get("b").AsReal().Set(12.0);
p.AsObject().Get("c").AsBool().Set(true);
p.AsObject().Get("d").AsString().Set("hello");
ASSERT_EQ(f.a, 2);
ASSERT_EQ(f.b, 12.0f);
ASSERT_EQ(f.c, true);
ASSERT_STREQ(f.d.c_str(), "hello");
It's also possible to specify a property mapping from within a type. Doing so allows it to be encapsulated as a property polymorphically, and to inherit the property mappings from parent classes. This does require that the type inherit from the PropertyHost interface.

Within an object, it's possible to specify a property mapping which maps to a pair of getter and setter functions instead of an actual member. This facilitates mapping complex data types which cannot be directly encapsulated by a Property, so long as they can be converted to another representation, such as a string. Here is an example:

class X
{
public:
   int value;
   int Get() const
   {
        return value;
   }
   void Set(int v)
   {
   value = v;
   }
};

X x;
x.value = 0;
auto pw = common::PropertyWrapper<int, X, &X::Get, &X::Set>(x);
Property p = pw;
ASSERT_EQ(p.AsInt().Get(), 0);
x.value = 1;
ASSERT_EQ(p.AsInt().Get(), 1);
p.AsInt().Set(2);
ASSERT_EQ(p.AsInt().Get(), 2);
ASSERT_EQ(x.value, 2);
This can also be done via a macro called DEF_PROPERTY_WRAPPER() within the BEGIN_PROPERTY_MAP()/END_PROPERTY_MAP() macros.

With the property library in place, I no longer need to write glue code for each serializable class. Instead, all I have to do is write glue code once for properties, and then make sure each class can be represented by a Property object. (In most cases this is simply a matter of providing a property mapping.) I wrote a class called XmlSerializer, based on rapidxml, which implements XML serialization of properties:
struct CompositeOuter
{
    struct CompositeInner
    {
        int a;
    };
    std::vector<CompositeInner> inners;
};

BEGIN_PROPERTY_MAP(CompositeOuter)
    DEF_PROPERTY(inners)
END_PROPERTY_MAP()

BEGIN_PROPERTY_MAP(CompositeOuter::CompositeInner)
    DEF_PROPERTY(a)
END_PROPERTY_MAP()

    std::string src =
        "<object>"
        "   <inners>"
        "       <item><a>1</a></item>"
        "       <item><a>2</a></item>"
        "   </inners>"
        "</object>";
    CompositeOuter obj;
    src >> XmlSerializer(obj);
    ASSERT_EQ(obj.inners.size(), 2);
    ASSERT_EQ(obj.inners[0].a, 1);
    ASSERT_EQ(obj.inners[1].a, 2);
}
As you can see, composition of data types is allowed so long as they can all be encapsulated by Property. This is handy because often a vector or dictionary of structs is needed to serialize a class.

I should say a few words on the performance of this code. As it is designed to be used in a high performance game engine, there are zero dynamic memory allocations used by the property library. (This doesn't count things such as growing a std::vector wrapped in a property, obviously.) The only major overhead is a virtual method call required by each invocation of a Property method. However, since serialization is something not likely to occur on a per-frame basis, I don't consider that a significant performance cost.

The source code is available on github.




Thursday, April 12, 2012

Rebuilding the world, one iteration at a time

I haven't been happy with the shape of the worlds produced by Nyx, my voxel terrain engine. Previously, I had attempted to focus on more realistic terrain at geological scales. This didn't really work; the terrain was neither realistic, nor could the engine handle true geological scale while retaining high local detail, even with a 5 level octree. At the same time, I wanted to produce environments which could plausibly pass for something in a real video game.

Then I saw this image floating around - it's one of World of Warcraft's continents drawn from a massive distance. At this scale, you can get a pretty good sense of how the world is shaped; wide-open zones are bounded by sheer, impassable mountains, arranged in something like an irregular grid. I decided to use this concept as the basis for my new algorithm.

My terrain engine is based on this article from GPU Gems 3. As in their implementation, the shape of the world is determined by a density function which is evaluated in a pixel shader. All of the following code snippets are from this shader.

Step one was to create land: a large. flat plain. I already had some noise-based texturing code from my original algorithm which I left in place.

float height = 100.0f; // Slightly above "sea level", i.e. y=0
    output.density = -(pos.y - height);

Next, to produce mountains, I added a sine function:

float mountain = (1.0f + sin((pos.x / 1000.0f) + (pos.z / 1000.0f))) / 2.0f;
    float height = 100.0f + (mountain * 1000.0f);

The basic problem with a sine wave is there's no flat part to it. However, applying a power function to the result sharpens the mountains by flattening out the low parts:

float mountain = (1.0f + sin((pos.x / 4000.0f) + (pos.z / 1000.0f) + 
                                 (simplex_noise(pos2D / 5000.0f) * 5.0f))) 
                     / 2.0f;

This produces mountains moving along a single axis, but one of the observations I had about WoW is that its zones are roughly arranged in a grid. By adding a second mountain range based on sin(x-y) I get a grid effect:

float mountain1 = (1.0f + sin((pos.x / 2000.0f) + (pos.z / 2000.0f)))
                      / 2.0f;

    float mountain2 = (1.0f + sin((pos.x / 2000.0f) - (pos.z / 2000.0f)))
                      / 2.0f;

    mountain1 = pow(mountain1, 5.0f);
    mountain2 = pow(mountain2, 5.0f);
    float height = 100.0f + (mountain1 * 2000.0f) + (mountain2 * 2000.0f);

Now that I have the basic shape of the mountains specified, I can add a noise term to make them irregular:

float mountainTerm1 = (pos.x / 4000.0f) + (pos.z / 4000.0f) + simplex_noise(pos2D / 9000.0f) * 5.0f;
    float mountain1 = pow((1.0f + sin(mountainTerm1)) / 2.0f, 5.0f);

    float mountainTerm2 = (pos.x / 4000.0f) + (pos.z / 4000.0f) + simplex_noise(pos2D / 7000.0f) * 5.0f;
    float mountain2 = pow((1.0f + sin(mountainTerm2)) / 2.0f, 5.0f);

    float height = 100.0f + (mountain1 * 1000.0f) + (mountain2 * 1000.0f);

Much better. However, because the basic shape was a closed grid, even after applying noise the resulting zones are completely closed off from one another. To counteract this, I'll add another factor, again based on a sine function, which will have the effect of canceling out mountains:

float mountainTerm1 = (pos.x / 4000.0f) + (pos.z / 4000.0f) + simplex_noise(pos2D / 6000.0f) * 3.0f;
    float mountain1 = pow((1.0f + sin(mountainTerm1)) / 2.0f, 5.0f);

    float mountainTerm2 = (pos.x / 4000.0f) - (pos.z / 4000.0f) + simplex_noise(pos2D / 6000.0f) * 3.0f;
    float mountain2 = pow((1.0f + sin(mountainTerm2)) / 2.0f, 5.0f);

    float canyonTerm = (pos.x / 2000.0f) - (pos.z / 2000.0f) + simplex_noise(pos2D / 4000.0f) * 3.0f;
    float canyon = 1.0f - pow((1.0f + sin(canyonTerm)) / 2.0f, 2.0f);

    float height = 100.0f + (mountain1 * canyon * 1000.0f) + (mountain2 * canyon * 1000.0f);

The mountain ranges are starting to look pretty good. There are clearly identifiable zones bounded by sheer walls, with passes between them allowing travel between zones. It's time to add some more detail - one of the isuses with working at this perspective is that without much low-level noise, it's impossible to perceive scale. By passing the density value through a sign filter, we can achieve what I like to call the "Minecraft effect":

output.density = sign(output.density);

With this filter in place, individual voxels can be visualized. This makes it much easier to perceive the finer effects of modifying the density function. The next step is to add some rivers - these will essentially be cuts into the ground, yet again based on a sine function to get a nice curving effect. I also added a "feature term", responsible for low-impact, medium-scale features within the open areas.

float riverTerm = (pos.x / 3000.0f) + (pos.z / 3000.0f) + simplex_noise(pos2D / 7000.0f) * 5.0f;
    float river = pow((1.0f + sin(riverTerm)) / 2.0f, 9.0f);

    float height = 100.0f + (mountain1 * valley * 1000.0f) + (mountain2 * valley * 1000.0f) - (river * 50.0f);

At this point I was pretty satisfied with the high level shape of the world, so my next task was to add some detail. I turned the sign filter back off. I also made a quick modification to the material function so that mountains would be textured differently from the grassy plains:

float dirt = saturate((mountain1 + mountain2) * 2.0f);
    float grass = 1.0f - dirt;

(Note: after this point, there won't be any more code snippets, as I did a lot of minor tweaking in between iterations.) The next step was to add some low-level noise, multiplied against the mountain factors so that the grasslands would not be affected:
The reason this looks bad is that there's only a single octave of noise; the human eye immediately perceives the frequency of the detail as unnatural. Adding a few more octaves of noise largely fixes this problem:

Although I haven't yet implemented water, by lowering the height arbitrarily selected as "sea level" I was able to cause low terrain to clip through the bottom of the domain of the marching cubes function; with the skybox being a greyish blue, this made for a decent interim water.

At this point I was very happy with the land, but I wanted continental features. This was simply a matter of sampling noise at an extremely low frequency, however the result was a lot of mountains poking up through what should have been deep ocean. I got around this by using the ocean term as a multiplier with the mountain terms, so that mountains fall off in the ocean:
Also visible in the above picture is the sand textures. Because I have function terms corresponding to geographical features such as "mountain", "river", "ocean", it's trivial to assign textures to match these regions. Oceans are sandy, while rivers are muddy.
Here's what the end result looks like. Although it's not nearly as regular as the human-design continents of Warcraft, it's much closer than what I started with. I'm definitely happy with the results overall.

As a side note: when compiled, the pixel shader for the world generator uses 2706 instruction slots. I'm kind of amazed that my GPU doesn't choke on it, but I haven't noticed any slowdowns. The voxel generation pipeline is limited to processing a maximum of four nodes per frame in any case, which keeps the frame rate consistent at around 120 near sea level and 60 at maximum elevation, with a delta of roughly 20 FPS when nodes are being generated.