CodeFreeze: 2012

Tuesday, May 8, 2012

Using Gmock with move-only return types

With the introduction of move semantics in C++11, noncopyable types are far more common. While the STL has largely been updated to support move-only types, this is not yet the case for the multiverse of available C++ libraries. One such library is googlemock. Consider the following:

class Foo  
{  
public:  
    MOCK_METHOD0(Clone, std::unique_ptr<Foo>());  
};

This will not compile, because the return type of mocked functions must be copyable. A workaround I have been using for simple cases is to use an adapter function:

class Foo  
{  
public:  
    MOCK_METHOD0(Clone_, Foo*());  
    std::unique_ptr<Foo> Clone()  
    {  
      return std::unique_ptr<Foo>(Clone_());  
    }  
};

This isn't pretty but it works for simple cases where the return object can be easily constructed from some other type. That isn't always possible, as I discovered today when trying to mock a function that returns a unique_future. There's no way of getting a value into a unique_future without getting it from a promise, packaged task, or another future. (The constructor needed to do so is private, and only accessible by friend classes.)

The workaround I came up with is to define a container type which implements move-on-copy for the contained object:

 template <typename T>  
 class Mover  
 {  
 public:  
   Mover(T&& obj)  
   : m_obj(std::move(obj)), m_valid(true)  
   {  
   }  
   
   Mover(const Mover<T>& rhs)  
   : m_obj(const_cast<T&&>(rhs.m_obj)), m_valid(true)  
   {  
     assert(rhs.m_valid);  
     rhs.m_valid = false;  
   }  
   
   Mover& operator=(const Mover& rhs)  
   {  
     assert(rhs.m_valid);  
     m_obj = const_cast<T&&>(rhs.m_obj);  
     rhs.m_valid = false;  
     m_valid = true;  
   }  
   
   bool Valid() const  
   {  
     return m_valid;  
   }  
   
   T& Get()  
   {  
     assert(m_valid);  
     return m_obj;  
   }  
   
   const T& Get() const  
   {  
     assert(m_valid);  
     return *m_obj;  
   }  
   
 private:  
   T m_obj;  
   mutable bool m_valid;  
 };  
   
 template <typename T>  
 inline Mover<T> Movable(T&& obj)  
 {  
   return Mover<T>(std::move(obj));  
 }

It goes without saying that this is dangerous territory (obviously, any time const_cast is involved, red flags should be going up.) The m_valid field is used to verify that once a Mover has been copied from, any attempt to access the contained object (which is now invalidated as it is in a moved-from state) will result in an assertion.
And now we can use this in our mock:

class Foo  
 {  
 public:  
   MOCK_METHOD0(Bar_, Mover<boost::unique_future<int>>());  
   boost::unique_future<int> Bar()  
   {  
     return std::move(Bar_()).Get());  
   }  
 };  
   
 TEST(FooTest, BarReturnsTheRightFuture)  
 {  
   Foo foo;  
   boost::promise<int> p;  
   EXPECT_CALL(f, Bar_()).Times(1).WillOnce(Return(Movable(p.get_future())));  
 }

Conceivably this technique could be used in other situations where you need to pass a move-only type through a framework. The validity check offers some semblance of safety, but really, this is likely not suitable for production code. Good enough for unit testing, though.

Friday, May 4, 2012

Designing a data-driven engine, part 3: Blueprints

Designing a data-driven engine, part 3: Blueprints

In part 1 I described the reflection system which allows C++ types to be serialized and deserialized from storage formats such as XML. Today I will talk about the system built on top of the properties library which forms the framework used by the rest of the engine to construct XML-defined objects at runtime.

My goals for this engine are influenced heavily by my experiences working with the Starcraft 2 engine, which is impressively data-driven. In SC2, nearly every aspect of the game is defined using XML files (with the remainder largely handled by a custom scripting language), which allows for an enormous amount of flexibility. Here's an example of what an XML node defining a unit might look like in SC2*:

* This is simplified pseudo-code, not an actual example.

<CUnit name="marine">
<weapon>machine_gun</weapon>
<hitPoints>45</hitPoints>
<armor>1</armor>
</CUnit>
<CUnit name="tough_marine" parent="marine">
<hitPoints>100</hitPoints>
<armor>2</armor>
</CUnit>
<CRangedWeapon name="machine_gun">
<damage>4</damage>
<range>3</range>
</CRangedWeapon>

A couple of observations:

The node name appears to indicate the concrete C++ class to construct. For some entity types, there are actually multiple classes which implement the same type - for example, "CRangedWeapon" and "CMeleeWeapon" are both weapon definitions. How does the engine identify the entity type (i.e. "weapon")? In SC2, all entity definitions are located within a single file particular to that entity, i.e. Weapon.xml.
The schema can be slightly different for each class. For example, CRangedWeapon can have a "range" property which CMeleeWeapon might not have. Both would have all of the properties of CWeapon.
Entities can inherit from other entities. This is a really useful feature to have.

The most basic method for constructing an object at runtime would be to define a property mapping for it for it, store the parsed XML tree in memory, and then deserialize it after construction:

Unit* marine = new Unit();
XmlDocument.GetNodeNamed("unit", "marine") >> XmlSerializer(*marine);

There are a few problems with this approach:

Naively, it does not work with multiple classes implementing a single entity type. What if "marine" should be implemented by InfantryUnit instead of Unit? This isn't too hard to overcome - you simply need to look at the XML first, check the node name, and then either use hardcoded known values or some sort of factory system to create the appropriate type.
It's not very fast. Even if we store the parsed XML tree we still need to do things like string to int conversions. And while the properties library is fairly optimized, it still has overhead which should be limited where possible.
It prevents data sharing. If the entity definition contains a string or collection, that would be duplicated in every object created. If we're creating thousand of entities, that could be a real source of inefficiency.
Validation occurs when the object is constructed, instead of when the entity definitions are loaded. It's much easier to find bugs when they manifest as early as possible.

I went with a slightly more complex approach. For each entity, I define a Blueprint class, which contains the properties which need to be serialized. The Blueprint also contains the code to create the appropriate object, via a Construct method. Essentially, blueprints are implementations of the factory pattern, coupled with a reflection system.

Blueprints are managed by a global BlueprintService. This service is responsible for reading in XML and maintaining a repository of blueprints objects. Constructing an entity is reduced to:

Unit* marine = blueprintService.Construct<Unit>("marine");

There are a few nifty things about blueprints. You can have multiple blueprint types associated with a single C++ type - i.e., in the above sample, "marine" could actually be constructed by an InfantryUnitBlueprint. Because the property system also supports inheritance, you can have a UnitBlueprint class which defines properties common to all units, then multiple subclasses such as InfantryUnitBlueprint, VehicleUnitBlueprint, etc. Blueprints can inherit from each other at the data level as well, i.e. "marine_captain" inherits all the properties from "marine", but increases the hitPoints property.

At some point in the future I might elaborate more on how blueprints are implemented, but for now here's some sample code from the unit tests to demonstrate how blueprints are defined and interact with other components:

 namespace  
 {  
   enum AnimalRole  
   {  
     AnimalRole_Wild,  
     AnimalRole_FoodSource,  
     AnimalRole_Pet,  
     AnimalRole_Intelligent  
   };  
   
   BEGIN_ENUM_LABELS(AnimalRole)  
     DEF_ENUM_LABEL("wild", AnimalRole_Wild)  
     DEF_ENUM_LABEL("foodSource", AnimalRole_FoodSource)  
     DEF_ENUM_LABEL("pet", AnimalRole_Pet)  
     DEF_ENUM_LABEL("intelligent", AnimalRole_Intelligent)  
   END_ENUM_LABELS()  
   
   class AnimalBlueprint : public BlueprintInterface<class Animal>  
   {  
   public:  
     std::string scientificName;  
     AnimalRole role;  
     BEGIN_HOST_PROPERTY_MAP()  
       DEF_PROPERTY(scientificName)  
       DEF_PROPERTY(role)  
     END_HOST_PROPERTY_MAP()  
   };  
   
   class Animal  
   {  
   public:  
     std::string scientificName;  
     AnimalRole role;  
   
     Animal(const class AnimalBlueprint& blueprint)  
     : scientificName(blueprint.scientificName),  
      role(blueprint.role)  
     {  
     }  
   
     virtual ~Animal() = 0 { };  
   };  
   
   class BirdBlueprint : public BasicBlueprint<BirdBlueprint, Animal, AnimalBlueprint>  
   {  
   public:  
     virtual std::unique_ptr<Animal> Construct() const;  
   
     bool canFly;  
     BEGIN_HOST_PROPERTY_MAP()  
       INHERIT_PROPERTIES(AnimalBlueprint)  
       DEF_PROPERTY(canFly)  
     END_HOST_PROPERTY_MAP()  
   };  
   
   class Bird : public Animal  
   {  
   public:  
     bool canFly;  
   
     Bird(const class BirdBlueprint& blueprint)  
     : Animal(blueprint), canFly(blueprint.canFly)  
     {  
     }  
   };  
   
   inline std::unique_ptr<Animal> BirdBlueprint::Construct() const  
   {  
     return std::unique_ptr<Animal>(new Bird(*this));  
   }  
   
   class MammalBlueprint : public BasicBlueprint<MammalBlueprint, Animal, AnimalBlueprint>  
   {  
   public:  
     virtual std::unique_ptr<Animal> Construct() const;  
   
     int legs;  
     BEGIN_HOST_PROPERTY_MAP()  
       INHERIT_PROPERTIES(AnimalBlueprint)  
       DEF_PROPERTY(legs)  
     END_HOST_PROPERTY_MAP()  
   };  
   
   class Mammal : public Animal  
   {  
   public:  
     int legs;  
   
     Mammal(const MammalBlueprint& blueprint)  
     : Animal(blueprint), legs(blueprint.legs)  
     {  
     }  
   };  
   
   inline std::unique_ptr<Animal> MammalBlueprint::Construct() const  
   {  
     return std::unique_ptr<Animal>(new Mammal(*this));  
   }  
 }  
   
 TEST(BlueprintTests, BlueprintsCanBeLoadedFromXml)  
 {  
   auto svc = GetFramework().Acquire<BlueprintServiceFactory>("DefaultBlueprintService").Construct();  
   svc->Register("mammal", std::unique_ptr<Blueprint>(new MammalBlueprint()));  
   svc->Register("bird", std::unique_ptr<Blueprint>(new BirdBlueprint()));  
     
   const char* xml =  
     "<animals>\n"  
     "  <mammal name=\"man\">\n"  
     "    <scientificName>Homo sapien</scientificName>\n"  
     "    <legs>2</legs>\n"  
     "    <role>intelligent</role>"  
     "  </mammal>\n"  
     "  <mammal name=\"dog\">\n"  
     "    <scientificName>Canis familiaris</scientificName>\n"  
     "    <legs>4</legs>\n"  
     "    <role>pet</role>"  
     "  </mammal>\n"  
     "  <bird name=\"raven\">\n"  
     "    <scientificName>Corvus corax</scientificName>\n"  
     "    <canFly>1</canFly>\n"  
     "    <role>wild</role>\n"  
     "  </bird>\n"  
     "  <bird name=\"chicken\">\n"  
     "    <scientificName>Gallus gallus domesticus</scientificName>\n"  
     "    <canFly>0</canFly>\n"  
     "    <role>foodSource</role>\n"  
     "  </bird>\n"  
     "</animals>\n";  
   std::vector<char> buf(xml, xml + strlen(xml) + 1);  
   svc->Parse(std::move(buf));  
   
   auto man = svc->Acquire<Animal>("man").Construct();  
   ASSERT_STREQ("Homo sapien", man->scientificName.c_str());  
   ASSERT_EQ(2, dynamic_cast<Mammal*>(man.get())->legs);  
   ASSERT_EQ(AnimalRole_Intelligent, man->role);  
   auto dog = svc->Acquire<Animal>("dog").Construct();  
   ASSERT_STREQ("Canis familiaris", dog->scientificName.c_str());  
   ASSERT_EQ(4, dynamic_cast<Mammal*>(dog.get())->legs);  
   ASSERT_EQ(AnimalRole_Pet, dog->role);  
   auto raven = svc->Acquire<Animal>("raven").Construct();  
   ASSERT_STREQ("Corvus corax", raven->scientificName.c_str());  
   ASSERT_TRUE(dynamic_cast<Bird*>(raven.get())->canFly);  
   ASSERT_EQ(AnimalRole_Wild, raven->role);  
   auto chicken = svc->Acquire<Animal>("chicken").Construct();  
   ASSERT_STREQ("Gallus gallus domesticus", chicken->scientificName.c_str());  
   ASSERT_FALSE(dynamic_cast<Bird*>(chicken.get())->canFly);  
   ASSERT_EQ(AnimalRole_FoodSource, chicken->role);  
 }

Friday, April 27, 2012

Pass-by-value and C++11

One of the first things you learn in an introductory C++ class - at least,
you should - is that immutable, non-primitive parameters should be passed
by const reference instead of by value. This avoids unnecessary copies and
ensures better performance.

In C++11, however, the introduction of move constructs means this isn't
always necessarily the case. Consider the following:

class Foo

{

    std::string m_str;

   

public:

    Foo(const std::string& str)

    : m_str(str)

    {

    }

};



Foo foo("hello");

How many string copies are performed here? Just one - within the Foo()
constructor. Prior to C++11 this was the best we could do, but with the
advent of rvalue references, we can do the following:

class Bar

{

   std::string m_str;

  

public:

   Bar(std::string str)

   : m_str(std::move(str))

   {

   }

};



Bar bar("hello");

This code requires zero string copies. First the string is constructed,
then (because implicit constructions result in an r-value) std::string's
move constructor is invoked for Bar's parameter. Finally, the move constructor
is explicitly invoked to again move the result to Bar's m_str member.

Assuming that the move operation is lightweight and the copy operation is
heavyweight, this means it's actually faster to pass by value in this
circumstance. In fact, if the assumptions about relative performance of
copy vs move hold true, passing by value is always as performant, and often
significantly better, than passing by const reference, when a copy will
be made of the parameter:

std::string myStr = "hello";

Foo foo(myStr);             // one copy

Foo foo("hello");           // one copy

Bar bar(myStr);             // one copy

Bar bar("hello");           // zero copies

Bar bar(std::move(myStr));  // zero copies

Passing by const reference, however, is still faster in cases when the
parameter doesn't need to be stored, only read. For example:



void Foo(const std::string& str)

{

    std::cout << str;

}



void Bar(std::string str)

{

    std::cout << str;

}



std::string myStr = "hello";

Foo(myStr);             // zero copies

Foo("hello");           // zero copies

Bar(myStr);             // one copy

Bar("hello");           // zero copies

Bar(std::move(myStr));  // zero copies

Unfortunately, this characteristic makes it harder to design an API that is
both clean and efficient. The decision to pass by value or reference must be
made at the interface level, and yet the efficiency of each approach is
determined by implementation details.

There are a few approaches I've considered. The first is all or nothing; pick
a technique and use it everywhere. This naturally favors passing by reference, as
get operations are typically far more common than set operations.

A second is to broadly classify functions based on the likelihood of their
implementation details. For example, constructors and setters are very likely
to store a copy of their parameters, so they should be accept parameters by
value, and everywhere else can accept them by reference.

A third is to always tailor the interface to the implementation. This will
result in the best performance, if the interfaces are well maintained. It has
the downside of being inconsistent, and the potential for conflicts where
two derived classes implement different behavior. (In practice, I would expect
this to be extremely rare.)

Wednesday, April 25, 2012

Designing a data-driven engine, part II - Error handling

In my experience one of the first things that needs to be decided when starting a new project is how to handle errors. The method chosen tends to have a large effect on code style and design decisions which is very difficult to reverse. In particular, it's very difficult to turn non-exception safe code into exception safe code, so if you are going to use exceptions in the project, it's best to start using them right away.

I'm a big fan of exceptions in general, but I would not simply use them for that reason. Most game engines don't use C++ exceptions, and for a good reason: the best behavior for most games when encountering a fatal error is to crash immediately. Exceptions also introduce some performance overhead, and if you're never going to recover from an error, there's no upside. Finally, exceptions can be problematic when debugging, as they can hide the true source of errors, often producing crash dumps which are far removed from the actual error site.

Nonetheless, for my 3d engine (working name 'spire'), I decided to use C++ exceptions. Here are the reasons why:

I want to be able to recover from some errors. I want to be able to do things like edit resources and script code in one window and watch the immediate effects in another. If the engine crashes at the slightest error, that will be very difficult. And in general, I won't know whether or not an error is recoverable at the error site; that decision should be made in a place with more contextual knowledge of what the program state is. (As an example, when reloading a resource dynamically I probably don't want the program to crash, but when loading certain critical resources at startup, I probably do. Both are likely to involve some of the same code paths.)
For non-recoverable errors, I want to fail fast. Exceptions actually make this fairly simple; all you have to do is not catch an exception and the program will crash at the throw site.
I need to be able to test failures. I consider testing failure conditions as important as acceptance tests, and although there are ways to unit test crashes, they're pretty messy compared to ASSERT_THROW. Because this is a one man project, I don't have a QA team to catch regressions, so I need to rely heavily on automated testing.
My target platforms are PCs where I don't need to worry much about the code bloat incurred by enabling exceptions. In terms of actual performance degradation, it's negligible in most circumstances, and any particular bottlenecks can be designated as noexcept.

The next issue is what types of exceptions to throw. My preference is to have a unique exception type for each error condition that can occur. There are two main reasons for this: first, it gives me the ability to write unit tests expecting a very specific error to occur. I've caught bugs before which would have been missed if I was simply testing for failure, instead of for a specific exception. Second, when writing recovery code, it gives me the flexibility to be as specific as I need to be about which kinds of errors should be caught and recovered from.

Defining all of the specialized exception classes isn't hard; they are all specializations of a template called Error. Here is an example implementation:

template <typename Tag, typename Base = std::runtime_error>

class Error : public Base

{

};



typedef Error<struct _ResourceError> ResourceError;

typedef Error<struct _XmlParsingError, ResourceError> XmlParsingError;

My actual implementation is slightly more complex as it utilizes boost::exception and has constructors which take an error message. But the concept, and the ability to define new specialization with a simple typedefs, remains the same.

One final thing I like about exceptions - as I mentioned earlier, turning non-exception safe code into exception safe code is very difficult. But the opposite is not hard at all. If I want to have a build profile where exceptions are disabled and I crash at every throw site, that's doable by using a throw macro, i.e.:

#ifdef USING_EXCEPTIONS

#define THROW(x) throw (x)

#else

#define THROW(x) *(static_cast<char*>(nullptr)) = 0

#endif

If I ever want to port to a platform where exceptions are too expensive - such as a mobile device where I need to reduce the executable size - I can use this technique to turn off error recovery and make every throw into an immediate crash.

Monday, April 23, 2012

New Nyx video

This includes a rudimentary water implementation and some further tweaks to the world generation algorithm. I also captured what it looks like during loading, so it's easier to see how node generation is prioritized.

Designing a data-driven engine, part I - Properties

I've started a new project which is fairly ambitious - I'm building a new 3d engine from scratch. My goal is to make something flexible enough so that I can easily build new demos and potentially integrate some of my existing projects under a common framework, so that I don't have to rewrite so much boilerplate.

One of my major goals for this project is to emphasize data driven design wherever possible. This is sort of an ambiguous buzzword at this point, so let me clarify my understanding of "data-driven": wherever possible, the program's behavior should be specified at run-time, rather than at compile-time. There are several advantages, and some disadvantages, to this approach:

You can have the same compiled codebase run multiple, potentially very different, applications. This reduces the need to fork projects and have multiple versions of the same code floating around. As I want to use this engine for many small projects, the last thing I want to do is to have to repeat myself.
Extremely rapid turnaround time is possible. In fact, I intend on implementing the ability to reload many kinds of resources without needing to restart the application; for example, I could edit shader code in one window and watch the effects of my changes in real-time in another.
Collaboration with non-programmers is greatly facilitated. The more that can be done without needing to edit C++ code, the more work I can delegate in a group project.
Validation is potentially easier. Tracking down bugs in C++ code is hard; tracking down certain types of bugs in an XML document can be done automatically by running it through an XML validator.
Performance is potentially decreased. You need to load and parse the data, instead of the compiler doing it for you in a design where everything is hard-coded. Loose coupling of components means fewer optimizations are available to the compiler, as more abstractions (such as virtual method calls) are required to glue the components together.
Huge amounts of glue code are required. Serialization code is tedious, bug prone, and therefore requires a lot of test coverage.

The last point is particularly concerning for me, as since this is a hobby project by a single programmer, I need to be constantly aware of how much time various features will require to implement. Much of the benefits of a data driven engine are to reduce the time I need to spend implementing new features, but if that is outweighed by the time spent writing glue code, I gain nothing from it.

If I were to write a similar engine in C# or Java, I wouldn't have to worry about glue code at all. Those languages provide reflection mechanisms which make the problem trivial, at least in the general case. The problem is C++'s complete lack of reflection. I decided that rather than writing glue code for every class in my engine, it would be far simpler to create a basic reflection system, and then write glue once. This should, in theory, also decrease the number of bugs in my program, because there's much less code to test.

I don't need a full reflection system, nor would I really want one, given the complexity of C++. Something which is capable of reflecting a small subset of data types would be more than sufficient. I decided on the following list:

Booleans
Integers
Reals (i.e. floats or doubles)
Strings
Collections (supporting Enumerate, Filter, and Grow operations)
Dictionaries (with string keys only)
Objects

To clarify the last point, I'm using "object" in the sense that it might be used in a language such as Javascript: essentially, a dictionary where the set of valid keys are fixed and always present.

I am not going to go into the implementation details in this post, but here's a few examples from the unit tests for the property system:

    bool b = false;
    Property p = b;
    p.AsBool().Set(true);
    ASSERT_EQ(b, true);
    b = false;
    ASSERT_EQ(p.AsBool().Get(), false);

This illustrates the basic operation for value types. A Property has refernece semantics and can be assigned directly to any supported value type. Integers, reals, and strings work in much the same way as bools - they provide get and set operations. Collections are more interesting:

    std::vector<int> v;
    v.push_back(0);
    v.push_back(1);
    v.push_back(2);
    Property p = v;
    ASSERT_EQ(p.AsCollection().GetSize(), 3);
    Property new_int = p.AsCollection().Grow();
    new_int.AsInt().Set(3);
    ASSERT_EQ(v.size(), 4);
    ASSERT_EQ(v.back(), 3);

As you can see, you can grow a collection, but you can't specify what value you're adding to it. That's because you don't know what the actual type is held by the collection, only that it can be encapsulated by a property. This implies that only collections of default-constructible types can be represented as properties. Here's how you can enumerate through a collection:

    std::vector<int> v;
    v.push_back(0);
    v.push_back(1);
    v.push_back(2);
    std::vector<int> o;
    Property p = v;
    p.AsCollection().Enumerate([&] (size_t, Property p)
    {
        return o.push_back(static_cast<int>(p.AsInt().Get())), true;
    });
    ASSERT_EQ(o.size(), 3);
    ASSERT_EQ(o[0], 0);
    ASSERT_EQ(o[1], 1);
    ASSERT_EQ(o[2], 2);

The return value for the enumerator allows you to short-circuit the enumeration. Filtering works in a similar manner - returning false removes the item from the collection.

Dictionaries are similar to collections, but for key-value stores. The keys must be STL strings.

    std::map<std::string, int> m;
    m["zero"] = 0;
    m["one"] = 1;
    m["two"] = 2;
    Property p = m;
    ASSERT_EQ(p.Type(), PropertyTypes::Dictionary);
    ASSERT_NO_THROW(p.AsDictionary());
    ASSERT_EQ(p.AsDictionary().Get("zero").AsInt().Get(), 0);
    ASSERT_EQ(p.AsDictionary().Get("one").AsInt().Get(), 1);
    ASSERT_EQ(p.AsDictionary().Get("two").AsInt().Get(), 2);

Dictionaries implement the same basic operations as collections - grow, enumerate, and filter.

Objects are a bit more complex. An object maps to a C++ composite type, but as C++ lacks reflection it's necessary to specify a field mapping. This is done using macros:

struct Foo
{
   int a;
   float b;
   bool c;
   std::string d;
};

BEGIN_PROPERTY_MAP(Foo)
    DEF_PROPERTY(a)
    DEF_PROPERTY(b)
    DEF_PROPERTY(c)
    DEF_PROPERTY(d)
END_PROPERTY_MAP()

Foo f;
Property p = f;
p.AsObject().Get("a").AsInt().Set(2);
p.AsObject().Get("b").AsReal().Set(12.0);
p.AsObject().Get("c").AsBool().Set(true);
p.AsObject().Get("d").AsString().Set("hello");
ASSERT_EQ(f.a, 2);
ASSERT_EQ(f.b, 12.0f);
ASSERT_EQ(f.c, true);
ASSERT_STREQ(f.d.c_str(), "hello");

It's also possible to specify a property mapping from within a type. Doing so allows it to be encapsulated as a property polymorphically, and to inherit the property mappings from parent classes. This does require that the type inherit from the PropertyHost interface.

Within an object, it's possible to specify a property mapping which maps to a pair of getter and setter functions instead of an actual member. This facilitates mapping complex data types which cannot be directly encapsulated by a Property, so long as they can be converted to another representation, such as a string. Here is an example:

class X
{
public:
   int value;
   int Get() const
   {
        return value;
   }
   void Set(int v)
   {
   value = v;
   }
};

X x;
x.value = 0;
auto pw = common::PropertyWrapper<int, X, &X::Get, &X::Set>(x);
Property p = pw;
ASSERT_EQ(p.AsInt().Get(), 0);
x.value = 1;
ASSERT_EQ(p.AsInt().Get(), 1);
p.AsInt().Set(2);
ASSERT_EQ(p.AsInt().Get(), 2);
ASSERT_EQ(x.value, 2);

This can also be done via a macro called DEF_PROPERTY_WRAPPER() within the BEGIN_PROPERTY_MAP()/END_PROPERTY_MAP() macros.

With the property library in place, I no longer need to write glue code for each serializable class. Instead, all I have to do is write glue code once for properties, and then make sure each class can be represented by a Property object. (In most cases this is simply a matter of providing a property mapping.) I wrote a class called XmlSerializer, based on rapidxml, which implements XML serialization of properties:

struct CompositeOuter
{
    struct CompositeInner
    {
        int a;
    };
    std::vector<CompositeInner> inners;
};

BEGIN_PROPERTY_MAP(CompositeOuter)
    DEF_PROPERTY(inners)
END_PROPERTY_MAP()

BEGIN_PROPERTY_MAP(CompositeOuter::CompositeInner)
    DEF_PROPERTY(a)
END_PROPERTY_MAP()

    std::string src =
        "<object>"
        "   <inners>"
        "       <item><a>1</a></item>"
        "       <item><a>2</a></item>"
        "   </inners>"
        "</object>";
    CompositeOuter obj;
    src >> XmlSerializer(obj);
    ASSERT_EQ(obj.inners.size(), 2);
    ASSERT_EQ(obj.inners[0].a, 1);
    ASSERT_EQ(obj.inners[1].a, 2);
}

As you can see, composition of data types is allowed so long as they can all be encapsulated by Property. This is handy because often a vector or dictionary of structs is needed to serialize a class.

I should say a few words on the performance of this code. As it is designed to be used in a high performance game engine, there are zero dynamic memory allocations used by the property library. (This doesn't count things such as growing a std::vector wrapped in a property, obviously.) The only major overhead is a virtual method call required by each invocation of a Property method. However, since serialization is something not likely to occur on a per-frame basis, I don't consider that a significant performance cost.

The source code is available on github.

Thursday, April 12, 2012

Rebuilding the world, one iteration at a time

I haven't been happy with the shape of the worlds produced by Nyx, my voxel terrain engine. Previously, I had attempted to focus on more realistic terrain at geological scales. This didn't really work; the terrain was neither realistic, nor could the engine handle true geological scale while retaining high local detail, even with a 5 level octree. At the same time, I wanted to produce environments which could plausibly pass for something in a real video game.

Then I saw this image floating around - it's one of World of Warcraft's continents drawn from a massive distance. At this scale, you can get a pretty good sense of how the world is shaped; wide-open zones are bounded by sheer, impassable mountains, arranged in something like an irregular grid. I decided to use this concept as the basis for my new algorithm.

My terrain engine is based on this article from GPU Gems 3. As in their implementation, the shape of the world is determined by a density function which is evaluated in a pixel shader. All of the following code snippets are from this shader.

Step one was to create land: a large. flat plain. I already had some noise-based texturing code from my original algorithm which I left in place.

float height = 100.0f; // Slightly above "sea level", i.e. y=0
    output.density = -(pos.y - height);

Next, to produce mountains, I added a sine function:

float mountain = (1.0f + sin((pos.x / 1000.0f) + (pos.z / 1000.0f))) / 2.0f;
    float height = 100.0f + (mountain * 1000.0f);

The basic problem with a sine wave is there's no flat part to it. However, applying a power function to the result sharpens the mountains by flattening out the low parts:

float mountain = (1.0f + sin((pos.x / 4000.0f) + (pos.z / 1000.0f) + 
                                 (simplex_noise(pos2D / 5000.0f) * 5.0f))) 
                     / 2.0f;

This produces mountains moving along a single axis, but one of the observations I had about WoW is that its zones are roughly arranged in a grid. By adding a second mountain range based on sin(x-y) I get a grid effect:

float mountain1 = (1.0f + sin((pos.x / 2000.0f) + (pos.z / 2000.0f)))
                      / 2.0f;

    float mountain2 = (1.0f + sin((pos.x / 2000.0f) - (pos.z / 2000.0f)))
                      / 2.0f;

    mountain1 = pow(mountain1, 5.0f);
    mountain2 = pow(mountain2, 5.0f);
    float height = 100.0f + (mountain1 * 2000.0f) + (mountain2 * 2000.0f);

Now that I have the basic shape of the mountains specified, I can add a noise term to make them irregular:

float mountainTerm1 = (pos.x / 4000.0f) + (pos.z / 4000.0f) + simplex_noise(pos2D / 9000.0f) * 5.0f;
    float mountain1 = pow((1.0f + sin(mountainTerm1)) / 2.0f, 5.0f);

    float mountainTerm2 = (pos.x / 4000.0f) + (pos.z / 4000.0f) + simplex_noise(pos2D / 7000.0f) * 5.0f;
    float mountain2 = pow((1.0f + sin(mountainTerm2)) / 2.0f, 5.0f);

    float height = 100.0f + (mountain1 * 1000.0f) + (mountain2 * 1000.0f);

Much better. However, because the basic shape was a closed grid, even after applying noise the resulting zones are completely closed off from one another. To counteract this, I'll add another factor, again based on a sine function, which will have the effect of canceling out mountains:

float mountainTerm1 = (pos.x / 4000.0f) + (pos.z / 4000.0f) + simplex_noise(pos2D / 6000.0f) * 3.0f;
    float mountain1 = pow((1.0f + sin(mountainTerm1)) / 2.0f, 5.0f);

    float mountainTerm2 = (pos.x / 4000.0f) - (pos.z / 4000.0f) + simplex_noise(pos2D / 6000.0f) * 3.0f;
    float mountain2 = pow((1.0f + sin(mountainTerm2)) / 2.0f, 5.0f);

    float canyonTerm = (pos.x / 2000.0f) - (pos.z / 2000.0f) + simplex_noise(pos2D / 4000.0f) * 3.0f;
    float canyon = 1.0f - pow((1.0f + sin(canyonTerm)) / 2.0f, 2.0f);

    float height = 100.0f + (mountain1 * canyon * 1000.0f) + (mountain2 * canyon * 1000.0f);

The mountain ranges are starting to look pretty good. There are clearly identifiable zones bounded by sheer walls, with passes between them allowing travel between zones. It's time to add some more detail - one of the isuses with working at this perspective is that without much low-level noise, it's impossible to perceive scale. By passing the density value through a sign filter, we can achieve what I like to call the "Minecraft effect":

output.density = sign(output.density);

With this filter in place, individual voxels can be visualized. This makes it much easier to perceive the finer effects of modifying the density function. The next step is to add some rivers - these will essentially be cuts into the ground, yet again based on a sine function to get a nice curving effect. I also added a "feature term", responsible for low-impact, medium-scale features within the open areas.

float riverTerm = (pos.x / 3000.0f) + (pos.z / 3000.0f) + simplex_noise(pos2D / 7000.0f) * 5.0f;
    float river = pow((1.0f + sin(riverTerm)) / 2.0f, 9.0f);

    float height = 100.0f + (mountain1 * valley * 1000.0f) + (mountain2 * valley * 1000.0f) - (river * 50.0f);

At this point I was pretty satisfied with the high level shape of the world, so my next task was to add some detail. I turned the sign filter back off. I also made a quick modification to the material function so that mountains would be textured differently from the grassy plains:

float dirt = saturate((mountain1 + mountain2) * 2.0f);
    float grass = 1.0f - dirt;

(Note: after this point, there won't be any more code snippets, as I did a lot of minor tweaking in between iterations.) The next step was to add some low-level noise, multiplied against the mountain factors so that the grasslands would not be affected:

The reason this looks bad is that there's only a single octave of noise; the human eye immediately perceives the frequency of the detail as unnatural. Adding a few more octaves of noise largely fixes this problem:

Although I haven't yet implemented water, by lowering the height arbitrarily selected as "sea level" I was able to cause low terrain to clip through the bottom of the domain of the marching cubes function; with the skybox being a greyish blue, this made for a decent interim water.

At this point I was very happy with the land, but I wanted continental features. This was simply a matter of sampling noise at an extremely low frequency, however the result was a lot of mountains poking up through what should have been deep ocean. I got around this by using the ocean term as a multiplier with the mountain terms, so that mountains fall off in the ocean:

Also visible in the above picture is the sand textures. Because I have function terms corresponding to geographical features such as "mountain", "river", "ocean", it's trivial to assign textures to match these regions. Oceans are sandy, while rivers are muddy.

Here's what the end result looks like. Although it's not nearly as regular as the human-design continents of Warcraft, it's much closer than what I started with. I'm definitely happy with the results overall.

As a side note: when compiled, the pixel shader for the world generator uses 2706 instruction slots. I'm kind of amazed that my GPU doesn't choke on it, but I haven't noticed any slowdowns. The voxel generation pipeline is limited to processing a maximum of four nodes per frame in any case, which keeps the frame rate consistent at around 120 near sea level and 60 at maximum elevation, with a delta of roughly 20 FPS when nodes are being generated.

Wednesday, February 22, 2012

Generic constructors

C++ gives us two ways of enabling implicit type conversions - single-parameter constructors and overloaded conversion operators. Both of those, however, are limited in that they must be defined within the class that the conversion is occurring from or two. As C++ does not have the concept of partial classes this can be an undesirable limitation.

Generic constructors are a handy idiom which can be used to circumvent this limitation. A generic constructor in C++11 looks like this:

class Foo
{
public:
    Foo();

    Foo(Foo&&);

    template <typename T>
    Foo(T&& param)
    : Foo(make_foo(std::forward<T>(param))
    {
    }
};

First, a couple of requirements - either a copy or move constructor is required for this idiom to work. (This isn't a hard requirement, strictly speaking, but forwarding to the move constructor is the cleanest implementation.) A move constructor is preferable due to better performance. Second, some other sort of constructor is necessary so that it's possible to implement make_foo() functions. A default constructor works fine. Once we've defined this class, we can now define make_foo() functions for all of the implicit conversions
we'd like to take place. For example:

Foo make_foo(int i)
{
    Foo f;
    f.set_value(i);
    return std::move(f);
}

Foo make_foo(std::string s)
{
    Foo f;
    f.set_value(s);
    return std::move(f);
}

The key advantage here is that make_foo can be defined somewhere else in the code, possibly not even in the same library. If Foo is provided in one library, and Bar in another, we can define a make_foo(Bar) function which allows Foo to be implicitly constructed from Bar, even though neither class is visible to the other.

Although implicit conversions are something to be avoided in general, in particular cases this can be extremely useful. I use this pattern in polymorphic_collections to allow the various adapters (such as enumerator) to be extensible, so if you need to interface with a non-STL collections library such as Qt, it's as straightforward as writing an adapter implementation and make_enumerator function.

Monday, February 13, 2012

Constructing static class members on-demand

A little pattern I like to use when I have resources shared among all instances of a class which should only be constructed when an object of the class is instantiated (and destroyed with the last object):

class Foo
{
private:
    struct SharedProperties
    {
        std::unique_ptr baz;
    };
    static std::weak_ptr<SharedProperties> m_weakShared;
    std::shared_ptr<SharedProperties> m_shared;
};

Foo::Foo()
{
    if (!m_weakShared.expired())
    {
        m_shared = m_weakShared.lock();
    }
    else
    {
        m_shared = std::make_shared<SharedProperties>();
        m_weakShared = std::weak_ptr<SharedProperties>(m_shared);

        // Initialize the shared properties here
        m_shared->baz.reset(new Baz());
    }
}

Introducing polymorphic_collections: enumerator

Polymorphic_collections is a C++ library which provides a set of wrapper objects which expose an interface for performing operations on a collection without specifying the underlying type of the collection. For example, enumerator<> is an adapter which allows a collection to be enumerated through:

int sum(enumerator<int> e)
{
     int result = 0;
     while (auto i = e.next())
     {
      // Note: i is a boost::optional<int&>, so we need to dereference it to get
      // at the value.
      result += *i;
     }
}

We can then call this function using enumerator<> to wrap a variety of collection types:

std::vector<int> v;
sum(v);

std::list<int> l;
sum(l);

int a[10];
sum(a);

Using enumerator<> allows some of the same flexibility in design that templates provide, but in locations where it would be impossible or unfeasible to use templates. For example, virtual functions:

struct Foo
{
     virtual void Baz(enumerator<int> e) = 0;
};

Enumerator is an extremely simple interface. It exposes only a single method, next(). The return type of next() is boost::optional<T&>, where T is enumerator's template parameter. When the enumerator has exhausted the collection, it will return an empty optional object. There is no mechanism for resetting an enumerator, retrieving an object without advancing the enumeration, or traversing the collection in any other way. Enumerators cannot be copied, but they can be moved.

Enumerators can wrap collections of a different type, so long as a reference of the enumerator's type can be taken of the collection's type. For example:

std::vector<int> v;
enumerator<const int> e1 = v; // Okay, const int& = int is allowed
enumerator<float> e2 = v; // Error, float& = int is not allowed

struct Base { };
struct Derived : public Base { };
std::vector<Derived> v;
enumerator<Base> e = v; // Okay, Base& = Derived is allowed

Enumerators normally allow modification of the objects within the collection. You can prevent this by specifying a const type as the enumerator's template parameter.

std::vector<int> v;
enumerator<int> e = v;
*e.next() = 1;        // Equivalent to *v.begin() = 1

enumerator<const int> e = v;
*e.next() = 1;        // Error, can't assign to const int&

Normally, an enumerator contains only a reference to the underlying collection. If assigned to an rvalue, the enumerator can embed the collection in itself. This is handy for returning enumerators from functions:

enumerator<int> foo()
{
     std::vector<int> v;
     v.push_back(0);
     v.push_back(1);
     v.push_back(2);
     return std::move(v);      // Note: without the std::move, it will return an enumerator
            // referencing a temporary, which will crash.
}

Any STL-compatible collection exposing an iterator-based interface can be embedded in an enumerator.

Enumerators can encapsulate pointers with the size-specified at runtime, via the make_enumerator function:

int ar[5];
int* ptr = ar;
enumerator<int> e = make_enumerator(ptr, sizeof(ar) / sizeof(ar[0]));

Enumerators can wrap functions and function-like objects:

int a = 0, b = 1;
enumerator<int> fibo = [&] () -> boost::optional<int>
{
     int r = b + a;
     a = b;
     b = r;
     return r;
}

In order to be wrapped by enumerator, the callable object must return either a boost::optional<T> or a boost::optional<T&>. If it returns a value, then next() will return a reference to a temporary which will be destroyed on the following call to next().

Enumerators can be made thread-safe by specifying a locking policy via a template parameter:

void worker(enumerator<Task, atomic>& tasks)
{
     while (auto task = tasks.next())
     {
      // Perform task...
     }
}

void manager(enumerator<Task> tasks)
{
     // Create a locking enumerator from a non-locking one.
     // As enumerators cannot be copied, this requires consuming the
     // original enumerator.
     enumerator<Task, atomic> atomicTasks = std::move(tasks);
     
     for (size_t i = 0; i < THREAD_LIMIT; i++)
     {
      spawn([] () { worker(atomicTasks); });
     }
}

The atomic policy inserts a mutex lock into every call to next(). Because of its simple interface design, it is extremely easy to write thread-safe code using enumerator.

Sunday, February 5, 2012

Using intrusive_ptr with COM

Writing COM code in C++ can be greatly simplified by using a reference-counting smart pointers. Microsoft provides an implementation in the ATL (CComPtr), however I typically don't use ATL in my projects and aren't interested in introducing it as a dependency just for the sake of a single utility class. Boost provides a lightweight reference-counting smart pointer in the form of intrusive_ptr, although it takes a few tweaks to get it to play nice with COM. First, we need to write a couple of functions to tell intrusive_ptr how to interface with COM objects. Since all COM objects are derived from IUnknown, which also defines the reference-counting interface, this is trivial:

inline void intrusive_ptr_add_ref(IUnknown* ptr)
{
    ptr->AddRef();
}

inline void intrusive_ptr_release(IUnknown* ptr)
{
    ptr->Release();
}

With these defined, intrusive_ptr will play nice with COM objects. However, getting them into the intrusive_ptr is a little trickier. The standard technique for COM functions is to take as a parameter a pointer-to-pointer, which, if the operation is successful, will be assigned to the address of the object being created or acquired. Also, incrementing the object's reference count is handled by the callee, not the caller. This behavior is the opposite of what intrusive_ptr expects. Both problems can be solved via a small helper class and function:

template <typename T>
class WrapPtr
{
public:
    WrapPtr(boost::intrusive_ptr<T>& ref)
    : m_ref(ref), m_ptr(0)
    {
    }
   
    ~WrapPtr()
    {
        // The second parameter indicates that the reference count should not be incremented
        m_ref = boost::intrusive_ptr(m_ptr, false);
    }
   
    operator T**()
    {
        return &m_ptr;
    }
   
    operator void**()
    {
        // Some COM functions ask for a pointer to void pointer, such as QueryInterface
        return reinterpret_cast<void**>(&m_ptr);
    }
   
private:
    T* m_ptr;
    boost::intrusive_ptr<T> m_ref;
};

template <typename T>
WrapPtr<T> AttachPtr(boost::intrusive_ptr<T>& ref)
{
    return WrapPtr<T>(ref);
}

The WrapPtr class acts as a proxy for the intrusive_ptr object when calling COM functions. It is designed to be used in the following manner:

D3D11_TEXTURE3D_DESC desc;
boost::intrusive_ptr<ID3D11Texture3D> texture;
HRESULT hr = m_d3dDevice->CreateTexture3D(&desc, NULL, AttachPtr(texture));

Using intrusive_ptr for RAII vastly reduces the verbosity involved in interfacing with COM objects, and makes it much easier to write exception safe code.

Tuesday, January 31, 2012

Generating Stack Traces from C++ Exceptions

C++ exceptions can be a double edged sword. While they are a fantastic tool for gracefully recovering from errors, when the error cannot be recovered from or is completely unexpected, the nature of exception handling tends to obfuscate the cause of the error. Effective error analysis, particularly in a postmortem environment, requires information about both the direct cause of the error (known at the throw site, and captured in both the exception type and error message) and proximate cause, which requires information layered throughout the call stack to determine. C++'s standard exception model lacks the facilities to capture this type of information.

Tools such as boost::exception, which allow additional information to be attached to exceptions as they propagate upward, can help, but only if the programmers are can be bothered enclose every function in a try-catch block which embeds context information and rethrows the exception. That's far from an ideal solution.

In many other programming languages with exception-based error handling, there exists a mechanism to extract a stack trace from the exception object. Typically that's sufficient context to analyze the error. Unfortunately because there is no standard C++ ABI, this can't be written into the language specification. Nonetheless, on many platforms it is possible to implement call stack capturing within the C++ exception model. Here I will focus on Windows.

Windows provides a robust API (DbgHelp) for debugging as part of the core Win32 SDK. We can use this to implement a class to save the current call stack as an array of 64-bit frame pointers. We can also implement a function to take that array of frame pointers and transform it into a human-readable stack trace - if PDBs are available, it will include symbol names. Otherwise, it will simply list the frame pointer addresses, which are still useful (for example, when debugging a minidump.)

class sym_handler
{
public:
    static sym_handler& get_instance()
    {
        static sym_handler instance;
        return instance;
    }

    std::string get_symbol_info(uint64_t addr)
    {
        std::stringstream ss;
        DWORD64 displacement64;
        DWORD displacement;
        char symbol_buffer[sizeof(SYMBOL_INFO) + 256];
        SYMBOL_INFO* symbol = reinterpret_cast<SYMBOL_INFO*>(symbol_buffer);
        symbol->SizeOfStruct = sizeof(SYMBOL_INFO);
        symbol->MaxNameLen = 255;

        IMAGEHLP_LINE64 line;
        line.SizeOfStruct = sizeof(IMAGEHLP_LINE64);

        ss << boost::format("[0x%08X] ") % addr;
        if (m_initialized)
        {
            if (SymFromAddr(GetCurrentProcess(),
                            addr,
                            &displacement64,
                            symbol))
            {
                ss << symbol->Name;
                if (SymGetLineFromAddr64(GetCurrentProcess(),
                                            addr,
                                            &displacement,
                                            &line))
                {
                    ss << (boost::format(" (%s:%d)") % line.FileName % line.LineNumber).str();
                }
            }
        }
        return ss.str();
    }

    void capture_stack_trace(CONTEXT* context, uint64_t* frame_ptrs, size_t count, size_t skip)
    {
        if (m_initialized)
        {
            CONTEXT current_context;
            if (!context)
            {
                RtlCaptureContext(&current_context);
                context = &current_context;
            }

            DWORD machine_type;
            STACKFRAME64 frame;
            ZeroMemory(&frame, sizeof(frame));
            frame.AddrPC.Mode = AddrModeFlat;
            frame.AddrFrame.Mode = AddrModeFlat;
            frame.AddrStack.Mode = AddrModeFlat;
#ifdef _M_X64
            frame.AddrPC.Offset = context->Rip;
            frame.AddrFrame.Offset = context->Rbp;
            frame.AddrStack.Offset = context->Rsp;
            machine_type = IMAGE_FILE_MACHINE_AMD64;
#else
            frame.AddrPC.Offset = context->Eip;
            frame.AddrPC.Offset = context->Ebp;
            frame.AddrPC.Offset = context->Esp;
            machine_type = IMAGE_FILE_MACHINE_I386;
#endif
            for (size_t i = 0; i < count + skip; i++)
            {
                if (StackWalk64(machine_type,
                                GetCurrentProcess(),
                                GetCurrentThread(),
                                &frame,
                                context,
                                NULL,
                                SymFunctionTableAccess64,
                                SymGetModuleBase64,
                                NULL))
                {
                    if (i >= skip)
                    {
                        frame_ptrs[i - skip] = frame.AddrPC.Offset;
                    }
                }
                else
                {
                    break;
                }
            }
        }
    }

private:
    sym_handler()
    {
        m_initialized = SymInitialize(GetCurrentProcess(), NULL, TRUE) == TRUE;
    }

    ~sym_handler()
    {
        if (m_initialized)
        {
            SymCleanup(GetCurrentProcess());
            m_initialized = false;
        }
    }

    bool m_initialized;
};

With that out of the way the first thing to do is define an exception class which will be the base for all custom exceptions used in our application. We will derive from boost::exception so we can use its mechanisms for attaching arbitrary data to exception objects. While that isn't strictly necessary - we could easily store the stack trace directly within the my_exception object - it's cleaner. We'll also define a template class which can be used to create a hierarchy of exception types without needing to write any extra code, by specifying the parent class as a template parameter.

class my_exception : virtual public std::exception,
                     virtual public boost::exception
{
public:
    virtual const char* what() const throw()
    {
        return m_message;
    }

protected:
    my_exception(const char* msg)
    {
        strcpy_s(m_message, msg);
        *this << attach_stack_trace(stack_trace(NULL, 2));
    }

    char m_message[256];
};

template <typename Tag, typename Base = my_exception>
class error : public Base
{
public:
    error(const char* msg)
    : Base(msg)
    {
    }
};

The my_exception constructor attaches a stack_trace object to itself upon construction. Here's the code for attach_stack_trace:

class stack_trace
{
public:
    stack_trace(CONTEXT* context, size_t skip)
    {
        ZeroMemory(m_frame_ptrs, sizeof(m_frame_ptrs));
        sym_handler::get_instance().capture_stack_trace(context,
                                                        m_frame_ptrs,
                                                        max_frame_ptrs,
                                                        skip);

    }

    std::string to_string() const
    {
        std::stringstream ss;
        for (size_t i = 0;
             i < max_frame_ptrs && m_frame_ptrs[i];
             ++i)
        {
            ss << sym_handler::get_instance().get_symbol_info(m_frame_ptrs[i]) << "\n";
        }
        return ss.str();
    }

private:
    static const size_t max_frame_ptrs = 16;
    uint64_t m_frame_ptrs[max_frame_ptrs];
};

typedef boost::error_info<stack_trace, stack_trace> attach_stack_trace;

inline std::string to_string(const stack_trace& trace)
{
    return trace.to_string();
}

Now we can test it out:

typedef error<struct invalid_parameter_tag> invalid_parameter_error;

void foo(int x)
{
    if (x < 0) BOOST_THROW_EXCEPTION(invalid_parameter_error("x must be >= 0"));
}

int WINAPI WinMain(HINSTANCE, HINSTANCE, LPSTR, int)
{
    try
    {
        foo(-1);
    }
    catch (std::exception& e)
    {
        OutputDebugStringA(boost::diagnostic_information(e).c_str());
    }
    return 0;
}

Because I'm using the boost::exception framework, I use the BOOST_THROW_EXCEPTION macro to throw. It's entirely possible to implement this kind of system without using boost, but I find it more convenient. In any case, here's the output:

failfast.cpp(299): Throw in function void __cdecl foo(int)
Dynamic exception type: class boost::exception_detail::clone_impl<class error<struct invalid_parameter_tag,class my_exception> >
std::exception::what: x must be >= 0
[class stack_trace * __ptr64] = [0x13F2D4664] my_exception::my_exception (c:\dev\failfast\failfast\failfast.cpp:202)
[0x13F2D4C50] error<invalid_parameter_tag,my_exception>::error<invalid_parameter_tag,my_exception> (c:\dev\failfast\failfast\failfast.cpp:214)
[0x13F2D3AF1] foo (c:\dev\failfast\failfast\failfast.cpp:299)
[0x13F2D3BE1] WinMain (c:\dev\failfast\failfast\failfast.cpp:308)
[0x13F306656] __tmainCRTStartup (f:\dd\vctools\crt_bld\self_64_amd64\crt\src\crtexe.c:547)
[0x13F30634E] WinMainCRTStartup (f:\dd\vctools\crt_bld\self_64_amd64\crt\src\crtexe.c:371)
[0x7775652D] BaseThreadInitThunk
[0x7788C521] RtlUserThreadStart

That's a fantastic amount of information. Even without PDBs, we'd still have access to the call stack frame pointers, and it's simple enough to query a debugger for the symbol names. Unfortunately there's a big problem with this approach - it only attaches the stack trace to exceptions derived from my_exception. If an exception is being thrown elsewhere, for example from STL or Boost, and it propagates to the top-level catch block, it will be neearly impossible to determine its cause.

One possible approach would be to not try to catch them at all - just let the application crash. The crash dump will have the stack trace at the throw site. If you can get away with it, this works well enough, but it precludes you from doing any cleanup. Also, if you have any catch-all-and-rethrow blocks for disposing non-RAII resources, the crash site will be at the rethrow, which generally won't have usable information.

Another approach would be to take the opposite tack and try to catch everything - surround all calls to library functions which could potentially throw with a try/catch block, then rethrow one of our custom exceptions. Although the stack trace wouldn't extend into the library code, unless the bug is within the library that wouldn't be necessary. This is good practice in general because it allows us to transform generic exceptions (such as out_of_range) into specific ones (too_many_foos_in_the_foo_queue), which we might be able to handle gracefully without terminating.

In practice, however, trying to catch all exceptions thrown by 3rd party code isn't practical. And it's precisely those which slip through the cracks which having a stack trace would be most valuable for diagnosing. Fortunately, there is a workaround that will allow us to capture stack traces for uncaught exceptions, although the process is a little arcane and dips into some of the undocumented nethers of the Win32 API. If that hasn't scared you off, here's how it works: when a C++ exception is thrown without a corresponding catch block to catch it, an SEH exception is raised. When an SEH exception is raised, Windows checks for any registered handlers (i.e. __except blocks); if none accept the exception, it calls the UnhandledExceptionFilter function.

Two things to note: the top-level exception filter is run prior to the stack being unwound. Second, we can use SetUnhandledExceptionFilter to install our own callback function. It turns out, we can also throw C++ exceptions from that callback function, and since the stack remains intact, they will propegate upward the same way that the original exception would have.

First, let's define an exception that will be thrown when an exception that would not have been caught is thrown:

typedef error<struct unhandled_exception_tag> unhandled_exception;

Next, we'll define our unhandled exception filter:

LONG WINAPI RethrowCppExceptions(EXCEPTION_POINTERS* e)
{
    if (e->ExceptionRecord->ExceptionCode == 0xe06d7363)
    {
        BOOST_THROW_EXCEPTION(unhandled_exception("Unhandled C++ exception")
                              << attach_original_exception_type(e->ExceptionRecord->ExceptionInformation[2]));
    }
    return EXCEPTION_CONTINUE_SEARCH;
}

The magic number (0xe06d7363) is the exception code assigned to SEH exceptions which are triggered from uncaught C++ exceptions being thrown. There's another part to this function which I haven't explained yet - attach_original_exception_type. One of the interesting things that happens when SEH takes over for an uncaught C++ exception is that the address of the exception's constructor is passed as a parameter to the SEH exception. Since a constructor has the same name as the type, by examining this symbol we can determine what the type of the original exception was. We can do this either at runtime (if PDBs are available) or in a postmortem debugger.

class original_exception_type
{
public:
    original_exception_type(uint64_t addr)
    : m_addr(addr)
    {
    }

    std::string to_string() const
    {
        return sym_handler::get_instance().get_symbol_info(m_addr);
    }

private:
    uint64_t m_addr;
};

typedef boost::error_info<original_exception_type, original_exception_type> attach_original_exception_type;

inline std::string to_string(const original_exception_type& type)
{
    return type.to_string();
}

All that remains is to register the unhandled exception filter. This is actually a little on the tricky side - Windows only allows one top-level exception filter to be installed at a time. And debuggers like to overwrite this, so they can be notified when a SEH exception is thrown. Which means that simply calling SetUnhandledExceptionFilter won't work when we're being debugged (although it will work otherwise.) Having radically different program behavior in a
debugger isn't acceptable, but fortunately there's an easy workaround.

int WINAPI WinMain(HINSTANCE, HINSTANCE, LPSTR, int)
{
    SetUnhandledExceptionFilter(RethrowCppExceptions);
    AddVectoredContinueHandler(1, RethrowCppExceptions);
    // ...
}

AddVectoredContinueHandler won't do anything normally - the unhandled exception filter will rethrow the exception before it can run. But when the program is being debugged, it will be called by the debugger's filter function, and can rethrow from there.

Let's verify that everything is working as expected:

void foo()
{
    std::vector<int> v;
    v.at(1);
}

int WINAPI WinMain(HINSTANCE, HINSTANCE, LPSTR, int)
{
    SetUnhandledExceptionFilter(RethrowCppExceptions);
    AddVectoredContinueHandler(1, RethrowCppExceptions);
    try
    {
        foo();
    }
    catch (my_exception& e)
    {
        OutputDebugStringA(boost::diagnostic_information(e).c_str());
    }

    return 0;
}

This program outputs:

FailFast.cpp(228): Throw in function long __cdecl RethrowCppExceptions(struct _EXCEPTION_POINTERS *)
Dynamic exception type: class boost::exception_detail::clone_impl<class error<struct unhandled_exception_tag,class my_exception> >
std::exception::what: Unhandled C++ exception
[class original_exception_type * __ptr64] = [0x790AA430] TI3?AVout_of_rangestd
[class stack_trace * __ptr64] = [0x13F4E37E6] error<unhandled_exception_tag,my_exception>::error<unhandled_exception_tag,my_exception> (c:\dev\failfast\failfast\failfast.cpp:214)
[0x13F4E2B7E] RethrowCppExceptions (c:\dev\failfast\failfast\failfast.cpp:228)
[0x7787A59F] vsprintf_s
[0x7786DE03] RtlDeleteTimer
[0x778B1278] KiUserExceptionDispatcher
[0x7FEFDE1CACD] RaiseException
[0x792F1345] CxxThrowException
[0x7904F976] std::_Xout_of_range
[0x13F4E2C66] WinMain (c:\dev\failfast\failfast\failfast.cpp:309)
[0x13F4EE123] __tmainCRTStartup (f:\dd\vctools\crt_bld\self_64_amd64\crt\src\crtexe.c:547)
[0x7775652D] BaseThreadInitThunk
[0x7788C521] RtlUserThreadStart

There are a few things we need to be careful of when using this technique. First, there absolutely must be a top-level catch block in every thread. If not, any uncaught exception will result in an infinite loop. (It shouldn't be too difficult to work around this if needed - for example, if a third party library creates its own threads and throws exceptions on them without catching them. A thread-local boolean variable can track if the exception filter has already run and avoid rethrowing if true - this will simply crash the application, instead of hanging.)

Second, if you use the catch-all-and-rethrow idiom anywhere in your code, you may need to refactor it to catch only exceptions derived from my_exception. When you catch-all and rethrow, the SEH exception will be raised from the site of the last rethrow, which will greatly reduce the usefulness of the stack trace. However, you need to be careful of this type of scenario:

try
{
    try
    {
        foo();                //    throws my_exception
        my_vector.at(-1);    //    throws std::out_of_range
    }
    catch (my_exception& e)
    {
        cleanup();
        throw;
    }
}
catch (std::out_of_range&)
{
    //    handle out of range error
}

If foo throws an std::out_of_range exception, then cleanup() will never be called. You can avoid this situation by adopting a rule that you will only ever catch non-my_exception objects immediately, and if you need to defer handling them, rethrow a my_exception-derived type. For example:

try
{
    try
    {
        foo();
        try
        {
            my_vector.at(-1);
        }
        catch (std::out_of_range&)
        {
            throw my_out_of_range();
        }
    }
    catch (my_exception& e)
    {
        cleanup();
        throw;
    }
}
catch (my_out_of_range&)
{
    // handle out of range error
}

Although it can be messy, in practice this should be a very rare scenario.

Now that we have the framework in place to generate stack traces, the question becomes how to access this information. If your application is set up to use Windows Error Reporting, the easiest way to do this is to simply cause a crash in the top-level catch handler. Microsoft actually provides a function to do this - RaiseFailFastException. And since minidumps normally don't contain heap data, you'll want to first copy the stack trace to a buffer on the stack.

    catch (my_exception& e)
    {
        char buffer[4096];
        ZeroMemory(buffer, sizeof(buffer));
        try
        {
            boost::diagnostic_information(e).copy(buffer, sizeof(buffer), 0);
        }
        catch (...)
        {
        }
        OutputDebugStringA(buffer);
        RaiseFailFastException(NULL, NULL, 0);
    }

The inner try/catch block is extremely important, because without it, if boost::diagnostic_information throws, the application will hang.

We can test out our minidump debugging capabilities by forcing Windows to store local dump files. To do this, edit the registry key HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\Windows Error Reporting\LocalDumps. Add the following values, editing as needed:

Next, run the application. It will crash, as expected, and a minidump will be created in the folder specified under DumpFolder. Fire up WinDbg and open up the minidump, then press Alt+6 to open up the call stack window. You'll see something that looks like this:

Double-click on the second item, which should be the top-level catch block. Finally, to view the contents of the buffer, enter the following command:

0:000> .printf "%ma", buffer
FailFast.cpp(228): Throw in function long __cdecl RethrowCppExceptions(struct _EXCEPTION_POINTERS *)
Dynamic exception type: class boost::exception_detail::clone_impl<class error<struct unhandled_exception_tag,class my_exception> >
std::exception::what: Unhandled C++ exception
[class original_exception_type * __ptr64] = [0x7887A430] TI3?AVout_of_rangestd
[class stack_trace * __ptr64] = [0x13F8F37E6] error<unhandled_exception_tag,my_exception>::error<unhandled_exception_tag,my_exception> (c:\dev\failfast\failfast\failfast.cpp:214)
[0x13F8F2B7E] RethrowCppExceptions (c:\dev\failfast\failfast\failfast.cpp:228)
[0x777D9450] UnhandledExceptionFilter
[0x778F43B8] MD5Final
[0x778785A8] _C_specific_handler
[0x77889D0D] RtlDecodePointer
[0x778791AF] RtlUnwindEx
[0x778797A8] RtlRaiseException
[0x7FEFDE1CACD] RaiseException
[0x79071345] CxxThrowException
[0x7881F976] std::_Xout_of_range
[0x13F8F2C66] WinMain (c:\dev\failfast\failfast\failfast.cpp:309)
[0x13F8FE123] __tmainCRTStartup (f:\dd\vctools\crt_bld\self_64_amd64\crt\src\crtexe.c:547)
[0x7775652D] BaseThreadInitThunk
[0x7788C521] RtlUserThreadStart

If the minidump was generated without PDBs available, you won't get symbol names, but you can query the debugger using the ln command:

0:000> ln 13F8F2C66
c:\dev\failfast\failfast\failfast.cpp(309)+0x1e
(00000001`3f8f2c00)   FailFast!WinMain+0x66   |  (00000001`3f8f2c90)   FailFast!std::basic_string<char,std::char_traits<char>,std::allocator<char> >::basic_string<char,std::char_traits<char>,std::allocator<char> >

And now you have everything you need to track down those pesky exceptions.

Sunday, January 29, 2012

Hacking concepts into C++11

One of the challenges when writing generic code in C++ is making the compiler choose the appropriate template specialization for a type which fulfills certain requirements. Concepts would have been a powerful tool but were dropped from the final C++11 standard. In this article I'll demonstrate a technique which can emulate some of the functionality of concepts - namely, selecting a template specialization based on the presence of a method.

Let's consider a hypothetical C++ template function which accepts, as a template parameter, some type of container, which it will search for a particular value. It will return true if that value is found and false otherwise. A basic implementation might look like this:

template <typename T, typename C>
bool exists(const C& cont, const T& value)
{
    return std::find(cont.begin(), cont.end(), value) != cont.end();
}

The STL's find function is a straightforward linear search. What if the container type is a map? A binary search would be much more effective. We could handle this by providing an explicit specialization for maps:

template <typename K, typename T>
bool exists(const std::map<K, T>& cont, const K& key)
{
    return cont.find(key) != cont.end();
}

That works for map, but what about set, multimap, multiset, unordered_map, unordered_set, and so forth? Trying to anticipate all of the possible container types is generally not feasible.

What we would like to do would be to provide one specialization that can handle all containers which provide a find() method with a similar signature to that of std::map.

A bit of Googling found this solution by Aleksey Gurtovoy posted on the boost mailing list, which works in pre-C++11 compilers:

typedef char (&no_tag)[1];
typedef char (&yes_tag)[2];
template< typename T > no_tag has_member_foo_helper(...);
template< typename T >
yes_tag has_member_foo_helper(int, void (T::*)() = &T::foo);
template< typename T >
struct has_member_foo
{
    BOOST_STATIC_CONSTANT(bool
        , value = sizeof(has_member_foo_helper<T>(0)) == sizeof(yes_tag)
        );
};
struct my {};
struct her { void foo(); };
int main()
{
    BOOST_STATIC_ASSERT(!has_member_foo<my>::value);
    BOOST_STATIC_ASSERT(has_member_foo<her>::value);
    return 0;
}

We can generalize this into a macro which can be used to generate has_xxx_member metafunctions for methods marked const and taking one parameter:

#define DEF_HAS_MEMBER_CONST_1(X)                                   \
    template <typename T, typename R, typename A>                   \
    struct has_##X##_method_const                                   \
    {                                                               \
        typedef char (&no_tag)[1];                                  \
        typedef char (&yes_tag)[2];                                 \
        template <typename T_, typename R_, typename A_>            \
        static no_tag test(...);                                    \
        template <typename T_, typename R_, typename A_>            \
        static yes_tag test(int, R_ (T_::*)(A_) const = &T_::##X);  \
        static const bool value =                                   \
            sizeof(test<T, R, A>(0)) == sizeof(yes_tag);            \
    };
DEF_HAS_MEMBER_CONST_1(find)

The idea is that if T::foo doesn't exist, SFNAE will cause the no_tag function to be instantiated. Unfortunately, possibly due to a bug in Visual C++ 2010, this doesn't happen. While it correctly evaluates to true when a foo() method is available, it causes a compiler error in all other cases:

error C2065: 'foo' : undeclared identifier

I believe the correct behavior would be for that error to trigger SFNAE instead of preventing compilation. In any case, this solution isn't usable.

There are a few other problems with this approach. It requires us to know in advance the return value - for this use case, that's no problem, but for others it can be a dealbreaker. It also requires us to specific the cv-qualifiers on the method.

In compilers implementing partial C++11 support (specifically, decltype) a more robust solution is possible:

#define DEF_HAS_METHOD_1(x)          \
template <typename T, typename A>          \
struct has_##x##_method              \
{                    \
    typedef char (&yes_tag)[1];         \
    typedef char (&no_tag)[2];         \
    template <typename U>          \
    static yes_tag test(U&&, decltype(declval<U>().##x(declval<A>()))* dummy = 0);  \
    static no_tag test(...);         \
    static const bool value = sizeof(test(declval<T>())) == sizeof(yes_tag);        \
};
DEF_HAS_METHOD_1(find)

Note: if declval is available in your STL (it's missing in VC++2010), you'll want to add:

using std::declval;

Otherwise, add:

template <typename T>
static typename boost::add_rvalue_reference<T>::type declval();

This implementation has a few advantages over the earlier one. It's not necessary to specify the return type, or be extremely specific about the parameter types. If the actual method parameters are implicitly convertible from the type parameters to has_xxx_method, it will match.

Armed with this implementation, we can now implement our specializations:

template <typename T, typename U>
typename boost::disable_if<has_find_method<T, const U&>, bool>::type
    exists(const T& collection, const U& value)
{
    std::cout << "Using global find" << std::endl;
    return std::find(collection.cbegin(), collection.cend(), value) != collection.cend();
}

template <typename T, typename U>
typename boost::enable_if<has_find_method<T, const U&>, bool>::type
    exists(const T& collection, const U& value)
{
    std::cout << "Using method find" << std::endl;
    return collection.find(value) != collection.cend();
}

Note that it's necessary to disable the first specialization when the find method is available, otherwise both specializations would be active and the call would be ambiguous for any type implementing a find() method.

Let's test it out:

int main()
{
    std::vector<int> a;
    exists(a, 0);
    std::list<int> b;
    exists(b, 0);
    std::map<int, int> c;
    exists(c, 0);
    std::multimap<int, int> d;
    exists(d, 0);
    return 0;
}

And the output:

Using global find
Using global find
Using method find
Using method find

The requirement to explicitly disable the global find specialization can quickly get out of hand if we have a lot of other specializations. One idiom for dealing with this is to wrap all of the requirements in a metafunction. Let's imagine we have three variants: one to use a find() method (which is the highest priority if available), another to use an at() method (second highest priority) and a third to use the global std::find function.

namespace detail
{
    template <typename T, typename A>
    struct use_find_method
    {
        static const bool value = has_find_method<T, A>::value;
    }
   
    template <typename T, typename A>
    struct use_at_method
    {
        static const bool value = !use_find_method<T, A>::value &&
                                   has_at_method<T, A>::value;
    }
   
    template <typename T, typename A>
    struct use_global_find
    {
        static const bool value = !use_find_method<T, A>::value &&
                                  !use_at_method<T, A>::value;
    };
}

We can then implement the functions as follows:

template <typename T, typename U>
auto exists(const T& collection, const U& key)
    -> typename boost::enable_if<detail::use_find_method<T, U>, bool>::type
{
    // ...
}

template <typename T, typename U>
auto exists(const T& collection, const U& key)
    -> typename boost::enable_if<detail::use_at_method<T, U>, bool>::type
{
    // ...
}

template <typename T, typename U>
auto exists(const T& collection, const U& key)
    -> typename boost::enable_if<detail::use_global_find<T, U>, bool>::type
{
    // ...
}

Although it doesn't come close to the full power that concepts would have provided, it's still a useful idiom for dealing with multiple specializations which must be selected based on the interface provided by a type.