Monday, April 23, 2012

Designing a data-driven engine, part I - Properties

I've started a new project which is fairly ambitious - I'm building a new 3d engine from scratch. My goal is to make something flexible enough so that I can easily build new demos and potentially integrate some of my existing projects under a common framework, so that I don't have to rewrite so much boilerplate.

One of my major goals for this project is to emphasize data driven design wherever possible. This is sort of an ambiguous buzzword at this point, so let me clarify my understanding of "data-driven": wherever possible, the program's behavior should be specified at run-time, rather than at compile-time. There are several advantages, and some disadvantages, to this approach:

  • You can have the same compiled codebase run multiple, potentially very different, applications. This reduces the need to fork projects and have multiple versions of the same code floating around. As I want to use this engine for many small projects, the last thing I want to do is to have to repeat myself.
  • Extremely rapid turnaround time is possible. In fact, I intend on implementing the ability to reload many kinds of resources without needing to restart the application; for example, I could edit shader code in one window and watch the effects of my changes in real-time in another.
  • Collaboration with non-programmers is greatly facilitated. The more that can be done without needing to edit C++ code, the more work I can delegate in a group project.
  • Validation is potentially easier. Tracking down bugs in C++ code is hard; tracking down certain types of bugs in an XML document can be done automatically by running it through an XML validator.
  • Performance is potentially decreased. You need to load and parse the data, instead of the compiler doing it for you in a design where everything is hard-coded. Loose coupling of components means fewer optimizations are available to the compiler, as more abstractions (such as virtual method calls) are required to glue the components together.
  • Huge amounts of glue code are required. Serialization code is tedious, bug prone, and therefore requires a lot of test coverage.
The last point is particularly concerning for me, as since this is a hobby project by a single programmer, I need to be constantly aware of how much time various features will require to implement. Much of the benefits of a data driven engine are to reduce the time I need to spend implementing new features, but if that is outweighed by the time spent writing glue code, I gain nothing from it.

If I were to write a similar engine in C# or Java, I wouldn't have to worry about glue code at all. Those languages provide reflection mechanisms which make the problem trivial, at least in the general case. The problem is C++'s complete lack of reflection. I decided that rather than writing glue code for every class in my engine, it would be far simpler to create a basic reflection system, and then write glue once. This should, in theory, also decrease the number of bugs in my program, because there's much less code to test.

I don't need a full reflection system, nor would I really want one, given the complexity of C++. Something which is capable of reflecting a small subset of data types would be more than sufficient. I decided on the following list:
  • Booleans
  • Integers
  • Reals (i.e. floats or doubles)
  • Strings
  • Collections (supporting Enumerate, Filter, and Grow operations)
  • Dictionaries (with string keys only)
  • Objects
To clarify the last point, I'm using "object" in the sense that it might be used in a language such as Javascript: essentially, a dictionary where the set of valid keys are fixed and always present.

I am not going to go into the implementation details in this post, but here's a few examples from the unit tests for the property system:

    bool b = false;
    Property p = b;
    p.AsBool().Set(true);
    ASSERT_EQ(b, true);
    b = false;
    ASSERT_EQ(p.AsBool().Get(), false);
This illustrates the basic operation for value types. A Property has refernece semantics and can be assigned directly to any supported value type. Integers, reals, and strings work in much the same way as bools - they provide get and set operations. Collections are more interesting:

    std::vector<int> v;
    v.push_back(0);
    v.push_back(1);
    v.push_back(2);
    Property p = v;
    ASSERT_EQ(p.AsCollection().GetSize(), 3);
    Property new_int = p.AsCollection().Grow();
    new_int.AsInt().Set(3);
    ASSERT_EQ(v.size(), 4);
    ASSERT_EQ(v.back(), 3);

As you can see, you can grow a collection, but you can't specify what value you're adding to it. That's because you don't know what the actual type is held by the collection, only that it can be encapsulated by a property. This implies that only collections of default-constructible types can be represented as properties. Here's how you can enumerate through a collection:

    std::vector<int> v;
    v.push_back(0);
    v.push_back(1);
    v.push_back(2);
    std::vector<int> o;
    Property p = v;
    p.AsCollection().Enumerate([&] (size_t, Property p)
    {
        return o.push_back(static_cast<int>(p.AsInt().Get())), true;
    });
    ASSERT_EQ(o.size(), 3);
    ASSERT_EQ(o[0], 0);
    ASSERT_EQ(o[1], 1);
    ASSERT_EQ(o[2], 2);
The return value for the enumerator allows you to short-circuit the enumeration. Filtering works in a similar manner - returning false removes the item from the collection.

Dictionaries are similar to collections, but for key-value stores. The keys must be STL strings.
    std::map<std::string, int> m;
    m["zero"] = 0;
    m["one"] = 1;
    m["two"] = 2;
    Property p = m;
    ASSERT_EQ(p.Type(), PropertyTypes::Dictionary);
    ASSERT_NO_THROW(p.AsDictionary());
    ASSERT_EQ(p.AsDictionary().Get("zero").AsInt().Get(), 0);
    ASSERT_EQ(p.AsDictionary().Get("one").AsInt().Get(), 1);
    ASSERT_EQ(p.AsDictionary().Get("two").AsInt().Get(), 2);
Dictionaries implement the same basic operations as collections - grow, enumerate, and filter.

Objects are a bit more complex. An object maps to a C++ composite type, but as C++ lacks reflection it's necessary to specify a field mapping. This is done using macros:

struct Foo
{
   int a;
   float b;
   bool c;
   std::string d;
};

BEGIN_PROPERTY_MAP(Foo)
    DEF_PROPERTY(a)
    DEF_PROPERTY(b)
    DEF_PROPERTY(c)
    DEF_PROPERTY(d)
END_PROPERTY_MAP()

Foo f;
Property p = f;
p.AsObject().Get("a").AsInt().Set(2);
p.AsObject().Get("b").AsReal().Set(12.0);
p.AsObject().Get("c").AsBool().Set(true);
p.AsObject().Get("d").AsString().Set("hello");
ASSERT_EQ(f.a, 2);
ASSERT_EQ(f.b, 12.0f);
ASSERT_EQ(f.c, true);
ASSERT_STREQ(f.d.c_str(), "hello");
It's also possible to specify a property mapping from within a type. Doing so allows it to be encapsulated as a property polymorphically, and to inherit the property mappings from parent classes. This does require that the type inherit from the PropertyHost interface.

Within an object, it's possible to specify a property mapping which maps to a pair of getter and setter functions instead of an actual member. This facilitates mapping complex data types which cannot be directly encapsulated by a Property, so long as they can be converted to another representation, such as a string. Here is an example:

class X
{
public:
   int value;
   int Get() const
   {
        return value;
   }
   void Set(int v)
   {
   value = v;
   }
};

X x;
x.value = 0;
auto pw = common::PropertyWrapper<int, X, &X::Get, &X::Set>(x);
Property p = pw;
ASSERT_EQ(p.AsInt().Get(), 0);
x.value = 1;
ASSERT_EQ(p.AsInt().Get(), 1);
p.AsInt().Set(2);
ASSERT_EQ(p.AsInt().Get(), 2);
ASSERT_EQ(x.value, 2);
This can also be done via a macro called DEF_PROPERTY_WRAPPER() within the BEGIN_PROPERTY_MAP()/END_PROPERTY_MAP() macros.

With the property library in place, I no longer need to write glue code for each serializable class. Instead, all I have to do is write glue code once for properties, and then make sure each class can be represented by a Property object. (In most cases this is simply a matter of providing a property mapping.) I wrote a class called XmlSerializer, based on rapidxml, which implements XML serialization of properties:
struct CompositeOuter
{
    struct CompositeInner
    {
        int a;
    };
    std::vector<CompositeInner> inners;
};

BEGIN_PROPERTY_MAP(CompositeOuter)
    DEF_PROPERTY(inners)
END_PROPERTY_MAP()

BEGIN_PROPERTY_MAP(CompositeOuter::CompositeInner)
    DEF_PROPERTY(a)
END_PROPERTY_MAP()

    std::string src =
        "<object>"
        "   <inners>"
        "       <item><a>1</a></item>"
        "       <item><a>2</a></item>"
        "   </inners>"
        "</object>";
    CompositeOuter obj;
    src >> XmlSerializer(obj);
    ASSERT_EQ(obj.inners.size(), 2);
    ASSERT_EQ(obj.inners[0].a, 1);
    ASSERT_EQ(obj.inners[1].a, 2);
}
As you can see, composition of data types is allowed so long as they can all be encapsulated by Property. This is handy because often a vector or dictionary of structs is needed to serialize a class.

I should say a few words on the performance of this code. As it is designed to be used in a high performance game engine, there are zero dynamic memory allocations used by the property library. (This doesn't count things such as growing a std::vector wrapped in a property, obviously.) The only major overhead is a virtual method call required by each invocation of a Property method. However, since serialization is something not likely to occur on a per-frame basis, I don't consider that a significant performance cost.

The source code is available on github.




No comments:

Post a Comment