M-x compile - Reflection in C++11

Next: A cleaner way to do tuple iteration

Reflection in C++11

14 March 2013 -

In recent years, with the reflective power of languages like Python and C#, the best way to add reflection to C++ has developed into an interesting discussion.

As it currently exists, the only core language features that enable anything nearing reflection are dynamic_cast and typeid the operators. These operators, while useful in their own right, tell us extremely little about a given object. In particular, what people have come now to expect from reflection is a list of class members with associated types and names.

One of the most common uses of reflection is for data serialization. For example, C# MVC has a one line invocation to serialize a given class to JSON. Obviously this doesn't work in all cases since C# is more expressive than JSON properly allows (For instance, circular references are hard, if not impossible, to represent in a standard way). But it does work most of the time, and is in particular useful when you are constructing small, data only objects which you want to use as a data exchange with an AJAX web app.

Classical Approaches

There are a lot of non-standard approaches which deal with either adding a pre-processing step to actually parse C++ and generate some reflection information which can be read in to the actual program, or directly modifying the compiler for the same purpose. These are not very viable approaches to most people since it limits portability while being a pain.

Barring those though, some method of directly annotating a class in code is required, whether that be by hand, or via a macro. Usually this is done via some specialized or statically initialized registration code, and adding a new field/function/etc object to some list which is part of a meta-class object. Boost Mirror for probably the most concrete example of how this is currently done in C++

The problem with these approaches is you end up having to operate on virtual classes as opposed to a properly templated class. Say for instance, you have a vector of 'Field' objects, each backed by a templated class:

struct Field {
   std::string name;
};

template <typename T> 
  struct MetaClass_{

  template <class S> struct Field_ : public Field {
      typedef S (T::*FieldT); 
      };

  std::vector<Field*> fields;
  };

Now we can pull a classes field names out of MetaClass_<MyClass>::fields, but since we can only poke at the actual field pointer through a virtual interface, doing basic operations like setting or getting that field value becomes reliant on careful casting. It is a workable approach, but C++11 gives us tools that allow us to approach the problem differently.

Variadic Templates Approach

Consider instead of using a standard vector to represent a series of fields, storing the list of all of our fields as a tuple object with as many arguments as fields. So let us say we have a class like so:

struct Book {
   std::string author;
   std::string name;
}

We could store all of it's fields in a tuple of type 'std::tuple<Field_<Book, std::string>, Field_<Book, std::string> >'. This will obviously break our example from earlier, since now every meta class has a different 'fields' variable definition:

  template <typename T> struct MetaClass_{ };

  template <typename T, typename S>
    struct Field_ {

      typedef S (T::*FieldT);
      std::string name;
      FieldT field;
    };

  /** We define this so we can let the compiler fill in the types for us */
  template <typename T, typename S> 
    static Field_<T, S> make_field( const std::string& name, S (T::*field)) 
  { 
     return Field_<T, S>{name, field}; 
  }

  template <> struct MetaClass_<Book> { 
    static std::tuple<Field_<Book, std::string>, Field_<Book, std::string> fields() 
        {
           return std::make_tuple(make_field("name", &Book::name),
	                          make_field("author", &Book::author));
        }
      };

Obviously it isn't viable for large structs to type out that return type each time. We can use the new C++11 'auto ... -> decltype' construct to alleviate this somewhat -- note though this has the same exact type either way.

  #define RETURN(x) -> decltype(x) { return x; }

  template <> struct MetaClass_<Book> { 
    static auto fields() RETURN(std::make_tuple(make_field("name", &Book::name),
					        make_field("author", &Book::author)));
      };

See this stackoverflow post for why and how that macro saves us from having to type the return type twice. Essentially we are telling it to use that return expression to figure out the type, and as the actual return expression.

So through use of a variadicly defined struct (tuple), the compiler is essentially going to make a struct that holds all of our field definitions for us. In fact, since we are just specifying a name and a pointer, we could feasibly optimize it into a constexpr (In general, the standard tuple object can't be used in constexpr's though, we'd have to roll a variant of one).

This is all well and good, but how does this help us?

Well, instead of iterating through our virtualized list of object pointers now, we can define an iteration through the tuple, and when we do, we can preserve all type information via templating. For instance, let's say we want to define a stream operator for a json class from any class with metaclass information. We could do something like this:

template <int N> struct json_tuple {
   template <typename T, typename... Args>
      static inline void set_values(Json::Value& jv, T& t, std::tuple<Field_<T, Args>...> args){
      auto field = std::get<N>(args);

      jv[field.name] = (&t)->*field.field; 

      //Rescursively steps through the fields. 
      json_tuple<N-1>::set_values(jv, t, args);
   }
};

// Terminal condition -- don't do anything when we have N == -1
template <> struct json_tuple<-1> {
   template <typename T, typename... Args> 
   static inline void set_values(Json::Value& jv, T& t, std::tuple<Field_<T, Args>...> args){ }
};

template <typename T>
    static auto operator<<(Json::Value& jv, T& t) -> decltype(MetaClass_<T>::fields() ,jv){
    auto fields = MetaClass_<T>::fields();
    // Starts a recursive call through. 
    json_tuple<std::tuple_size<decltype(fields)>::value - 1>::set_values(jv, t, fields);
    return jv;
  };

Addmitedly, this unroll operation is far from elegant or concise. But it is powerful. And despite the recursive definition used, When compiled in release mode, it will boil down to something analagous to:

    static auto operator<<(Json::Value& jv, Book& t) -> decltype(MetaClass_<Book>::fields() ,jv){
    auto fields = MetaClass_<Book>::fields();
    jv["author"] = t.author;
    jv["name"] = t.name;
    return jv;
  };

Which is to say, it's going to be almost completely as fast as if the user had done it by hand, and plausibly faster.

Cons

There are a couple of warts to this approach. The loop unroll you need to have in place to use the field information is not the most intuitive construct, and even when someone has a full grasp of the approach, typos and other slight errors can produce the standard template dump of information that most compilers will give when something goes wrong in nested template invocations.

And even then, since unrolling a variadic structure requires a function call at each iteration through, it will never be as concise as a normal loop construct, and will involve structures and functions that exist solely to make the compiler perform the heavy lifting for us. While this approach is becoming the standard way to interact with things like tuples, it is a bit hackish.

The last demerit is the general warning when doing anything involving templates -- gcc is going to emit at least one new function for each time you try to write a new type to a json object. Possibly a few more than that, depending on how the optimization pass goes. And for each type you define a metaclass for, it is going to create a new structure plus a new structure for each field defined. So you end up paying a price in both compile time and object size.

Pros

This implementation has a couple of attractive features, and actually surpasses the functionality of higher level languages like C# in a few respects.

First, since much of the process is static, it is not going to let you try setting a json property that JsonCpp doesn't support. You get a compiler error if you trying assigning it a pointer, because JsonCpp has no overload for setting a property to a pointer. Analagous mistakes in C# are run-time errors.

Second, it's fast. There is a lot of layers from the code perspective, but from the compiled binary perspective, it is about as fast as any serializer is likely to be.

Conclusion

This method is going to be the basis for rikitiki's serialization outputs moving forward. The full implementation of the deserializer is available currently in the latest repo update, and it includes read/write support to Json, as well as nested objects and processing of containers.

It is currently wired up in such a way that if you just write an object of 'Book' to an active connection, it figures out from the request headers the best format to respond in and does the full conversion for you. Ditto with reading an object from a connection stream so long as the proper content-type is set.

Currently the plan is to also add support for XML and DB clients of metaclassed data, and eventual expansion to add support for constructors, methods, inheritance and typedefs as those features are needed.

Next: A cleaner way to do tuple iteration