Sunday 8 June 2008

C++ pocket lambda library, part III


As I first wrote my C++ lambda code about a year and a half ago, I didn't know that I'm hitting such a hot topic. I wanted just to reduce the amount of code I had to write, and didn't have any high-church functional programming thingy in mind. But now lamdbads and closures are all the rage: look at all those Groovy articles on developerWorks for example. Even Java 7 will have closures (or won't it?). Definitely, Ruby has made programming languages an interesting topic again!

BTW, do you know how currying and lambdas looks like in Haskell, a popular functional (dynamically and strongly typed) language? If you don't know what it is, we've discussed currying in a previous posting of this mini-series*. In Haskell it is very natural syntactically, you can just write (well, almost, I skipped the T=>T=>T type definition):

    product = a b = a*b
    double = product 2    <-- curried!
    double 2
This is basic, but look at that:

    doubleEach = map (2 *)
i.e. we define a partially evaluated function (as map needs 2 arguments, a function and a list) waiting for application. It's like you'd be using the bind() template of our last posting* in C++. And look at the cute lambda function shortcut: (2 *) is a function (unsurprisingly, as in Haskell everything is a function, contrary to Ruby or Groovy where everything is an object, even the functions ;-)). I like it.


1. Getting exxpresive


Admittedly the code in the 2nd part of this mini series was rather bland*: some hyper technical stuff but not really very entertaining like the 1st part (which was really fun for me to code). I wrote it only for completness' sake, as it is part of my code. I hope this installment will be more fun again.

So let us tackle the last topic we need to implement as to have an usable lambda mini-library: the expression templates. The what? Wy do we need it exacltly? I'm certainly not going to write code like: __expr template <class Expressible> express_anger(Expressible& e); - not in my life!!! Ok, ok, let's introduce the concept gently.

Do you know for example the Boost (e)Xpressive library? It expresses (expressively) regular expressions (Boost Spirit library does in much greater style for grammars) by C++ code constructs. I.e. instead of:

    sregex rex = sregex::compile( "(\\w+) (\\w+)!" );
you can write

    sregex rex = (s1= +_w) >> ' ' >> (s2= +_w) >> '!';
using the domain specific language (buzzword alarm!!!) instead. The string: "\\w+" is replaced by an (Xpressive) expression +_w. You recognize perhaps the usage of placeholders, like our _$1 or _$2, operator overloading and assignment of partial matches to external variables (external to the closure, you'd say in Perl or Groovy). But in this case we don't have a single operation which should create a functor, neither a combination of two different operations. Here we have one operation applied again and again (>> concatenator), and we have to encapsulate it in a single lambda functor!

The same problem emerges in the context of out mini-library.

    for_each(vec.begin(), vec.end(), cout << "-->" << _$1 << "\n");
we have to collect all the items which have to be sent to cout, which can be infinite in number!

Here expression templates come to the rescue. First described by Todd Veldhuizen**, they let us to define recursive templates with operator overloading. And recursion can go infinitely deep down, so we can accomodate our long shift operator sequences with our usual aplomb! What we need is following tree structure:

                  expr
                    ¦
                   op >>
                 /    \
               op >>   \
             /   \      \
           op >>  \      \
         /        \      \
      s1=+_w  ' ' s2=+w_  '!'
True to the "Modern C++ design" book's ubiquitous typelists usage, we can express this runtime structure in compile time with a following monstrous type:

    Op<Op<Op<Char, Expr>, Expr>, Char> rex;
Here, all the structural information has been recorded: just read the type from left to right and compare it with the picture of the parse tree. Now we have to supply the arguments to the constructor of an object of this type and then call a method of the rex object. But we don't do it in classical sense, the construction and the type definition will be done recursively while compiler is parsing the expression. Hence the name of the expression templates #B-D.


2. First real exxpressive code


To convince the compiler to to some sensible work for us we first define the following recursive template:

    // Shift represents a node in the parse tree
    // ---
    template<typename Left, typename Op, typename Right>
        struct Shift  : public lambda_expr
    {
        Left leftNode;
        Right rightNode;
        std::ostream& out;

        Shift(Left t1, Right t2, std::ostream& os)
            : leftNode(t1), rightNode(t2), out(os) { }

        template <class T>
            void operator() (T& t) { Op::print(leftNode, rightNode, t); }
    };
You can see, the structure of the tree node is different form the monstrous type given as example above, well, it's even more complicated. Using this approach we would express the above example tree as:

    Node<Node<Node<Char, Op, Expr>, Op, Expr>, Op, Char> rex;
Ok, why not. If it's supposed to help, I couldn't care less... ;-)

But what has this all with our lambda library? The answer is, we can apply the same concept to the problem of priniting data to cout: supposed we have a following lambda function: cout lt;< "element:" << _$1, we'll can build a type tree like:

                  expr
                    ¦
                  << op
                 /    \
             << op     \
             /   \      \
            /     \      \
         cout "element:" _$1
One interesting thing to note is the ()-operator, which prints the actual node of the parse tree using a mysterious T& t argument: it is the actual parametr of the lambda function, i.e. the uderlying iterator itself!

So if we have the top level tree node (i.e. the expr in the diagram above), we'll just call its ()-operator an the whole tree is printed out, hopefully! Two problem pose themselves here:

1. how to descend to the subnodes of the tree while printing, and
2. how do we get the whole tree structure constructed in the first place?


3. Recursive printing


To answer the first question we need the representation of the recursive printing operation in code:

    struct shiftOp // Represents << operation
    {
        template <class Left, class Right, class T>
            static void print(Left& left, Right& right, T& t)
            {
              print(left.leftNode, left.rightNode, t); // walk down
              left.out << right(t);  // if right needs t? : << _$1*2 ???
            }
 
template <class Right, class T> static void print(std::ostream& left, Right& right, T& t) { left << right(t); // at the beginning }
 
template <class Left, class T> static void print(Left& left, placeholder<1>& right, T& t) { print(left.leftNode, left.rightNode, t); left.out << t; // special case placeholder } };
First we walk down the tree in the depth first, left to right mode (the first print() function). When we arrive at the lowest left node of the tree we do the first print, then go back and print the corresponding right node. Note that we don't walk a physical tree here, we walk a type expression which is organized like a tree! The print() functions will "match" a part of the type-tree, print the matched part, and match the subtype in a recursive manner. So we are treating types in compile time as we'd treat data in the runtime! That's why this is called template META-programming.

Then we need only 2 specialisations: one for the first invocation of the shift operator, where the left operand is the output stream itself, and the second one to deal with our lambda mini-library's placeholder types. The placeholder will be printed directly to the stream. Our tree expression for the simple example above will be then as follows:

    Shift<Shift<cout, shiftOp, "element"-Expr>, shiftOp, _$1> lambda;
Now just imagine how the print() function will work on it.


4. Growing the tree


To answer the second question we need 2 simple ;-) operator overloading definitions:

    // The overloaded operator which does parsing for expressions of the
    //  form "a<<b<<c<<d" => Shift<Shift<Shift<A, op, B>, op, C>, op, D>()
 
// for ostream: cannot be taken by value!
template<class Left, class Right> Shift<Left&, shiftOp, Right> operator<< (Left& left, Right right) { return Shift<Left&, shiftOp, Right>(left, right, left); // left IS an ostream! }
 
// for lambda_expr: must be taken by value! template<class Left1, class Left2, class Right> Shift<Shift<Left1, shiftOp, Left2>, shiftOp, Right> operator<< (Shift<Left1, shiftOp, Left2> left, Right right) { return Shift<Shift<Left1, shiftOp, Left2>, shiftOp, Right>(left, right, left.out); }
The first one starts the recursive template definition at the "cout <<"-expression, an the second one goes one nesting level deeper and one <<-operator application to the right. It's an elaborate syntax, but conceptually it's not a rocket science! Note how the stream (cout) parameter is handed down the expression tree.

Yeeee-ha! We are done now! Let us try it out:

    for_each(vec.begin(), vec.end(), cout << _$1);  // OK
    for_each(vec.begin(), vec.end(), cout << _$1 << "\n" ); // compile error???
    for_each(vec.begin(), vec.end(), cout << i++ << ":" << _$1 << " " ); // again!!!
Did you see that? We still cannot use our lambda library. What is it this time? Well, we need a last building block, and for discussioon of that we need a separate paragraph.


5. External data in lambda functions


The problem is, that inside of an lambda expression we are working with functors, and not with native C++ data types. This means, we need a function call operator for each element of the lambda expression. What can be easier than that! We can make a trivial functor, which, when evaluated returns our literal value i.e. "\n" or ":"!

    // constant_ref
    //  ---
    template<class T> struct Const : public lambda_expr
    {
        T& val;
        Const(T& t) : val(t) { }
        const T& operator()() const { return val; }
        const T& value() const { return val; }

        template <class S> // for different context: needed for Shift!!!
            const T& operator()(S& s) const { return val; } // ignore input
    };
 
template<class T> Const<T> delay(T& t) { return Const<T>(t); }
This can be described as a delayed evaluation: the compiler first wraps the constant in a functor, and the constant's value is used in runtime instead of compile time. Now a lambda like: auto lambda = (cout << _$1 << "\n"); will work. BTW I wonder if that would compile...

Maybe you don't remebmer, but it the IIa part of this series*** I showed the following code snippet:

    // assign to a external counter
    int aaa;
    for_each(vecx.begin(), vecx.end(), aaa += _$1->*(&XXX::value));
only to say, that it wouldn't work yet. What's the problem? Basically the same one: we are using native C++ type instead of a functor. The solution is of course the delayed evaluation from above, but now must cater for the basic operations:

    // variable_ref
    //  ---
    template<class T> struct Var : public lambda_expr
    {
        T& val;
        Var(T& t) : val(t) { }
        T& operator()() const { return val; }
        T& operator()(T& t) const { return val; } // ignore input
        T& value() const { return val; }

        AssgnTo<T> operator=(T t) { return AssgnTo<T>(val, t); }

        template <class S>  // assign from lambda_expr
            AssgnTo<T> operator=(S s) { return AssgnTo<T>(val, s); }
    };
 
template<class T> Var<T> globvar(T& t) { return Var<T>(t); }
Here we enabled the assignment to the C++ data type. The addional operators could be implemented like this:

    // OPEN todo: be more modular!!!
    //  --- return lambda_operation<lambda_exp, oper_type>
    template<class T>
        AddAssg<T> operator+=(Var<T> v, const T& t) { return AddAssg<T>(v.value(), t); }
    template<class T>
        AddAssg<T> operator+=(Var<T> v, placeholder<1>) { return AddAssg<T>(v.value()); }
 
template<class T> Incr<T> operator++(Var<T> v, int) { return Incr<T>(v.value()); }
Now we can finally write:

    int xxx = 0;
    for_each(vecx.begin(), vecx.end(), globvar(xxx)++);
    for_each(vecx.begin(), vecx.end(), globvar(xxx) += 5);
    for_each(vecx.begin(), vecx.end(), globvar(xxx) += _$1->*(&XXX::value));
and, for example:

    for_each(vec.begin(), vec.end(), cout << _$1 << delay("\n") );
    for_each(vec.begin(), vec.end(), cout << globvar(i++) << delay(":") << _$1 );
But not, among others: globvar(aaa) = _$1->*(&XXX::value), as it's not a complete, orthogonal, and idustrial strength library. It was only a little pastime of mine...


6. The End?


Ehm, I thought this will be the last part of the description of my simple implementation, but no, I'll need another blog entry as not to bore you to death with this one. Meantime, the writing of this description has taken more time than the actual implementation itself!!! But I have some more points to make, and at the end of a long post, there are rater unlikely to be given much attention. So long then!

---
* Part II: http://ib-krajewski.blogspot.com/2008/01/c-pocket-lambda-library-part-2.html
** Techniques for Scientific C++: http://ubiety.uwaterloo.ca/~tveldhui/papers/techniques/techniques01.html
*** Part IIa: http://ib-krajewski.blogspot.com/2008/04/pocket-c-lambda-library-part-iia.html


No comments: