Writing functional code in C++ II – Function composition

Luca Bolognese - Mar 2012

Function composition is at the core of functional programming. You start by being very conﬁdent that certain very small functions are correct, you compose them in well known ways and you end up being very conﬁdent that your ﬁnal program is correct.

You are very conﬁdent that the initial functions are correct because they are very small and side effect free. You are very conﬁdent that your program is correct because the means of composition are well known and generate functions that are themselves side effect free.

Such ‘almost primitive’ functions operate on data structures. Each functional language has its own set of data structures and ‘almost primitive’ functions. Probably the most famous ones are contained in the haskell prelude, but F# has similar ones. LINQ in C# is just such a set.

In this article I use the term functional composition in a broad sense as ‘putting functions together’, not in the more technical sense of f(g(x)): dot operator in Haskell or ‘>>’ operator in F#.

Code for this post is here. Thanks to Steve Bower and Andy Sawyer for review and comments.

Here is an example. The F# code below ﬁlters in the odd numbers from an array and doubles them. You have two logical operations, ﬁltering and doubling.

let v = [|1;2;3;4;5;6;7;8;9;0|]
let transformed =
    v
    |> Seq.filter(fun i -> i % 2 = 0)
    |> Seq.map ((*) 2)

How would the code look in C++? There are many ways. One is the ‘big fat for loop’, or in C++ 11, for each:

    const int tmp[] = {1,2,3,4,5,6,7,8,9,0};
    const vector<int> v(&tmp[0], &tmp[10]);
    vector<int> filtered;
    for(auto i: v)
    {
        if(i % 2 == 0)
            filtered.push_back(i * 2);
    }

This looks simple enough, apart from the initialization syntax that gets better with C++ 11. But it’s simplicity is misleading. We have now lost the information that our computation is composed by a ﬁlter and a map. It’s all intermingled. Obviously, the info is still there in some sense, but you need to read each line in your head and ﬁgure out the structure on your own. It is not readily apparent to the reader.

Obviously, things get much worse in a large program composed of several loops where each one does not trivial stuff. As for me, every time I indulge into the ‘big fat for loop pattern’, I end up regretting it either immediately or after some time when I have to make changes to the code. So I try to avoid it.

Wait a minute, you might say, what about STL algorithms? Aren’t you supposed to use those?

The syntax to call them is not very compositional. You cannot take the result of an algorithm and pass it to another easily. It is difﬁcult to chain them that way. Everything takes two iterators, even if 99% of the times you are just using begin and end. Moreover, you have to create all sort of temporary data structures for the sake of not modifying your original one. Once you have done all of that, the syntactic weight of your code overwhelm the simplicity of what you are trying to do.

For example, here is the same code using ‘transform’ for the ‘map’ part of the algorithm:

    vector<int> filtered;
    copy_if(begin(v), end(v), back_inserter(filtered), [](int x) { return x % 2 == 0;});
    vector<int> transformed;
    transform(begin(filtered), end(filtered), back_inserter(transformed), [](int x) { return x * 2;});

I’m not arguing that STL is a bad library, I think it is a great one. It’s just that its syntax is not very compositional. And that breaks the magic.

Luckily, help is on the way in the form of the Range library and OvenToBoost. These libraries ‘pack’ the two iterators in a single structure and override the ‘|’ operator to come up with something as below:

    auto result =
        v
        | filtered(   [] (int i) { return i % 2 == 0;})
        | transformed([] (int i) { return i * 2;});

If you use Boost lambdas (not sure I’d do that), the syntax gets even better:

    auto result =
        v
        | filtered(   _1 % 2 == 0)
        | transformed(_1 * 2);

Ok, so we regained compositionality and clarity, but what price are we paying? Apart from the dependency from Boost::Range and OvenToBoost, you might think that this code would be slower than a normal for loop. And you would be right. But maybe not by as much as you would think.

The code for my test is here. I’m testing the perf difference between the following code engineered to put in a bad light the most pleasing syntax. A for loop:

        int sum = 0;
        for(vector<int>::const_iterator it = v.begin(); it != v.end(); ++it)
            if(*it < 50)
                sum += *it * 2;
        return sum;

A Range based code using language lambdas (transformedF is a variation of this):

        auto lessThan50 = v | filtered(    [](int i) { return i < 50;})
                            | transformedF([] (int i) { return i * 2;});
        return boost::accumulate (lessThan50, 0);

A Range based code using two functors:

        auto lessThan50 = v | filtered(Filterer())
                            | transformed(Doubler());
        return boost::accumulate (lessThan50, 0);

Where:

struct Doubler {
    typedef int result_type;
    int operator() (int i) const { return i * 2;}
};
struct Filterer {
    typedef int result_type;
    bool operator() (int i) const { return i < 50;}
};

a Range based code using Boost lambdas:

        auto lessThan50 = v | filtered(_1 < 50)
                            | transformed(_1 * 2);
        return boost::accumulate (lessThan50, 0);

And ﬁnally, some code using the STL for_each function:

        int sum = 0;
        std::for_each( v.cbegin(), v.cend(),
            [&](int i){
                if( i < 50 ) sum += i*2;
            });
        return sum;

Notice that I need to run the test an humongous number of times (10000000) on a 100 integers array to start seeing some consistency in results.
And the differences that I see (in nanosecs below) are not huge:

1>                         Language lambda: {1775645516;1778411400;0}
1>                          Functor lambda: {1433377701;1435209200;0}
1>                            Boost lambda: {1327829199;1326008500;0}
1>                                For loop: {1338268336;1341608600;0}
1>                             STL foreach: {1338268336;1341608600;0}

True to be told, by changing the code a bit these numbers vary and, in some conﬁgurations, the Range code takes twice as long.

But given how many repetitions of the test I need to run to notice the difference, I’d say that in most scenarios you should be ok.

The ‘|>’ operator over collections is the kind of functional composition, broadly deﬁned, that I use the most, but you certainly can use it over other things as well. You’d like to write something like the below:

    BOOST_CHECK_EQUAL(pipe("bobby", strlen, boost::lexical_cast<string, int>), "5");

And you can, once you deﬁne the following:

    template< class A, class F1, class F2 >
    inline auto pipe(const A & a, const F1 & f1, const F2 & f2)
        -> decltype(REMOVE_REF_BUG(f2(f1(a)))) {
        return f2(f1(a));
    }

Where REMOVE_REF_BUG is a workaround for a bug in the decltype implementation under msvc (it should be ﬁxed in VS 11):

    #ifdef _MSC_VER
        #define REMOVE_REF_BUG(...) remove_reference<decltype(__VA_ARGS__)>::type()
    #else
        #define REMOVE_REF_BUG(...) __VA_ARGS__
    #endif

I didn’t do the work of creating ‘>>’ , ‘<<’ and ‘<|’ F# operators or wrapping the whole lot with better syntax (i.e. operator overloading?), but it certainly can be done.

Now that we have a good syntax for records and a good syntax for operating over collections, the next instalment will put them together and try to answer the question: how should I create collections of records and operate over them?

Comments

Richard Polton

2012-05-25T07:48:18Z

Hi, since I’ve known enough to understand the difference, my preference has always been for functional-style programming and I particularly enjoyed the STL when I discovered it way back when I was actively writing new C++ code. It has been, therefore, very interesting and refreshing to read your article and see a modern take on those things I was thinking about and on occasion doing, so many thanks for that. I have constructed the same sort of framework in C# recently, originally in .NET2.0 and latterly using .NET3.5. For example I have deﬁned a ‘>’ operator for currying function arguments, which I like and, for the same reasons as it exists in F#, think it adds to the readability of the code. It’s interesting to contrast my C# with your C++ - we’re doing similar things but the language leads us in different directions.

lucabol

2012-05-25T21:18:55Z

Thanks for the kind words. LINQ brings functional programming to C#, but not currying. Frankly, I don’t use curry all that much myself, probably I’ve not yet fully embraced the Haskell style of ‘extreme function composition’. Well, one thing at the time …

Richard Polton

2012-05-28T12:16:56Z

Hi, your reply prompted a further thought: if you don’t use currying ‘all that much’ do you use any heuristics to determine how to order your function parameters? I have found that, through currying considerations, I tend to construct my functions with their arguments in reverse order to many of my colleagues because I almost always place the most frequently-changing parameter at the end. It’s difﬁcult to deduce the answer from your article because the functions all seem to be functions of one parameter :-)

lucabol

2012-05-28T19:57:49Z

Yours is a technically sound strategy. I tend to put the most obvious parameters upfront, the ones that whatever user of the function would naturally expect. I then put the others, defaulted to something sensible if it exists. Maybe that’s why I don’t use currying all that much, because it doesn’t naturally put the most important things ﬁrst, which strongly appeals to my aesthetic sense.

Tags

Comments

Richard Polton

lucabol

Richard Polton

lucabol