Writing functional code in C++ II – Function composition

-

Function com­po­si­tion is at the core of func­tional pro­gram­ming. You start by be­ing very con­fi­dent that cer­tain very small func­tions are cor­rect, you com­pose them in well known ways and you end up be­ing very con­fi­dent that your fi­nal pro­gram is cor­rect.

You are very con­fi­dent that the ini­tial func­tions are cor­rect be­cause they are very small and side ef­fect free. You are very con­fi­dent that your pro­gram is cor­rect be­cause the means of com­po­si­tion are well known and gen­er­ate func­tions that are them­selves side ef­fect free.

Such almost prim­i­tive’ func­tions op­er­ate on data struc­tures. Each func­tional lan­guage has its own set of data struc­tures and almost prim­i­tive’ func­tions. Probably the most fa­mous ones are con­tained in the haskell pre­lude, but F# has sim­i­lar ones. LINQ in C# is just such a set.

In this ar­ti­cle I use the term func­tional com­po­si­tion in a broad sense as putting func­tions to­geth­er’, not in the more tech­ni­cal sense of f(g(x)): dot op­er­a­tor in Haskell or >>’ op­er­a­tor in F#.

Code for this post is here. Thanks to Steve Bower and Andy Sawyer for re­view and com­ments.

Here is an ex­am­ple. The F# code be­low fil­ters in the odd num­bers from an ar­ray and dou­bles them. You have two log­i­cal op­er­a­tions, fil­ter­ing and dou­bling.

let v = [|1;2;3;4;5;6;7;8;9;0|]
let transformed =
    v
    |> Seq.filter(fun i -> i % 2 = 0)
    |> Seq.map ((*) 2)

How would the code look in C++? There are many ways. One is the big fat for loop’, or in C++ 11, for each:

    const int tmp[] = {1,2,3,4,5,6,7,8,9,0};
    const vector<int> v(&tmp[0], &tmp[10]);
    vector<int> filtered;
    for(auto i: v)
    {
        if(i % 2 == 0)
            filtered.push_back(i * 2);
    }

This looks sim­ple enough, apart from the ini­tial­iza­tion syn­tax that gets bet­ter with C++ 11. But it’s sim­plic­ity is mis­lead­ing. We have now lost the in­for­ma­tion that our com­pu­ta­tion is com­posed by a fil­ter and a map. It’s all in­ter­min­gled. Obviously, the info is still there in some sense, but you need to read each line in your head and fig­ure out the struc­ture on your own. It is not read­ily ap­par­ent to the reader.

Obviously, things get much worse in a large pro­gram com­posed of sev­eral loops where each one does not triv­ial stuff. As for me, every time I in­dulge into the big fat for loop pat­tern’, I end up re­gret­ting it ei­ther im­me­di­ately or af­ter some time when I have to make changes to the code. So I try to avoid it.

Wait a minute, you might say, what about STL al­go­rithms? Aren’t you sup­posed to use those?

The syn­tax to call them is not very com­po­si­tional. You can­not take the re­sult of an al­go­rithm and pass it to an­other eas­ily. It is dif­fi­cult to chain them that way. Everything takes two it­er­a­tors, even if 99% of the times you are just us­ing be­gin and end. Moreover, you have to cre­ate all sort of tem­po­rary data struc­tures for the sake of not mod­i­fy­ing your orig­i­nal one. Once you have done all of that, the syn­tac­tic weight of your code over­whelm the sim­plic­ity of what you are try­ing to do.

For ex­am­ple, here is the same code us­ing transform’ for the map’ part of the al­go­rithm:

    vector<int> filtered;
    copy_if(begin(v), end(v), back_inserter(filtered), [](int x) { return x % 2 == 0;});
    vector<int> transformed;
    transform(begin(filtered), end(filtered), back_inserter(transformed), [](int x) { return x * 2;});

I’m not ar­gu­ing that STL is a bad li­brary, I think it is a great one. It’s just that its syn­tax is not very com­po­si­tional. And that breaks the magic.

Luckily, help is on the way in the form of the Range li­brary and OvenToBoost. These li­braries pack’ the two it­er­a­tors in a sin­gle struc­ture and over­ride the |’ op­er­a­tor to come up with some­thing as be­low:

    auto result =
        v
        | filtered(   [] (int i) { return i % 2 == 0;})
        | transformed([] (int i) { return i * 2;});

If you use Boost lamb­das (not sure I’d do that), the syn­tax gets even bet­ter:

    auto result =
        v
        | filtered(   _1 % 2 == 0)
        | transformed(_1 * 2);

Ok, so we re­gained com­po­si­tion­al­ity and clar­ity, but what price are we pay­ing? Apart from the de­pen­dency from Boost::Range and OvenToBoost, you might think that this code would be slower than a nor­mal for loop. And you would be right. But maybe not by as much as you would think.

The code for my test is here. I’m test­ing the perf dif­fer­ence be­tween the fol­low­ing code en­gi­neered to put in a bad light the most pleas­ing syn­tax. A for loop:

        int sum = 0;
        for(vector<int>::const_iterator it = v.begin(); it != v.end(); ++it)
            if(*it < 50)
                sum += *it * 2;
        return sum;

A Range based code us­ing lan­guage lamb­das (transformedF is a vari­a­tion of this):

        auto lessThan50 = v | filtered(    [](int i) { return i < 50;})
                            | transformedF([] (int i) { return i * 2;});
        return boost::accumulate (lessThan50, 0);

A Range based code us­ing two func­tors:

        auto lessThan50 = v | filtered(Filterer())
                            | transformed(Doubler());
        return boost::accumulate (lessThan50, 0);

Where:

struct Doubler {
    typedef int result_type;
    int operator() (int i) const { return i * 2;}
};
struct Filterer {
    typedef int result_type;
    bool operator() (int i) const { return i < 50;}
};

a Range based code us­ing Boost lamb­das:

        auto lessThan50 = v | filtered(_1 < 50)
                            | transformed(_1 * 2);
        return boost::accumulate (lessThan50, 0);

And fi­nally, some code us­ing the STL for_each func­tion:

        int sum = 0;
        std::for_each( v.cbegin(), v.cend(),
            [&](int i){
                if( i < 50 ) sum += i*2;
            });
        return sum;

Notice that I need to run the test an hu­mon­gous num­ber of times (10000000) on a 100 in­te­gers ar­ray to start see­ing some con­sis­tency in re­sults. 
And the dif­fer­ences that I see (in nanosecs be­low) are not huge:

1>                         Language lambda: {1775645516;1778411400;0}
1>                          Functor lambda: {1433377701;1435209200;0}
1>                            Boost lambda: {1327829199;1326008500;0}
1>                                For loop: {1338268336;1341608600;0}
1>                             STL foreach: {1338268336;1341608600;0}

True to be told, by chang­ing the code a bit these num­bers vary and, in some con­fig­u­ra­tions, the Range code takes twice as long.

But given how many rep­e­ti­tions of the test I need to run to no­tice the dif­fer­ence, I’d say that in most sce­nar­ios you should be ok.

The |>’ op­er­a­tor over col­lec­tions is the kind of func­tional com­po­si­tion, broadly de­fined, that I use the most, but you cer­tainly can use it over other things as well. You’d like to write some­thing like the be­low:

    BOOST_CHECK_EQUAL(pipe("bobby", strlen, boost::lexical_cast<string, int>), "5");

And you can, once you de­fine the fol­low­ing:

    template< class A, class F1, class F2 >
    inline auto pipe(const A & a, const F1 & f1, const F2 & f2)
        -> decltype(REMOVE_REF_BUG(f2(f1(a)))) {
        return f2(f1(a));
    }

Where REMOVE_REF_BUG is a workaround for a bug in the de­cltype im­ple­men­ta­tion un­der msvc (it should be fixed in VS 11):

    #ifdef _MSC_VER
        #define REMOVE_REF_BUG(...) remove_reference<decltype(__VA_ARGS__)>::type()
    #else
        #define REMOVE_REF_BUG(...) __VA_ARGS__
    #endif

I did­n’t do the work of cre­at­ing >>’ , <<’ and <|’ F# op­er­a­tors or wrap­ping the whole lot with bet­ter syn­tax (i.e. op­er­a­tor over­load­ing?), but it cer­tainly can be done.

Now that we have a good syn­tax for records and a good syn­tax for op­er­at­ing over col­lec­tions, the next in­stal­ment will put them to­gether and try to an­swer the ques­tion: how should I cre­ate col­lec­tions of records and op­er­ate over them?

Tags

4 Comments

Comments

Richard Polton

2012-05-25T07:48:18Z

Hi, since I’ve known enough to un­der­stand the dif­fer­ence, my pref­er­ence has al­ways been for func­tional-style pro­gram­ming and I par­tic­u­larly en­joyed the STL when I dis­cov­ered it way back when I was ac­tively writ­ing new C++ code. It has been, there­fore, very in­ter­est­ing and re­fresh­ing to read your ar­ti­cle and see a mod­ern take on those things I was think­ing about and on oc­ca­sion do­ing, so many thanks for that. I have con­structed the same sort of frame­work in C# re­cently, orig­i­nally in .NET2.0 and lat­terly us­ing .NET3.5. For ex­am­ple I have de­fined a >’ op­er­a­tor for cur­ry­ing func­tion ar­gu­ments, which I like and, for the same rea­sons as it ex­ists in F#, think it adds to the read­abil­ity of the code. It’s in­ter­est­ing to con­trast my C# with your C++ - we’re do­ing sim­i­lar things but the lan­guage leads us in dif­fer­ent di­rec­tions.

Thanks for the kind words. LINQ brings func­tional pro­gram­ming to C#, but not cur­ry­ing. Frankly, I don’t use curry all that much my­self, prob­a­bly I’ve not yet fully em­braced the Haskell style of extreme func­tion com­po­si­tion’. Well, one thing at the time …

Richard Polton

2012-05-28T12:16:56Z

Hi, your re­ply prompted a fur­ther thought: if you don’t use cur­ry­ing all that much’ do you use any heuris­tics to de­ter­mine how to or­der your func­tion pa­ra­me­ters? I have found that, through cur­ry­ing con­sid­er­a­tions, I tend to con­struct my func­tions with their ar­gu­ments in re­verse or­der to many of my col­leagues be­cause I al­most al­ways place the most fre­quently-chang­ing pa­ra­me­ter at the end. It’s dif­fi­cult to de­duce the an­swer from your ar­ti­cle be­cause the func­tions all seem to be func­tions of one pa­ra­me­ter :-)

Yours is a tech­ni­cally sound strat­egy. I tend to put the most ob­vi­ous pa­ra­me­ters up­front, the ones that what­ever user of the func­tion would nat­u­rally ex­pect. I then put the oth­ers, de­faulted to some­thing sen­si­ble if it ex­ists. Maybe that’s why I don’t use cur­ry­ing all that much, be­cause it does­n’t nat­u­rally put the most im­por­tant things first, which strongly ap­peals to my aes­thetic sense.