Writing functional code in C++ II – Function composition

-

Function com­po­si­tion is at the core of func­tional pro­gram­ming. You start by be­ing very con­ﬁ­dent that cer­tain very small func­tions are cor­rect, you com­pose them in well known ways and you end up be­ing very con­ﬁ­dent that your ﬁ­nal pro­gram is cor­rect.

You are very con­ﬁ­dent that the ini­tial func­tions are cor­rect be­cause they are very small and side ef­fect free. You are very con­ﬁ­dent that your pro­gram is cor­rect be­cause the means of com­po­si­tion are well known and gen­er­ate func­tions that are them­selves side ef­fect free.

Such almost prim­i­tive’ func­tions op­er­ate on data struc­tures. Each func­tional lan­guage has its own set of data struc­tures and almost prim­i­tive’ func­tions. Probably the most fa­mous ones are con­tained in the haskell pre­lude, but F# has sim­i­lar ones. LINQ in C# is just such a set.

In this ar­ti­cle I use the term func­tional com­po­si­tion in a broad sense as putting func­tions to­geth­er’, not in the more tech­ni­cal sense of f(g(x)): dot op­er­a­tor in Haskell or >>’ op­er­a­tor in F#.

Code for this post is here. Thanks to Steve Bower and Andy Sawyer for re­view and com­ments.

Here is an ex­am­ple. The F# code be­low ﬁl­ters in the odd num­bers from an ar­ray and dou­bles them. You have two log­i­cal op­er­a­tions, ﬁl­ter­ing and dou­bling.

```let v = [|1;2;3;4;5;6;7;8;9;0|]
let transformed =
v
|> Seq.filter(fun i -> i % 2 = 0)
|> Seq.map ((*) 2)
```

How would the code look in C++? There are many ways. One is the big fat for loop’, or in C++ 11, for each:

```    const int tmp[] = {1,2,3,4,5,6,7,8,9,0};
const vector<int> v(&tmp[0], &tmp[10]);
vector<int> filtered;
for(auto i: v)
{
if(i % 2 == 0)
filtered.push_back(i * 2);
}```

This looks sim­ple enough, apart from the ini­tial­iza­tion syn­tax that gets bet­ter with C++ 11. But it’s sim­plic­ity is mis­lead­ing. We have now lost the in­for­ma­tion that our com­pu­ta­tion is com­posed by a ﬁl­ter and a map. It’s all in­ter­min­gled. Obviously, the info is still there in some sense, but you need to read each line in your head and ﬁg­ure out the struc­ture on your own. It is not read­ily ap­par­ent to the reader.

Obviously, things get much worse in a large pro­gram com­posed of sev­eral loops where each one does not triv­ial stuff. As for me, every time I in­dulge into the big fat for loop pat­tern’, I end up re­gret­ting it ei­ther im­me­di­ately or af­ter some time when I have to make changes to the code. So I try to avoid it.

Wait a minute, you might say, what about STL al­go­rithms? Aren’t you sup­posed to use those?

The syn­tax to call them is not very com­po­si­tional. You can­not take the re­sult of an al­go­rithm and pass it to an­other eas­ily. It is dif­ﬁ­cult to chain them that way. Everything takes two it­er­a­tors, even if 99% of the times you are just us­ing be­gin and end. Moreover, you have to cre­ate all sort of tem­po­rary data struc­tures for the sake of not mod­i­fy­ing your orig­i­nal one. Once you have done all of that, the syn­tac­tic weight of your code over­whelm the sim­plic­ity of what you are try­ing to do.

For ex­am­ple, here is the same code us­ing transform’ for the map’ part of the al­go­rithm:

```    vector<int> filtered;
copy_if(begin(v), end(v), back_inserter(filtered), [](int x) { return x % 2 == 0;});
vector<int> transformed;
transform(begin(filtered), end(filtered), back_inserter(transformed), [](int x) { return x * 2;});
```

I’m not ar­gu­ing that STL is a bad li­brary, I think it is a great one. It’s just that its syn­tax is not very com­po­si­tional. And that breaks the magic.

Luckily, help is on the way in the form of the Range li­brary and OvenToBoost. These li­braries pack’ the two it­er­a­tors in a sin­gle struc­ture and over­ride the |’ op­er­a­tor to come up with some­thing as be­low:

```    auto result =
v
| filtered(   [] (int i) { return i % 2 == 0;})
| transformed([] (int i) { return i * 2;});```

If you use Boost lamb­das (not sure I’d do that), the syn­tax gets even bet­ter:

```    auto result =
v
| filtered(   _1 % 2 == 0)
| transformed(_1 * 2);```

Ok, so we re­gained com­po­si­tion­al­ity and clar­ity, but what price are we pay­ing? Apart from the de­pen­dency from Boost::Range and OvenToBoost, you might think that this code would be slower than a nor­mal for loop. And you would be right. But maybe not by as much as you would think.

The code for my test is here. I’m test­ing the perf dif­fer­ence be­tween the fol­low­ing code en­gi­neered to put in a bad light the most pleas­ing syn­tax. A for loop:

```        int sum = 0;
for(vector<int>::const_iterator it = v.begin(); it != v.end(); ++it)
if(*it < 50)
sum += *it * 2;
return sum;```

A Range based code us­ing lan­guage lamb­das (transformedF is a vari­a­tion of this):

```        auto lessThan50 = v | filtered(    [](int i) { return i < 50;})
| transformedF([] (int i) { return i * 2;});
return boost::accumulate (lessThan50, 0);```

A Range based code us­ing two func­tors:

```        auto lessThan50 = v | filtered(Filterer())
| transformed(Doubler());
return boost::accumulate (lessThan50, 0);
```

Where:

```struct Doubler {
typedef int result_type;
int operator() (int i) const { return i * 2;}
};
struct Filterer {
typedef int result_type;
bool operator() (int i) const { return i < 50;}
};
```

a Range based code us­ing Boost lamb­das:

```        auto lessThan50 = v | filtered(_1 < 50)
| transformed(_1 * 2);
return boost::accumulate (lessThan50, 0);```

And ﬁ­nally, some code us­ing the STL for_each func­tion:

```        int sum = 0;
std::for_each( v.cbegin(), v.cend(),
[&](int i){
if( i < 50 ) sum += i*2;
});
return sum;
```

Notice that I need to run the test an hu­mon­gous num­ber of times (10000000) on a 100 in­te­gers ar­ray to start see­ing some con­sis­tency in re­sults.
And the dif­fer­ences that I see (in nanosecs be­low) are not huge:

```1>                         Language lambda: {1775645516;1778411400;0}
1>                          Functor lambda: {1433377701;1435209200;0}
1>                            Boost lambda: {1327829199;1326008500;0}
1>                                For loop: {1338268336;1341608600;0}
1>                             STL foreach: {1338268336;1341608600;0}
```

True to be told, by chang­ing the code a bit these num­bers vary and, in some con­ﬁg­u­ra­tions, the Range code takes twice as long.

But given how many rep­e­ti­tions of the test I need to run to no­tice the dif­fer­ence, I’d say that in most sce­nar­ios you should be ok.

The |>’ op­er­a­tor over col­lec­tions is the kind of func­tional com­po­si­tion, broadly de­ﬁned, that I use the most, but you cer­tainly can use it over other things as well. You’d like to write some­thing like the be­low:

`    BOOST_CHECK_EQUAL(pipe("bobby", strlen, boost::lexical_cast<string, int>), "5");`

And you can, once you de­ﬁne the fol­low­ing:

```    template< class A, class F1, class F2 >
inline auto pipe(const A & a, const F1 & f1, const F2 & f2)
-> decltype(REMOVE_REF_BUG(f2(f1(a)))) {
return f2(f1(a));
}```

Where REMOVE_REF_BUG is a workaround for a bug in the de­cltype im­ple­men­ta­tion un­der msvc (it should be ﬁxed in VS 11):

```    #ifdef _MSC_VER
#define REMOVE_REF_BUG(...) remove_reference<decltype(__VA_ARGS__)>::type()
#else
#define REMOVE_REF_BUG(...) __VA_ARGS__
#endif```

I did­n’t do the work of cre­at­ing >>’ , <<’ and <|’ F# op­er­a­tors or wrap­ping the whole lot with bet­ter syn­tax (i.e. op­er­a­tor over­load­ing?), but it cer­tainly can be done.

Now that we have a good syn­tax for records and a good syn­tax for op­er­at­ing over col­lec­tions, the next in­stal­ment will put them to­gether and try to an­swer the ques­tion: how should I cre­ate col­lec­tions of records and op­er­ate over them?

Tags

• C++
• FUNCTIONAL

Richard Polton

2012-05-25T07:48:18Z

Hi, since I’ve known enough to un­der­stand the dif­fer­ence, my pref­er­ence has al­ways been for func­tional-style pro­gram­ming and I par­tic­u­larly en­joyed the STL when I dis­cov­ered it way back when I was ac­tively writ­ing new C++ code. It has been, there­fore, very in­ter­est­ing and re­fresh­ing to read your ar­ti­cle and see a mod­ern take on those things I was think­ing about and on oc­ca­sion do­ing, so many thanks for that. I have con­structed the same sort of frame­work in C# re­cently, orig­i­nally in .NET2.0 and lat­terly us­ing .NET3.5. For ex­am­ple I have de­ﬁned a >’ op­er­a­tor for cur­ry­ing func­tion ar­gu­ments, which I like and, for the same rea­sons as it ex­ists in F#, think it adds to the read­abil­ity of the code. It’s in­ter­est­ing to con­trast my C# with your C++ - we’re do­ing sim­i­lar things but the lan­guage leads us in dif­fer­ent di­rec­tions.

lucabol

2012-05-25T21:18:55Z

Thanks for the kind words. LINQ brings func­tional pro­gram­ming to C#, but not cur­ry­ing. Frankly, I don’t use curry all that much my­self, prob­a­bly I’ve not yet fully em­braced the Haskell style of extreme func­tion com­po­si­tion’. Well, one thing at the time …

Richard Polton

2012-05-28T12:16:56Z

Hi, your re­ply prompted a fur­ther thought: if you don’t use cur­ry­ing all that much’ do you use any heuris­tics to de­ter­mine how to or­der your func­tion pa­ra­me­ters? I have found that, through cur­ry­ing con­sid­er­a­tions, I tend to con­struct my func­tions with their ar­gu­ments in re­verse or­der to many of my col­leagues be­cause I al­most al­ways place the most fre­quently-chang­ing pa­ra­me­ter at the end. It’s dif­ﬁ­cult to de­duce the an­swer from your ar­ti­cle be­cause the func­tions all seem to be func­tions of one pa­ra­me­ter :-)

lucabol

2012-05-28T19:57:49Z

Yours is a tech­ni­cally sound strat­egy. I tend to put the most ob­vi­ous pa­ra­me­ters up­front, the ones that what­ever user of the func­tion would nat­u­rally ex­pect. I then put the oth­ers, de­faulted to some­thing sen­si­ble if it ex­ists. Maybe that’s why I don’t use cur­ry­ing all that much, be­cause it does­n’t nat­u­rally put the most im­por­tant things ﬁrst, which strongly ap­peals to my aes­thetic sense.