Writing functional code in C++ – Records

-

This is the first of a se­ries of posts about writ­ing func­tional code in C++.  My goal is dif­fer­ent from FC++, which is a full fledged environment’ to write func­tional code. Instead, I want to ex­per­i­ment with some of the new C++ 11 lan­guage fea­tures and see if one can build rea­son­ably look­ing func­tional code and stay pretty close to the lan­guage. The idea is to ju­di­ciously use macros and ex­ter­nal li­braries to build a thin layer on top of the lan­guage that does­n’t change the per­for­mance char­ac­ter­is­tics of it (aka it does­n’t slow it down) and in­te­grates fine with ex­ist­ing C++ code.

Think of it as an at­tempt to an­swer the ques­tion: is there a way to write C++ code in a func­tional style with­out loos­ing its C++sness’? We won’t at­tempt to be as type safe or syn­tac­ti­cally pleas­ing as Haskell or F# , but we’ll at­tempt to stay as fast and flex­i­ble as C++ and make it func­tional enough to be in­ter­est­ing.

Thanks to Steve Bower and Ganesh Sittampalam for re­view­ing and to Andy Sawyer for giv­ing me so much feed­back that this post can be con­sid­ered a co-au­thor­ship. Code for this post is here.

Let’s first talk about the data types that you typ­i­cally find in a func­tional lan­guage. Let’s start with Records. They are not part of the func­tional model per se. You can do with­out them and just use al­ge­braic data types and tu­ples. But they are damn con­ve­nient, and  most func­tional lan­guages (i.e. Haskell, Clojure, etc…) have them now. We start from them be­cause they map nat­u­rally to C++. Records are just like structs, but im­mutable and hav­ing struc­tural equal­ity.

In F# your vanilla record looks like this:

type Person = {
    Name: string
    Id: int
}

Nice and sim­ple. With C++, you be­come more ver­bose. A first at­tempt would be:

    struct Person {
        const string Name;
        const int Salary;
    };

Which looks nice and easy, but does­n’t quite work be­cause more of­ten than not you need to be able to com­pare two records for struc­tural equal­ity (which means the value of their fields, not the pointer in mem­ory, de­fines equal­ity). Records in func­tional lan­guages au­to­mat­i­cally sup­port that. But the syn­tax gets on the ugly side:

    struct Person {
        const string Name;
        const int Salary;
        bool operator==(const Person& other) const { return Salary == other.Salary && Name == other.Name;}
        bool operator!=(const Person& other) const { return !(*this == other);}
    };

We’ll see how to sim­plify the syn­tax later. The syn­tax on the cre­ation side is not too bad:

Person p = {Bobby”, 2};

Let’s call the above rep­re­sen­ta­tion, the ob­vi­ous one. Let’s con­sider two vari­a­tions on this scheme. The first one is use­ful if you want to make your records in­ter­op­er­a­ble with C or with other C++ com­pil­ers.

A full dis­cus­sion of how to achieve these goals would be very long. It will go about dis­cussing what POD types are and how their de­f­i­n­i­tion got more pre­cise in C++ 11.

You can look at my ex­per­i­men­ta­tions on pod, stan­dard lay­out and triv­ially con­structible types here. My sum­mary is pretty sim­ple, if you want to achieve all the goals above, you got to use C structs that con­tain C-compatible types. No const, strings or STL li­braries al­lowed.

The above class would then be­come:

    struct Person {
        char Name[20];
        int Salary;
        bool operator==(const Person& other) const { return Salary == other.Salary && !strcmp(Name,other.Name);}
        bool operator!=(const Person& other) const { return !(*this == other);}
    };

Obviously, not be­ing able to use strings or STL col­lec­tions in your record is a big lim­i­ta­tion, but in cer­tain cases you might be able to live with it.  Let’s call this rep­re­sen­ta­tion, the pod one.

You would think you can make the syn­tax bet­ter by do­ing some­thing like the be­low:

    struct _Person {
        int id;
        char name[20];
        pod_equality(_Person);
    };    

Where pod_e­qual­ity is de­fined as be­low:

        #define pod_equality(Record)                                                                 \
               bool operator==(const Record& other) const {                                          \
                      static_assert(std::is_trivial<Record>::value, "Not trivially copyable");       \
                      return memcmp(this, &other, sizeof(Record)) == 0;}                             \
               bool operator!=(const Record& other) const { return !(*this == other);}

But you would be wrong (as I was for a while), as com­par­ing mem­ory val­ues does­n’t work in this case be­cause of the padding that the com­piler can in­sert be­tween fields. Such padding can con­tain ran­dom value (whatever was on the stack, for ex­am­ple) which would make the equal­ity re­turn false for struc­turally equal ob­jects. Also this scheme fails for float­ing point types (i.e. NaN and signed ze­ros).

An al­ter­na­tive rep­re­sen­ta­tion for records, which nicely sep­a­rates con­st­ness from the struc­ture of your record is be­low. It does have some some ad­van­tages that we’ll look at, but in its raw form it is yet more syn­tax:

namespace Mutable {
    struct Person {
        string Name;
        int Salary;
        bool operator==(const Person& other) const { return Salary == other.Salary && Name == other.Name;}
        bool operator!=(const Person& other) const { return !(*this == other);}
    };
}
typedef const Mutable::Person Person;

Let’s call this rep­re­sen­ta­tion, the im­mutable one. Let’s give these guys some us­age and talk about their trade-offs. If you want to write a func­tion that in­crease some­one salary, you would write it like this:

template<class T>
T rise_salary1(const T& p) {
    T ret = { p.Name, p.Salary + 1000 };
    return ret;
}

This looks nice and clean, un­less your record has a lot of fields. Let me tell you, in real ap­pli­ca­tion it prob­a­bly does. For ex­am­ple:

namespace Mutable {
    struct foo {
        int value1;
        int value2;
        int value3;
        int value4;
        int value5;
        int value7;
        int value6;
        int value8;
        int value9;
    };
}
typedef const Mutable::foo foo;
foo increment_value7( const foo& f )
{
     foo tmp = { f.value1, f.value2, f.value3, f.value4, f.value5, f.value6, f.value7+1, f.value8 };
     return tmp;
}

Thanks to Andy for this ex­am­ple. BTW: did you spot the bug at first sight? What about the other one?

So this syn­tax is prob­lem­atic. True to be told, part of the prob­lem is in the sub-op­ti­mal ini­tial­iza­tion syn­tax in C++. If you could use named pa­ra­me­ters, it would be more dif­fi­cult to in­tro­duce bugs, but the syn­tax would still be ugly. You re­ally need some­thing like the F# syn­tax:

let r1 = {f = 0.2; k = 3}
let r2 = {r1 with f = 0.1}

Can we do some­thing like that? Well, if we are will­ing to pass the Mutable one around, we can write this:

template<class T>
T rise_salary2(const T& p) {
    T ret(p);
    ret.Salary += 1000;
    return ret;
}

Or even this:

template<class T>
T rise_salary3(T p) {
    p.Salary += 1000;
    return p;
}

But that does­n’t make us happy, does it? The whole point of mak­ing things const is that you want them to be const. If you pass around mu­ta­ble ones, who knows what’s go­ing to hap­pen in­side the method?

There is a mid­dle ground that might be ac­cept­able, which is to write func­tions so that their in­ter­face takes im­mutable records, but in­side the func­tion they op­er­ate on mu­ta­ble ones. This is not a bad pat­tern in gen­eral, as hav­ing mu­ta­ble ver­sions of your im­mutable records might come use­ful for op­ti­miz­ing cer­tain al­go­rithms. Luckily the cast­ing rules of C++ favour the bold, so the be­low works:

Person rise_salary_m(const Person& p) {
    Mutable::Person ret(p);
    ret.Salary += 1000;
    return ret;
}

And does­n’t look too bad ei­ther.

Now let’s talk syn­tax. Defining a record is still a lot of typ­ing (and a lot of read­ing if you are main­tain­ing the code). F# does it like this:

type Person = {
    Name: string
    Id: int
}

The best I came up with looks like this:

RECORD2(Person,
          string, Name,
          int,    Salary);

And you need a lot of those macros de­pend­ing on how many fields your record has. You can write this macro to ex­pand to ei­ther the Obvious or Immutable rep­re­sen­ta­tion triv­ially. It is a bit more com­plex for the Pod one be­cause of the in­ter­est­ing C++ ar­ray de­c­la­ra­tion syn­tax with the num­ber of el­e­ments af­ter the name of the field.

For the Obvious one it looks like this:

#define RECORD2(n, t1, f1, t2, f2)                                                            \
    struct n {                                                                                \
        const t1 f1;                                                                          \
        const t2 f2;                                                                          \
                                                                                              \
        bool operator==(const n& other) const { return f1 == other.f1 && f2 == other.f2;}     \
        bool operator!=(const n& other) const { return !(*this == other);}                    \
    };

All the usual con­cerns about macros ap­ply. Moreover all your fields need to have a mean­ing­ful == op­er­a­tor.

To sum­ma­rize, we have found three dif­fer­ent rep­re­sen­ta­tions of records in C++ and tried to al­le­vi­ate the syn­tax bur­den with some macro mag­ick. We’ll go wild in macro-land when we talk about dis­crim­i­nated unions.

Tags