Adventure in parserland – parsing lambda expressions in F# – Part I

-

This is part of my things that I do in the empty spaces be­tween one meet­ing and the next one, which might end up be­ing vaguely in­ter­est­ing’. It is a lambda ex­pres­sion parser.

The full source code is here.

I ac­tu­ally have two ver­sions of it: one writ­ten long­hand and the other one writ­ten with FParsec. Just to be clear: I’m no ex­pert of ei­ther.

And just to be more clear: I think writ­ing most parsers long­hand in the way I am about to show is crazy. You ei­ther use FParsec or  fslex / fsy­acc.

I have a strong dis­taste for ad­di­tional com­pi­la­tion steps. I think it lingers on from MFC pro­ject types of 15/20 years ago. I was one of these crazy folks that would gen­er­ate the pro­ject, wrap the gen­er­ated code (with some gen­er­al­iza­tions) in my own li­brary and use that one from then on.

So I pre­fer FParsec. I’m ok rewrit­ing left re­cur­sive rules and its per­for­mance has never been a prob­lem for me. Here is a table that com­pares the dif­fer­ent ap­proaches.

But I started won­der­ing about cod­ing a a re­cur­sive de­scent parser for a sim­ple gram­mar by hand, fully know­ing the fool­ish­ness of the idea. Thanks to Jose for code re­view­ing it.

The in­spi­ra­tion for the gram­mar comes from this book.

(*
        <expression> ::= <name> | <function> | <application>
        <name> ::= non­blank character sequence
        <function> ::= \ <name> . <body>
        <body> ::= <expression>
        <application> ::= ( <function expression> <argument expression> )
        <function expression> ::= <expression>
        <argument expression> ::= <expression>
    *)

In English, an ex­pres­sion is ei­ther a name, a func­tion or an ap­pli­ca­tion. A name is a bunch of char­ac­ters (better de­fined in the code). A func­tion is ', a name, .’ and an ex­pres­sion. An ap­pli­ca­tion is (‘, an ex­pres­sion, white­spaces, an ex­pres­sion and )’.

Some test­cases for the above gram­mar and the parsers writ­ten to parse it are be­low. It should be in­tu­itive what this code does just by the name of the func­tions. Even it is­n’t, check that the ex­pres­sions sym­bol con­tains valid pro­duc­tions from the gram­mar above.

module Test
open Microsoft.FSharp.Collections
open Xunit
open LambdaEngine
open Parser
open Lexer
open FParser
let writeTokenStream stream = Seq.fold (fun acc token -> acc + writeToken token) "" stream
let rec writeExpr = function
        | EName(s) -> s
        | Function(expr, body) -> writeToken Lambda + writeExpr expr + writeToken Dot + writeExpr body
        | Application(funExpr, argExpr) -> writeToken OpenParens + writeExpr funExpr + writeToken (Ws(" "))
                                            + writeExpr argExpr + writeToken CloseParens
        | EOT -> ""
let tokenStreams = [
    ""
    "(\xs.xs \y.(y \x.y))"
    "(\xst.xst \y.(y  \x.y))"
    " "
    "x"
    "(x y)"
    ]
let expressions = [
    ""
    "(\x.x \y.(y \x.y))"
    "x"
    "(x y)"
    ]
let stringToCharList s =
    let textReader = new System.IO.StringReader(s)
    textReaderToLazyList textReader
[<Fact>]
let testTokenizer () =
    let testTokenStream s =
        let stream = tokenStream <| stringToCharList s
        let s1 = writeTokenStream stream
        Assert.Equal(s, s1)
    tokenStreams |> List.iter testTokenStream
let testExpr parseFunction s =
    let exprs = parseFunction s
    let s1 = exprs |> Seq.fold (fun s expr -> s + writeExpr expr) ""
    Assert.Equal(s, s1)
[<Fact>]
let testParser () = expressions |> List.iter (testExpr parseString)
[<Fact>]
let testFParser () = expressions |> List.iter (testExpr fparseString)

In the next in­stal­ment, we’ll start look­ing at the real code for the parser.

Tags