Write Yourself a Scheme in 48 Hours in F# – Part VI

-

The eval­u­a­tor takes as an in­put a LispVal. Where does it come from? There must be some­thing that con­verts your tex­tual in­put into it. That is the job of the parser.

I have used FParsec to build my parser. FParsec is a fan­tas­tic li­brary to build parsers. It is a per­fect show­case of the com­po­si­tion po­ten­tial that func­tional code yields. 

When you write an FParsec parser you com­pose many lit­tle parsers to cre­ate the one parser that works for your lan­guage.  The re­sult­ing code looks very much like your lan­guage gram­mar, but you don’t need  a sep­a­rate code gen­er­a­tion com­pi­la­tion step to pro­duce it.

There is one el­e­ment of ug­li­ness in the syn­tax to cre­ate re­cur­sive parsers. You need to de­fine two global vari­ables that can be re­ferred to be­fore they are con­structed. This is an arte­fact of how F# works. So you need a line in your code that looks like this:

let parseExpr, parseExprRef : LispParser * LispParser ref = createParserForwardedToRef()

With that piece of ma­chin­ery out of the way, we can fo­cus on the parser it­self. Our goal here is to parse ex­pres­sions and gen­er­ate LispVal. We need a LispParser like the be­low (the sec­ond generic pa­ra­me­ter is for ad­vanced us­age).

type LispParser = Parser<LispVal, unit>

We need to parse all the kind of ex­pres­sions that the user can type. Notice in the be­low the use of a com­pu­ta­tion ex­pres­sion to sim­plify the syn­tax. Also note that lists and dot­ted lists look very much the same un­til you en­counter the .’ char­ac­ter. You could dis­am­biguate the sit­u­a­tion by ex­tract­ing out the com­mon­al­ity in a sep­a­rate kind of ex­pres­sion. I de­cided in­stead to in­struct the parser to back­track if it gets it wrong (at­tempt). This is slower, but keeps the code iden­ti­cal to our con­cep­tual model. I value that greatly.

do parseExprRef := parseAtom
                   <|> parseString
                   <|> parseNumber
                   <|> parseQuoted
                   <|> parse {
                           do! chr '('
                           let! x = (attempt parseList) <|> parseDottedList
                           do! chr ')'
                           return x
                       }

Let’s start from the top. Parsing an atom means pars­ing some­thing that starts with a let­ter or sym­bol and con­tin­ues with let­ters, sym­bols or dig­its. Also #t” and #f” can be re­solved at pars­ing time.

let parseAtom : LispParser = parse {
        let! first = letter <|> symbol
        let! rest = manyChars (letter <|> symbol <|> digit)
        return match first.ToString() + rest with
               | "#t" -> Bool true
               | "#f" -> Bool false
               | atom -> Atom atom
}

A string is just a bunch of chars (except ') sur­rounded by .

let parseString : LispParser = parse {
    do! chr '"'
    let! xs = manyChars (noneOf "\"")
    do! chr '"'
    return String(xs)
}

A num­ber is just one or more dig­its. I am afraid we just sup­port in­te­gers at this stage …

let parseNumber : LispParser = many1Chars digit |>> (System.Int32.Parse >> Number)

A quoted ex­pres­sion is jut a ' fol­lowed by an ex­pres­sion.

let parseQuoted : LispParser = chr '\'' >>. parseExpr |>> fun expr -> List [Atom "quote"; expr] 

A list is just a bunch of ex­pres­sions sep­a­rate by at least one space.

let parseList : LispParser = sepBy parseExpr spaces1 |>> List

A dot­ted list starts in the same way (hence the back­track­ing above), but then has a dot, one or more spaces and an ex­pres­sion.

let parseDottedList : LispParser = parse {
    let! head = endBy parseExpr spaces1
    let! tail = chr '.' >>. spaces1 >>. parseExpr
    return DottedList (head, tail)
}

And here are a bunch of func­tions used through­out the code, pre­sented here for com­plete­ness.

let spaces1 : LispParser<unit> = skipMany1 whitespace
    let chr c = skipChar c
    let endBy  p sep = many  (p .>> sep)
    let symbol : LispParser<char> = anyOf "!$%&|*+-/:<=>?@^_~#"

This is all the code you need to trans­late text to a LispVal to feed the eval­u­a­tor. That is pretty im­pres­sive.

There is also a func­tion to go the other way, from a LispVal to text. It is used in im­ple­ment­ing the test­cases and to print out di­ag­nos­tics.

let rec showVal = function
        | String contents -> "\"" + contents + "\""
        | Atom name -> name
        | Number num -> num.ToString()
        | Bool t -> if t then "#t" else "#f"
        | List l -> "(" + unwordsList l + ")"
        | DottedList (head, tail) -> "(" + unwordsList head + " . " + showVal tail + ")"
        | PrimitiveFunc(_) -> "<primitive>"
        | Port (_) -> "<IO port>"
        | Func({ parms = parms; varargs = varargs; body = body; closure = closure }) ->
                                                "(lambda (" + unwordsList (parms |> List.map (String)) +
                                                    (match varargs with
                                                        | None -> ""
                                                        | Some(arg) -> " . " + arg) + ") ...)"
    and
        unwordsList = List.map showVal >> String.concat " "

Tags