Adventure in parserland – parsing lambda expressions in F# – Part III

-

Let’s start from the lexer. Remember, I wrote this code based on my mem­ory of how a lexer ought to look like. I did­n’t read again the rel­e­vant chap­ters in the Dragon book. But I think it came out all right af­ter all.

The to­ken­Stream func­tion we looked at last time takes a LazyList and re­turns a LazyList. It uses the un­fold method on LazyList to call match­To­ken on each char un­til the stream is empty.

let rec tokenStream chars =
    LazyList.unfold
        (fun chList ->
            match chList with
            | LazyList.Nil -> None
            | chList ->
                let token, chList' = matchToken chList
                Some(token, chList')
        )
        chars 

A to­ken is what gets passed up to the parser to do syn­tac­tic analy­sis on. It is the vo­cab­u­lary of our lan­guage. The lexer di­vide a phrase in words, the parser put to­gether the words in a phrase. So, these are the words.

type Token =
    | Name of string
    | Dot
    | OpenParens
    | CloseParens
    | Lambda
    | Def
    | Ws of string
    | NewLine
    | EOF

Matching is a process whereas you try to re­turn the to­ken that you have read plus the list of char­ac­ters yet to be read. Matching a Token is de­fined be­low:

let matchToken = function
    | LazyList.Nil                 -> EOF, LazyList.empty
    | LazyList.Cons(h, t) as chars ->
        match h with
        | ch when isWs ch -> matchWs chars
        | ch when isSpecialChar ch -> matchSpecialChar ch t
        | _ -> matchString chars

A to­ken is ei­ther noth­ing, a white­space, a spe­cial char or any­thing else.

Let’s look at what match­ing each one of them means.  Matching white­spaces means con­sum­ing them and re­mem­ber­ing what was con­sumed.

let matchWs chars =
    let value, remainingChars = matchSeriesOfChars isWs chars
    Ws value, remainingChars

match­SeriesOfChars takes a pred­i­cate and a LazyList of chars and re­turns the string com­posed of all the con­sec­u­tive chars for which the pred­i­cate is true, plus, as al­ways, the re­main­ing chars to be matched. In this case the pred­i­cate re­turns true if the char is a white­space.

To write match­SeriesOfChars I need a func­tion that re­verses a LazyList. Not hav­ing found such thing, I wrote it.

let reversell l =
    let rec go l' a = match l', a with
                        | LazyList.Nil, a -> a
                        | LazyList.Cons(h, t), a -> go t (LazyList.cons h a)
    go l LazyList.empty

Then I wrote match­SeriesOfChars. The func­tion uses an ac­cu­mu­la­tor. It adds to the front when­ever the pred­i­cate is true, it re­verses it and trans­lates it to a string (I could have re­versed the string in­stead, it might have been bet­ter).

let matchSeriesOfChars comparer chars =
    let rec go result = function
        | LazyList.Nil    -> charListToString(reversell result), LazyList.empty
        | LazyList.Cons(h, t) -> if comparer h then go (LazyList.cons h result) t
                                 else charListToString (reversell result), LazyList.cons h t
    go LazyList.empty chars

These are  pred­i­cates we’ll use later on to rec­og­nize char­ac­ters:

let isInString (ch: char) (s: string) = s.IndexOf(ch) <> -1
let isWs (chr: char) = isInString chr wsChars
let isNameChar (chr: char) = not (isInString chr (wsChars + specialChars))
let isSpecialChar ch = isInString ch specialChars

wsChar and spe­cialChars are de­fined be­low:

let wsChars = " \t"
    let charTokens =
        Map.ofList [
            '.' , Dot
            '(' , OpenParens
            ')' , CloseParens
            '\\', Lambda
            '\n', NewLine
         ]
let specialChars = charTokens |> Map.fold (fun s k v -> s + k.ToString()) ""

Getting back to the more im­por­tant match­ing func­tions, match­ing a spe­cial char­ac­ter is de­fined as a sim­ple lookup in the char­To­ken map:

let matchSpecialChar ch chars = Map.find ch charTokens, chars

We are left with match­String, this sim­ply matches the char­ac­ters un­til it finds a char that can­not be part of a name. It then looks it up in a list of spe­cial strings. If it finds it, it re­turns it, oth­er­wise it just re­turns the name.

let stringTokens =
    Map.ofList [
        "Def", Def
    ]
let matchString chars =
    let value, remainingChars = matchSeriesOfChars isNameChar chars
    let specialString = Map.tryFind value stringTokens
    if specialString.IsSome
        then specialString.Value, remainingChars
        else Name(value), remainingChars

And we are done with the lexer, all of 100+ lines of it …

Tags