LLite : language friendly literate programming

-

Main ideas

My in­ter­est in lit­er­ate pro­gram­ming comes from some re­al­iza­tions on my part:

Unhappiness with existing tools

Many of the ex­ist­ing lit­er­ate pro­gram­ming tools work sim­i­larly to the orig­i­nal CWeb.

This scheme has the un­for­tu­nate lim­i­ta­tion of break­ing your code ed­i­tor. Given that your file is not a valid code file any­more, the ed­i­tor starts mis­be­hav­ing (i.e. in­tel­lisense breaks). The de­bug­ger starts to get con­fused (albeit peo­ple tried to re­me­di­ate that with cleaver use of #line. If your lan­guage has an in­ter­ac­tive con­sole, that would not work ei­ther.

A different interpretation

The main idea of this pro­gram is to add your nar­ra­tive to the com­ment part of a code file by ex­tend­ing the com­ment tag (i.e. in C you could use /** ). This keeps ed­i­tor, de­bug­ger and in­ter­ac­tive con­sole work­ing.

The weave phase as been re­tained and what you are read­ing is the pro­gram that goes over your code file and ex­tracts a nicely for­mat­ted (for this pro­gram in markdown’ for­mat) file that can then be trans­lated to HTML, PDF, la­tex, etc…

You got that? The doc­u­ment you are read­ing now is the pro­gram.

Multi-language, multi-document format

LLite works for any pro­gram­ming lan­guage, as­sum­ing it has open and close com­ment char­ac­ter se­quences, and any doc­u­men­ta­tion for­mat, as­sum­ing it has open and close code char­ac­ter se­quences (aka al­lows you to de­lim­i­tate your code some­how), or it needs the code to be in­dented. This doc­u­ment uses mark­down (with Pandoc extensions to gen­er­ate table of con­tents and ti­tles).

Usage

You in­voke the pro­gram as doc­u­mented be­low. The first set of pa­ra­me­ters lets you choose the sym­bols that de­lim­i­tate your lan­guage com­ments (or the de­fault sym­bols be­low). The sec­ond set of pa­ra­me­ters lets you choose how your tar­get doc­u­men­ta­tion lan­guage treats code. Either it de­lim­its it with some sym­bols or it in­dents it.

module LLite

let langParamsTable     = [ "fsharp", ("(*" + "*", "*" + "*)") // The + is not to confuse the parser
                            "c", ("/**", "**/")
                            "csharp", ("/**", "**/")
                            "java", ("/**", "**/")] |> Map.ofList

let languages = langParamsTable |> Map.fold (fun state lang _ -> state + lang + " ") ""

let usage   = sprintf @"
Usage: llite inputFile parameters
where:
One of the following two sets of parameters is mandatory
    -no string : string opening a narrative comment
    -nc string : string closing a narrative comment
or
    -l language: where language is one of (%s)

One of the following two sets of parameters is mandatory
    -co string : string opening a code block
    -cc string : string closing a code block
or
    -indent N  : indent the code by N whitespaces

The following parameters are optional:
    -o outFile : defaults to the input file name with mkd extension" languages


let getLangNoNC lang    =
    match Map.tryFind lang langParamsTable with
    | Some(no, nc) -> no, nc
    | None -> failwith (lang + " is not a valid programming language")

Programming Languages limitations

One of the main tenets of lit­er­ate pro­gram­ming is that the code should be writ­ten in the or­der that fa­cil­i­tates ex­po­si­tion to a hu­man reader, not in the or­der that makes the com­piler happy. This is very im­por­tant.

If you have writ­ten a blog post or tried to ex­plain a code­base to a new joiner, you must have no­ticed that you don’t start from the top of the file and go down, but jump here and there try­ing to bet­ter ex­plain the main con­cepts. Literate pro­gram­ming says that you should write your code the same way. But in our ver­sion of it, the com­piler needs to be kept happy be­cause the lit­er­ate file is the code file.

Some in­ge­nu­ity is re­quired to achieve such goal:

The F# trick be­low is used in the rest of the pro­gram. You’ll un­der­stand its us­age nat­u­rally by just read­ing the code

let declare<'a>  = ref Unchecked.defaultof<'a>

Implementation

At the core, this pro­gram is a sim­ple trans­la­tor that takes some code text and re­turn a valid mark­down/​what­ever text. We need to know:

type CodeSymbols =
    | Indent of int                 // indentation level in whitespaces
    | Surrounded of string * string // start code * end code

type Options = {
    startNarrative  : string
    endNarrative    : string
    codeSymbols     : CodeSymbols
}

let translate   = declare<Options -> string -> string>

Going over the parse tree

We need a func­tion that takes a string and re­turns a list with the var­i­ous blocks. We can then go over each block, per­form some op­er­a­tions and, in the end, trans­form it back to text

type Block =
| Code      of string
| Narrative of string

let blockize = declare<Options -> string -> Block list>

I could have used reg­u­lar ex­pres­sions to parse the pro­gram, but it seemed ugly. I could also have used FsParsec, but that brings with it an ad­di­tional dll. So I de­cided to roll my own parser. This has sev­eral prob­lems:

The lat­ter in par­tic­u­lar is trou­ble­some. You’ll need to use a trick in the code (i.e. con­cate­nat­ing strings) to foul this pro­gram in not see­ing an open­ing com­ment, but it is in­con­ve­nient.

With all of that, it works.

TODO: con­sider switch­ing to FsParsec

###Lexer

The lexer is go­ing to process list of char­ac­ters. We need func­tions to check if a list of char­ac­ters starts with cer­tain chars and to re­turn the re­main­ing list af­ter hav­ing re­moved such chars.

BTW: these func­tions are poly­mor­phic and tail re­cur­sive

let rec startWith startItems listToCheck =
    match startItems, listToCheck with
    | [], _             -> true
    | _ , []            -> false
    | h1::t1, h2::t2  when h1 = h2  -> startWith t1 t2
    | _, _              -> false

let rec remove itemsToRemove listToModify =
    match itemsToRemove, listToModify with
    | [], l             -> l
    | _ , []            -> failwith "Remove not defined on an empty list"
    | h1::t1, h2::t2  when h1 = h2  -> remove t1 t2
    | _, _              -> failwith "itemsToRemove are not in the list"

let isOpening options       = startWith (List.ofSeq options.startNarrative) 
let isClosing options       = startWith (List.ofSeq options.endNarrative)
let remainingOpen options   = remove (List.ofSeq options.startNarrative)
let remainingClose options  = remove (List.ofSeq options.endNarrative)

This is a pretty ba­sic to­k­enizer. It just an­a­lyzes the start of the text and re­turns what it finds. It also keeps track of the line num­ber for the sake of re­port­ing it in the er­ror mes­sage.

let NL = System.Environment.NewLine

type Token =
| OpenComment   of int
| CloseComment  of int
| Text          of string

let tokenize options source =

    let startWithNL = startWith (Seq.toList NL)

    let rec text line acc = function
        | t when isOpening options t    -> line, acc, t 
        | t when isClosing options t    -> line, acc, t
        | c :: t as full                ->
            let line' = if startWithNL full then line + 1 else line
            text line' (acc + c.ToString()) t
        | []                            -> line, acc, [] 
    let rec tokenize' line acc = function
        | []                            -> List.rev acc
        | t when isOpening options t    -> tokenize' line
                                            (OpenComment(line)::acc)  (remainingOpen options t)
        | t when isClosing options t    -> tokenize' line
                                            (CloseComment(line)::acc) (remainingClose options t)
        | t                             ->
            let line, s, t'= text line "" t
            tokenize' line (Text(s) :: acc) t'

    tokenize' 1 [] (List.ofSeq source)

###Parser

The parse tree is just a list of Chunks, where a chunk can be a piece of nar­ra­tive or a piece of code.

type Chunk =
| NarrativeChunk    of Token list
| CodeChunk         of Token list

let parse options source =

    let rec parseNarrative acc = function
        | OpenComment(l)::t         ->
            failwith ("Don't open narrative comments inside narrative comments at line "
                                                                                    + l.ToString())
        | CloseComment(_)::t        -> acc, t
        | Text(s)::t                -> parseNarrative (Text(s)::acc) t
        | []                        -> failwith "You haven't closed your last narrative comment"

    let rec parseCode acc = function
        | OpenComment(_)::t as t'   -> acc, t'
        | CloseComment(l)::t        -> parseCode (CloseComment(l)::acc) t
        | Text(s)::t                -> parseCode (Text(s)::acc) t
        | []                        -> acc, []
    let rec parse' acc = function
        | OpenComment(_)::t         ->
            let narrative, t' = parseNarrative [] t
            parse' (NarrativeChunk(narrative)::acc) t' 
        | Text(s)::t                ->
            let code, t' = parseCode [Text(s)] t
            parse' (CodeChunk(code)::acc) t'
        | CloseComment(l)::t           ->
            failwith ("Don't insert a close narrative comment at the start of your program at line "
                                                                                    + l.ToString())
        | []                -> List.rev acc

    parse' [] (List.ofSeq source)

###Flattener

The flat­ten­ing part of the al­go­rithm is a bit un­usual. At this point we have a parse tree that con­tains to­kens, but we want to re­duce it to two sim­ple node types con­tain­ing all the text in string form.

TODO: con­sider man­ag­ing nested com­ments and com­ments in strings (the lat­ter has to hap­pen in ear­lier phases)

let flatten options chunks =
    let tokenToStringNarrative = function
    | OpenComment(l) | CloseComment(l)  -> failwith ("Narrative comments cannot be nested at line "
                                                                                    + l.ToString())
    | Text(s)                           -> s

    let tokenToStringCode = function
    | OpenComment(l)                -> failwith ("Open narrative comment cannot be in code at line"
                                                                + l.ToString()) +
                                                 ". Perhaps you have an open comment in" +
                                                 " a code string before this comment tag?"
    | CloseComment(_)               -> string(options.endNarrative |> Seq.toArray)
    | Text(s)                       -> s

    let flattenChunk = function
    | NarrativeChunk(tokens)             ->
        Narrative(tokens |> List.fold (fun state token -> state + tokenToStringNarrative token) "")
    | CodeChunk(tokens)                  ->
        Code(tokens |> List.fold (fun state token -> state + tokenToStringCode token) "")

    chunks |> List.fold (fun state chunk -> flattenChunk chunk :: state) [] |> List.rev

We are get­ting there, now we have a list of blocks we can op­er­ate upon

blockize := fun options source -> source |> tokenize options |> parse options |> flatten options
 

Narrative comments phases

Each phase is a func­tion that takes the op­tions and a block list and re­turns a block list that has been processed’ in some way.

type Phase = Options -> Block List -> Block List

let removeEmptyBlocks   = declare<Phase>
let mergeBlocks         = declare<Phase>
let addCodeTags         = declare<Phase>

let processPhases options blockList = 
    blockList
    |> !removeEmptyBlocks   options
    |> !mergeBlocks         options
    |> !addCodeTags         options

We want to man­age how many new­lines there are be­tween dif­fer­ent blocks, be­cause we don’t trust the pro­gram­mer to have a good view of how many new­line to keep from com­ment blocks and code blocks. We’ll trim all new­lines from the start and end of a block, and then add our own.

let newLines = [|'\n';'\r'|]

type System.String with
    member s.TrimNl () = s.Trim(newLines) 

###Remove empty blocks

There might be empty blocks (i.e. be­tween two con­sec­u­tive com­ment blocks) in the file. For the sake of for­mat­ting the file beau­ti­fully, we want to re­move them.

let extract = function
    | Code(text)        -> text
    | Narrative(text)   -> text

removeEmptyBlocks := fun options blocks ->
                        blocks |> List.filter (fun b -> (extract b).TrimNl().Trim() <> "")

###Merge blocks

Consecutive blocks of the same kind need to be merged, for the sake of for­mat­ting the over­all text cor­rectly.

TODO: make tail re­cur­sive

let rec mergeBlockList = function
    | []        -> []
    | [a]       -> [a]
    | h1::h2::t -> match h1, h2 with
                   | Code(t1), Code(t2)             -> mergeBlockList (Code(t1 + NL + t2)::t)
                   | Narrative(n1), Narrative(n2)   -> mergeBlockList(Narrative(n1 + NL + n2)::t)
                   | _, _                           -> h1::mergeBlockList(h2::t)

mergeBlocks := fun options blocks -> mergeBlockList blocks

###Adding code tags

Each code block needs a tag at the start and one at the end or it needs to be in­dented by N chars.

let indent n (s:string) =
    let pad = String.replicate n " "
    pad + s.Replace(NL, NL + pad)

addCodeTags := fun options blocks ->
    match options.codeSymbols with
    | Indent(n)         ->
        blocks |> List.map (function Narrative(s) as nar -> nar | Code(s) -> Code(indent n s))
    | Surrounded(s, e)  -> 
        blocks |> List.map (function
                            | Narrative(text)   -> Narrative(NL + text.TrimNl() + NL)
                            | Code(text)        -> Code(NL + s + NL + text.TrimNl() + NL + e + NL))

###Flatten again

Once we have the ar­ray of blocks, we need to flat­ten them (transform them in a sin­gle string), which is triv­ial, and then fi­nally im­ple­ment our orig­i­nal trans­late func­tion.

let sumBlock s b2 = s + extract b2

let flattenB blocks = (blocks |> List.fold sumBlock "").TrimStart(newLines)

translate := fun options text -> text |> !blockize options |> processPhases options |> flattenB

Parsing command line arguments

Parsing com­mand lines in­volves writ­ing a func­tion that goes from a se­quence of strings to an in­put file name, out­put file name and Options record

let parseCommandLine = declare<string array -> string * string * Options>

To im­ple­ment it, we are go­ing to use a com­mand line parser taken from here. The parseArgs func­tion takes a se­quence of ar­gu­ment val­ues and map them into a (name,value) tu­ple. It scans the tu­ple se­quence and put com­mand name into all sub­se­quent tu­ples with­out name and dis­card the ini­tial (“”,“”) tu­ple. It then groups tu­ples by name and con­verts the tu­ple se­quence into a map of (name,value seq)

For now, I don’t need the value seq’ part as all my pa­ra­me­ters take a sin­gle ar­gu­ment, but I left it in there in case I will need to pass mul­ti­ple args later on.

open  System.Text.RegularExpressions

let (|Command|_|) (s:string) =
  let r = new Regex(@"^(?:-{1,2}|\/)(?<command>\w+)[=:]*(?<value>.*)$",RegexOptions.IgnoreCase)
  let m = r.Match(s)
  if m.Success
  then 
    Some(m.Groups.["command"].Value.ToLower(), m.Groups.["value"].Value)
  else
    None


let parseArgs (args:string seq) =
  args 
  |> Seq.map (fun i -> 
                    match i with
                    | Command (n,v) -> (n,v) // command
                    | _ -> ("",i)            // data
                  )
  |> Seq.scan (fun (sn,_) (n,v) -> if n.Length>0 then (n,v) else (sn,v)) ("","")
  |> Seq.skip 1
  |> Seq.groupBy (fun (n,_) -> n)
  |> Seq.map (fun (n,s) -> (n, s |> Seq.map (fun (_,v) -> v) |> Seq.filter (fun i -> i.Length>0)))
  |> Map.ofSeq

let paramRetrieve (m:Map<string,_>) (p:string) = 
  if Map.containsKey p m
  then Some(m.[p])
  else None

This is the main logic of pa­ra­me­ter pass­ing. Note that we give prece­dence to the -l and -indent pa­ra­me­ters, if pre­sent.

This is a func­tion that goes from the map of com­mand line pa­ra­me­ters to the in­put file name, out­put file name and op­tions. With that we can fi­nally de­fine the orig­i­nal par­seC­om­man­d­Line.

let safeHead errMsg s = if s |> Seq.isEmpty then failwith errMsg else s |> Seq.head 

let paramsToInputs paramsMap =
    let single p er     = match paramRetrieve paramsMap p with | Some(k) -> Some(k |> safeHead er)
                                                               | None -> None
    let get p s         = match paramRetrieve paramsMap p with |Some(k) -> k |> safeHead s
                                                               | None -> failwith s
    let defaultP p q er = match paramRetrieve paramsMap p with | Some(k) -> k |> safeHead er
                                                               | None -> q

    let inputFile       = get "" "You need to pass an input file"
    let outputFile      = defaultP  "o"
                                    (System.IO.Path.GetFileNameWithoutExtension(inputFile:string) + ".mkd")
                                    "You must pass a parameter to -o"

    let no, nc          = match single "l" "You must pass a language parameter to -l" with
                          | Some(l) -> getLangNoNC l
                          | None    ->
                                get "no" "The no (narrative open) parameter is mandatory, if no -l specified",
                                get "nc" "The nc (narrative close) parameter is mandatory, if no -l specified"

    let codeSymbs       = match single "indent" "You must pass a whitespace indentation number to -indent" with
                          | Some(n)     ->
                                let success, value = System.Int32.TryParse n
                                if success
                                    then Indent(value)
                                    else failwith "-i accepts just an integer value as parameter"                          
                          | None        ->
                                match single "l" "You must pass a language parameter to -l" with
                                | Some(l) -> Surrounded("~~~" + l,"~~~")
                                | None    ->
                                    Surrounded(
                                        get "co" "The co (code open) parameter is mandatory, if no -indent specified",
                                        get "cc" "The cc (code close) parameter is mandatory")
    inputFile, outputFile, {
        startNarrative  = no
        endNarrative    = nc
        codeSymbols     = codeSymbs
        }

parseCommandLine := parseArgs >> paramsToInputs

Main method

We can then write main as the only side ef­fect func­tion in the pro­gram. Here is where the IO monad would live …

let banner  = "LLite : language friendly literate programming\n"

let printMemory () =
    let bytesInMeg = 1048576.
    let peak = System.Diagnostics.Process.GetCurrentProcess().PeakWorkingSet64
    let work = System.Diagnostics.Process.GetCurrentProcess().WorkingSet64
    printfn "Peak working set: %A" ((float) peak / bytesInMeg)
    printfn "Working set: %A" ((float) peak / bytesInMeg)

let myMain args =
        try

            let inputFile, outputFile, options = !parseCommandLine args
            let input       = System.IO.File.ReadAllText inputFile
            let output      = !translate options input
            System.IO.File.WriteAllText (outputFile, output)

            //printMemory ()
            0
        with
        | e ->
            printfn "%s" "Failure"
            printfn "%s" e.Message 
            printfn "%s" usage
    #if DEBUG 
            printfn "\nDetailed Error Below:\n%A" e
    #endif
            -1

An aside: forward declaring functions in F#

A simple solution

You can achieve some­thing some­how sim­i­lar to for­ward de­c­la­ra­tion by the declare dirty trick used in this pro­gram. When­ever you want to do a for­ward de­c­la­ra­tion of a func­tion , or vari­able, you can type:

let testDeclare() =

    let add = declare<float -> float>

    let ``function where I want to use add without having defined it`` nums = nums |> Seq.map !add

This cre­ates a ref to a func­tion from float to float. It looks a bit like an Haskell type de­c­la­ra­tion. You can then use such func­tion as if it were ac­tu­ally de­fine and de­lay his de­f­i­n­i­tion to a later point in time when you are ready to ex­plain it.

When you are ready to talk about it, you can then de­fine it with:

    add := fun x -> x + 1.

The syn­tax is not too bad. You get that of­ten-sought Haskell like ex­plicit type de­c­la­ra­tion and you can regex the code­base to cre­ate an in­dex at the end of the pro­gram (maybe).

But is it too slow? After all, there is one more in­di­rec­tion call for each func­tion call.

Let’s test it: en­able #time in F# in­ter­ac­tive and ex­e­cute timeNor­malF and timeIndi­rectF vary­ing sleep­Time and how­ManyIter un­til you con­vince your­self that it is ok (or not).

    let sleepTime   = 50
    let howManyIter = 100
    let normalF x   = System.Threading.Thread.Sleep sleepTime
    let indirectF   = declare<int -> unit>
    indirectF      := fun x -> System.Threading.Thread.Sleep sleepTime
     
    let timeNormalF     = [1..howManyIter] |> List.iter normalF
    let timeIndirectF   = [1..howManyIter] |> List.iter !indirectF
    ()

A correct solution (but ugly)

Unfortunately, there is a big prob­lem with all of the above: it does­n’t work with generic func­tions and cur­ried func­tion in­vo­ca­tions. The code be­low works in all cases, but it is ugly for the user to use. In this pro­gram I’ve used the beau­ti­ful, but in­cor­rect, syn­tax.

type Literate() =
    static member Declare<'a, 'b>  (ref : obj ref) (x : 'a) : 'b =
        unbox <| (unbox<obj -> obj> !ref) x
    static member Define<'a, 'b> (func : 'a -> 'b) (ref : obj ref) (f : 'a -> 'b) =
        ref := box (unbox<'a> >> f >> box)

// Declaration    
let rec id (x : 'a) : 'a = Literate.Declare idImpl x
and idImpl = ref null

// Usage
let f () = id 100 + id 200

// Definition
Literate.Define id idImpl (fun x -> x)

The traditional way

The tra­di­tional way of do­ing some­thing like this is by pass­ing the func­tion as a pa­ra­me­ter in a man­ner sim­i­lar to the be­low.

// Declaration and usage intermingled
let calculate' (add: int -> int -> int) x y = add x y * add x y

// Definition
let add x y = x + y

let calculate = calculate' add

To my way of see­ing, this is the most cum­ber­some so­lu­tion and con­cep­tu­ally dis­hon­est be­cause you are not pa­ram­e­triz­ing your func­tion for tech­ni­cal rea­sons, but just for the sake of ex­plain­ing things. In a way, you are chang­ing the sig­na­ture of your func­tions for the sake of writ­ing a book. That can’t be right …

[<EntryPoint>]
let main args = myMain args

Tags