Skip to content

Latest commit

 

History

History
123 lines (95 loc) · 4.18 KB

22.markdown

File metadata and controls

123 lines (95 loc) · 4.18 KB

22. --dump-corefn

Today's post is a guest post by @rightfold, who recently added support to export PureScript's intermediate representation to a file. This post will show how we can use that representation to create entirely new compiler backends, while reusing much of the work done by the compiler front-end!


The internal functional representation, called CoreFn, is a data structure used by the PureScript compiler as a simpler form than the input language. In particular, this simpler form lacks type classes and syntactic sugar (e.g. do), and names are fully qualified. The CoreFn data structure is used to generate JavaScript code. In PureScript 0.10 a new feature was added to dump this data structure to JSON files, making it easier to write tools that analyze this representation or generate code from it.

PureScript is quite a simple language, and while designed with JavaScript code generation in mind, it is independent enough to allow translation to other target languages. Such efforts already exist, namely for C++ and Erlang. Those projects fork the entire compiler however, mainly because they were written before the CoreFn dump feature was available.

To produce the dump, simply invoke the compiler with the --dump-corefn flag:

$ pulp build -- --dump-corefn

This will produce a file named corefn.json in each module output directory, i.e. output/*/corefn.json.

Let's look at a simple example:

module Data.Pair where

type Pair a b = {fst :: a, snd :: b}

fst :: forall a b. Pair a b -> a
fst = _.fst

snd :: forall a b. Pair a b -> b
snd = _.snd

swap :: forall a b. Pair a b -> Pair b a
swap p = {fst: snd p, snd: fst p}

class FromPair a e | a -> e where
    fromPair :: Pair e e -> a

instance fromPairArray :: FromPair (Array e) e where
    fromPair p = [fst p, snd p]

Compile it as follows:

$ psc Pair.purs --dump-corefn
$ cat output/Data.Pair/corefn.json | json_pp | gist
https://gist.github.com/rightfold/ed38cacbac307322e2225d13ae68b4b9

Because it's quite a mouthful of JSON, I put it in a pastebin.

The dump somewhat resembles the original program: there is the module name ("Data.Pair"), a list of imports (["Prim"]), a list of exports (["fst", "snd", "swap"]) and a list of definitions. In the dump, expressions follow a syntax similar to S-expressions, but in JSON. For example, an expression like \x -> f (g x) would be encoded as:

["Abs", "x", ["App", ["Var", "f"], ["App", ["Var", "g"], ["Var", "x"]]]]

Where "Abs" means "lambda abstraction", and "App" means "function application".

We can now write a program to translate this data structure into JavaScript or another language. Let's choose Lua, because Lua already supports records and tail-call optimization. The conversion program, lua.js, will output the following Lua program:

-- Data.Pair
Data_Pair_FromPair = function(fromPair)
   return {
      fromPair = fromPair,
   }
end
Data_Pair_snd = function(v)
   return (v).snd
end
Data_Pair_fst = function(v)
   return (v).fst
end
Data_Pair_swap = function(p)
   return {
      snd = (Data_Pair_fst)(p),
      fst = (Data_Pair_snd)(p),
   }
end
Data_Pair_fromPairArray = (Data_Pair_FromPair)(function(p)
   return {
      (Data_Pair_fst)(p),
      (Data_Pair_snd)(p),
   }
end)
Data_Pair_fromPair = function(dict)
   return (dict).fromPair
end

Which resembles the original PureScript code quite well. :) Of course, this is a very minimal code generator, and doesn't support all the CoreFn syntax (in particular literals and case expressions), but it shows that the implementation is relatively simple.

It should be noted that at the time of writing the CoreFn dump does not include type or line number information. This will probably be added in the future, and will allow for more efficient code generation, and easier implementation of code generation for typed targets like the CLR and the JVM.