Skip to content

Formatters

Dávid Németi edited this page Sep 5, 2013 · 10 revisions

So, we have a grammar that drives the general unparser which can unparse our structure to text. But we have not talked about the whitespaces, newlines, indentation, i.e. the format of our text representation.

The behaviour of the default formatter is that it inserts a space (Formatter.WhiteSpaceBetweenUtokens, actually) between every utokens but it can be barely called a readable format. In order to provide a readable format for your unparsed text you need to write a formatter (that means implementing a class that derives from Sarcasm.Unparsing.Formatter).

You can make your formatter a default formatter for your grammar by overriding the GetUnparseControl method in your grammar and returning your formatter:

protected override UnparseControl GetUnparseControl()
{
    return UnparseControl.Create(
        this,
        new MyFormatter(this),
        new ParenthesizedExpression(Expression, LEFT_PAREN, RIGHT_PAREN)
        );
}

Or you can just set the formatter for the given unparser instance:

unparser.Formatter = new MyFormatter(myGrammar);

Note, that the formatter needs the grammar as constructor parameter.

You can write more formatters for your grammar, but only one can be the default. If you want to use the other formatters, you have to set it on the unparser. However, you can implement more formatters inside one single formatter, and this is a better solution for that matter.

Let's suppose you want a formatter that implements e.g. the Allman indent-style, and another that implements Kernighan&Ritchie. Of course, our formatters have to do the other formatting besides the indent-style, so there are a lot of common stuff in those two formatter class which could be made common by using a common base class for them. However, let's suppose that you want other formatters too: e.g. one that flattens if hierarchies, and another that does not. Now, if you want to combine your formatters, you would have to write 2*2=4 formatters, which obviously not a good idea. Instead, you only write one single formatter, and inside of it you can define two public properties through which you can set the indent-style and the 'if' flattening behaviour, respectively.

Implementing the formatter

Until now it was enough to define your bnfterm variables inside your grammar constructors, but if you use a formatter in most cases you need to access those bnfterms. In order to do that you should define you bnfterms as public readonly fields/properties in your grammar. In order to separate your bnfterms from the other public member of the grammar the common practice is that you create a separate class called BnfTerms inside your grammar class, your bnfterms should be members of this BnfTerms class, and your grammar should contain a public readonly member called B with type BnfTerms:

public class GrammarC : Sarcasm.GrammarAst.Grammar<D.Program>
{
    public class BnfTerms
    {
        public readonly BnfiTermRecord<D.Program> Program = new BnfiTermRecord<D.Program>();
        public readonly BnfiTermRecord<D.Function> Function = new BnfiTermRecord<D.Function>();

        public readonly BnfiTermKeyTerm PROGRAM;
        public readonly BnfiTermKeyTerm NAMESPACE;
        public readonly BnfiTermKeyTerm BEGIN;
        public readonly BnfiTermKeyTerm END;

        internal BnfTerms(TerminalFactoryS TerminalFactoryS)
        {
            this.PROGRAM = TerminalFactoryS.CreateKeyTerm("program");
            this.NAMESPACE = TerminalFactoryS.CreateKeyTerm("namespace");
            this.BEGIN = TerminalFactoryS.CreateKeyTerm("{");
            this.END = TerminalFactoryS.CreateKeyTerm("}");
        }
    }

    public readonly BnfTerms B;

    public GrammarC()
        : base(new Domain())
    {
        B = new BnfTerms(new TerminalFactoryS(this));

        this.Root = B.Program;

        B.Program.Rule =
            B.PROGRAM
            + B.Name.BindTo(B.Program, t => t.Name)
            + (B.NAMESPACE + B.NamespaceName).QRef().BindTo(B.Program, t => t.Namespace)
            + B.Function.StarList().BindTo(B.Program, t => t.Functions)
            + B.BEGIN
            + B.Statement.PlusList().BindTo(B.Program, t => t.Body)
            + B.END
            ;

        B.Function.Rule =
            B.Type.QVal().BindTo(B.Function, t => t.ReturnType)
            + B.Name.BindTo(B.Function, t => t.Name)
            + B.LEFT_PAREN
            + B.Parameter.StarList(B.COMMA).BindTo(B.Function, t => t.Parameters)
            + B.RIGHT_PAREN
            + B.BEGIN
            + B.Statement.PlusList().BindTo(B.Function, t => t.Body)
            + B.END
            ;
    }

    public class Formatter : Sarcasm.Unparsing.Formatter
    {
        private readonly BnfTerms B;

        public Formatter(GrammarC grammar)
            : base(grammar)
        {
            this.B = grammar.B;
        }
        
        // inside your formatter you can access your bnfterms easily like this: B.Program, B.PROGRAM, B.BEGIN etc.
    }
}

There are three methods you can override in order to change the behaviour of the default formatter. They deal with UnparsableAst parameters which is actually an (astValue,bnfterm) pair for a given astValue which has been placed in the parse tree built by the unparser.

Inserting utokens

The first two methods are responsible for inserting (whitespace) utokens. They returns InsertedUtokens, which is basically a list of UtokenInsert utokens (utokens that can be inserted by the formatter) with priority and behaviour (see later).

These are the 'UtokenInsert' utokens you can use:

  • UtokenInsert.NewLine()
  • UtokenInsert.EmptyLine()
  • UtokenInsert.Space()
  • UtokenInsert.Tab()
  • UtokenInsert.NoWhitespace()

The first four are evident, only NoWhitespace might need some explanation: it means that we do not want to insert anything but want to prevent the default formatter from inserting a space automatically between the specific utokens. The first four will be converted to strings only if needed, e.g. if you call AsText() on the utokens. In this case Formatter.NewLine, Formatter.Space and Formatter.Tab strings will be used.

GetUtokensAround

By using this you can control what utokens should be inserted to the left and to the right of the given UnparsableAst.

protected override void GetUtokensAround(UnparsableAst target,
    out InsertedUtokens leftInsertedUtokens, out InsertedUtokens rightInsertedUtokens)
{
    base.GetUtokensAround(target, out leftInsertedUtokens, out rightInsertedUtokens);

    // no whitespace around "."
    if (target.BnfTerm == B.DOT)
        leftInsertedUtokens = rightInsertedUtokens = UtokenInsert.NoWhitespace();

    // no whitespace at the left of ")"
    else if (target.BnfTerm == B.RIGHT_PAREN)
        leftInsertedUtokens = UtokenInsert.NoWhitespace();

    // no whitespace at the right of "("
    else if (target.BnfTerm == B.LEFT_PAREN)
        rightInsertedUtokens = UtokenInsert.NoWhitespace();
}

GetUtokensBetween

By using this you can control what utokens should be inserted between the given UnparsableAst parameters. leftTerminalLeaveTarget is a the left UnparsableAst which actually has a Terminal inside its BnfTerm property, and rightTarget is the right UnparsableAst which could be a leave or a non-leave node in the parse tree.

protected override InsertedUtokens GetUtokensBetween(UnparsableAst leftTerminalLeaveTarget, UnparsableAst rightTarget)
{
    // no whitespace between a nameref and "(" (typically: function call)
    if (leftTerminalLeaveTarget.AstImage != null &&
        leftTerminalLeaveTarget.AstImage.AstValue is DC.NameRef && rightTarget.BnfTerm == B.LEFT_PAREN)
    {
        return UtokenInsert.NoWhitespace();
    }

    // no whitespace between a "Write" keyterm and "("
    else if (leftTerminalLeaveTarget.BnfTerm == B.WRITE && rightTarget.BnfTerm == B.LEFT_PAREN)
        return UtokenInsert.NoWhitespace();

    else
        return base.GetUtokensBetween(leftTerminalLeaveTarget, rightTarget);
}

Priorities and behaviour

These methods are being called for each node in the parse tree that is built by the unparser. It means that more inserted utokens can be side by side.

E.g. if we unparse an expression Add(NumberLiteral(5), NumberLiteral(8)) the following parse tree will be built by the unparser:

Expression_1
    BinaryExpression
        Expression_2
            NumberLiteral_1 "5"
        BinaryOperator
            ADD_OP
        Expression_3
            NumberLiteral_2 "8"

The formatter methods are being called for each (bnfterm, astValue) pairs, i.e.:

(Expression_1,      Add(NumberLiteral(5), NumberLiteral(8)))
(BinaryExpression,  Add(NumberLiteral(5), NumberLiteral(8)))
(Expression_2,      NumberLiteral(5))
(NumberLiteral_1,   NumberLiteral(5))
...

This is the point where we reach the actual NumberLiteral(5) leave in the parse tree, and any method calls can return InsertedUtokens. So the utoken list would look like this if each methods would return something:

Left    Expression_1
Left    BinaryExpression
Left    Expression_2
Left    NumberLiteral_1
Content NumberLiteral_1 "5"
Right   NumberLiteral_1
Right   Expression_2
Between NumberLiteral_1_BinaryOperator
Left    BinaryOperator
Between NumberLiteral_1_ADD_OP
Left    ADD_OP
Content ADD_OP
Right   ADD_OP
Right   BinaryOperator
Between ADD_OP_Expression_3
Left    Expression_3
Between ADD_OP_NumberLiteral_2
Left    NumberLiteral_2
Content NumberLiteral_2 "8"
Right   NumberLiteral_2
Right   Expression_3
...

So the inserted utokens that are side by side are in the following format: (Right)*((Between)?(Left)?)*

Most of the time it is not good to keep all the inserted items, so there is a filtering logic inside Sarcasm's formatter that filters out inserted utokens. InsertedUtokens has a Priority (default is 0) and a Behavior (default is Overridable) to change the default logic. The rules are the followings:

  • an InsertedUtokens with higher priority is stronger than one with lower priority
  • outer InsertedUtokens are stronger than inner ones, if they have the same priority
  • Between is stronger then Left which is stronger than Right, if they have the same priority
  • an InsertedUtokens which is stronger than another will filter out the other one, if the other one's behavior is Overridable
  • if an InsertedUtokens has a NonOverridableSkipThrough behavior then it will be preserved in the sequence, and it will not interfere with the comparisons between other InsertedUtokenss
  • if an InsertedUtokens has a NonOverridableSeparator behavior then it will be preserved in the sequence, and it will filter out the InsertedUtokenss at its left if they are weaker, or will have the strongest one at its left

Both in the case of a single UtokenInsert or in the case of InsertedUtokens (which consist of more UtokenInserts) the SetPriority and SetBehavior methods can be used:

return UtokenInsert.Space().SetPriority(10);
return UtokenInsert.Space().SetBehavior(Behavior.NonOverridableSkipThrough);
return UtokenInsert.Space().SetPriority(10).SetBehavior(Behavior.NonOverridableSkipThrough);

Control indentation

The third method is responsible for the indentation. It returns a BlockIndentation which can be any of the followings:

  • BlockIndentation.Indent
  • BlockIndentation.Unindent
  • BlockIndentation.ZeroIndent

GetBlockIndentation

By using it you can control the indentation of the current UnparsableAst and its descendants.

Note that the effect of a new Indent and Unindent is relative to the effect we already have as "current indentation", i.e. BlockIndentation.Indent and BlockIndentation.Unindent sums up. (So a new BlockIndentation.Indent on an UnparsableAst, assuming that one if its ancestors already did a BlockIndentation.Indent, means that the current UnparsableAst will be indented by two units.)

On the other hand, BlockIndentation.ZeroIndent is absolute: the indentation will be zero regardless of the ancestors' indentation.

protected override BlockIndentation GetBlockIndentation(UnparsableAst leftTerminalLeaveIfAny, UnparsableAst target)
{
    // indent statement
    if (target.BnfTerm == B.Statement)
        return BlockIndentation.Indent;

    else
        return base.GetBlockIndentation(leftTerminalLeaveIfAny, target);
}

Example

Suppose that we want to implement the Allman, Whitesmiths, Stroustrup and Kernighan&Ritchie indent-style:

// Allman
if (i == 0)
{
    return 0;
}
else if (i == 1)
{
    return 1;
}
else
{
    return 2;
}

// Whitesmiths
if (i == 0)
    {
    return 0;
    }
else if (i == 1)
    {
    return 1;
    }
else
    {
    return 2;
    }

// Stroustrup
if (i == 0) {
    return 0;
}
else if (i == 1) {
    return 1;
}
else {
    return 2;
}

// Kernighan&Ritchie
if (i == 0) {
    return 0;
} else if (i == 1) {
    return 1;
} else {
    return 2;
}

First we have to define an enum for it:

public enum IndentStyle { Allman, Whitesmiths, Stroustrup, KernighanAndRitchie }

Then a public property inside our formatter:

public IndentStyle IndentStyle { get; set; }

Then implement the formatter methods:

protected override void GetUtokensAround(UnparsableAst target,
    out InsertedUtokens leftInsertedUtokens, out InsertedUtokens rightInsertedUtokens)
{
    base.GetUtokensAround(target, out leftInsertedUtokens, out rightInsertedUtokens);

    if (target.BnfTerm == B.Statement)
        leftInsertedUtokens = rightInsertedUtokens = UtokenInsert.NewLine();

    else if (target.BnfTerm == B.BEGIN)
    {
        rightInsertedUtokens = UtokenInsert.NewLine();

        if (IndentStyle.EqualToAny(GrammarC.IndentStyle.Allman, GrammarC.IndentStyle.Whitesmiths))
            leftInsertedUtokens = UtokenInsert.NewLine();
        else if (IndentStyle.EqualToAny(GrammarC.IndentStyle.Stroustrup, GrammarC.IndentStyle.KernighanAndRitchie))
            leftInsertedUtokens = UtokenInsert.Space();
    }

    else if (target.BnfTerm == B.END)
        leftInsertedUtokens = rightInsertedUtokens = UtokenInsert.NewLine();
}

protected override InsertedUtokens GetUtokensBetween(UnparsableAst leftTerminalLeaveTarget, UnparsableAst rightTarget)
{
    if (rightTarget.BnfTerm == B.ELSE && leftTerminalLeaveTarget.BnfTerm == B.END &&
        IndentStyle == GrammarC.IndentStyle.KernighanAndRitchie)
    {
        return UtokenInsert.Space().SetPriority(10);
    }

    else
        return base.GetUtokensBetween(leftTerminalLeaveTarget, rightTarget);
}

protected override BlockIndentation GetBlockIndentation(UnparsableAst leftTerminalLeaveIfAny, UnparsableAst target)
{
    // in MiniPL a statement list is a statement itself with statements inside, so we have to avoid double indentation here
    if (target.BnfTerm == B.Statement && !(target.AstValue is D.StatementList))
        return BlockIndentation.Indent;

    else if (IndentStyle == GrammarC.IndentStyle.Whitesmiths && target.BnfTerm.EqualToAny(B.BEGIN, B.END))
        return BlockIndentation.Indent;

    else
        return base.GetBlockIndentation(leftTerminalLeaveIfAny, target);
}

Example 2

Suppose that we want to flatten if hierarchies as it is common in editors:

// without flattening (because the second 'if-else' is actually inside the 'else' body of the first 'if-else'):
if (i == 0)
    return 0;
else
    if (i == 1)
        return 1;
    else
        return 2;

// with flattening:
if (i == 0)
    return 0;
else if (i == 1)
    return 1;
else
    return 2;

First we have to define a public property inside our formatter:

public bool FlattenIfHierarchy { get; set; }

Then implement the formatter methods:

protected override void GetUtokensAround(UnparsableAst target,
    out InsertedUtokens leftInsertedUtokens, out InsertedUtokens rightInsertedUtokens)
{
    base.GetUtokensAround(target, out leftInsertedUtokens, out rightInsertedUtokens);

    if (target.BnfTerm == B.Statement)
        leftInsertedUtokens = rightInsertedUtokens = UtokenInsert.NewLine();

    else if (target.BnfTerm == B.BEGIN)
        rightInsertedUtokens = UtokenInsert.NewLine();

    else if (target.BnfTerm == B.END)
        leftInsertedUtokens = rightInsertedUtokens = UtokenInsert.NewLine();
}

protected override InsertedUtokens GetUtokensBetween(UnparsableAst leftTerminalLeaveTarget, UnparsableAst rightTarget)
{
    if (FlattenIfHierarchy && leftTerminalLeaveTarget.BnfTerm == B.ELSE && rightTarget.BnfTerm == B.If)
        return UtokenInsert.Space().SetPriority(10);

    else
        return base.GetUtokensBetween(leftTerminalLeaveTarget, rightTarget);
}

protected override BlockIndentation GetBlockIndentation(UnparsableAst leftTerminalLeaveIfAny, UnparsableAst target)
{
    if (target.BnfTerm == B.Statement && !(target.AstValue is D.StatementList))
        return BlockIndentation.Indent;

    else if (target.BnfTerm.EqualToAny(B.BEGIN, B.END))
        return BlockIndentation.Indent;

    else if (FlattenIfHierarchy && leftTerminalLeaveIfAny != null &&
        leftTerminalLeaveIfAny.BnfTerm == B.ELSE && target.BnfTerm == B.If)
    {
        return BlockIndentation.Unindent;
    }

    else
        return base.GetBlockIndentation(leftTerminalLeaveIfAny, target);
}

Of course, as it was mentioned before, these two formatters can be combined. To see an example look at the formatter in MiniPL's GrammarC.

If you are interested in how to do advanced syntax highlight, continue with Decorations.