-
Notifications
You must be signed in to change notification settings - Fork 2
Formatters
So, we have a grammar that drives the general unparser which can unparse our structure to text. But we have not talked about the whitespaces, newlines, indentation, i.e. the format of our text representation.
The behaviour of the default formatter is that it inserts a space (Formatter.WhiteSpaceBetweenUtokens
, actually) between every utokens but it can be barely called a readable format. In order to provide a readable format for your unparsed text you need to write a formatter (that means implementing a class that derives from Sarcasm.Unparsing.Formatter
).
You can make your formatter a default formatter for your grammar by overriding the GetUnparseControl
method in your grammar and returning your formatter:
protected override UnparseControl GetUnparseControl()
{
return UnparseControl.Create(
this,
new MyFormatter(this),
new ParenthesizedExpression(Expression, LEFT_PAREN, RIGHT_PAREN)
);
}
Or you can just set the formatter for the given unparser instance:
unparser.Formatter = new MyFormatter(myGrammar);
Note, that the formatter needs the grammar as constructor parameter.
You can write more formatters for your grammar, but only one can be the default. If you want to use the other formatters, you have to set it on the unparser. However, you can implement more formatters inside one single formatter, and this is a better solution for that matter.
Let's suppose you want a formatter that implements e.g. the Allman indent-style, and another that implements Kernighan&Ritchie. Of course, our formatters have to do the other formatting besides the indent-style, so there are a lot of common stuff in those two formatter class which could be made common by using a common base class for them. However, let's suppose that you want other formatters too: e.g. one that flattens if
hierarchies, and another that does not. Now, if you want to combine your formatters, you would have to write 2*2=4 formatters, which obviously not a good idea. Instead, you only write one single formatter, and inside of it you can define two public properties through which you can set the indent-style and the 'if' flattening behaviour, respectively.
Until now it was enough to define your bnfterm variables inside your grammar constructors, but if you use a formatter in most cases you need to access those bnfterms. In order to do that you should define you bnfterms as public readonly fields/properties in your grammar. In order to separate your bnfterms from the other public member of the grammar the common practice is that you create a separate class called BnfTerms
inside your grammar class, your bnfterms should be members of this BnfTerms
class, and your grammar should contain a public readonly member called B
with type BnfTerms
:
public class GrammarC : Sarcasm.GrammarAst.Grammar<D.Program>
{
public class BnfTerms
{
public readonly BnfiTermRecord<D.Program> Program = new BnfiTermRecord<D.Program>();
public readonly BnfiTermRecord<D.Function> Function = new BnfiTermRecord<D.Function>();
public readonly BnfiTermKeyTerm PROGRAM;
public readonly BnfiTermKeyTerm NAMESPACE;
public readonly BnfiTermKeyTerm BEGIN;
public readonly BnfiTermKeyTerm END;
internal BnfTerms(TerminalFactoryS TerminalFactoryS)
{
this.PROGRAM = TerminalFactoryS.CreateKeyTerm("program");
this.NAMESPACE = TerminalFactoryS.CreateKeyTerm("namespace");
this.BEGIN = TerminalFactoryS.CreateKeyTerm("{");
this.END = TerminalFactoryS.CreateKeyTerm("}");
}
}
public readonly BnfTerms B;
public GrammarC()
: base(new Domain())
{
B = new BnfTerms(new TerminalFactoryS(this));
this.Root = B.Program;
B.Program.Rule =
B.PROGRAM
+ B.Name.BindTo(B.Program, t => t.Name)
+ (B.NAMESPACE + B.NamespaceName).QRef().BindTo(B.Program, t => t.Namespace)
+ B.Function.StarList().BindTo(B.Program, t => t.Functions)
+ B.BEGIN
+ B.Statement.PlusList().BindTo(B.Program, t => t.Body)
+ B.END
;
B.Function.Rule =
B.Type.QVal().BindTo(B.Function, t => t.ReturnType)
+ B.Name.BindTo(B.Function, t => t.Name)
+ B.LEFT_PAREN
+ B.Parameter.StarList(B.COMMA).BindTo(B.Function, t => t.Parameters)
+ B.RIGHT_PAREN
+ B.BEGIN
+ B.Statement.PlusList().BindTo(B.Function, t => t.Body)
+ B.END
;
}
public class Formatter : Sarcasm.Unparsing.Formatter
{
private readonly BnfTerms B;
public Formatter(GrammarC grammar)
: base(grammar)
{
this.B = grammar.B;
}
// inside your formatter you can access your bnfterms easily like this: B.Program, B.PROGRAM, B.BEGIN etc.
}
}
There are three methods you can override in order to change the behaviour of the default formatter. They deal with UnparsableAst
parameters which is actually an (astValue,bnfterm) pair for a given astValue which has been placed in the parse tree built by the unparser.
The first two methods are responsible for inserting (whitespace) utokens. They returns InsertedUtokens
, which is basically a list of UtokenInsert
utokens (utokens that can be inserted by the formatter) with priority and behaviour (see later).
These are the 'UtokenInsert' utokens you can use:
UtokenInsert.NewLine()
UtokenInsert.EmptyLine()
UtokenInsert.Space()
UtokenInsert.Tab()
UtokenInsert.NoWhitespace()
The first four are evident, only NoWhitespace
might need some explanation: it means that we do not want to insert anything but want to prevent the default formatter from inserting a space automatically between the specific utokens. The first four will be converted to strings only if needed, e.g. if you call AsText()
on the utokens. In this case Formatter.NewLine
, Formatter.Space
and Formatter.Tab
strings will be used.
By using this you can control what utokens should be inserted to the left and to the right of the given UnparsableAst
.
protected override void GetUtokensAround(UnparsableAst target,
out InsertedUtokens leftInsertedUtokens, out InsertedUtokens rightInsertedUtokens)
{
base.GetUtokensAround(target, out leftInsertedUtokens, out rightInsertedUtokens);
// no whitespace around "."
if (target.BnfTerm == B.DOT)
leftInsertedUtokens = rightInsertedUtokens = UtokenInsert.NoWhitespace();
// no whitespace at the left of ")"
else if (target.BnfTerm == B.RIGHT_PAREN)
leftInsertedUtokens = UtokenInsert.NoWhitespace();
// no whitespace at the right of "("
else if (target.BnfTerm == B.LEFT_PAREN)
rightInsertedUtokens = UtokenInsert.NoWhitespace();
}
By using this you can control what utokens should be inserted between the given UnparsableAst
parameters. leftTerminalLeaveTarget
is a the left UnparsableAst
which actually has a Terminal
inside its BnfTerm
property, and rightTarget
is the right UnparsableAst
which could be a leave or a non-leave node in the parse tree.
protected override InsertedUtokens GetUtokensBetween(UnparsableAst leftTerminalLeaveTarget, UnparsableAst rightTarget)
{
// no whitespace between a nameref and "(" (typically: function call)
if (leftTerminalLeaveTarget.AstImage != null &&
leftTerminalLeaveTarget.AstImage.AstValue is DC.NameRef && rightTarget.BnfTerm == B.LEFT_PAREN)
{
return UtokenInsert.NoWhitespace();
}
// no whitespace between a "Write" keyterm and "("
else if (leftTerminalLeaveTarget.BnfTerm == B.WRITE && rightTarget.BnfTerm == B.LEFT_PAREN)
return UtokenInsert.NoWhitespace();
else
return base.GetUtokensBetween(leftTerminalLeaveTarget, rightTarget);
}
These methods are being called for each node in the parse tree that is built by the unparser. It means that more inserted utokens can be side by side.
E.g. if we unparse an expression Add(NumberLiteral(5), NumberLiteral(8))
the following parse tree will be built by the unparser:
Expression_1
BinaryExpression
Expression_2
NumberLiteral_1 "5"
BinaryOperator
ADD_OP
Expression_3
NumberLiteral_2 "8"
The formatter methods are being called for each (bnfterm, astValue) pairs, i.e.:
(Expression_1, Add(NumberLiteral(5), NumberLiteral(8)))
(BinaryExpression, Add(NumberLiteral(5), NumberLiteral(8)))
(Expression_2, NumberLiteral(5))
(NumberLiteral_1, NumberLiteral(5))
...
This is the point where we reach the actual NumberLiteral(5) leave in the parse tree, and any method calls can return InsertedUtokens
. So the utoken list would look like this if each methods would return something:
Left Expression_1
Left BinaryExpression
Left Expression_2
Left NumberLiteral_1
Content NumberLiteral_1 "5"
Right NumberLiteral_1
Right Expression_2
Between NumberLiteral_1_BinaryOperator
Left BinaryOperator
Between NumberLiteral_1_ADD_OP
Left ADD_OP
Content ADD_OP
Right ADD_OP
Right BinaryOperator
Between ADD_OP_Expression_3
Left Expression_3
Between ADD_OP_NumberLiteral_2
Left NumberLiteral_2
Content NumberLiteral_2 "8"
Right NumberLiteral_2
Right Expression_3
...
So the inserted utokens that are side by side are in the following format: (Right)*((Between)?(Left)?)*
Most of the time it is not good to keep all the inserted items, so there is a filtering logic inside Sarcasm's formatter that filters out inserted utokens. InsertedUtokens
has a Priority
(default is 0) and a Behavior
(default is Overridable
) to change the default logic. The rules are the followings:
- an
InsertedUtokens
with higher priority is stronger than one with lower priority - outer
InsertedUtokens
are stronger than inner ones, if they have the same priority -
Between
is stronger thenLeft
which is stronger thanRight
, if they have the same priority - an
InsertedUtokens
which is stronger than another will filter out the other one, if the other one's behavior isOverridable
- if an
InsertedUtokens
has aNonOverridableSkipThrough
behavior then it will be preserved in the sequence, and it will not interfere with the comparisons between otherInsertedUtokens
s - if an
InsertedUtokens
has aNonOverridableSeparator
behavior then it will be preserved in the sequence, and it will filter out theInsertedUtokens
s at its left if they are weaker, or will have the strongest one at its left
Both in the case of a single UtokenInsert
or in the case of InsertedUtokens
(which consist of more UtokenInsert
s) the SetPriority
and SetBehavior
methods can be used:
return UtokenInsert.Space().SetPriority(10);
return UtokenInsert.Space().SetBehavior(Behavior.NonOverridableSkipThrough);
return UtokenInsert.Space().SetPriority(10).SetBehavior(Behavior.NonOverridableSkipThrough);
The third method is responsible for the indentation. It returns a BlockIndentation
which can be any of the followings:
BlockIndentation.Indent
BlockIndentation.Unindent
BlockIndentation.ZeroIndent
By using it you can control the indentation of the current UnparsableAst
and its descendants.
Note that the effect of a new Indent
and Unindent
is relative to the effect we already have as "current indentation", i.e. BlockIndentation.Indent
and BlockIndentation.Unindent
sums up. (So a new BlockIndentation.Indent
on an UnparsableAst, assuming that one if its ancestors already did a BlockIndentation.Indent
, means that the current UnparsableAst
will be indented by two units.)
On the other hand, BlockIndentation.ZeroIndent
is absolute: the indentation will be zero regardless of the ancestors' indentation.
protected override BlockIndentation GetBlockIndentation(UnparsableAst leftTerminalLeaveIfAny, UnparsableAst target)
{
// indent statement
if (target.BnfTerm == B.Statement)
return BlockIndentation.Indent;
else
return base.GetBlockIndentation(leftTerminalLeaveIfAny, target);
}
Suppose that we want to implement the Allman, Whitesmiths, Stroustrup and Kernighan&Ritchie indent-style:
// Allman
if (i == 0)
{
return 0;
}
else if (i == 1)
{
return 1;
}
else
{
return 2;
}
// Whitesmiths
if (i == 0)
{
return 0;
}
else if (i == 1)
{
return 1;
}
else
{
return 2;
}
// Stroustrup
if (i == 0) {
return 0;
}
else if (i == 1) {
return 1;
}
else {
return 2;
}
// Kernighan&Ritchie
if (i == 0) {
return 0;
} else if (i == 1) {
return 1;
} else {
return 2;
}
First we have to define an enum
for it:
public enum IndentStyle { Allman, Whitesmiths, Stroustrup, KernighanAndRitchie }
Then a public property inside our formatter:
public IndentStyle IndentStyle { get; set; }
Then implement the formatter methods:
protected override void GetUtokensAround(UnparsableAst target,
out InsertedUtokens leftInsertedUtokens, out InsertedUtokens rightInsertedUtokens)
{
base.GetUtokensAround(target, out leftInsertedUtokens, out rightInsertedUtokens);
if (target.BnfTerm == B.Statement)
leftInsertedUtokens = rightInsertedUtokens = UtokenInsert.NewLine();
else if (target.BnfTerm == B.BEGIN)
{
rightInsertedUtokens = UtokenInsert.NewLine();
if (IndentStyle.EqualToAny(GrammarC.IndentStyle.Allman, GrammarC.IndentStyle.Whitesmiths))
leftInsertedUtokens = UtokenInsert.NewLine();
else if (IndentStyle.EqualToAny(GrammarC.IndentStyle.Stroustrup, GrammarC.IndentStyle.KernighanAndRitchie))
leftInsertedUtokens = UtokenInsert.Space();
}
else if (target.BnfTerm == B.END)
leftInsertedUtokens = rightInsertedUtokens = UtokenInsert.NewLine();
}
protected override InsertedUtokens GetUtokensBetween(UnparsableAst leftTerminalLeaveTarget, UnparsableAst rightTarget)
{
if (rightTarget.BnfTerm == B.ELSE && leftTerminalLeaveTarget.BnfTerm == B.END &&
IndentStyle == GrammarC.IndentStyle.KernighanAndRitchie)
{
return UtokenInsert.Space().SetPriority(10);
}
else
return base.GetUtokensBetween(leftTerminalLeaveTarget, rightTarget);
}
protected override BlockIndentation GetBlockIndentation(UnparsableAst leftTerminalLeaveIfAny, UnparsableAst target)
{
// in MiniPL a statement list is a statement itself with statements inside, so we have to avoid double indentation here
if (target.BnfTerm == B.Statement && !(target.AstValue is D.StatementList))
return BlockIndentation.Indent;
else if (IndentStyle == GrammarC.IndentStyle.Whitesmiths && target.BnfTerm.EqualToAny(B.BEGIN, B.END))
return BlockIndentation.Indent;
else
return base.GetBlockIndentation(leftTerminalLeaveIfAny, target);
}
Suppose that we want to flatten if
hierarchies as it is common in editors:
// without flattening (because the second 'if-else' is actually inside the 'else' body of the first 'if-else'):
if (i == 0)
return 0;
else
if (i == 1)
return 1;
else
return 2;
// with flattening:
if (i == 0)
return 0;
else if (i == 1)
return 1;
else
return 2;
First we have to define a public property inside our formatter:
public bool FlattenIfHierarchy { get; set; }
Then implement the formatter methods:
protected override void GetUtokensAround(UnparsableAst target,
out InsertedUtokens leftInsertedUtokens, out InsertedUtokens rightInsertedUtokens)
{
base.GetUtokensAround(target, out leftInsertedUtokens, out rightInsertedUtokens);
if (target.BnfTerm == B.Statement)
leftInsertedUtokens = rightInsertedUtokens = UtokenInsert.NewLine();
else if (target.BnfTerm == B.BEGIN)
rightInsertedUtokens = UtokenInsert.NewLine();
else if (target.BnfTerm == B.END)
leftInsertedUtokens = rightInsertedUtokens = UtokenInsert.NewLine();
}
protected override InsertedUtokens GetUtokensBetween(UnparsableAst leftTerminalLeaveTarget, UnparsableAst rightTarget)
{
if (FlattenIfHierarchy && leftTerminalLeaveTarget.BnfTerm == B.ELSE && rightTarget.BnfTerm == B.If)
return UtokenInsert.Space().SetPriority(10);
else
return base.GetUtokensBetween(leftTerminalLeaveTarget, rightTarget);
}
protected override BlockIndentation GetBlockIndentation(UnparsableAst leftTerminalLeaveIfAny, UnparsableAst target)
{
if (target.BnfTerm == B.Statement && !(target.AstValue is D.StatementList))
return BlockIndentation.Indent;
else if (target.BnfTerm.EqualToAny(B.BEGIN, B.END))
return BlockIndentation.Indent;
else if (FlattenIfHierarchy && leftTerminalLeaveIfAny != null &&
leftTerminalLeaveIfAny.BnfTerm == B.ELSE && target.BnfTerm == B.If)
{
return BlockIndentation.Unindent;
}
else
return base.GetBlockIndentation(leftTerminalLeaveIfAny, target);
}
Of course, as it was mentioned before, these two formatters can be combined. To see an example look at the formatter in MiniPL's GrammarC.
If you are interested in how to do advanced syntax highlight, continue with Decorations.