Skip to content

Significant Whitespace Parsing

John Gietzen edited this page Mar 30, 2019 · 8 revisions

Significant Whitespace parsing is possible in Pegasus by making use of state (#{}) code regions.

The principal is that you can make a rule (called INDENTATION below) that looks for a certain number of spaces. This rule can be varied by entering and exiting rules which modify the number of spaces to match.

The gist of the idea is:

program
  = #{ state["Indentation"] = 0; } otherRules

INDENTATION
  = spaces:" "* &{ spaces.Count == state["Indentation"] }

INDENT
  = #{ state["Indentation"] += 4; }

UNDENT
  = #{ state["Indentation"] -= 4; }

It would be feasible to use an immutable Stack<int> rather than a simple int in order to allow variable sized indentation, but (since .NET doesn't have an immutable generic stack out of the box) this has been omitted for simplicity.


Here is a working prototype of significant whitespace parsing:

Significant.peg:

@namespace PegExamples
@classname SignificantWhitespaceParser
@using Pegasus.Common
@using System.Linq

program <object>
  = #{ state["Indentation"] = 0; } s:statements eof { s }

statements
  = line+

line <object>
  = INDENTATION s:statement { s }

statement <object>
  = s:simpleStatement eol { s }
  / "if" _ n:name _? ":" eol INDENT s:statements UNDENT { new { Condition = n, Statements = s } }
  / "def" _ n:name _? ":" eol INDENT s:statements UNDENT { new { Name = n, Statements = s } }

simpleStatement <object>
  = a:name _? "=" _? b:name { new { LValue = a, Expression = b } }

name
  = n:([a-zA-Z] [a-zA-Z0-9]*) { n }

_ = [ \t]+

eol = _? comment? ("\r\n" / "\n\r" / "\r" / "\n" / eof)

comment = "//" [^\r\n]*

eof = !.

INDENTATION
  = spaces:" "* &{ spaces.Count == state["Indentation"] }

INDENT
  = #{ state["Indentation"] += 4; }

UNDENT
  = #{ state["Indentation"] -= 4; }

Program.cs

using System;
using System.IO;
using Newtonsoft.Json;
using Pegasus.Demos;

namespace PegTest
{
    internal class Program
    {
        private static void Main(string[] args)
        {
            try
            {
                var result = new SignificantWhitespaceParser().Parse(File.ReadAllText("test.txt"));
                Console.WriteLine(JsonConvert.SerializeObject(result, Formatting.Indented));
            }
            catch (FormatException ex)
            {
                Console.WriteLine(ex.Message);
            }

            Console.ReadKey(true);
        }
    }
}

Test.txt

a = b
if a:
    a = b
    if q:
        a = z
        d = f
    b = c
    def q:
        a = c
c = d
Clone this wiki locally