Skip to content
John Gietzen edited this page Aug 5, 2016 · 18 revisions

This is a guide to the basic syntax of Pegasus. For more advanced topics, see the "How Do I... ?" article.

Grammar

A Pegasus grammar consists of a text file with two sections, in order:

  1. The "Settings" section.
  2. The "Rules" section.

Settings

Settings are specified in one of three ways:

  • @setting value For simple values, just write the setting value out. This is parsed as a type name.
  • @setting { value } For more complex values, wrap the setting value in curly braces. This is parsed as a code section.
  • @setting "value" An alternative to using curly braces is to use a string.

Supported settings

  • @namespace Specifies the namespace in which the parser class will be placed.
  • @accessibility Specifies the accessibility of the generated class.
  • @classname Specifies the name of the generated class.
  • @resources Specifies the resources class to be used for resource based strings.
  • @start Specifies the starting rule. Defaults to the first rule in the grammar.
  • @using Adds a using directive to the generated class file. (Multiple Allowed)
  • @members Allows for the definition of additional class members.

Combined Example

@namespace MyProject.Parsers
@accessibility internal
@classname MyParser
@resources MyProject.Properties.Resources
@start startingRule
@using System.Linq
@using { Foo = System.String }
@members
{
    private static bool HelperFunction()
    {
    }
}

Rules

Basic Syntax

The basic syntax of a rule is:

name = expression

Rule Types

By default, rules infer their return type. For sequence expressions this is string, but this can be modified by specifying a type for the rule, like so:

name <type> = expression { ... }

Rule Flags

Rule flags are Boolean settings that are enabled on a per-rule basis. Flags come after the rule type, if there is one:

rule -flag = expression
rule <type> -flag = expression

Supported flags

  • -memoize Enables memoization for the rule.
  • -lexical Specifies that the rule should be included in the lexicalElements collection whenever it is successfully parsed.
  • -public Specifies that this rule will be made a public entry point for the grammar.
  • -export Specifies that this rule will be included in this grammar's exported rules.

Expressions

Character Matching Expressions

  • String 'foo' or "bar": String expressions match a string literally.
  • Character Class [a-z]: Matches a single character that is within the character class.
  • Wildcard .: Wildcard expressions match any single character.

Strings and character classes can be marked as case-insensitive by suffixing the string or class with the letter i. For example, "foo"i 'bar'i [baz]i

Strings can be read from resources by suffixing the string with the letter r. The string to be parsed is then read from the grammar's resources, specified via the @resources setting described above.

Control Flow Expressions

  • Name a: Name expressions refer to a rule by name.
  • Labeled foo:a: Labeled expressions store a parse result for use in code assertions and expressions.
  • Sequence a b c: Sequence expressions match each component consecutively.
  • Choice a / b / c: Choice expressions provide options for parsing. They are evaluated consecutively.
  • Assertions !a &b: Assertion expressions act as look-aheads. They only peek at the parsing subject, they do not advance the cursor.
  • Code Assertions !{foo} &{bar}: Code assertions are similar to regular assertions. They represent C# code that returns a Boolean value, rather than performing a look-ahead.
  • Repetition a? b+ c* d<3> e<2,> f<1,5>: Repetition expressions allow another expression to be repeated.
  • Delimited Repetition a<0,,",">: Repetition expressions also support a delimiter that will match (and consume) in between each repeated match.
  • Parenthesis ( ... ): Parenthesis are used to group expressions.
  • Type (<type> ... ): Type expressions allow part of a rule to have a certain return type. This has the same meaning as having a type for a rule, except it is constrained to the expression wrapped by the parenthesis.

State and Error Handling Expressions

  • Code { code }: Code expressions contain C# code that specifies the result of an expression. Code expressions must come at the end of a sequence.
  • Error #error{ code }: Error-type code expressions throw a System.FormatException with the error message specified by the code section. The exception that is thrown will also have the Data["cursor"] property set, so that the location of the error can be determined.
  • State #{ code; }: State-type code expressions allow for stateful parsing. The code in a state-type code expression is allowed to modify the state object in a way that supports backtracking and memoization. State expressions may appear anywhere in a rule definition.
  • Parse #parse{ code }: Parse-type code expressions not only allow mutation of the cursor like state expressions, but also return a ParseResult<T>, allowing the integration of more complex parsing logic. The canonical example of this would be using an exported rule from another Pegasus parser.

Miscellaneous

  • /* ... */ Multi-line comment
  • // ... Single-line comment
Clone this wiki locally