Skip to content
John Gietzen edited this page Feb 28, 2020 · 18 revisions

This is a guide to the basic syntax of Pegasus. For more advanced topics, see the "How Do I... ?" article.

Grammar

A Pegasus grammar consists of a text file with two sections, in order:

  1. The "Settings" section.
  2. The "Rules" section.

Settings

Settings are specified in one of three ways:

  • @setting value For simple values, just write the setting value out. This is parsed as a type name.
  • @setting { value } For more complex values, wrap the setting value in curly braces. This is parsed as a code section.
  • @setting "value" An alternative to using curly braces is to use a string.

Supported settings

  • @namespace Specifies the namespace in which the parser class will be placed.
  • @accessibility Specifies the accessibility of the generated class.
  • @classname Specifies the name of the generated class.
  • @ignorecase Specifies the default behavior of the parser with regards to case sensitivity.
  • @resources Specifies the resources class to be used for resource based strings.
  • @start Specifies the starting rule. Defaults to the first rule in the grammar.
  • @trace Enables or disables tracing. Defaults to false.
  • @using Adds a using directive to the generated class file. (Multiple Allowed)
  • @members Allows for the definition of additional class members.

Combined Example

@namespace PegExamples.Foo
@accessibility internal
@classname MyParser
@ignorecase true
@resources MyProject.Properties.Resources
@start startingRule
@trace true
@using System.Linq
@using { Foo = System.String }
@members
{
    private static bool HelperFunction()
    {
    }
}

Rules

Basic Syntax

The basic syntax of a rule is:

name = expression

Rule Types

By default, rules infer their return type. For sequence expressions this is string, but this can be modified by specifying a type for the rule, like so:

name <type> = expression { ... }

Rule Flags

Rule flags are Boolean settings that are enabled on a per-rule basis. Flags come after the rule type, if there is one:

rule -flag = expression
rule <type> -flag = expression

Supported flags

  • -memoize Enables memoization for the rule.
  • -lexical Specifies that the rule should be included in the lexicalElements collection whenever it is successfully parsed.
  • -export Specifies that this rule will be included in this grammar's exported rules. Use this to make the rule available to other parsers in a convenient format. This is primarily used for #parse{} expressions.
  • -public Specifies that a public entry point will be made for this rule. Use this if it makes sense to parse an entire string using this rule. This could be used to provide user-input validation for primitive values supported by your parser.

Expressions

Character Matching Expressions

  • String 'foo' or "bar": String expressions match a string literally.
  • Character Class [a-z] or [a-z.,0-9] or [\x1f-\xfe\u0100-\u1fff]: Matches a single character that is within the character class.
  • Negative Character Class [^a-z] or [^a-z.,0-9] or [^\x1f-\xfe\u0100-\u1fff]: Matches a single character that is not within the character class.
  • Wildcard .: Wildcard expressions match any single character.

Strings and character classes can be marked as case-insensitive by suffixing the string or class with the letter i. For example, "foo"i 'bar'i [baz]i Or, they can be marked as case-sensitive by suffixing the string or class with the letter s.

Strings can be read from resources by suffixing the string with the letter r. The string to be parsed is then read from the grammar's resources, specified via the @resources setting described above.

Control Flow Expressions

  • Name a: Name expressions refer to a rule by name.
  • Labeled foo:a: Labeled expressions store a parse result for use in code assertions and expressions.
  • Sequence a b c: Sequence expressions match each component consecutively.
  • Choice a / b / c: Choice expressions provide options for parsing. They are evaluated consecutively.
  • Assertions !a &b: Assertion expressions act as look-aheads. They peek at the parsing subject, and do not logically advance the cursor (although internally they do use a cursor).
  • Code Assertions !{foo} &{bar}: Code assertions are similar to regular assertions, except they represent C# code that returns a Boolean value, rather than performing a look-ahead.
  • Repetition a? b+ c* d<3> e<2,> f<1,5>: Repetition expressions allow another expression to be repeated.
    • expr<3> matches an expression exactly three times.
    • expr<2,> matches an expression two or more times. Greedy.
    • expr<1,5> matches an expression one to five times. Greedy.
    • expr? matches an expression one or zero times. Equivalent to expr<0,1>.
    • expr+ matches an expression one or more times. Equivalent to expr<1,>.
    • expr* matches an expression zero or more times. Equivalent to expr<0,>.
  • Delimited Repetition a<0,,",">: Repetition expressions also support a delimiter that will match (and consume) in between each repeated match.
  • Parenthesis ( ... ): Parenthesis are used to group expressions.
  • Type (<type> ... ): Type expressions allow part of a rule to have a certain return type. This has the same meaning as having a type for a rule, except it is constrained to the expression wrapped by the parenthesis.

State and Error Handling Expressions

  • Code { code }: Code expressions contain C# code that specifies the result of an expression. Code expressions must come at the end of a sequence.
  • Error #error{ code }: Error-type code expressions throw a System.FormatException with the error message specified by the code section. The exception that is thrown will also have the Data["cursor"] property set, so that the location of the error can be determined.
  • State #{ code; }: State-type code expressions allow for stateful parsing. The code in a state-type code expression is allowed to modify the state object in a way that supports backtracking and memoization. State expressions may appear anywhere in a rule definition.
  • Parse #parse{ code }: Parse-type code expressions not only allow mutation of the cursor like state expressions, but also return a ParseResult<T>, allowing the integration of more complex parsing logic. The canonical example of this would be using an exported rule from another Pegasus parser.

Miscellaneous

  • /* ... */ Multi-line comment
  • // ... Single-line comment