diff --git a/AUTHORS b/AUTHORS
index 7ec58f89..dbdfa352 100644
--- a/AUTHORS
+++ b/AUTHORS
@@ -25,6 +25,7 @@ Julian Aubourg If you need both the Documentation
Table of Contents
-
-
-
-
-
-
-
+
+
+
+
+
+ Installation
@@ -64,25 +65,25 @@ Node.js
$ npm install peggy
peggy
command and the JavaScript API,
-install Peggy both ways.
The easiest way to use Peggy from the browser is to pull the latest version -from a CDN. Either of these should work:
+ from a CDN. Either of these should work:<script src="https://unpkg.com/peggy"></script>
<script src="https://cdn.jsdelivr.net/npm/peggy"></script>
Both of those CDNs support pinning a version number rather than always -taking the latest. Not only is that good practice, it will save several -redirects, improving performance. See their documentation for more -information:
+ taking the latest. Not only is that good practice, it will save several + redirects, improving performance. See their documentation for more + information:When your document is done loading, there will be a global peggy
object.
Peggy generates parser from a grammar that describes expected input and can -specify what the parser returns (using semantic actions on matched parts of the -input). Generated parser itself is a JavaScript object with a simple API.
+ specify what the parser returns (using semantic actions on matched parts of the + input). Generated parser itself is a JavaScript object with a simple API.To generate a parser from your grammar, use the peggy
-command:
$ peggy arithmetics.pegjs
This writes parser source code into a file with the same name as the grammar -file but with “.js” extension. You can also specify the output file -explicitly:
+ file but with “.js” extension. You can also specify the output file + explicitly:$ peggy -o arithmetics-parser.js arithmetics.pegjs
If you omit both input and output file, standard input and standard output -are used.
+ are used.By default, the generated parser is in the Node.js module format. You can
-override this using the --format
option.
--format
option.
You can tweak the generated parser with several options:
--allowed-start-rules <rules>
--ast
-t/--test
, -T/--test-file
and -m/--source-map
-options.--cache
-d
, --dependency <[name:]module>
-D
, --dependencies <json>
-e
, --export-var <variable>
--extra-options <options>
peg.generate
.-c
, --extra-options-file <file>
peg.generate
.--format <format>
amd
, commonjs
,
-globals
, umd
, es
(default: commonjs
).-o
, --output <file>
.js
, or stdout if no input file is given.--plugin
-m
, --source-map <file>
inline
is given,
-the sourcemap will be embedded in the output file as a data URI. If the
-filename is prefixed with hidden:
, no mapping URL will be
-included so that the mapping can be specified with an HTTP SourceMap:
-header. This option conflicts with the -t/--test
and
--T/--test-file
options unless -o/--output
is also
-specified-S
, --start-rule <rule>
-t
, --test <text>
-T
, --test-file <text>
--trace
-v
, --version
-h
, --help
--allowed-start-rules <rules>
--ast
-t/--test
, -T/--test-file
and -m/--source-map
+ options.
+ --cache
-d
, --dependency <[name:]module>
-D
, --dependencies <json>
-e
, --export-var <variable>
--extra-options <options>
peg.generate
.
+ -c
, --extra-options-file <file>
peg.generate
.
+ --format <format>
amd
, commonjs
,
+ globals
, umd
, es
(default: commonjs
).
+ -o
, --output <file>
.js
, or stdout if no input file is given.--plugin
-m
, --source-map <file>
inline
is given,
+ the sourcemap will be embedded in the output file as a data URI. If the
+ filename is prefixed with hidden:
, no mapping URL will be
+ included so that the mapping can be specified with an HTTP SourceMap:
+ header. This option conflicts with the -t/--test
and
+ -T/--test-file
options unless -o/--output
is also
+ specified
+ -S
, --start-rule <rule>
-t
, --test <text>
-T
, --test-file <text>
--trace
-v
, --version
-h
, --help
-If you specify options using -c <file>
or
---extra-options-file <file>
, you will need to ensure you
-are using the correct types. In particular, you may specify "plugin" as a
-string, or "plugins" as an array of objects that have a use
-method. Always use the long (two-dash) form of the option, without the
-dashes, as the key. Options that contain internal dashes should be specified
-in camel case. You may also specify an "input" field instead of using the
-command line. For example:
-
If you specify options using -c <file>
or
+ --extra-options-file <file>
, you will need to ensure you
+ are using the correct types. In particular, you may specify "plugin" as a
+ string, or "plugins" as an array of objects that have a use
+ method. Always use the long (two-dash) form of the option, without the
+ dashes, as the key. Options that contain internal dashes should be specified
+ in camel case. You may also specify an "input" field instead of using the
+ command line. For example:
// config.js or config.cjs
module.exports = {
@@ -223,23 +227,22 @@ Command Line
};
-
-You can test generated parser immediately if you specify the
--t/--test
or -T/--test-file
-option. This option conflicts with the
---ast
option, and also conflicts with the
--m/--source-map
option unless -o/--output
is also
-specified.
-
You can test generated parser immediately if you specify the
+ -t/--test
or -T/--test-file
+ option. This option conflicts with the
+ --ast
option, and also conflicts with the
+ -m/--source-map
option unless -o/--output
is also
+ specified.
The CLI will exit with the code:
0
: if successful1
: if you supply incorrect or conflicting parameters2
: if you specified the
--t/--test
or -T/--test-file
option and the specified
-input fails parsing with the specified grammar0
: if successful1
: if you supply incorrect or conflicting parameters2
: if you specified the
+ -t/--test
or -T/--test-file
option and the specified
+ input fails parsing with the specified grammar
+ Examples:
@@ -286,156 +289,162 @@import * as peggy from "peggy";
For use in browsers, include the Peggy library in your web page or
-application using the <script>
tag. If Peggy detects an AMD loader, it will define
-itself as a module, otherwise the API will be available in the
-peg
global object.
<script>
tag. If Peggy detects an
+ AMD loader, it will define
+ itself as a module, otherwise the API will be available in the
+ peg
global object.
To generate a parser, call the peggy.generate
method and pass your
-grammar as a parameter:
const parser = peggy.generate("start = ('a' / 'b')+");
The method will return generated parser object or its source code as a string
-(depending on the value of the output
option — see below). It will
-throw an exception if the grammar is invalid. The exception will contain a
-message
property with more details about the error.
output
option — see below). It will
+ throw an exception if the grammar is invalid. The exception will contain a
+ message
property with more details about the error.
You can tweak the generated parser by passing a second parameter with an
-options object to peg.generate
. The following options are
-supported:
peg.generate
. The following options are
+ supported:
allowedStartRules
cache
true
, makes the parser cache results, avoiding exponential
-parsing time in pathological cases but making the parser slower (default:
-false
).dependencies
format
is set to "amd"
,
-"commonjs"
, "es"
, or "umd"
.
-Dependencies variables will be available in both the global
-initializer and the per-parse initializer. Unless the parser is
-to be generated in different formats, it is recommended to rather import
-dependencies from within the global initializer (default:
-{}
).error
exportVar
format
is set to
-"globals"
or "umd"
(default:
-null
).format
"amd"
, "bare"
,
-"commonjs"
, "es"
, "globals"
, or
-"umd"
); valid only when output
is set to
-"source"
, "source-and-map"
, or
-"source-with-inline-map"
. (default: "bare"
).
-grammarSource
source
in the location objects, that returned by the
-location()
API function (default: undefined
). It is
-recommended that if you do not use a string, the object you supply has a
-useful toString()
implementation.info
output
A string, one of:
-"parser"
- return generated parser object."source"
- return parser source code as a string."source-and-map"
- return a
-SourceNode
-object; you can get source code by calling toString()
-method or source code and mapping by calling
-toStringWithSourceMap()
method, see the
-SourceNode
-documentation.
-"source-with-inline-map"
- return the parser source along
-with an embedded source map as a data:
URI. This option
-leads to a larger output string, but is the easiest to integrate with
-developer tooling."ast"
- return the internal AST of the grammar as a JSON
- string. Useful for plugin authors to explore internals of Peggy and
- for automation.(default: "parser"
)
--Note: You should also set
-grammarSource
-to a not-empty string if you set this value to -"source-and-map"
or -"source-with-inline-map"
. The path should be relative to -the location where the generated parser code will be stored. For -example, if you are generatinglib/parser.js
from -src/parser.peggy
, then your options should be: -{ grammarSource: "../src/parser.peggy" }
plugins
trace
false
).warning
allowedStartRules
cache
true
, makes the parser cache results, avoiding exponential
+ parsing time in pathological cases but making the parser slower (default:
+ false
).
+ dependencies
format
is set to "amd"
,
+ "commonjs"
, "es"
, or "umd"
.
+ Dependencies variables will be available in both the global
+ initializer and the per-parse initializer. Unless the parser
+ is to be generated in different formats, it is recommended to rather
+ import dependencies from within the global initializer (default:
+ {}
).
+ error
exportVar
format
is set to
+ "globals"
or "umd"
(default:
+ null
).
+ format
"amd"
, "bare"
,
+ "commonjs"
, "es"
, "globals"
, or
+ "umd"
); valid only when output
is set to
+ "source"
, "source-and-map"
, or
+ "source-with-inline-map"
. (default: "bare"
).
+ grammarSource
source
in the location objects, that returned by the
+ location()
API function (default: undefined
). It
+ is recommended that if you do not use a string, the object you supply has
+ a useful toString()
implementation.
+ info
output
A string, one of:
+"parser"
- return generated parser object."source"
- return parser source code as a string."source-and-map"
- return a
+ SourceNode
+ object; you can get source code by calling toString()
+ method or source code and mapping by calling
+ toStringWithSourceMap()
method, see the
+ SourceNode
+ documentation.
+ "source-with-inline-map"
- return the parser source along
+ with an embedded source map as a data:
URI. This option
+ leads to a larger output string, but is the easiest to integrate with
+ developer tooling."ast"
- return the internal AST of the grammar as a JSON
+ string. Useful for plugin authors to explore internals of Peggy and
+ for automation.(default: "parser"
)
++Note: You should also set
+grammarSource
+ to a not-empty string if you set this value to +"source-and-map"
or +"source-with-inline-map"
. The path should be relative to + the location where the generated parser code will be stored. For + example, if you are generatinglib/parser.js
from +src/parser.peggy
, then your options should be: +{ grammarSource: "../src/parser.peggy" }
+
plugins
trace
false
).warning
While generating the parser, the compiler may throw a GrammarError
which collects
-all of the issues that were seen.
There is also another way to collect problems as fast as they are reported — -register one or more of these callbacks:
+ register one or more of these callbacks:error(stage: Stage, message: string, location?: LocationRange, notes?: DiagnosticNote[]): void
warning(stage: Stage, message: string, location?: LocationRange, notes?: DiagnosticNote[]): void
info(stage: Stage, message: string, location?: LocationRange, notes?: DiagnosticNote[]): void
error(stage: Stage, message: string, location?: LocationRange, notes?: DiagnosticNote[]): void
warning(stage: Stage, message: string, location?: LocationRange, notes?: DiagnosticNote[]): void
info(stage: Stage, message: string, location?: LocationRange, notes?: DiagnosticNote[]): void
All parameters are the same as the parameters of the reporting API except the first.
-The stage
represent one of possible stages during which execution a diagnostic was generated.
-This is a string enumeration, that currently has one of three values:
stage
represent one of possible stages during which execution a diagnostic was generated.
+ This is a string enumeration, that currently has one of three values:
check
transform
generate
check
transform
generate
To use the generated parser, call its parse
-method and pass an input string as a parameter. The method will return a parse
-result (the exact value depends on the grammar used to generate the parser) or
-throw an exception if the input is invalid. The exception will contain
-location
, expected
, found
,
-message
, and diagnostic
properties with more details about the error. The error
-will have a format(SourceText[])
function, to which you pass an array
-of objects that look like { source: grammarSource, text: string }
; this
-will return a nicely-formatted error suitable for human consumption.
location
, expected
, found
,
+ message
, and diagnostic
properties with more details about the error. The error
+ will have a format(SourceText[])
function, to which you pass an array
+ of objects that look like { source: grammarSource, text: string }
; this
+ will return a nicely-formatted error suitable for human consumption.
parser.parse("abba"); // returns ["a", "b", "b", "a"]
@@ -446,20 +455,20 @@ Using the Parser
supported:
-startRule
-- Name of the rule to start parsing from.
-
-tracer
--
- Tracer to use. A tracer is an object containing a
trace()
function.
- trace()
takes a single parameter which is an object containing
- "type" ("rule.enter", "rule.fail", "rule.match"), "rule" (the rule name as a
- string), "location", and, if the type is
- "rule.match", "result" (what the rule returned).
-
-
-...
(any others)
-- Made available in the
options
variable
+ startRule
+ - Name of the rule to start parsing from.
+
+ tracer
+ -
+ Tracer to use. A tracer is an object containing a
trace()
function.
+ trace()
takes a single parameter which is an object containing
+ "type" ("rule.enter", "rule.fail", "rule.match"), "rule" (the rule name as a
+ string), "location", and, if the type is
+ "rule.match", "result" (what the rule returned).
+
+
+ ...
(any others)
+ - Made available in the
options
variable
As you can see above, parsers can also support their own custom options. For example:
@@ -483,12 +492,12 @@ Using the Parser
Grammar Syntax and Semantics
The grammar syntax is similar to JavaScript in that it is not line-oriented
-and ignores whitespace between tokens. You can also use JavaScript-style
-comments (// ...
and /* ... */
).
+ and ignores whitespace between tokens. You can also use JavaScript-style
+ comments (// ...
and /* ... */
).
Let's look at example grammar that recognizes simple arithmetic expressions
-like 2*(3+4)
. A parser generated from this grammar computes their
-values.
+ like 2*(3+4)
. A parser generated from this grammar computes their
+ values.
start
= additive
@@ -509,42 +518,45 @@ Grammar Syntax and Semantics
= digits:[0-9]+ { return parseInt(digits.join(""), 10); }
On the top level, the grammar consists of rules (in our example,
-there are five of them). Each rule has a name (e.g.
-integer
) that identifies the rule, and a parsing
-expression (e.g. digits:[0-9]+ { return parseInt(digits.join(""),
-10); }
) that defines a pattern to match against the input text and
-possibly contains some JavaScript code that determines what happens when the
-pattern matches successfully. A rule can also contain human-readable
-name that is used in error messages (in our example, only the
-integer
rule has a human-readable name). The parsing starts at the
-first rule, which is also called the start rule.
-
-A rule name must be a JavaScript identifier. It is followed by an equality
-sign (“=”) and a parsing expression. If the rule has a human-readable name, it
-is written as a JavaScript string between the rule name and the equality sign.
-Rules need to be separated only by whitespace (their beginning is easily
-recognizable), but a semicolon (“;”) after the parsing expression is
-allowed.
+ there are five of them). Each rule has a name (e.g.
+ integer
) that identifies the rule, and a parsing
+ expression (e.g. digits:[0-9]+ { return parseInt(digits.join(""), 10); }
)
+ that defines a pattern to match against the input text and
+ possibly contains some JavaScript code that determines what happens when the
+ pattern matches successfully. A rule can also contain human-readable
+ name that is used in error messages (in our example, only the
+ integer
rule has a human-readable name). The parsing starts at the
+ first rule, which is also called the start rule.
+
+A rule name must be a Peggy identifier. It is
+ followed by an equality sign (“=”) and a parsing expression. If the rule has a
+ human-readable name, it is written as a JavaScript string between the rule
+ name and the equality sign. Rules need to be separated only by whitespace
+ (their beginning is easily recognizable), but a semicolon (“;”) after the
+ parsing expression is allowed.
The first rule can be preceded by a global initializer and/or a
-per-parse initializer, in that order. Both are pieces of JavaScript
-code in double curly braces (“{{'{{'}}” and “}}”) and single curly braces (“{” and
-“}”) respectively. All variables and functions defined in both
-initializers are accessible in rule actions and semantic predicates.
-Curly braces in both initializers code must be balanced.
+ per-parse initializer, in that order. Both are pieces of JavaScript
+ code in double curly braces (“{{'{{'}}” and “}}”) and single curly braces (“{” and
+ “}”) respectively. All variables and functions defined in both
+ initializers are accessible in rule actions and semantic predicates.
+ Curly braces in both initializers code must be balanced.
+
The global initializer is executed once and only once, when the
-generated parser is loaded (through a require
or an
-import
statement for instance). It is the ideal location to
-require, to import, to declare constants, or to declare utility functions to be used in rule actions
-and semantic predicates.
+ generated parser is loaded (through a require
or an
+ import
statement for instance). It is the ideal location to
+ require, to import, to declare constants, or to declare utility functions to be used in rule actions
+ and semantic predicates.
+
The per-parse initializer is called before the generated parser
-starts parsing. The code inside the per-parse initializer can access
-the input string and the options passed to the parser using the
-input
variable and the options
variable respectively.
-It is the ideal location to create data structures that are unique to each
-parse or to modify the input before the parse.
+ starts parsing. The code inside the per-parse initializer can access
+ the input string and the options passed to the parser using the
+ input
variable and the options
variable respectively.
+ It is the ideal location to create data structures that are unique to each
+ parse or to modify the input before the parse.
+
Let's look at the example grammar from above using a global
-initializer and a per-parse initializer:
+ initializer and a per-parse initializer:
{{'{{'}}
function makeInteger(o) {
@@ -577,625 +589,645 @@ Grammar Syntax and Semantics
= digits:[0-9]+ { return makeInteger(digits); }
The parsing expressions of the rules are used to match the input text to the
-grammar. There are various types of expressions — matching characters or
-character classes, indicating optional parts and repetition, etc. Expressions
-can also contain references to other rules. See detailed
-description below.
+ grammar. There are various types of expressions — matching characters or
+ character classes, indicating optional parts and repetition, etc. Expressions
+ can also contain references to other rules. See
+ detailed
+ description below.
If an expression successfully matches a part of the text when running the
-generated parser, it produces a match result, which is a JavaScript
-value. For example:
+ generated parser, it produces a match result, which is a JavaScript
+ value. For example:
-- An expression matching a literal string produces a JavaScript string
-containing matched text.
+ - An expression matching a literal string produces a JavaScript string
+ containing matched text.
-- An expression matching repeated occurrence of some subexpression produces
-a JavaScript array with all the matches.
+ - An expression matching repeated occurrence of some subexpression produces
+ a JavaScript array with all the matches.
The match results propagate through the rules when the rule names are used in
-expressions, up to the start rule. The generated parser returns start rule's
-match result when parsing is successful.
+ expressions, up to the start rule. The generated parser returns start rule's
+ match result when parsing is successful.
One special case of parser expression is a parser action — a
-piece of JavaScript code inside curly braces (“{” and “}”) that takes match
-results of the preceding expression and returns a JavaScript value.
-This value is then considered match result of the preceding expression (in other
-words, the parser action is a match result transformer).
+ piece of JavaScript code inside curly braces (“{” and “}”) that takes match
+ results of the preceding expression and returns a JavaScript value.
+ This value is then considered match result of the preceding expression (in other
+ words, the parser action is a match result transformer).
In our arithmetics example, there are many parser actions. Consider the
-action in expression digits:[0-9]+ { return parseInt(digits.join(""), 10);
-}
. It takes the match result of the expression [0-9]+, which is an array
-of strings containing digits, as its parameter. It joins the digits together to
-form a number and converts it to a JavaScript number
object.
+ action in expression digits:[0-9]+ { return parseInt(digits.join(""), 10); }
.
+ It takes the match result of the expression [0-9]+, which is an array
+ of strings containing digits, as its parameter. It joins the digits together to
+ form a number and converts it to a JavaScript number
object.
Parsing Expression Types
There are several types of parsing expressions, some of them containing
-subexpressions and thus forming a recursive structure. Each example below is
-a part of a full grammar, which produces an
-object that contains match
and rest
.
-match
is the part of the input that matched the example,
-rest
is any remaining input after the match.
+ subexpressions and thus forming a recursive structure. Each example below is
+ a part of a full grammar, which produces an
+ object that contains match
and rest
.
+ match
is the part of the input that matched the example,
+ rest
is any remaining input after the match.
-"literal"
'literal'
-
--
-
Match exact literal string and return it. The string syntax is the same
-as in JavaScript. Appending i
right after the literal makes the
-match case-insensitive.
-
-
-
-Example: literal = "foo"
-Matches: "foo"
-Does not match: "Foo"
, "fOo"
, "bar"
, "fo"
-
-
-Try it:
-
-
-
-
-
-
-
-Example: literal_i = "foo"i
-Matches: "foo"
, "Foo"
, "fOo"
-Does not match: "bar"
, "fo"
-
-
-Try it:
-
-
-
-
-
-
-.
(U+002E: FULL STOP, or "period")
-
--
-
Match exactly one character and return it as a string.
-
-
-Example: any = .
-Matches: "f"
, "."
, " "
-Does not match: ""
-
-
-Try it:
-
-
-
-
-
-
-[characters]
-
--
-
Match one character from a set and return it as a string. The characters
-in the list can be escaped in exactly the same way as in JavaScript string.
-The list of characters can also contain ranges (e.g. [a-z]
-means “all lowercase letters”). Preceding the characters with ^
-inverts the matched set (e.g. [^a-z]
means “all character but
-lowercase letters”). Appending i
right after the literal makes
-the match case-insensitive.
-
-
-
-Example: class = [a-z]
-Matches: "f"
-Does not match: "A"
, "-"
, ""
-
-
-Try it:
-
-
-
-
-
-
-
-Example: class_i = [^a-z]i
-Matches: "="
, " "
-Does not match: "F"
, "f"
, ""
-
-
-Try it:
-
-
-
-
-
-
-rule
-
--
-
Match a parsing expression of a rule (perhaps recursively) and return its match
-result.
-
-
-
-Example: rule = child; child = "foo"
-Matches: "foo"
-Does not match: "Foo"
, "fOo"
, "bar"
, "fo"
-
-
-Try it:
-
-
-
-
-
-
-( expression )
-
--
-
Match a subexpression and return its match result.
-
-
-
-Example: paren = ("1" { return 2; })+
-Matches: "11"
-Does not match: "2"
, ""
-
-
-Try it:
-
-
-
-
-
-
-expression *
-
--
-
Match zero or more repetitions of the expression and return their match
-results in an array. The matching is greedy, i.e. the parser tries to match
-the expression as many times as possible. Unlike in regular expressions,
-there is no backtracking.
-
-
-
-Example: star = "a"*
-Matches: "a"
, "aaa"
-Does not match: (always matches)
-
-
-Try it:
-
-
-
-
-
-
-expression +
-
--
-
Match one or more repetitions of the expression and return their match
-results in an array. The matching is greedy, i.e. the parser tries to match
-the expression as many times as possible. Unlike in regular expressions,
-there is no backtracking.
-
-
-
-Example: plus = "a"+
-Matches: "a"
, "aaa"
-Does not match: "b"
, ""
-
-
-Try it:
-
-
-
-
-
-
-expression |count|
-
expression |min..max|
-
expression |count, delimiter|
-
expression |min..max, delimiter|
-
--
-
Match exact count
repetitions of expression
.
- If the match succeeds, return their match results in an array.
-
- -or-
-
- Match expression at least min
but not more then max
times.
- If the match succeeds, return their match results in an array. Both min
- and max
may be omitted. If min
is omitted, then it is assumed
- to be 0
. If max
is omitted, then it is assumed to be infinity.
- Hence
-
-
- expression |..|
is equivalent to expression |0..|
- and expression *
- expression |1..|
is equivalent to expression +
-
-
- Optionally, delimiter
expression can be specified. The
- delimiter is a separate parser expression, its match results are ignored,
- and it must appear between matched expressions exactly once.
-
- count
, min
and max
can be represented as:
-
-
- - positive integer:
-
start = "a"|2|;
-
- - name of the preceding label:
-
start = count:n1 "a"|count|;
+ "literal"
'literal'
+
+ -
+
Match exact literal string and return it. The string syntax is the same
+ as in JavaScript. Appending i
right after the literal makes the
+ match case-insensitive.
+
+
+
+ Example: literal = "foo"
+ Matches: "foo"
+ Does not match: "Foo"
, "fOo"
, "bar"
, "fo"
+
+
+
+ Try it:
+
+
+
+
+
+
+
+ Example: literal_i = "foo"i
+ Matches: "foo"
, "Foo"
, "fOo"
+ Does not match: "bar"
, "fo"
+
+
+ Try it:
+
+
+
+
+
+
+ .
(U+002E: FULL STOP, or "period")
+
+ -
+
Match exactly one character and return it as a string.
+
+
+ Example: any = .
+ Matches: "f"
, "."
, " "
+ Does not match: ""
+
+
+ Try it:
+
+
+
+
+
+
+ [characters]
+
+ -
+
Match one character from a set and return it as a string. The characters
+ in the list can be escaped in exactly the same way as in JavaScript string.
+ The list of characters can also contain ranges (e.g. [a-z]
+ means “all lowercase letters”). Preceding the characters with ^
+ inverts the matched set (e.g. [^a-z]
means “all character but
+ lowercase letters”). Appending i
right after the literal makes
+ the match case-insensitive.
+
+
+
+ Example: class = [a-z]
+ Matches: "f"
+ Does not match: "A"
, "-"
, ""
+
+
+ Try it:
+
+
+
+
+
+
+
+ Example: class_i = [^a-z]i
+ Matches: "="
, " "
+ Does not match: "F"
, "f"
, ""
+
+
+ Try it:
+
+
+
+
+
+
+ rule
+
+ -
+
Match a parsing expression of a rule (perhaps recursively) and return its match
+ result.
+
+
+
+ Example: rule = child; child = "foo"
+ Matches: "foo"
+ Does not match: "Foo"
, "fOo"
, "bar"
, "fo"
+
+
+ Try it:
+
+
+
+
+
+
+ ( expression )
+
+ -
+
Match a subexpression and return its match result.
+
+
+
+ Example: paren = ("1" { return 2; })+
+ Matches: "11"
+ Does not match: "2"
, ""
+
+
+ Try it:
+
+
+
+
+
+
+ expression *
+
+ -
+
Match zero or more repetitions of the expression and return their match
+ results in an array. The matching is greedy, i.e. the parser tries to match
+ the expression as many times as possible. Unlike in regular expressions,
+ there is no backtracking.
+
+
+
+ Example: star = "a"*
+ Matches: "a"
, "aaa"
+ Does not match: (always matches)
+
+
+ Try it:
+
+
+
+
+
+
+ expression +
+
+ -
+
Match one or more repetitions of the expression and return their match
+ results in an array. The matching is greedy, i.e. the parser tries to match
+ the expression as many times as possible. Unlike in regular expressions,
+ there is no backtracking.
+
+
+
+ Example: plus = "a"+
+ Matches: "a"
, "aaa"
+ Does not match: "b"
, ""
+
+
+ Try it:
+
+
+
+
+
+
+ expression |count|
+
expression |min..max|
+
expression |count, delimiter|
+
expression |min..max, delimiter|
+
+ -
+
Match exact count
repetitions of expression
.
+ If the match succeeds, return their match results in an array.
+
+ -or-
+
+ Match expression at least min
but not more then max
times.
+ If the match succeeds, return their match results in an array. Both min
+ and max
may be omitted. If min
is omitted, then it is assumed
+ to be 0
. If max
is omitted, then it is assumed to be infinity.
+ Hence
+
+
+ expression |..|
is equivalent to expression |0..|
+ and expression *
+ expression |1..|
is equivalent to expression +
+
+
+ Optionally, delimiter
expression can be specified. The
+ delimiter is a separate parser expression, its match results are ignored,
+ and it must appear between matched expressions exactly once.
+
+ count
, min
and max
can be
+ represented as:
+
+
+ - positive integer:
+
start = "a"|2|;
+
+ - name of the preceding label:
+
start = count:n1 "a"|count|;
n1 = n:$[0-9] { return parseInt(n); };
-
- - code block:
-
start = "a"|{ return options.count; }|;
-
- Any non-number values, returned by the code block, will be interpreted as 0
.
-
-
-
-
- Example: repetition = "a"|2..3, ","|
- Matches: "a,a"
, "a,a,a"
- Does not match: "a"
, "b,b"
,
- "a,a,a,"
, "a,a,a,a"
+
+ - code block:
+
start = "a"|{ return options.count; }|;
+
+ Any non-number values, returned by the code block, will be interpreted as 0
.
+
+
+
+
+ Example: repetition = "a"|2..3, ","|
+ Matches: "a,a"
, "a,a,a"
+ Does not match: "a"
, "b,b"
,
+ "a,a,a,"
, "a,a,a,a"
+
+
+ Try it:
+
+
+
+
+
+
+ expression ?
+
+ -
+
Try to match the expression. If the match succeeds, return its match
+ result, otherwise return null
. Unlike in regular expressions,
+ there is no backtracking.
+
+
+
+ Example: maybe = "a"?
+ Matches: "a"
, ""
+ Does not match: (always matches)
+
+
+ Try it:
+
+
+
-
- Try it:
-
-
+
+
+
+ & expression
+
+ -
+
This is a positive assertion. No input is consumed.
+ Try to match the expression. If the match succeeds, just return
+ undefined
and do not consume any input, otherwise consider the
+ match failed.
+
+
+
+
+ Example: posAssertion = "a" &"b"
+ Matches: "ab"
+ Does not match: "ac"
, "a"
, ""
+
+
+ Try it:
+
+
+
+
+
+
+ ! expression
+
+ -
+
This is a negative assertion. No input is consumed.
+
+ Try to match the expression. If the match does
+ not succeed, just return undefined
and do not consume any
+ input, otherwise consider the match failed.
+
+
+
+ Example: negAssertion = "a" !"b"
+ Matches: "a"
, "ac"
+ Does not match: "ab"
, ""
+
+
+ Try it:
+
+
+
+
+
+
+ & { predicate }
+
+ -
+
This is a positive assertion. No input is consumed.
+
+ The predicate should be JavaScript code, and it's executed as a
+ function. Curly braces in the predicate must be balanced.
+
+ The predicate should return
a boolean value. If the result
+ is truthy, it's match result is undefined
, otherwise the
+ match is considered failed. Failure to include the return
+ keyword is a common mistake.
+
+ The predicate has access to all variables and functions in the
+ Action Execution Environment.
+
+
+
+
+ Example:
posPredicate = [0-9]+ &{return parseInt(match, 10) < 100}
+ Matches: "0"
, "99"
+ Does not match: "100"
, "-1"
, ""
+
+
+ Try it:
+
+
+
+
+
+
+ ! { predicate }
+
+ -
+
This is a negative assertion. No input is consumed.
+
+ The predicate should be JavaScript code, and it's executed as a
+ function. Curly braces in the predicate must be balanced.
+
+ The predicate should return
a boolean value. If the result is
+ falsy, it's match result is undefined
, otherwise the match is
+ considered failed.
+
+ The predicate has access to all variables and functions in the
+ Action Execution Environment.
+
+
+
+
+ Example:
negPredicate = $[0-9]+ !{ return parseInt(match, 10) < 100 }
+ Matches: "100"
, "156"
+ Does not match: "56"
, "-1"
, ""
+
+
+ Try it:
+
+
+
-
-
-
-expression ?
-
--
-
Try to match the expression. If the match succeeds, return its match
-result, otherwise return null
. Unlike in regular expressions,
-there is no backtracking.
-
-
-
-Example: maybe = "a"?
-Matches: "a"
, ""
-Does not match: (always matches)
-
-
-Try it:
-
-
-
-
-
-
-
-& expression
-
--
-
This is a positive assertion. No input is consumed.
-Try to match the expression. If the match succeeds, just return
-undefined
and do not consume any input, otherwise consider the
-match failed.
-
-
-
-Example: posAssertion = "a" &"b"
-Matches: "ab"
-Does not match: "ac"
, "a"
, ""
-
-
-Try it:
-
-
-
-
-
-
-! expression
-
--
-
This is a negative assertion. No input is consumed.
-
-Try to match the expression. If the match does
-not succeed, just return undefined
and do not consume any
-input, otherwise consider the match failed.
-
-
-
-Example: negAssertion = "a" !"b"
-Matches: "a"
, "ac"
-Does not match: "ab"
, ""
-
-
-Try it:
-
-
-
-
-
-
-& { predicate }
-
--
-
This is a positive assertion. No input is consumed.
-
-The predicate should be JavaScript code, and it's executed as a
-function. Curly braces in the predicate must be balanced.
-
-The predicate should return
a boolean value. If the result
-is truthy, it's match result is undefined
, otherwise the
-match is considered failed. Failure to include the return
-keyword is a common mistake.
-
-The predicate has access to all variables and functions in the
-Action Execution Environment.
-
-
-
-Example:
posPredicate = [0-9]+ &{return parseInt(match, 10) < 100}
-Matches: "0"
, "99"
-Does not match: "100"
, "-1"
, ""
-
-
-Try it:
-
-
-
-
-
-
-! { predicate }
-
--
-
This is a negative assertion. No input is consumed.
-
-The predicate should be JavaScript code, and it's executed as a
-function. Curly braces in the predicate must be balanced.
-
-The predicate should return
a boolean value. If the result is
-falsy, it's match result is undefined
, otherwise the match is
-considered failed.
-
-The predicate has access to all variables and functions in the
-Action Execution Environment.
-
-
-
-Example:
negPredicate = $[0-9]+ !{ return parseInt(match, 10) < 100 }
-Matches: "100"
, "156"
-Does not match: "56"
, "-1"
, ""
-
-
-Try it:
-
-
-
-
-
-
-$ expression
-
--
-
Try to match the expression. If the match succeeds, return the matched
-text instead of the match result.
-
-If you need to return the matched text in an action, use the
-text()
function.
-
-
-
-Example: dollar = $"a"+
-Matches: "a"
, "aa"
-Does not match: "b"
, ""
-
-
-Try it:
-
-
-
-
-
-
-label : expression
-
--
-
Match the expression and remember its match result under given label. The
-label must be a JavaScript identifier, which includes not being in the list of
-reserved words. By default this is a list of JavaScript
-reserved words, but plugins can change it.
-
-Labeled expressions are useful together with actions, where saved match
-results can be accessed by action's JavaScript code.
-
-
-
-Example: label = foo:"bar"i { return {foo}; }
-Matches: "bar"
, "BAR"
-Does not match: "b"
, ""
-
-
-Try it:
-
-
-
-
-
-
-@ ( label : )? expression
-
--
-
Match the expression and if the label exists, remember its match result
-under given label. The label must be a JavaScript identifier if it
-exists, but not in the list of reserved words.
-By default this is a list of JavaScript reserved words,
-but plugins can change it.
-
-Return the value of this expression from the rule, or "pluck" it. You
-may not have an action for this rule. The expression must not be a
-semantic predicate (&{ predicate }
or
-!{ predicate }
). There may be multiple
-pluck expressions in a given rule, in which case an array of the plucked
-expressions is returned from the rule.
-
-Pluck expressions are useful for writing terse grammars, or returning
-parts of an expression that is wrapped in parentheses.
-
-
-
-Example: pluck_1 = @$"a"+ " "+ @$"b"+
-Matches: "aaa "
, "a "
-Does not match: "b"
, " "
-
-
-Try it:
-
-
-
-
-
-
-
-Example: pluck_2 = @$"a"+ " "+ @two:$"b"+
-Matches: "aaa b"
, "a bbb"
-Does not match: "b"
, " "
-
-
-Try it:
-
-
-
-
-
-
-expression1 expression2 ... expressionn
-
--
-
Match a sequence of expressions and return their match results in an array.
-
-
-
-Example: sequence = "a" "b" "c"
-Matches: "abc"
-Does not match: "b"
, " "
-
-
-Try it:
-
-
-
-
-
-
-expression { action }
-
--
-
If the expression matches successfully, run the action, otherwise
-consider the match failed.
-
-The action should be JavaScript code, and it's executed as a
-function. Curly braces in the action must be balanced.
-
-The action should return
some value, which will be used as the
-match result of the expression.
-
-The action has access to all variables and functions in the
-Action Execution Environment.
-
-
-
-Example: action = " "+ "a" { return location(); }
-Matches: " a"
-Does not match: "a"
, " "
-
-
-Try it:
-
-
-
-
-
-
-expression1 / expression2 / ... / expressionn
-
--
-
Try to match the first expression, if it does not succeed, try the second
-one, etc. Return the match result of the first successfully matched
-expression. If no expression matches, consider the match failed.
-
-
-
-Example: alt = "a" / "b" / "c"
-Matches: "a"
, "b"
, "c"
-Does not match: "d"
, ""
-
-
-Try it:
-
-
-
-
-
+
+
+ $ expression
+
+ -
+
Try to match the expression. If the match succeeds, return the matched
+ text instead of the match result.
+
+ If you need to return the matched text in an action, use the
+ text()
function.
+
+
+
+
+ Example: dollar = $"a"+
+ Matches: "a"
, "aa"
+ Does not match: "b"
, ""
+
+
+ Try it:
+
+
+
+
+
+
+ label : expression
+
+ -
+
Match the expression and remember its match result under given label. The
+ label must be a Peggy identifier.
+
+ Labeled expressions are useful together with actions, where saved match
+ results can be accessed by action's JavaScript code.
+
+
+
+ Example: label = foo:"bar"i { return {foo}; }
+ Matches: "bar"
, "BAR"
+ Does not match: "b"
, ""
+
+
+ Try it:
+
+
+
+
+
+
+ @ ( label : )? expression
+
+ -
+
Match the expression and if the label exists, remember its match result
+ under given label. The label must be a Peggy
+ identifier, and must be valid as a function parameter
+ in the language that is being generated (by default, JavaScript).
+
+ Return the value of this expression from the rule, or "pluck" it. You
+ may not have an action for this rule. The expression must not be a
+ semantic predicate (&{ predicate }
or
+ !{ predicate }
). There may be multiple
+ pluck expressions in a given rule, in which case an array of the plucked
+ expressions is returned from the rule.
+
+
+ Pluck expressions are useful for writing terse grammars, or returning
+ parts of an expression that is wrapped in parentheses.
+
+
+
+ Example: pluck_1 = @$"a"+ " "+ @$"b"+
+ Matches: "aaa "
, "a "
+ Does not match: "b"
, " "
+
+
+ Try it:
+
+
+
+
+
+
+
+ Example: pluck_2 = @$"a"+ " "+ @two:$"b"+
+ Matches: "aaa b"
, "a bbb"
+ Does not match: "b"
, " "
+
+
+ Try it:
+
+
+
+
+
+
+ expression1 expression2 ... expressionn
+
+
+ -
+
Match a sequence of expressions and return their match results in an array.
+
+
+
+ Example: sequence = "a" "b" "c"
+ Matches: "abc"
+ Does not match: "b"
, " "
+
+
+ Try it:
+
+
+
+
+
+
+ expression { action }
+
+ -
+
If the expression matches successfully, run the action, otherwise
+ consider the match failed.
+
+ The action should be JavaScript code, and it's executed as a
+ function. Curly braces in the action must be balanced.
+
+ The action should return
some value, which will be used as the
+ match result of the expression.
+
+ The action has access to all variables and functions in the
+ Action Execution Environment.
+
+
+
+
+ Example: action = " "+ "a" { return location(); }
+ Matches: " a"
+ Does not match: "a"
, " "
+
+
+ Try it:
+
+
+
+
+
+
+ -
+
expression1 / expression2 / ... / expressionn
+
+
+ -
+
Try to match the first expression, if it does not succeed, try the second
+ one, etc. Return the match result of the first successfully matched
+ expression. If no expression matches, consider the match failed.
+
+
+
+ Example: alt = "a" / "b" / "c"
+ Matches: "a"
, "b"
, "c"
+ Does not match: "d"
, ""
+
+
+ Try it:
+
+
+
+
+
Action Execution Environment
Actions and predicates have these variables and functions
-available to them.
+ available to them.
-All variables and functions defined in the initializer or the top-level
-initializer at the beginning of the grammar are available.
-
-Note, that all functions and variables, described below, are unavailable
-in the global initializer.
-
-Labels from preceding expressions are available as local variables,
-which will have the match result of the labelled expressions.
-A label is only available after its labelled expression is matched:
-rule = A:('a' B:'b' { /* B is available, A is not */ } )
-A label in a sub-expression is only valid within the
-sub-expression:
-rule = A:'a' (B:'b') (C:'b' { /* A and C are available, B is not */ })
-
-input
is a parsed string that was passed to the parse()
method.
-
-options
is a variable that contains the parser options.
-That is the same object that was passed to the parse()
method.
-
-error(message, where)
will report an error and throw an exception.
-where
is optional; the default is the value of location()
.
-
-expected(message, where)
is similar to error
, but reports
-
-Expected message but "other" found.
-
-where other
is, by default, the character in the location().start.offset
position.
-
-location()
returns an object with the information about the parse position.
-Refer to the corresponding section for the details.
-
-range()
is similar to location()
, but returns an object with offsets only.
-Refer to the "Locations" section for the details.
-
-offset()
returns only the start offset, i.e. location().start.offset
.
-Refer to the "Locations" section for the details.
-
-text()
returns the source text between start
and end
(which will be ""
for
-predicates). Instead of using that function as a return value for the rule consider
-using the $
operator.
-
+ -
+
All variables and functions defined in the initializer or the top-level
+ initializer at the beginning of the grammar are available.
+
+ -
+
Note, that all functions and variables, described below, are unavailable
+ in the global initializer.
+
+ -
+
Labels from preceding expressions are available as local variables,
+ which will have the match result of the labelled expressions.
+ A label is only available after its labelled expression is matched:
+ rule = A:('a' B:'b' { /* B is available, A is not */ } )
+ A label in a sub-expression is only valid within the
+ sub-expression:
+ rule = A:'a' (B:'b') (C:'b' { /* A and C are available, B is not */ })
+
+ -
+
input
is a parsed string that was passed to the parse()
method.
+
+ -
+
options
is a variable that contains the parser options.
+ That is the same object that was passed to the parse()
method.
+
+ -
+
error(message, where)
will report an error and throw an exception.
+ where
is optional; the default is the value of location()
.
+
+
+ -
+
expected(message, where)
is similar to error
, but reports
+
+ Expected message but "other" found.
+
+ where other
is, by default, the character in the location().start.offset
position.
+
+ -
+
location()
returns an object with the information about the parse position.
+ Refer to the corresponding section for the details.
+
+ -
+
range()
is similar to location()
, but returns an object with offsets only.
+ Refer to the "Locations" section for the details.
+
+ -
+
offset()
returns only the start offset, i.e. location().start.offset
.
+ Refer to the "Locations" section for the details.
+
+ -
+
text()
returns the source text between start
and end
(which will be
+ ""
for
+ predicates). Instead of using that function as a return value for the rule consider
+ using the $
operator.
+
Parsing Lists
One of the most frequent questions about Peggy grammars is how to parse a
-delimited list of items. The cleanest current approach is:
+ delimited list of items. The cleanest current approach is:
list
= word|.., _ "," _|
@@ -1227,11 +1259,51 @@ Parsing Lists
= [ \t]*
Note that the @
in the tail section plucks the word out of the
-parentheses, NOT out of the rule itself.
+ parentheses, NOT out of the rule itself.
+
+Peggy Identifiers
+
+Peggy Identifiers are used as rule names, rule references, and label names.
+ They are used as identifiers in the code that Peggy generates (by default,
+ JavaScript), and as such, must conform to the limitations of the Peggy grammar
+ as well as those of the target language.
+
+Like all Peggy grammar constructs, identifiers MUST contain only codepoints in the
+ Basic
+ Multilingual Plane. They must begin with a codepoint whose Unicode
+ General Category property is Lu, Ll, Lt, Lm, Lo, or Nl (letters),
+ "_" (underscore), or a Unicode escape in the form \uXXXX
.
+ Subsequent codepoints can be any of those that are valid as an initial
+ codepoint, "$", codepoints whose General Category property is Mn or Mc
+ (combining characters), Nd (numbers), Pc (connector punctuation),
+ "\u200C" (zero width non-joiner), or "\u200D (zero width joiner)"
+
+Labels have a further restriction, which is that they must be valid as
+ a function parameter in the language being generated. For JavaScript, this
+ means that they cannot be on the limited set of
+ JavaScript
+ reserved words. Plugins can modify the list of reserved words at compile time.
+
+
+Valid identifiers:
+
+ Foo
+ Bär
+ _foo
+ foo$bar
+
+
+Invalid identifiers:
+
+ const
(reserved word)
+ 𐓁𐒰͘𐓐𐓎𐓊𐒷
(valid in JavaScript, but not in the Basic Multilingual Plane)
+ $Bar
(starts with "$")
+ foo bar
(invalid JavaScript identifier containing space)
+
Error Messages
As described above, you can annotate your grammar rules with human-readable
-names that will be used in error messages. For example, this production:
+ names that will be used in error messages. For example, this production:
integer "simple number"
= digits:[0-9]+
@@ -1240,8 +1312,8 @@ Error Messages
Expected simple number but "a" found.
when parsing a non-number, referencing the human-readable name "simple
-number." Without the human-readable name, Peggy instead uses a description of
-the character class that failed to match:
+ number." Without the human-readable name, Peggy instead uses a description of
+ the character class that failed to match:
Expected [0-9] but "a" found.
@@ -1266,8 +1338,15 @@ Error Messages
There are two classes of errors in Peggy:
-SyntaxError
: Syntax errors, found during parsing the input. This kind of errors can be thrown both during grammar parsing and during input parsing. Although name is the same, errors of each generated parser (including Peggy parser itself) has its own unique class.
-GrammarError
: Grammar errors, found during construction of the parser. These errors can be thrown only in the parser generation phase. This error signals a logical mistake in the grammar, such as having two rules with the same name in one grammar, etc.
+ SyntaxError
: Syntax errors, found during parsing the input.
+ This kind of errors can be thrown both during grammar parsing and
+ during input parsing. Although name is the same, errors of each
+ generated parser (including Peggy parser itself) has its own unique
+ class.
+ GrammarError
: Grammar errors, found during construction of
+ the parser. These errors can be thrown only in the parser generation phase.
+ This error signals a logical mistake in the grammar, such as having two
+ rules with the same name in one grammar, etc.
Both of these errors have the format()
method that takes an array of mappings from source to grammar text:
@@ -1311,15 +1390,17 @@ Error Messages
3 | end = !start
| ^^^^^
-A plugin may register additional passes that can generate GrammarError
s to report about
-problems, but they shouldn't do that by throwing an instance of GrammarError
. They should
-use the session API instead.
+A plugin may register additional passes that can generate
+ GrammarError
s to report about problems, but they shouldn't do
+ that by throwing an instance of GrammarError
. They should use the
+ session API instead.
Locations
-During the parsing you can access to the information of the current parse location,
-such as offset in the parsed string, line and column information. You can get this
-information by calling location()
function, which returns you the following object:
+During the parsing you can access to the information of the current parse
+ location, such as offset in the parsed string, line and column information.
+ You can get this information by calling location()
function,
+ which returns you the following object:
{
source: options.grammarSource,
@@ -1329,11 +1410,11 @@ Locations
source
is an any object that was supplied in the grammarSource
option in
-the parse()
call. That object can be used to hold reference to the origin of
-the grammar, for example, it can be a filename. It is recommended that this
-object have a toString()
implementation that returns meaningful string,
-because that string will be used when getting formatted error representation
-with e.format()
.
+ the parse()
call. That object can be used to hold reference to the origin of
+ the grammar, for example, it can be a filename. It is recommended that this
+ object have a toString()
implementation that returns meaningful string,
+ because that string will be used when getting formatted error representation
+ with e.format()
.
For certain special cases, you can use an instance of the
GrammarLocation
class as the grammarSource
.
@@ -1342,14 +1423,14 @@
Locations
document.
If source
is null
or undefined
it doesn't appear in the formatted messages.
-The default value for source
is undefined
.
+ The default value for source
is undefined
.
For actions, start
refers to the position at the beginning of the preceding
-expression, and end
refers to the position after the end of the preceding
-expression.
+ expression, and end
refers to the position after the end of the preceding
+ expression.
For semantic predicates, start
and end
are equal, denoting the location where
-the predicate is evaluated.
+ the predicate is evaluated.
For the per-parse initializer, the location is the start of the input, i.e.
@@ -1364,11 +1445,11 @@ Locations
line
and column
are 1-based indices.
The line number is incremented each time the parser finds an end of line sequence in
-the input.
+ the input.
Line and column are somewhat expensive to compute, so if you just need the
-offset, there's also a function offset()
that returns just the
-start offset, and a function range()
that returns the object:
+ offset, there's also a function offset()
that returns just the
+ start offset, and a function range()
that returns the object:
{
source: options.grammarSource,
@@ -1377,31 +1458,30 @@ Locations
}
(i.e. difference from the location()
result only in type of
-start
and end
properties, which contain just an
-offset instead of the Location
-object.)
+ start
and end
properties, which contain just an
+ offset instead of the Location
+ object.)
All of the notes about values for location()
object are also
-applicable to the range()
-and offset()
calls.
+ applicable to the range()
and offset()
calls.
-Currently, Peggy only works with the Basic Multilingual Plane (BMP) of Unicode.
-This means that all offsets are measured in UTF-16 code units. If you
-try to parse characters outside this Plane (for example, emoji, or any
-surrogate pairs), you may get an offset inside a code point.
+Currently, Peggy grammars may only contain codepoints from the
+ Basic
+ Multilingual Plane (BMP) of Unicode.
+ This means that all offsets are measured in UTF-16 code units. If you
+ include characters outside this Plane (for example, emoji, or any
+ surrogate pairs), you may get an offset inside a code point.
Changing this behavior might be a breaking change, so it will likely cause
-a major version number increase if it happens. You can join to the discussion
-for this topic on the GitHub Discussions
-page.
+ a major version number increase if it happens. You can join to the discussion
+ for this topic on the GitHub Discussions
+ page.
Plugins API
A plugin is an object with the use(config, options)
method.
-That method will be called for all plugins in the options.plugins
-array, supplied to the generate()
-method.
+ That method will be called for all plugins in the options.plugins
+ array, supplied to the generate()
method.
use
accepts these parameters:
@@ -1409,108 +1489,121 @@ config
Object with the following properties:
-parser
-Parser
object, by default the peggy.parser
instance. That object
-will be used to parse the grammar. Plugin can replace this object
-
-passes
--
-
Mapping { [stage: string]: Pass[] }
that represents compilation
-stages that would applied to the AST, returned by the parser
object. That
-mapping will contain at least the following keys:
-
-
-check
— passes that check AST for correctness. They shouldn't change the AST
-transform
— passes that performs various optimizations. They can change
-the AST, add or remove nodes or their properties
-generate
— passes used for actual code generating
-
-
-A plugin that implements a pass should usually push it to the end of the correct
-array. Each pass is a function with the signature pass(ast, options, session)
:
-
-
-ast
— the AST created by the config.parser.parse()
method
-options
— compilation options passed to the peggy.compiler.compile()
method.
-If parser generation is started because generate()
function was called that
-is also an options, passed to the generate()
method
-session
— a Session
object that allows raising errors,
-warnings and informational messages
-
-
-
-reservedWords
--
-
String array with a list of words that shouldn't be used as
-label names. This list can be modified by plugins. That property is not required
-to be sorted or not contain duplicates, but it is recommend to remove duplicates.
-
-Default list contains JavaScript reserved words, and can be found
-in the peggy.RESERVED_WORDS
property.
-
+ parser
+ Parser
object, by default the peggy.parser
instance. That object
+ will be used to parse the grammar. Plugin can replace this object
+
+ passes
+ -
+
Mapping { [stage: string]: Pass[] }
that represents compilation
+ stages that would applied to the AST, returned by the parser
object. That
+ mapping will contain at least the following keys:
+
+
+ check
— passes that check AST for correctness. They shouldn't change the AST
+ transform
— passes that performs various optimizations. They can change
+ the AST, add or remove nodes or their properties
+ generate
— passes used for actual code generating
+
+
+ A plugin that implements a pass should usually push it to the end of the correct
+ array. Each pass is a function with the signature pass(ast, options, session)
:
+
+
+ ast
— the AST created by the
+ config.parser.parse()
method
+
+ options
— compilation options passed to the
+ peggy.compiler.compile()
method. If parser generation is
+ started because generate()
function was called that is also an
+ options, passed to the generate()
method
+
+ session
— a Session
+ object that allows raising errors, warnings and informational messages
+
+
+
+ reservedWords
+ -
+
String array with a list of words that shouldn't be used as label
+ names. This list can be modified by plugins. That property is not required
+ to be sorted or not contain duplicates, but it is recommend to remove
+ duplicates.
+
+ Default list contains JavaScript
+ reserved words, and can be found in the peggy.RESERVED_WORDS
+ property.
+
options
-- Build options passed to the
generate()
method. A best practice for
-a plugin would look for its own options under a <plugin_name>
key.
+
+Build options passed to the generate()
method. A best practice
+ for a plugin would look for its own options under a
+ <plugin_name>
key.
Session API
-Each compilation request is represented by a Session
instance. An object of this class
-is created by the compiler and given to each pass as a 3rd parameter. The session
-object gives access to the various compiler services. At the present time there is only
-one such service: reporting of diagnostics.
+Each compilation request is represented by a Session
instance.
+ An object of this class is created by the compiler and given to each pass as a
+ 3rd parameter. The session object gives access to the various compiler
+ services. At the present time there is only one such service: reporting of
+ diagnostics.
-All diagnostics are divided into three groups: errors, warnings and informational
-messages. For each of them the Session
object has a method, described below.
+All diagnostics are divided into three groups: errors, warnings and
+ informational messages. For each of them the Session
object has a
+ method, described below.
All reporting methods have an identical signature:
(message: string, location?: LocationRange, notes?: DiagnosticNote[]) => void;
-message
: a main diagnostic message
-location
: an optional location information if diagnostic is related to the grammar
-source code
-notes
: an array with additional details about diagnostic, pointing to the
-different places in the grammar. For example, each note could be a location of
-a duplicated rule definition
+ message
: a main diagnostic message
+ location
: an optional location information if diagnostic is related to the grammar
+ source code
+ notes
: an array with additional details about diagnostic, pointing to the
+ different places in the grammar. For example, each note could be a location of
+ a duplicated rule definition
-error(...)
--
-
Reports an error. Compilation process is subdivided into pieces called stages and
-each stage consist of one or more passes. Within the one stage all errors, reported
-by different passes, are collected without interrupting the parsing process.
-
-When all passes in the stage are completed, the stage is checked for errors. If one
-was registered, a GrammarError
with all found problems in the problems
property
-is thrown. If there are no errors, then the next stage is processed.
-
-After processing all three stages (check
, transform
and generate
) the compilation
-process is finished.
-
-The process, described above, means that passes should be careful about what they do.
-For example, if you place your pass into the check
stage there is no guarantee that
-all rules exists, because checking for existing rules is also performed during the
-check
stage. On the contrary, passes in the transform
and generate
stages can be
-sure that all rules exists, because that precondition was checked on the check
stage.
-
-
-warning(...)
-- Reports a warning. Warnings are similar to errors, but they do not interrupt a compilation.
-
-info(...)
-- Report an informational message. This method can be used to inform user about
-significant changes in the grammar, for example, replacing proxy rules.
+ error(...)
+ -
+
Reports an error. Compilation process is subdivided into pieces called stages and
+ each stage consist of one or more passes. Within the one stage all errors, reported
+ by different passes, are collected without interrupting the parsing process.
+
+ When all passes in the stage are completed, the stage is checked for errors. If one
+ was registered, a GrammarError
with all found problems in the problems
property
+ is thrown. If there are no errors, then the next stage is processed.
+
+ After processing all three stages (check
, transform
and generate
) the
+ compilation
+ process is finished.
+
+ The process, described above, means that passes should be careful about what they do.
+ For example, if you place your pass into the check
stage there is no guarantee that
+ all rules exists, because checking for existing rules is also performed during the
+ check
stage. On the contrary, passes in the transform
and generate
stages
+ can be
+ sure that all rules exists, because that precondition was checked on the check
stage.
+
+
+
+ warning(...)
+ - Reports a warning. Warnings are similar to errors, but they do not interrupt a compilation.
+
+ info(...)
+ - Report an informational message. This method can be used to inform user about
+ significant changes in the grammar, for example, replacing proxy rules.
Compatibility
Both the parser generator and generated parsers should run well in the
-following environments:
+ following environments:
- Node.js 14+
@@ -1521,9 +1614,9 @@ Compatibility
- Opera
-The generated parser is intended to run in older environments when the format
-chosen is "globals" or "umd". Extensive testing is NOT performed in these
-environments, but issues filed regarding the generated code will be fixed.
+The generated parser is intended to run in older environments when the format
+ chosen is "globals" or "umd". Extensive testing is NOT performed in these
+ environments, but issues filed regarding the generated code will be fixed.