Skip to content

Commit

Permalink
Fixes peggyjs#347. Makes $ invalid as an identifier start character.
Browse files Browse the repository at this point in the history
  • Loading branch information
hildjj committed Feb 25, 2023
1 parent db2d723 commit 457ea8e
Show file tree
Hide file tree
Showing 8 changed files with 90 additions and 56 deletions.
4 changes: 4 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,10 @@ Released: TBD

### Bug Fixes

- [#347](https://github.com/peggyjs/peggy/issues/347) Disallow '$' as an initial
character in identifiers. This is not a breaking change because no grammar
could have successfully used these in the past. From @hildjj.

3.0.0
-----

Expand Down
56 changes: 42 additions & 14 deletions docs/documentation.html
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ <h2 id="table-of-contents">Table of Contents</h2>
<li><a href="#parsing-lists">Parsing Lists</a></li>
</ul>
</li>
<li><a href="#identifiers">Peggy Identifiers</a></li>
<li><a href="#error-messages">Error Messages</a></li>
<li><a href="#locations">Locations</a></li>
<li>
Expand Down Expand Up @@ -519,12 +520,12 @@ <h2 id="grammar-syntax-and-semantics">Grammar Syntax and Semantics</h2>
<code>integer</code> rule has a human-readable name). The parsing starts at the
first rule, which is also called the <em>start rule</em>.</p>

<p>A rule name must be a JavaScript identifier. It is followed by an equality
sign (“=”) and a parsing expression. If the rule has a human-readable name, it
is written as a JavaScript string between the rule name and the equality sign.
Rules need to be separated only by whitespace (their beginning is easily
recognizable), but a semicolon (“;”) after the parsing expression is
allowed.</p>
<p>A rule name must be a Peggy <a href="#identifiers">identifier</a>. It is
followed by an equality sign (“=”) and a parsing expression. If the rule has a
human-readable name, it is written as a JavaScript string between the rule
name and the equality sign. Rules need to be separated only by whitespace
(their beginning is easily recognizable), but a semicolon (“;”) after the
parsing expression is allowed.</p>

<p>The first rule can be preceded by a <em>global initializer</em> and/or a
<em>per-parse initializer</em>, in that order. Both are pieces of JavaScript
Expand Down Expand Up @@ -1005,10 +1006,7 @@ <h3 id="grammar-syntax-and-semantics-parsing-expression-types">Parsing Expressio

<dd>
<p>Match the expression and remember its match result under given label. The
label must be a JavaScript identifier, which includes not being in the list of
reserved words. By default this is a list of <a
href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Lexical_grammar#reserved_words">JavaScript
reserved words</a>, but <a href="#plugins-api">plugins</a> can change it.</p>
label must be a Peggy <a href="#identifiers">identifier</a>.</p>

<p>Labeled expressions are useful together with actions, where saved match
results can be accessed by action's JavaScript code.</p>
Expand All @@ -1031,10 +1029,7 @@ <h3 id="grammar-syntax-and-semantics-parsing-expression-types">Parsing Expressio

<dd>
<p>Match the expression and if the label exists, remember its match result
under given label. The label must be a JavaScript identifier if it
exists, but not in the list of reserved words.
By default this is a list of <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Lexical_grammar#reserved_words">JavaScript reserved words</a>,
but <a href="#plugins-api">plugins</a> can change it.</p>
under given label. The label must be a Peggy <a href="#identifiers">identifier</a>.</p>

<p>Return the value of this expression from the rule, or "pluck" it. You
may not have an action for this rule. The expression must not be a
Expand Down Expand Up @@ -1229,6 +1224,39 @@ <h3 id="parsing-lists">Parsing Lists</h3>
<p>Note that the <code>@</code> in the tail section plucks the word out of the
parentheses, NOT out of the rule itself.</p>

<h2 id="identifiers">Peggy Identifiers</h2>

<p>Peggy Identifiers are used as rule names, rule references, and label names.
They are used as JavaScript identifiers in the code that Peggy generates, and
as such, must conform to the limitations of the Peggy grammar as well as those
of the JavaScript grammar.</p>

<p>Peggy identifiers are almost any valid JavaScript
<a href="https://262.ecma-international.org/13.0/#prod-IdentifierName">IdentifierName</a> in the
<a href="https://en.wikipedia.org/wiki/Plane_(Unicode)#Basic_Multilingual_Plane">Basic
Multilingual Plane</a>. However, identifiers MUST NOT start with a "$" character
to avoid interactions with Peggy's <code>$</code> operator.
Any word that might be a
<a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Lexical_grammar#reserved_words">JavaScript reserved word</a>
in any context is also disallowed. Plugins can modify the list of reserved
words at compile time.</p>

<p>Valid identifiers:</p>
<ul>
<li><code>Foo</code></li>
<li><code>Bär</code></li>
<li><code>_foo</code></li>
<li><code>foo$bar</code></li>
</ul>

<p><b>Invalid</b> identifiers:</p>
<ul>
<li><code>const</code> (reserved word)</li>
<li><code>𐓁𐒰͘𐓐𐓎𐓊𐒷</code> (valid in JavaScript, but not in the Basic Multilingual Plane)</li>
<li><code>$Bar</code> (starts with "$")</li>
<li><code>foo bar</code> (invalid JavaScript identifier containing space)</li>
</ul>

<h2 id="error-messages">Error Messages</h2>
<p>As described above, you can annotate your grammar rules with human-readable
names that will be used in error messages. For example, this production:</p>
Expand Down
2 changes: 1 addition & 1 deletion docs/js/benchmark-bundle.min.js

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/js/test-bundle.min.js

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion docs/vendor/peggy/peggy.min.js

Large diffs are not rendered by default.

74 changes: 37 additions & 37 deletions lib/parser.js

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion src/parser.pegjs
Original file line number Diff line number Diff line change
Expand Up @@ -323,12 +323,12 @@ IdentifierName "identifier"

IdentifierStart
= UnicodeLetter
/ "$"
/ "_"
/ "\\" @UnicodeEscapeSequence

IdentifierPart
= IdentifierStart
/ "$"
/ UnicodeCombiningMark
/ UnicodeDigit
/ UnicodeConnectorPunctuation
Expand Down
4 changes: 3 additions & 1 deletion test/unit/parser.spec.js
Original file line number Diff line number Diff line change
Expand Up @@ -902,7 +902,9 @@ describe("Peggy grammar parser", () => {
// Canonical IdentifierStart is "a".
it("parses IdentifierStart", () => {
expect("start = a").to.parseAs(ruleRefGrammar("a"));
expect("start = $").to.parseAs(ruleRefGrammar("$"));
expect("start = $").to.failToParse();
expect("$start = a").to.failToParse();
expect("start = a$b").to.parseAs(ruleRefGrammar("a$b"));
expect("start = _").to.parseAs(ruleRefGrammar("_"));
expect("start = \\u0061").to.parseAs(ruleRefGrammar("a"));
});
Expand Down

0 comments on commit 457ea8e

Please sign in to comment.