-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Finalize grammar composition story #30
Comments
Moved from a code comment: We can either have extending callers pull the rule text out of repr, or we could get fancy and define |
I guess we'll have to go with something heavier, like |
One nice thing about spitting out a new class for each Grammar is that we could use super() to implement parent-rule delegation. But OTOH, I don't want to get married to simply calling other rules; it stomps on the possibility for optimization. Strawman:
|
We'll use the
We don't override
We should move the custom rules into a The question arose whether to support delegation to non-uniquely named rules within a single grammar:
I believe this is a bad idea. It causes the order of rules within a grammar to matter in one special case; this isn't true more generally and could lead to confusion. For instance, if the user views grammar addition as concatenation of the string-based rules, he will be surprised when
Thus, order of rules will not matter except that later rules of a given name within a single grammar will shadow earlier ones. (Subgrammar rules of a given name will also shadow rules of that name from the supergrammar, except for when Delegated-to but otherwise shadowed supergrammar rules remain only as anonymous sub-Expressions of the delegating rule. We strip the name off so the subgrammar's visitor method receives the output of only the new rule. If it likes, the method can delegate to the supergrammar's visitation code as appropriate. (As a bonus, this keeps multiple same-named rules from existing in a grammar, which would be un-dict-like and make reprs inaccurate.) To implement, we will pass zero or more supergrammars into RuleVisitor at initialization. A Custom rules should not be a problem. They either reference other rules lazily, by dereferencing the grammar at parse time (and so are not in danger of accidentally referencing an overridden supergrammar rule), or they contain LazyReferences, which should resolve just as in any other Expression. Like with other Expressions, we will keep uncompiled versions of them around to allow for grammar extension. We might add some facility for custom rules to do delegation: perhaps a DelegatedReference akin to a LazyReference. |
@halst, I wouldn't mind your feedback on the above. |
Consider that OMeta's delegation syntax is just "super"; that is, it doesn't allow delegation to supergrammar rules other than the one you're currently overriding. Is it useful to be able to do so, or should we simplify and make our syntax shorter? |
Another thing I was thinking, inheritance is good, but maybe there is a place for composition. For example: version = Grammar('''
version <- digit+ "." digit+ "." digit+
digit <- [0-9]''')
semantic_version = Grammar('''
version <- basic_version ("-" meta_data)? ("+" meta_data)?
meta_data <- [0-9A-Za-z-]+''',
basic_version = version) |
In that case, you might also like the critique of
|
😄 |
You raise a good point about composition. Sometimes it's nice to overlay the namespaces and override rules; other times, it's nice to compose and let the grammar we're delegating to encapsulate its rules and keep the relationships between them. If nothing else, it would be nice to know that rules I add to my subgrammar won't collide with ones added in the future by the supergrammar. (Heh, most programming languages don't even do anything to help with that in subclassing.) At any rate, this requires more design thought. |
You're not going to believe this. It just so happens that, because of the way custom rules are built, we get composition for free. For example, let's compose 2 wiki grammars to make an uber-wiki parser that tries multiple dialects, one after the other, until one works: Grammar("""
any_wiki = mediawiki / freakywiki
""",
# Now we do, in essence, "from mediawiki_grammar import markup"--and the
# same for freakywiki:
custom={'mediawiki': mediawiki_grammar['markup'],
'freakywiki': freakywiki_grammar['markup']}) The external grammars already have all their references resolved, so there's no danger of (We could even teach custom rules to take Grammars directly and go grab their default rules, meaning we wouldn't need to say |
Also, we should rename |
Here's something fun. I don't necessarily think it's a good idea. Grammar(
any_wiki = "mediawiki / freakywiki",
a_custom_rule = lambda text, pos: ...,
an_import = another_grammar,
something_else = "'hi' / an_import",
options = {'all': 'the things we might otherwise need specific kwargs for'}
) It sure makes everything consistent, though I don't like all the extra punctuation (commas and quotes) you have to type and, more importantly, read. I could see making it so you could use strings as custom rules so people could have the option of doing something like this within an |
I don't like an exceptional |
Even if grammars implemented the Expression interface, how would we make their entrypoints available in another grammar without a |
Also, you might want to call out to an expression of a grammar other than the default one. |
Not totally sure what you mean. Like this? Grammar("""
any_wiki = mediawiki / freakywiki
""",
mediawiki = mediawiki_grammar['markup'],
freakywiki = freakywiki_grammar['markup'])
I think that |
I think we have a failure to communicate. :-) Some clarifications:
Does that clear things up? Do you agree? |
I wish we could just have Ok, just brainstorming here... anywiki_partial_grammar = Grammar.from_string("""
any_wiki = mediawiki / freakywiki
""")
wikis_mixin_grammar = Grammar.from_expressions(
mediawiki=mediawiki_grammar['markup'],
freakywiki=freakywiki_grammar['markup'],
)
Grammar.which_extends(anywiki_partial_grammar,
wikis_mixin_grammar) |
Do you expect that you'll use custom rules much of the time? I am assuming they'd be fairly rare. |
Your multiple constructors idea brought to mind one alternative we hadn't considered: chaining methods. g = Grammar("""
a = b / c
b = c d
"""
).more(d=lambda text, pos: pos+5,
c=lambda text, pos: pos+1
).extend(other_grammar) I think it looks worse than a dedicated kwarg. It's a little shorter but actually no fewer tokens than the kwarg: g = Grammar("""
a = b / c
b = c d
""",
rules=dict(d=lambda text, pos: pos+5,
c=lambda text, pos: pos+1),
extend=other_grammar) Plus we'd have to introduce partial grammars (likewise with your For those reasons, I still favor the dedicated kwarg—or a spelling convention: we could reserve a certain segment of kwarg namespace for our non-rule kwargs. But no good conventions spring to mind. I'm still curious whether you expect to need custom rules often; it seems a pretty niche case to optimize for. Let me know; I'm psyched to get something implemented. |
I don't think custom rules won't be used very often. |
Realized that abstract supergrammars are inherently incomplete grammars and will need to be representable somehow. Rethinking based on that. |
Okay, let's think about going back to kwarg-style custom rules. They're nice because they (a) have less syntax and (b) are more dict-like. The problem with them is that they smash into future constructor args. The reasons I can think of to have a constructor arg are...
Barring throwing away our basest assumptions, we're clearly not going to add more requirements to instantiate a Grammar, so we're safe from (1). (3) can be addressed by making different flavors of Grammars (like AbstractGrammars) or adding alternate constructors. I don't forsee any of (2) at the moment; we're going to try to keep Parsimonious's optimizer pretty light-touch. So let's try Problem: we make a big deal out of having to declare you want an The motivation for making users say AbstractGrammar up front is that they get prompt error messages when designing grammars in the usual, non-abstract case. It makes intent clearer, too, when reading the code. AbstractGrammars will have parse() and match() methods that raise errors, and |
Just writing down some misc. thoughts about this issue. I might make a PR implementing something like this. I think it's not really that much code. My main use case for this at the moment is extending Parsimonious's syntax to allow richer expressions in token grammars. However, I don't think the syntax would make sense in general, just for my particular case. So I'd want to say something like: """
...somehow import parsimonious.grammar.rules_syntax...
quantifier = quantifier^ / attrs_expr
attrs_expr = "[" _ "@" identifier "=" atom "]"
"""
Then I'd subclass `RulesVisitor` with an updated `visit_quantifier` and `visit_attrs_expr`, making new syntax usable for my use case without either complicating the Grammar DSL or forking the library.
The issue is that because of the way references are resolved, there's not a good way to replace `quantifier` in the rules in the "super" grammar that use it. There, I specifically want the "new" version to override the old in all expressions in the super grammar. A few options:
Deepcopy the original grammar, and traverse the exprs replacing all nodes with the given name with the new name. (1)
Use custom_rules:
```python
from parsimonious.grammar import rules_syntax, rule_grammar, Grammar
new_grammar = Grammar(rules_syntax, custom_rules={"quantifier": custom_quantifier_rule, "attrs_expr": attrs_expr_rule})
...extend rulevisitor, etc. This works, but it's a bit clunky. It seems to me, the key insight of parsimonious's good usability compared to past parser generator libraries in python is that a flexible string parsing library can take its own input as strings, and doesn't have to rely on the syntax of the host language. So the most natural kind of composition for parsimonious grammars feels like just composing strings in the ordinary way, possibly with a separator between sections. If grammars just retained their input string (and maybe custom rules), then this becomes easy. In a system like this, my example above would look like: from parsimonious.grammar import Grammar, RuleVisitor
new_rule_grammar = Grammar(fr"""
{Grammar.rule_grammar.definition}
------------
quantifier = quantifier^ / attrs_expr
attrs_expr = "[" _ "@" identifier "=" atom "]"
""")
class MyCustomRuleVisitor(RuleVisitor):
def visit_quantifier(...):
...
def visit_attrs_expr(...):
...
class MyCustomGrammar(Grammar):
visitor_cls = MyCustomRuleVisitor # requires a small refactor of `_expressions_from_rules` For the separator, I'm thinking at least two This skirts past a lot of the problems with validating "abstract" grammars: they're just strings. To extend an abstract grammar, you do something like: Grammar(fr"""
{abstract_grammar} # references but does not define ruleX
ruleX = ...
""") |
Reconsider dict-like grammars:
However, pros:
extends
will do;extends
maintains Grammar boundaries (for delegation), while update() just paves over keys, one at a time.]If we were to store expressions on methods of a grammar, we could change or alias Expression.parse to
Expression.__call__
.The text was updated successfully, but these errors were encountered: