You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This isn't a bug per se, more of my thoughts on pest after trying to use it. I apologize if it seems like a rant.
tldr; instead of generating data structures for a proper AST, pest gives you a very generic "hierarchy of pairs" interface which is neither convenient to work with nor type safe.
Pest IMO is not elegant to use. While specifying the language via the pest grammar is fine, the architecture of the returned result - the Pairs and Rules - is overly simplistic and requires a lot of boilerplate, because the "constraints" specified in the grammar are not present in their generic API.
For example, consider the rule expr_plus = {"+" ~ integer ~ integer}. By the grammar definition, any text matching this rule is going to produce exactly two sub-pairs, both with the integer rule. However, that information isn't available on the rust side - instead it gets a Pair<Rule>, which can contain zero or more inner pairs that you now have to manually retrieve out from an iterator.
You may say that that's not too bad - just get the iterator and call .next().unwrap() two times... but now you are making an unchecked assumption that the expr_plus will always produce two integer pairs. And unchecked assumptions mean bugs later down the line. Consider if I change the rule to expr_plus = {"+" ~ (integer)*} or expr_plus = {"+" ~ int_or_float ~ int_or_float } - in both cases, the existing rust code that handles those rules will compile fine, because the interface has not changed, but the unchecked assumptions are now violated, either because the expr_plus rule now has more or less inner pairs than the code was expecting or because the pairs are no longer only integers.
In my view, a truly "elegant" parser would emit Rust data structure based on the rules given via the grammar. For instance, ExprPlus = {"+" ~ first:Integer ~ second:Integer} would generate a struct ExprPlus { pub first: Integer, pub second: Integer }. Not only is this more convenient to work with, as the Rust code can simply access the fields (or better yet, destructure the rule), changing the grammar will also change the generated Rust structures, causing appropriate compiler warnings and errors based on the new grammar.
I hope that my feedback motivates the devs to change things. I do like specifying my grammars in a specific PEG DSL, unlike in say nom, which is excessively verbose and requires handling comments and whitespace manually. But the pain of using the Pairs output that pest provides is too great for me to use this project.
The text was updated successfully, but these errors were encountered:
@ColonelThirtyTwo, thank you for taking your time to write this. This is exactly what I have in mind for pest3. Unfortunately, I've been quite busy lately and haven't had time to invest more time into making it a reality. However, I have a clear path forward if anyone would be interested to contribute.
This isn't a bug per se, more of my thoughts on pest after trying to use it. I apologize if it seems like a rant.
tldr; instead of generating data structures for a proper AST, pest gives you a very generic "hierarchy of pairs" interface which is neither convenient to work with nor type safe.
Pest IMO is not elegant to use. While specifying the language via the pest grammar is fine, the architecture of the returned result - the
Pairs
andRule
s - is overly simplistic and requires a lot of boilerplate, because the "constraints" specified in the grammar are not present in their generic API.For example, consider the rule
expr_plus = {"+" ~ integer ~ integer}
. By the grammar definition, any text matching this rule is going to produce exactly two sub-pairs, both with theinteger
rule. However, that information isn't available on the rust side - instead it gets aPair<Rule>
, which can contain zero or more inner pairs that you now have to manually retrieve out from an iterator.You may say that that's not too bad - just get the iterator and call
.next().unwrap()
two times... but now you are making an unchecked assumption that theexpr_plus
will always produce twointeger
pairs. And unchecked assumptions mean bugs later down the line. Consider if I change the rule toexpr_plus = {"+" ~ (integer)*}
orexpr_plus = {"+" ~ int_or_float ~ int_or_float }
- in both cases, the existing rust code that handles those rules will compile fine, because the interface has not changed, but the unchecked assumptions are now violated, either because theexpr_plus
rule now has more or less inner pairs than the code was expecting or because the pairs are no longer onlyinteger
s.In my view, a truly "elegant" parser would emit Rust data structure based on the rules given via the grammar. For instance,
ExprPlus = {"+" ~ first:Integer ~ second:Integer}
would generate astruct ExprPlus { pub first: Integer, pub second: Integer }
. Not only is this more convenient to work with, as the Rust code can simply access the fields (or better yet, destructure the rule), changing the grammar will also change the generated Rust structures, causing appropriate compiler warnings and errors based on the new grammar.I hope that my feedback motivates the devs to change things. I do like specifying my grammars in a specific PEG DSL, unlike in say nom, which is excessively verbose and requires handling comments and whitespace manually. But the pain of using the Pairs output that pest provides is too great for me to use this project.
The text was updated successfully, but these errors were encountered: