-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for user defined member variables in the grammar #223
Comments
This is something that has come up before and I've been thinking about for a while. The closest prior art I know of is the C# library Pegasus. I never came up with a design for it I was happy with, or that seemed to be the best approach to solving the problems it would be used for, but it would be great to hear what you have in mind. How do you envision these working with backtracking and caching? We basically assume that rule and grammar arguments are immutable, but for a feature explicitly designed for mutation, that's asking for trouble. As I understand it, Pegasus saves the variables before entering alternation or repetition, and restores them on backtracking. It would also need to include the values of all variables (along with input position) in the key for memoization. What use cases do you have in mind for this? The primary one I've seen is parsing languages with significant whitespace, but I think you're way better off handling that in a custom lexer. |
Yes, this complicates things. I think saving the member variables on the parse stack would be the correct way of handling this (not familiar with Pegasus but it seems this is what it does). But IMO this is overkill for most purposes. I'd be happy if the parser generator can just fail with an error message when it detects a grammar where the member variables cannot be safely used with backtracking or caching. As for use cases, I needed this to deal with significant white-space (like you correctly guessed). For example in the line count grammar (see member_variables.rs in PR #224) the line count is updated only when there is a successful match. As far as I can tell backtracking does not invalidate the line count at any point. But it would have been great if rust-peg could confirm that. I am of-course assuming any such static analysis will not run afoul of the decidability problem. |
That analysis would break the encapsulation of rules. A rule may be safe on its own, but if it uses variables, it would be an error to call that rule within alternation or repetition. While we theoretically could do a whole-grammar analysis starting from the It would also mean parsing the Rust code to determine where a variable was read and written. Right now we just paste the action code blocks into the output without looking at their content. |
I think saving state on a "parse stack" (like Pegasus) is probably the best way to move forward with this.
What if we use some kind of sigil to flag member variable use? For example ... rule line()
= (terminated_line() / unterminated_line()) { @line_count += 1; } IIRC there is precedence for this in other parser generators and should not be too jarring. |
Have you tried parsing a language with significant indentation with a custom lexer? (Please suggest improvements for custom lexer support; it's the bare minimum right now). For a feature that would be as invasive and inelegant as this one, I'm reluctant to add it when the only thing people want to use it for can probably be done more elegantly in a different way. |
Sure, let me try a custom lexer and get back to you. |
A use case that I have where this feature would come in handy is creating a programming language AST where each node in the tree is tagged with a unique numeric ID upon creation (I have a bunch of AST processing code that assumes these unique IDs already exist). This needs state so that I can make sure I'm generating a fresh ID for each created tree node. |
@neunenak There's existing (but not documented) support for grammar arguments that can help with this. They're not saved and restored on backtracking, and are not really supposed to be mutable, but as long as you're either only assigning IDs after completing rules that can't backtrack into an alternative, or don't care about skipped IDs, it works: extern crate peg;
pub struct Item {
id: usize,
name: String,
}
peg::parser! {
grammar p(counter: &mut usize) for str {
pub rule list() -> Vec<Item> = "[" items: item() ** "," "]" { items }
rule item() -> Item = name: $(['a'..='z']+) {
Item {
id: { *counter += 1; *counter },
name: name.to_owned()
}
}
}
}
fn main() {
let mut counter = 0;
let r = p::list("[abcd,defg]", &mut counter).unwrap();
assert_eq!(r[0].id, 1);
assert_eq!(r[1].id, 2);
} |
Example ...
I understand rule arguments can be used to simulate this feature but IMO it makes the grammar unwieldy.
The text was updated successfully, but these errors were encountered: