-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New Parser #262
New Parser #262
Conversation
Codecov ReportBase: 88.73% // Head: 88.53% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #262 +/- ##
==========================================
- Coverage 88.73% 88.53% -0.20%
==========================================
Files 64 65 +1
Lines 7864 8017 +153
Branches 1286 1270 -16
==========================================
+ Hits 6978 7098 +120
- Misses 631 660 +29
- Partials 255 259 +4
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
xdsl/parser_ng.py
Outdated
]) | ||
|
||
|
||
class MlirParser: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class MlirParser: | |
class MLIRParser: |
xdsl/parser_ng.py
Outdated
methods marked try_... will attempt to parse, and return None if they failed. If they return None | ||
they must make sure to restore all state. | ||
|
||
methods marked must_... will do greedy parsing, meaning they consume as much as they can. They will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"must parse" reads like a boolean expression to me, it took a little while to understand that it was actually parsing the text. Maybe replace with either parse
or parse_greedy
to make into an active verb phrase?
Puh, honestly this PR is a mouthful, so let me start at the top (or the thing I feel like I understand somewhat): The BNF description. I am reading this as the BNF description of the xDSL representation, not the MLIR one. I am personally more familiar with EBNF which, AFAIK, is the same, but with some "syntactic sugar". Therefore, I will argue with EBNF. If something is unclear, please ask! AFAIU, Optional and Group correspond to a choice and a list. In EBNF, we use Next, I am confused that blocks are not nested into regions. and that they get the And why is the entire thing wrapped inside a group? Fundamentally, I feel that it really should be possible to generate parsers and printers from this, if we are a little bit careful about how we write the grammars. Not sure whether you are familiar with the concept of left/right-recursive grammars and its implications on parsings such grammars. Anyway, will jump into the code now, but I don't believe I will be able to fully review/understand this today. I will probably revisit it on monday :) |
Hey, thanks @webmiche, I don't think you want to/should jump into the code right now. This PR is very WIP. It's more meant to be a "Hey, I'm working on this feature right now and am thinking about these concepts", and to get some feedback on the concepts. Edit: Had the wrong mention here |
Just to quickly respond to that, I think that this is the syntax for the successors, not the blocks themselves! |
Ah yes. But aren't we currently printing successors wrapped into |
I think we still wrap them in |
Looking at https://github.com/xdslproject/xdsl/blob/main/tests/filecheck/cf_ops.xdsl we use Sure, I agree with you. I also don't like that regions and attributes are wrapped into |
Okay, forget my comments, I'm probably the most confused xD I think |
Well, at the end of the day, aren't we all confused? xD Okay, yes I agree that I guess people from the python world can also look at it as a list of tuples, so there is a weak argument for |
A flyby comment (I don't think I'm going to make it up the hill to the hackathon this week, sorry). Is there a reason that you are not using an existing package (for example lark) for the parsing infrastructure. It seems to me that would help quite a bit because you just need to define the grammars and translation of parsed trees into XDSL (using their existing tree-visitor infrastructure). |
So the reason we cannot use most parser/printer generators is that we need to use arbitrary Python for the grammar (for attributes and operations). |
726dba6
to
a69de90
Compare
Current state: Filecheck:
Pytest: |
I removed the BNF stuff from this PR as well, the design was not ready, and the PR is big enough as is. |
77ec5ca
to
2fcfedb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be split into multiple PRs? I just scrolled through the text in the parser file and added the most obvious comments, but I cannot review the actual functionality like this, it is just too much.
tests/test_ir.py
Outdated
parser = Parser(ctx, program_func) | ||
program: ModuleOp = parser.parse_op() | ||
parser = XDSLParser(ctx, program_func) | ||
program: ModuleOp = parser.must_parse_operation() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel that must_parse_operation
sounds like an awful name for an API you want to expose. Maybe rename or wrap into a properly named function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are correct. There already is a function called begin_parsing
which is meant to be called from outside to parse a file. I'll get to it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i just realized that this breaks some tests. Specifically, begin_parse
makes sure that the operation parsed is a module_op
. Some tests don't wrap their input in a builtin.module
, and are therefore not "valid" xdsl/mlir programs.
I changed back to using must_parse_operation
, as we are wanting to parse just a single operation here. I don't think we should expose an interface like parse_op
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that must_parse_op
is not an API the parser wants to expose. It's only meant to be used parser internally. The test just has to use it as it isn't using the "whole" parser. If that makes sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's not an exposed API, preface it with _
please. (Python coding standards)
attr = DictionaryAttr.from_dict(data) | ||
|
||
with StringIO() as io: | ||
Printer(io).print(attr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this makes sense as this makes the Parser tests depend on the Printer. I feel that parser tests should really take strings and the data structure that is expected in order to test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Valid point. I'll change that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed that most of the printer tests rely on the parser as well. What's up with that? Why is that more okay?
import itertools | ||
import re | ||
import sys | ||
import traceback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are any of these new dependencies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are all builtin python modules
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am surprised the parser needs the python ast?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use it to evaluate string literals, that is actually quite tricky, and instead of re-inventing the wheel I looked for a stdlib function that handles that for us. That's why I used ast.literal_eval
. We also could have used json.loads
, or something else. I just found literal_eval
first while searching/thinking about it.
@webmiche How would you go about splitting something like this into smaller PRs? I'm genuinely curious, I can't think of a sensible way. I could sit down with you and give you a high-level overview over the concepts, if that would help? The plan was to do the review during the Hackathon, sadly I couldn't get it done in time :( |
So I guess the issue is that a lot of tests need a full parser in order to run, right? I could imagine having a branch without the old parser and with all tests marked as UNSUPPORTED and then basically upstream into that. So start out with a xDSLParser that can just about parse an empty module and then develop that by adding more functionality, enabling tests along the way. And once you pass a good amount of tests, we can merge that parser into the main branch and continue upstreaming there. Or just remove everything that is around the MLIRParser. That should be relatively little anyhow. Then we can let that still flow through the old infrastructure, or maybe not support it at all. I think this would already cut down the number of lines by a lot. Also notice that you are pretty much removing the old parser, so github diffs look extremely bad. Simply renaming the parser file might already be quite an improvement on that side. |
We cam move the parser back to |
Co-authored-by: Fehr Mathieu <mathieu.fehr@gmail.com>
c3b2915
to
868e509
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! I think we can merge it now!
First of all, I want to note that none of this is final, and that I welcome any ciriticism and discussion on all parts of this! This is my "getting to know xDSL" project, so my understanding of the underlying principles is still basic at best. I hope the following makes sense.
The new parser is aimed at making it easier to:
For that a couple of things were implemented:
Spans
over the input, meaning we have location information attached to our string snippets we move aroundThe BNF stuff:
There are a couple different motivating factors, and it wasn't easy to find something that satisfied them all:
I also had the weird dream on generating printer and parser out of a single spec.
So let's begin adressing the first three points. For that I sketched up parsing of a generic operation:
Note that a
BNF.Literal
represents a fixed string in the input, and is not to be confused with the parsing of e.g. astring-literal
(so"some arbitrary string here"
)Each
Nonterminal
calls an underlying function in the parser.This is not exactly pure BNF. I instead provided something I feel is easier to use. For example,
Optional
andGroup
are often combined, so there is a wrapper for that. And parsing lists can be much more comfortable now usingListOf
which takes a containing token, a regex separator, and can be configured to either allow or disallow empty lists.There are also no plans to provide a
OR
(so basically( something | something-else )
) as this explodes the parsing complexity.Extracting parsed fields is done through the
bind=<name>
attributes on the nodes. After parsing is complete, you get a dictionary where the fields<name>
are populated with the parsing results of that parser.Note: There are constraints on what makes "sense" to bind to. If you, for example bind inside of a ListOf, you will only have the last element on the output dictionary. Instead you should probably bind the ListOf? I am not sure though, because you can, theoretically, have arbitrary BNF inside the ListOf, which would take away all the simplicity we gained from
bind
. This is not a solved problem yet.Printing with the Parser?
This whole
bind
stuff, gave me the idea, that it might be possible to now go fromdict[<name>, <value>]
and the BNF tree back to the source code and implement parsing/printing in one!On the surface it seems possible, but it is a lot more complicated than this sadly. How do we decide when to print an
OptionalGroup
, what to do with aListOf
that contains aGroup
? anOptionalGroup
? I have some ideas, but it's not very straight forward sadly.The best thing might be to restrict the nested complexity of the BNF to allow for good ergonomics there. lets see. I only just got "here" yesterday evening.
On Error Messages:
They currently don't fulfill their promise. At all. Sorry about that. It's all still very wonky right now!