-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC007] Migrate the parser to the new AST #2083
Draft
yannham
wants to merge
18
commits into
master
Choose a base branch
from
rfc007/parsing
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
+2,709
−1,721
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
yannham
force-pushed
the
rfc007/parsing
branch
from
October 30, 2024 14:49
9655af8
to
751da65
Compare
yannham
force-pushed
the
rfc007/parsing
branch
3 times, most recently
from
November 20, 2024 09:12
8cedc56
to
2de6236
Compare
yannham
force-pushed
the
rfc007/parsing
branch
from
November 20, 2024 09:38
2de6236
to
f1da1dd
Compare
yannham
force-pushed
the
rfc007/parsing
branch
2 times, most recently
from
November 20, 2024 15:24
74f9906
to
eb338a1
Compare
Bencher Report
Click to view all benchmark results
|
Instead of elaborating piecewise definitions (such as `{foo.bar = 1, foo.baz = 2}`) directly at the parsing stage, this commit makes the new AST closer to the source language by making record a list of field definition, where the field "name" can be a sequence of identifiers and strings. This representation is used internally by the parser; we now make it the default in the AST, such that the migration of the parser won't have to do this elaboration at all. The elaboration is offloaded to the conversion to `RichTerm`, which happens in the `ast::compat` module. This makes the AST closer to the source language. The first motivation is that it'll be better for the LSP, where some open issues on the tracker are caused by the inability to trace what the LSP get back to the original piecewise definitions. The second reason is that we can't actually elaborate a piecewise definition while staying in the new AST correctly as of today: the new AST only has one record variant, which is recursive by default, but this doesn't match the way recursion and scoping work for piecewise definition. For example, `{foo.bar = 1, baz.foo = foo + 1}` works fine in today's Nickel (evaluate to `{foo = {bar = 1}, baz {foo = 2}}`), but if we elaborate it in the new AST naively, we'll get an infinite recursion: `{foo = {bar = 1}, baz = {foo = foo + 1}}`. Mailine Nickel currently uses a non recursive `Record` for that, but we don't want to introduce such "runtime dictionary" in the new AST as they can't be expressed in the source language. Instead, we rather keep record as defined piecewise and will do further elaboration when needed, during typechecking, future compilation, or in the meantime when converting the new AST representation to mainline Nickel.
First stab at making the parser compatible with the new AST representation (`bytecode::ast::Ast`). This is a heavy refactoring which required to update most of `parser::uniterm` and `parser::utils` as well as `grammar.lalrpop`. The current version is far from compiling; fixing compiler errors is planned in follow-up work.
As we move toward a bytecode compiler and a bytecode virtual machine, we are replacing the left part of the pipeline with the new AST representation. The bytecode module was previously gated by an experimental feature, thea idea being that this feature would enable the whole bytcode compiler pipeline. However, for now, we only have a new AST representation, and it's being used in the mainline Nickel parser (and soon, in the typechecker, etc.). Thus we need access to the new AST representation by default, and it doesn't make much sense to gate it behind a feature. We'll reintroduce the feature once we have a prototype compiler and a bytecode virtual machine, when it will then make sense to use the feature to toggle between the legacy tree-walking interpreter and the new bytecode compiler.
…esolution for RepeatSep1)
yannham
force-pushed
the
rfc007/parsing
branch
from
November 21, 2024 21:07
0117944
to
a394c35
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
WIP. Following the step by step implementation of RFC07, this PR migrates the parser to output the new AST, The plan is to convert this to the old AST for the remaining of the pipeline (typechecking, transformations, and evaluation), and to measure if that conversion is noticeable on various type of examples (small, big, small but importing big contracts, libraries, etc.)