[RFC007] Migrate the parser to the new AST #2083

yannham · 2024-10-29T13:59:10Z

WIP. Following the step by step implementation of RFC07, this PR migrates the parser to output the new AST, The plan is to convert this to the old AST for the remaining of the pipeline (typechecking, transformations, and evaluation), and to measure if that conversion is noticeable on various type of examples (small, big, small but importing big contracts, libraries, etc.)

github-actions · 2024-11-20T15:39:35Z

Bencher Report

Branch	rfc007/parsing
Testbed	ubuntu-latest

Click to view all benchmark results

Benchmark	Latency	nanoseconds (ns)
fibonacci 10	📈 view plot 🚷 view threshold	498,550.00
foldl arrays 50	📈 view plot 🚷 view threshold	1,683,000.00
foldl arrays 500	📈 view plot 🚷 view threshold	6,727,600.00
foldr strings 50	📈 view plot 🚷 view threshold	7,023,100.00
foldr strings 500	📈 view plot 🚷 view threshold	61,956,000.00
generate normal 250	📈 view plot 🚷 view threshold	46,548,000.00
generate normal 50	📈 view plot 🚷 view threshold	2,090,000.00
generate normal unchecked 1000	📈 view plot 🚷 view threshold	3,285,500.00
generate normal unchecked 200	📈 view plot 🚷 view threshold	770,500.00
pidigits 100	📈 view plot 🚷 view threshold	3,283,800.00
pipe normal 20	📈 view plot 🚷 view threshold	1,489,200.00
pipe normal 200	📈 view plot 🚷 view threshold	9,985,800.00
product 30	📈 view plot 🚷 view threshold	825,760.00
scalar 10	📈 view plot 🚷 view threshold	1,510,800.00
sum 30	📈 view plot 🚷 view threshold	822,280.00

🐰 View full continuous benchmarking report in Bencher

Instead of elaborating piecewise definitions (such as `{foo.bar = 1, foo.baz = 2}`) directly at the parsing stage, this commit makes the new AST closer to the source language by making record a list of field definition, where the field "name" can be a sequence of identifiers and strings. This representation is used internally by the parser; we now make it the default in the AST, such that the migration of the parser won't have to do this elaboration at all. The elaboration is offloaded to the conversion to `RichTerm`, which happens in the `ast::compat` module. This makes the AST closer to the source language. The first motivation is that it'll be better for the LSP, where some open issues on the tracker are caused by the inability to trace what the LSP get back to the original piecewise definitions. The second reason is that we can't actually elaborate a piecewise definition while staying in the new AST correctly as of today: the new AST only has one record variant, which is recursive by default, but this doesn't match the way recursion and scoping work for piecewise definition. For example, `{foo.bar = 1, baz.foo = foo + 1}` works fine in today's Nickel (evaluate to `{foo = {bar = 1}, baz {foo = 2}}`), but if we elaborate it in the new AST naively, we'll get an infinite recursion: `{foo = {bar = 1}, baz = {foo = foo + 1}}`. Mailine Nickel currently uses a non recursive `Record` for that, but we don't want to introduce such "runtime dictionary" in the new AST as they can't be expressed in the source language. Instead, we rather keep record as defined piecewise and will do further elaboration when needed, during typechecking, future compilation, or in the meantime when converting the new AST representation to mainline Nickel.

First stab at making the parser compatible with the new AST representation (`bytecode::ast::Ast`). This is a heavy refactoring which required to update most of `parser::uniterm` and `parser::utils` as well as `grammar.lalrpop`. The current version is far from compiling; fixing compiler errors is planned in follow-up work.

As we move toward a bytecode compiler and a bytecode virtual machine, we are replacing the left part of the pipeline with the new AST representation. The bytecode module was previously gated by an experimental feature, thea idea being that this feature would enable the whole bytcode compiler pipeline. However, for now, we only have a new AST representation, and it's being used in the mainline Nickel parser (and soon, in the typechecker, etc.). Thus we need access to the new AST representation by default, and it doesn't make much sense to gate it behind a feature. We'll reintroduce the feature once we have a prototype compiler and a bytecode virtual machine, when it will then make sense to use the feature to toggle between the legacy tree-walking interpreter and the new bytecode compiler.

…esolution for RepeatSep1)

…cords)

yannham force-pushed the rfc007/parsing branch from 9655af8 to 751da65 Compare October 30, 2024 14:49

yannham mentioned this pull request Oct 31, 2024

[RFC007] Add a builder module for the new AST #2085

Merged

yannham force-pushed the rfc007/parsing branch 3 times, most recently from 8cedc56 to 2de6236 Compare November 20, 2024 09:12

yannham mentioned this pull request Nov 20, 2024

Add missing implementation of from_ast for Record #2100

Merged

yannham force-pushed the rfc007/parsing branch from 2de6236 to f1da1dd Compare November 20, 2024 09:38

yannham mentioned this pull request Nov 20, 2024

[RFC007] Add Seal and Unseal to the new AST primops #2101

Merged

yannham force-pushed the rfc007/parsing branch 2 times, most recently from 74f9906 to eb338a1 Compare November 20, 2024 15:24

yannham mentioned this pull request Nov 21, 2024

[RFC007] Improve/simplify record representation in the new AST #2102

Merged

yannham added 18 commits November 21, 2024 22:05

Fix almost all grammar errors, fix parser/mod.rs

4f08a58

Fix last errors to make it compile

de5e9f4

Fix curried operator handling and make its impl nicer

3af6e10

Revert to the previous handling of last fields (might need conflict r…

e751489

…esolution for RepeatSep1)

Fix compilation errors and spurious grammar ambiguity

2324c35

Fix unwrapping position panicking

6b0426d

Fill todo!() when parsing seal/unseal

77db0bf

Entirely get rid of rec priorities leftovers

4059bca

Fix fix_type_vars for forall binders, improve code doc sporadically

f7d0403

Fix handling of zero-ary application/variable

1f7dbe1

Fix test code and corner case of new -> mainline conversion

751165d

[Maybe to drop?] Fix failing test (symbolic string being recursive re…

2259027

…cords)

Fix swapped seal/unseal

d6e3573

Fix missing position for elaborated merge (piecewise defs)

4fe9aea

[WIP/TOSQ] move field defs and all out of parser utils

a394c35

yannham force-pushed the rfc007/parsing branch from 0117944 to a394c35 Compare November 21, 2024 21:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC007] Migrate the parser to the new AST #2083

[RFC007] Migrate the parser to the new AST #2083

yannham commented Oct 29, 2024

github-actions bot commented Nov 20, 2024 •

edited

Loading

[RFC007] Migrate the parser to the new AST #2083

Are you sure you want to change the base?

[RFC007] Migrate the parser to the new AST #2083

Conversation

yannham commented Oct 29, 2024

github-actions bot commented Nov 20, 2024 • edited Loading

Bencher Report

github-actions bot commented Nov 20, 2024 •

edited

Loading