-
-
Notifications
You must be signed in to change notification settings - Fork 411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modularized parser #304
Modularized parser #304
Conversation
Benchmark for 40ed44eClick to view benchmark
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! The parser code is much easier to reason with!
} | ||
|
||
let mut lhs = ConditionalExpression::parse(cursor)?; | ||
// let mut lhs = self.read_block()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// let mut lhs = self.read_block()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll check what's happening here :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue with this is that I'm not 100% sure on how is this done. An AssignmentExpression
can be one of these:
ConditionalExpression
YieldExpression
ArrowFunction
AsyncArrowFunction
LeftHandSideExpression
=
AssignmentExpression
LeftHandSideExpression
AssignmentOperator
AssignmentExpression
The thing is that we do test for ArrowFunction
at the beginning, and then, we directly parse a ConditionalExpression
, but I don't understand why. Can this be explained a bit more, @jasonwilliams? maybe I can add some comments once I understand it properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
im not sure i understand what the question is here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just that the read_block()
call is commented out, not sure why, and more generally, how does this parsing work? It seems to only check the conditional expression, but it should check more things, right?
/// - The `$lower` identifier will contain the parser for lower level expressions. | ||
/// | ||
/// Those exressions are divided by the punctuators passed as the third (and following) parameters. | ||
macro_rules! expression { ( $name:ident, $lower:ident, $( $op:path ),* ) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did you remove [
...]
form expression macro?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this made it easier to understand. What do you think? It's pretty easy to put it back, but I wanted to try how it looked like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I think separating the parameters of the expression
macro is better. So its not a continuous line of parameters.
I would changed even more to be:
macro_rules! expression {
(
name: $name:ident,
call: $lower:ident,
[$( $op:path ),*]
)
...
}
expression!(
name: update_expression,
call: some_other_expression,
[...]
);
This way it is clear to see that the name of the expression, what it will call and what it matches against.
What do you think? @Razican
Am I overdoing it a bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm the thing is that the name
would be a struct
or an enum
to implement TokenParser
in, so maybe a better name would suit. I will implement it this way for now, though, to see how it goes.
The tests are failing because you need to import |
It might be a good idea to refactor the Any thoughts @Razican and @jasonwilliams? 🤔 |
When working on iterative statements, I did notice that we were not following the spec 100%, as we directly parse them instead of using an intermediate parser for iterative statements. It can be a good idea to match the V8 engine in AST node types. |
Yeah i agree with this. |
I will review that. I'm not sure if it would make sense for the This would allow not having dangling functions and all parsers would implement BTW, I added a Then, there is another thing to be considered: The fact that the lexer doesn't take into account goal symbols (#294). This could be solved by passing the This would mean that we no longer can benchmark lexing and parsing separately, as they would happen at the same time, but I think it makes sense. What do you think? |
Benchmark for 661469aClick to view benchmark
|
I don't mind too much what you have now, but i'd like to see this alternative you mention (even if its just for 1 function) to get an idea of what you mean.
Looks good!
Would we send the lexer the current state we're in? Then the lexer has context of what we're looking for.
Yeah if there's no solution i'm fine with changing the benchmarks |
The idea is that we would change how the parsers are called now: let expr = ExpressionParser::parse(cursor)?; To something like this: let expr = ExpressionParser(cursor).parse()?; But this would allow having functions that require some extra information, such as let binding_list = BindingList(cursor, is_const).parse()?; where struct BindingList<'b>(&'b Cursor<'_>, bool);
Yes, I think that the lexer's Also, I think the cursor should always skip line terminators, and have the The only special case where this might be a bit difficult would be the |
The only thing I'm worried about is if this will make the code less performant, having to create an object first and calling its methods, instead of calling a static function with some arguments. I can try it as soon as I finish documenting this first iteration. |
aa47eda
to
321dafd
Compare
Benchmark for abbca39Click to view benchmark
|
Benchmark for 0d40842Click to view benchmark
|
I added some nice methods (WIP) to easily create AST nodes. This will help make tests more clear, and should also help with the parser implementation. So you can now create a local node like this: Node::local("a") instead of: Node::Local(String::from("a")); and you can create a Node::break_node("label"); instead of: Node::Break(Some(String::from("label"))); Also no more need to add |
Benchmark for 0862dd4Click to view benchmark
|
Benchmark for d98538eClick to view benchmark
|
I implemented automatic semicolon insertion after do-while statements. This means we have less things left for this to be mergeable:
|
05d3467
to
79816c1
Compare
Benchmark for f3bf8b0Click to view benchmark
|
Benchmark for c212135Click to view benchmark
|
Benchmark for 2d8eb20Click to view benchmark
|
Benchmark for 6eb0431Click to view benchmark
|
Something I see from the benchmarks is that more than 99% of the time is spent on execution. I think we should improve our interpreter :) |
e5072b3
to
bbc2269
Compare
Benchmark for 06c0c6bClick to view benchmark
|
I did al the requested changes, @HalidOdat, I will rebase today. Still, the |
Well I think that's good, the parser in a way should reflect what it generates. (not exactly). In the future we should refactor the AST too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks perfect to me!
Yes, I'm working a bit on that in a couple of places, with string interning and trying to parse source code byte by byte. |
e3e6d52
to
03c1dd7
Compare
Rebased and ready for a merge, @HalidOdat, @jasonwilliams :D |
Benchmark for bca05a0Click to view benchmark
|
Benchmark for 8781a01Click to view benchmark
|
Benchmark for 4f14555Click to view benchmark
|
cc43f06
to
bc2e924
Compare
Benchmark for 959685eClick to view benchmark
|
bc2e924
to
d23b34f
Compare
Rebased and ready :) |
Benchmark for 6305aa0Click to view benchmark
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks awesome, very clean too!
allow_yield: allow_yield.into(), | ||
allow_await: allow_await.into(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How come into() is needed here? (just for me to understand)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not realy needed, but I used generics in the function signature: I: Into<AllowIn>
and so on. This allows us to create a parser with booleans too: ArrowFunction(true, false, true)
.
We could just remove the generics from the new()
functions and pass the wrapping structures, which would improve compile times, and is probably just as clear / developer friendly.
Benchmark for 893a647Click to view benchmark
|
This is still work in progress, but tests pass, and even though I need to divide the code further, we now have multiple different structures. I also need to document a bunch of stuff.
I also modified the parser so that it would better respect the specification. So for example, an initializer expects to first read a
=
symbol, as in the spec.I added more tests for
break
throw
andcontinue
parsing, that I think they were failing with automatic semicolon insertion.About this last part, it gets done at the
Cursor
level. I think I changed all semicolon expects for the newexpect_semicolon()
function. This will automatically insert semicolons in all the situations where the spec says so. The only exception is the semicolon after thedo while
statement, so this should fail:In any case, do-while parsing is not done, so this doesn't affect us. It's now detected with a panic, though.
In any case, this is advanced enough so that you can give it a look and let me know what you think.