Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve lexer by make cursor iterate over bytes #915

Merged
merged 12 commits into from
Dec 3, 2020

Conversation

jevancc
Copy link
Contributor

@jevancc jevancc commented Oct 27, 2020

This Pull Request fixes/closes #335 . Notice that this PR does not change any behavior of the existing lexer.

It changes the following:

  • Rewrite cursor to iterate over bytes (u8) and Unicode chars (u32, code points)
  • Update lexers for the new cursor

Not covered in this PR:

  • Full UTF-16 support for string and regex
  • Fix bug and panic of the lexer
  • Read input in bytes instead of str (involving lots of changes on tests. I will create a separate PR for it once this PR is merged)

@jevancc jevancc marked this pull request as draft October 27, 2020 15:45
@codecov
Copy link

codecov bot commented Oct 27, 2020

Codecov Report

Merging #915 (fed6215) into master (ee8575d) will increase coverage by 0.07%.
The diff coverage is 67.09%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #915      +/-   ##
==========================================
+ Coverage   59.21%   59.29%   +0.07%     
==========================================
  Files         166      166              
  Lines       10570    10689     +119     
==========================================
+ Hits         6259     6338      +79     
- Misses       4311     4351      +40     
Impacted Files Coverage Δ
boa/src/syntax/lexer/regex.rs 39.36% <48.48%> (-1.67%) ⬇️
boa/src/syntax/lexer/string.rs 39.53% <52.63%> (+1.16%) ⬆️
boa/src/syntax/lexer/number.rs 63.12% <63.15%> (-0.65%) ⬇️
boa/src/syntax/lexer/mod.rs 64.86% <65.71%> (-2.79%) ⬇️
boa/src/syntax/lexer/cursor.rs 68.29% <71.69%> (+7.60%) ⬆️
boa/src/syntax/lexer/template.rs 61.11% <72.72%> (-3.18%) ⬇️
boa/src/syntax/lexer/operator.rs 69.56% <73.91%> (-1.55%) ⬇️
boa/src/syntax/lexer/comment.rs 50.00% <80.00%> (+3.84%) ⬆️
boa/src/syntax/lexer/identifier.rs 60.00% <80.00%> (+1.66%) ⬆️
boa/src/syntax/lexer/spread.rs 50.00% <100.00%> (ø)
... and 10 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ee8575d...fed6215. Read the comment docs.

@Razican Razican added this to the v0.11.0 milestone Oct 28, 2020
@Razican Razican added lexer Issues surrounding the lexer performance Performance related changes and issues labels Oct 28, 2020
@jevancc jevancc marked this pull request as ready for review November 4, 2020 03:54
@jevancc
Copy link
Contributor Author

jevancc commented Nov 4, 2020

This PR is ready for review now. Unfortunately, it is hard to tell that this idea, i.e. iterating over bytes instead of chars, and the implementation improves the performance from the existing benchmarks. However, it is still good to have it since we can now skip/handle/report the position of invalid char with the new lexer instead of panic when reading the input.

Copy link

@Lan2u Lan2u left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this - looks good :)

@jasonwilliams
Copy link
Member

This looks great to me, great work @jevancc

@Razican
Copy link
Member

Razican commented Dec 3, 2020

Benchmark results are looking good. No big speed up, but no regressions.

Test results look good too:

Test result master count PR count difference
Total 78,415 78,415 0
Passed 18,953 18,953 0
Ignored 15,547 15,547 0
Failed 43,915 43,915 0
Panics 1,127 1,127 0
Conformance 24.17 24.17 0.00%

No conformance changes, so I'm happy with this, it opens new doors. Thanks for your work!

@Razican Razican merged commit cc47385 into boa-dev:master Dec 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lexer Issues surrounding the lexer performance Performance related changes and issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize the lexer by iterating over bytes
5 participants