Improve lexer by make cursor iterate over bytes #915

jevancc · 2020-10-27T15:42:25Z

This Pull Request fixes/closes #335 . Notice that this PR does not change any behavior of the existing lexer.

It changes the following:

Rewrite cursor to iterate over bytes (u8) and Unicode chars (u32, code points)
Update lexers for the new cursor

Not covered in this PR:

Full UTF-16 support for string and regex
Fix bug and panic of the lexer
Read input in bytes instead of str (involving lots of changes on tests. I will create a separate PR for it once this PR is merged)

codecov · 2020-10-27T17:20:33Z

Codecov Report

Merging #915 (fed6215) into master (ee8575d) will increase coverage by 0.07%.
The diff coverage is 67.09%.

@@            Coverage Diff             @@
##           master     #915      +/-   ##
==========================================
+ Coverage   59.21%   59.29%   +0.07%     
==========================================
  Files         166      166              
  Lines       10570    10689     +119     
==========================================
+ Hits         6259     6338      +79     
- Misses       4311     4351      +40

Impacted Files	Coverage Δ
boa/src/syntax/lexer/regex.rs	`39.36% <48.48%> (-1.67%)`	⬇️
boa/src/syntax/lexer/string.rs	`39.53% <52.63%> (+1.16%)`	⬆️
boa/src/syntax/lexer/number.rs	`63.12% <63.15%> (-0.65%)`	⬇️
boa/src/syntax/lexer/mod.rs	`64.86% <65.71%> (-2.79%)`	⬇️
boa/src/syntax/lexer/cursor.rs	`68.29% <71.69%> (+7.60%)`	⬆️
boa/src/syntax/lexer/template.rs	`61.11% <72.72%> (-3.18%)`	⬇️
boa/src/syntax/lexer/operator.rs	`69.56% <73.91%> (-1.55%)`	⬇️
boa/src/syntax/lexer/comment.rs	`50.00% <80.00%> (+3.84%)`	⬆️
boa/src/syntax/lexer/identifier.rs	`60.00% <80.00%> (+1.66%)`	⬆️
boa/src/syntax/lexer/spread.rs	`50.00% <100.00%> (ø)`
... and 10 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ee8575d...fed6215. Read the comment docs.

jevancc · 2020-11-04T04:31:45Z

This PR is ready for review now. Unfortunately, it is hard to tell that this idea, i.e. iterating over bytes instead of chars, and the implementation improves the performance from the existing benchmarks. However, it is still good to have it since we can now skip/handle/report the position of invalid char with the new lexer instead of panic when reading the input.

boa/src/syntax/lexer/cursor.rs

Lan2u

Thanks for this - looks good :)

jasonwilliams · 2020-11-29T14:58:50Z

This looks great to me, great work @jevancc

Razican · 2020-12-03T15:36:39Z

Benchmark results are looking good. No big speed up, but no regressions.

Test results look good too:

Test result	master count	PR count	difference
Total	78,415	78,415	0
Passed	18,953	18,953	0
Ignored	15,547	15,547	0
Failed	43,915	43,915	0
Panics	1,127	1,127	0
Conformance	24.17	24.17	0.00%

No conformance changes, so I'm happy with this, it opens new doors. Thanks for your work!

jevancc added 5 commits October 26, 2020 23:12

(WIP) Add byte iterating methods to cursor

7057046

Fix cursor bugs

f7765ae

cargo fmt

a941c25

Fix cursor warnings

1d16e5c

Fix warnings

2079476

jevancc marked this pull request as draft October 27, 2020 15:45

Compatible with rust stable

7fd7d91

jevancc added 2 commits October 27, 2020 10:23

Fix clippy

239571e

Fix peek_n_bytes bug

e1dcc3a

Razican added this to the v0.11.0 milestone Oct 28, 2020

Razican added lexer Issues surrounding the lexer performance Performance related changes and issues labels Oct 28, 2020

jevancc added 3 commits November 2, 2020 14:35

Merge branch 'master' into lexer_on_bytes

b29ec33

Fix comments and add inline

2fb24d8

Add cursor tests

8bfe190

jevancc marked this pull request as ready for review November 4, 2020 03:54

RageKnify requested review from Lan2u, Razican, jasonwilliams and HalidOdat November 20, 2020 10:03

Lan2u reviewed Nov 21, 2020

View reviewed changes

boa/src/syntax/lexer/cursor.rs Outdated Show resolved Hide resolved

Lan2u approved these changes Nov 21, 2020

View reviewed changes

Remove left over comment

fed6215

HalidOdat approved these changes Nov 27, 2020

View reviewed changes

jasonwilliams approved these changes Nov 29, 2020

View reviewed changes

Razican merged commit cc47385 into boa-dev:master Dec 3, 2020

jevancc mentioned this pull request Dec 18, 2020

Read file input in bytes instead of string #979

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve lexer by make cursor iterate over bytes #915

Improve lexer by make cursor iterate over bytes #915

jevancc commented Oct 27, 2020 •

edited

Loading

codecov bot commented Oct 27, 2020 •

edited

Loading

jevancc commented Nov 4, 2020 •

edited

Loading

Lan2u left a comment

jasonwilliams commented Nov 29, 2020

Razican commented Dec 3, 2020

Improve lexer by make cursor iterate over bytes #915

Improve lexer by make cursor iterate over bytes #915

Conversation

jevancc commented Oct 27, 2020 • edited Loading

codecov bot commented Oct 27, 2020 • edited Loading

Codecov Report

jevancc commented Nov 4, 2020 • edited Loading

Lan2u left a comment

Choose a reason for hiding this comment

jasonwilliams commented Nov 29, 2020

Razican commented Dec 3, 2020

jevancc commented Oct 27, 2020 •

edited

Loading

codecov bot commented Oct 27, 2020 •

edited

Loading

jevancc commented Nov 4, 2020 •

edited

Loading