-
-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix tokenizing Unicode escape sequence in string literal #826
Conversation
Codecov Report
@@ Coverage Diff @@
## master #826 +/- ##
==========================================
+ Coverage 59.23% 59.26% +0.02%
==========================================
Files 157 157
Lines 10034 10035 +1
==========================================
+ Hits 5944 5947 +3
+ Misses 4090 4088 -2
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! check my comment on how we might improve this :)
boa/src/syntax/lexer/cursor.rs
Outdated
let mut buf = [0u8; 4]; | ||
chr.encode_utf8(&mut buf); | ||
Ok(Some(buf[0])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let mut buf = [0u8; 4]; | |
chr.encode_utf8(&mut buf); | |
Ok(Some(buf[0])) | |
Ok(Some(chr as u8)) |
Couldn't we just cast to u8 since we check if its ascii, this should be faster than calling encode_utf8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds great! I've made this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides, I can look into the invalid code point issue. Is there already an issue/PR for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides, I can look into the invalid code point issue. Is there already an issue/PR for it?
There is an issue #778, but not a PR to fix (also nobody is assigned if you would like to take it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems there are no regressions in the benchmarks, so this looks pretty good! Thanks!
This Pull Request fixes/closes #808 .
It changes the following:
InnerIter::peek_iter()
and store the peeked char inInnerIter
instance instead ofCursor
The issue is caused by a bug in
Cursor::fill_bytes
. WhenCursor::fill_bytes
was called afterCursor::peek
, the iter would be incremented to peek the next char but the peeked char would not be filled to the buffer. This PR introduces a new methodInnerIter::peek_char
and stores the peeked char inInnerIter
so that it can fill the peeked char to input buffer when invokingInnerIter::fill_bytes
correctly.The test
syntax::lexer::tests::codepoint_with_no_braces
is updated in this PR. Since\uD83D
is a surrogate code point, the test will panic when callingdecode_utf16
and trying to decode it. This bug should be fixed in another issue/PR to handle the invalid code point.