-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix parsing for invalid control characters in class atoms #85
Conversation
9e28b12
to
699f7a6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some suggestions for simplifying.
fn try_consume_bracket_class_atom(&mut self) -> Result<Option<ClassAtom>, Error> { | ||
fn try_consume_bracket_class_atom( | ||
&mut self, | ||
) -> Result<Option<(ClassAtom, Option<ClassAtom>)>, Error> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a complicated return type. Here's two alternatives for simplifying it:
- When encountering "\c" in a bracket, instead of returning both, unread the "c" and only return the backslash. The next iteration through the loop will then pick up the "c".
- Instead return a
Result<Vec<ClassAtom>, Error>
I think the first will end up quite clean assuming it works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good, I will try to change it to the first if it works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works well. I implemented it in #88.
// ClassAtomNoDash :: \ [lookahead = c] | ||
_ => Ok(Some(( | ||
ClassAtom::CodePoint(u32::from('\\')), | ||
Some(ClassAtom::CodePoint(u32::from('c'))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't able to follow this - from what I can tell in the spec it says:
Return the numeric value of U+005C (REVERSE SOLIDUS).
but the actual behavior in Chrome agrees with your code - do you know where this is documented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can read the spec like you have suggested to write to code in your other comment. It just says that if we encounter a \
and the next char is a c
we return U+005C
. Then the spec "returns"
and we read the c
as a single char next.
In non unicode mode
ClassAtomNoDash :: \ [lookahead = c]
was not being parsed correctly.