Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix parsing for invalid control characters in class atoms #85

Merged
merged 1 commit into from
Jan 28, 2024

Conversation

raskad
Copy link
Collaborator

@raskad raskad commented Jan 25, 2024

In non unicode mode ClassAtomNoDash :: \ [lookahead = c] was not being parsed correctly.

@raskad raskad force-pushed the fix-class-atom-invalid-control-escapes branch from 9e28b12 to 699f7a6 Compare January 28, 2024 20:29
@raskad raskad merged commit 61d52fd into master Jan 28, 2024
7 checks passed
Copy link
Owner

@ridiculousfish ridiculousfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some suggestions for simplifying.

fn try_consume_bracket_class_atom(&mut self) -> Result<Option<ClassAtom>, Error> {
fn try_consume_bracket_class_atom(
&mut self,
) -> Result<Option<(ClassAtom, Option<ClassAtom>)>, Error> {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a complicated return type. Here's two alternatives for simplifying it:

  1. When encountering "\c" in a bracket, instead of returning both, unread the "c" and only return the backslash. The next iteration through the loop will then pick up the "c".
  2. Instead return a Result<Vec<ClassAtom>, Error>

I think the first will end up quite clean assuming it works.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good, I will try to change it to the first if it works.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works well. I implemented it in #88.

// ClassAtomNoDash :: \ [lookahead = c]
_ => Ok(Some((
ClassAtom::CodePoint(u32::from('\\')),
Some(ClassAtom::CodePoint(u32::from('c'))),
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to follow this - from what I can tell in the spec it says:

Return the numeric value of U+005C (REVERSE SOLIDUS).

but the actual behavior in Chrome agrees with your code - do you know where this is documented?

Copy link
Collaborator Author

@raskad raskad Jan 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can read the spec like you have suggested to write to code in your other comment. It just says that if we encounter a \ and the next char is a c we return U+005C. Then the spec "returns" and we read the c as a single char next.

@raskad raskad deleted the fix-class-atom-invalid-control-escapes branch January 28, 2024 21:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants