-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Raw & in attributes #134
Raw & in attributes #134
Conversation
Reference: https://www.w3.org/TR/html52/syntax.html#character-reference-state If the character reference was consumed as part of an attribute (return state is either attribute value (double-quoted) state, attribute value (single-quoted) state or attribute value (unquoted) state), and the last character matched is not a U+003B SEMICOLON character (;), and the next input character is either a U+003D EQUALS SIGN character (=) or an alphanumeric ASCII character, then, for historical reasons, switch to the character reference end state. If the last character matched is not a U+003B SEMICOLON character (;), this is a parse error.
@@ -622,13 +622,27 @@ public function testTagAttributes() | |||
), | |||
false | |||
), | |||
"<foo a='blue&red'>" => array( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with this cange blue&red
is a valid attribute and will not trigger errors
<img src="a&b"/> | ||
<img src="a&="/> | ||
<img src="a&=c"/> | ||
<img src="a&=9"/> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgive me if I'm missing something here but, shouldn't the written attribute be &
instead of &
since that was the input and it's within an attribute? I've not looked at the spec (or worked on this) in awhile so I could be off here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the specs says:
If the character reference was consumed as part of an attribute (return state is either attribute value (double-quoted) state, attribute value (single-quoted) state or attribute value (unquoted) state), and the last character matched is not a U+003B SEMICOLON character (;), and the next input character is either a U+003D EQUALS SIGN character (=) or an alphanumeric ASCII character, then, for historical reasons, switch to the character reference end state.
My interpretation if it is that the correct form is &
but for "historical reasons" a mere &
is accepted.
The current implementation is able to parse correctly &
, but when serializing it back, it fixes it transforming it into &
.
Does it make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me it does. Other "accepted syntax variations" are normalized as well when output is written.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense. I was curious about browser handling and compatibility. Quite often developers use an &
. For reference, it turns out that they translate &
to &
when calling the link. There used to be issues with this in older browser (e.g., amp;
being prepended to the next key/value pair in the query. I think this is fixed in all modern browser.
lgtm |
Fixes #124
Tries to implement the HTML spec:
https://www.w3.org/TR/html52/syntax.html#character-reference-state
More specifically: