Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raw & in attributes #134

Merged
merged 1 commit into from
Sep 1, 2017
Merged

Raw & in attributes #134

merged 1 commit into from
Sep 1, 2017

Conversation

goetas
Copy link
Member

@goetas goetas commented Aug 28, 2017

Fixes #124

Tries to implement the HTML spec:
https://www.w3.org/TR/html52/syntax.html#character-reference-state

More specifically:

If the character reference was consumed as part of an attribute (return state is either attribute value (double-quoted) state, attribute value (single-quoted) state or attribute value (unquoted) state), and the last character matched is not a U+003B SEMICOLON character (;), and the next input character is either a U+003D EQUALS SIGN character (=) or an alphanumeric ASCII character, then, for historical reasons, switch to the character reference end state.

Reference: https://www.w3.org/TR/html52/syntax.html#character-reference-state

If the character reference was consumed as part of an attribute (return state is either attribute value (double-quoted) state, attribute value (single-quoted) state or attribute value (unquoted) state), and the last character matched is not a U+003B SEMICOLON character (;), and the next input character is either a U+003D EQUALS SIGN character (=) or an alphanumeric ASCII character, then, for historical reasons, switch to the character reference end state.
If the last character matched is not a U+003B SEMICOLON character (;), this is a parse error.
@@ -622,13 +622,27 @@ public function testTagAttributes()
),
false
),
"<foo a='blue&red'>" => array(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with this cange blue&red is a valid attribute and will not trigger errors

<img src="a&amp;b"/>
<img src="a&amp;="/>
<img src="a&amp;=c"/>
<img src="a&amp;=9"/>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgive me if I'm missing something here but, shouldn't the written attribute be & instead of &amp; since that was the input and it's within an attribute? I've not looked at the spec (or worked on this) in awhile so I could be off here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the specs says:

If the character reference was consumed as part of an attribute (return state is either attribute value (double-quoted) state, attribute value (single-quoted) state or attribute value (unquoted) state), and the last character matched is not a U+003B SEMICOLON character (;), and the next input character is either a U+003D EQUALS SIGN character (=) or an alphanumeric ASCII character, then, for historical reasons, switch to the character reference end state.

My interpretation if it is that the correct form is &amp; but for "historical reasons" a mere & is accepted.
The current implementation is able to parse correctly &, but when serializing it back, it fixes it transforming it into &amp;.

Does it make sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me it does. Other "accepted syntax variations" are normalized as well when output is written.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense. I was curious about browser handling and compatibility. Quite often developers use an &. For reference, it turns out that they translate &amp; to & when calling the link. There used to be issues with this in older browser (e.g., amp; being prepended to the next key/value pair in the query. I think this is fixed in all modern browser.

@mattfarina
Copy link
Member

lgtm

@goetas goetas merged commit 39e2a7a into 2.x Sep 1, 2017
@goetas goetas deleted the ampersand-in-urls branch September 1, 2017 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants