-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTML character reference processing is inconsistent #488
Comments
Only for legacy entities, which is consistent with HTML. The list of entities matched is taken directly from the HTML spec.
I don't think we have any uses of that form to check if the decision to use that form was deliberate (e.g. the numeric value of the code point is relevant to the reader), so I didn't implement it. If there's some usages out there which you think are definitely not deliberate, I'd be happy to accept a PR adding that.
Omitting that case was a deliberate decision, which I stand by. Rather than formatting those, I would prefer to have a lint rule for unknown entity-like things which will warn on that case.
Fixed in #489, thanks.
I don't think this is literally ever going to come up, so I don't want to spend time implementing it. |
I think it makes as much sense to convert numeric references as it does to convert named references, so I'll try to submit that.
Perfect, that's the kind of context I was looking for. It would incline me towards requiring the semicolon for replacement and catching mistakes by linting (i.e., the first suggestion above). WDYT? |
I prefer errors to be automatically fixed when possible, which only the formatter is capable of at the moment. (It would be nice to make the linter support So I'm inclined to stick with the current strategy unless there's some improvement to the user experience from the strategy you propose. |
Works for me. |
#481 introduced processing of named character references, but is inconsistent with the HTML spec (cf. https://html.spec.whatwg.org/multipage/parsing.html#tokenization and specifically https://html.spec.whatwg.org/multipage/parsing.html#named-character-reference-state )—it matches without a terminal semicolon but also fails to handle numeric references or the prefix matching that renders e.g.
I'm ¬it; I tell you
asI'm ¬it; I tell you
, and the inner logic lowercasing≤
(which is its own bug and should be<
) and&
only with a terminal semicolon is inconsistent with it being optional in the initial match and also inconsistent with the lack of e.g.AMp;
in https://html.spec.whatwg.org/multipage/named-characters.html#named-character-references .I think it should instead be either unconditionally strict1 or else better aligned with HTML2, and in either case should probably also be updated for numeric references and correct case handling.
Footnotes
e.g.,
↩e.g.,
↩The text was updated successfully, but these errors were encountered: