Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[css-syntax] Wrapping up the <unicode-range> thing #3591

Closed
tabatkins opened this issue Feb 1, 2019 · 1 comment
Closed

[css-syntax] Wrapping up the <unicode-range> thing #3591

tabatkins opened this issue Feb 1, 2019 · 1 comment
Labels
Closed as Duplicate Commenter Satisfied Commenter has indicated satisfaction with the resolution / edits. css-syntax-3 Testing Unnecessary Memory aid - issue doesn't require tests

Comments

@tabatkins
Copy link
Member

(migrated from the mailing list)

Tab Atkins said:

So, unicode ranges aren't settled right now, and I'd like to wrap them up.

Quick history lesson:

  1. Unicode ranges were originally defined as a CSS token. They have
    to be specially handled, because they don't look like any other token.

  2. FF got some bug reports about the selector u+a {...} failing -
    the reason is because it parses as a unicode-range token, which is
    invalid for selectors.

  3. I proposed we eliminate unicode-range as a token, and break it down
    into a complicated reimagining based on existing tokens, like I did
    for An+B.

The major problem with this is that some hex numbers look like
exponented numbers. For example, "U+04e4" is supposed to be Ӥ, but it
parses as:

ident(U) delim(+) number(40000)

Obviously, 0x4e4 and 40000 are very different numbers! (U+40000 is
actually invalid!) I currently solve this by keeping around the
"representation" of the number token, which is the actual characters
it was written with, but no impl does that, or is willing to keep
around a string for every number and dimension they parse.

So I think there are two ways we can handle this:

  1. Abandon the project, restore , and live with
    the fact that we have a weird almost-useless token that will
    occasionally cause problems for authors in unrelated contexts. (We
    can't even really do something like make Selectors treat unicode-range
    specially, because it can cut selectors in pieces - "u+area" parses as
    a urange(a) ident(rea)!)

  2. Produce a new, reliable syntax for unicode ranges, and keep around
    the old version for back-compat, with a warning that some values won't
    parse correctly. The most obvious fix is to just replace the + with a
    -, like "U-0404", "U-400-600", or "U-4??". This makes the entire
    thing an ident, which keeps around the characters properly (or an
    ident followed by some ? delims, which is also fine).

Thoughts?


Simon Sapin said:

On 22/06/15 17:26, Tab Atkins Jr. wrote:

So I think there are two ways we can handle this:

  1. Abandon the project, restore , and live with
    the fact that we have a weird almost-useless token that will
    occasionally cause problems for authors in unrelated contexts. (We
    can't even really do something like make Selectors treat unicode-range
    specially, because it can cut selectors in pieces - "u+area" parses as
    a urange(a) ident(rea)!)

Not sure if this is a good idea, but we could handle that in the
Selectors grammar as well. u+a/**/rea would also parse, which we might
not want, but it’s much harder for authors to accidentally do that than u+a.

  1. Produce a new, reliable syntax for unicode ranges, and keep around
    the old version for back-compat, with a warning that some values won't
    parse correctly. The most obvious fix is to just replace the + with a
    -, like "U-0404", "U-400-600", or "U-4??". This makes the entire
    thing an ident, which keeps around the characters properly (or an
    ident followed by some ? delims, which is also fine).

unicode-range: U+04e4 works today in multiple browsers. Breaking this
seems worse than the u+a selector not working. (Introducing an
alternative unicode-range syntax will not help existing unmaintained
content.)


fantasai said:

I agree with Simon. We should not break unicode-range syntax here.

If it's possible to fix this by munging the Selectors grammar,
that seems like the best option. I'd argue that we may want to
allow implementations to use context-specific parsing rules as
well, if they want to go that route instead, so the UA would be
allowed to either accept or reject u+a/**/rea. (A full CSS parser
might not want to do that, but a Selectors parser shouldn't have
to deal with unicode-range token munging. Ditto An+B, now I think
about it.)


Simon Sapin said:

Allowing a different behavior without mandating it reduces interop, and
this doesn’t seem to be a good enough reason to do it.


fantasai said:

The cases where there wouldn't be interop are just weird edge cases
like u+a/**/rea, right? I don't think interop on that case is worth
imposing the complexity of a CSS-token-munging parsing model on all
non-CSS implementations of Selectors.


Tab Atkins said:

On Fri, Jun 26, 2015 at 3:37 PM, Simon Sapin simon.sapin@exyr.org wrote:

unicode-range: U+04e4 works today in multiple browsers. Breaking this
seems worse than the u+a selector not working. (Introducing an alternative
unicode-range syntax will not help existing unmaintained content.)

There's a difference between "it works" and "it's used". I'm going to
run some searches over our corpus and see if I can find any actual
uses of unicode-ranges that look like scinot numbers.

@tabatkins tabatkins added css-syntax-3 Commenter Satisfied Commenter has indicated satisfaction with the resolution / edits. Closed as Duplicate Testing Unnecessary Memory aid - issue doesn't require tests labels Feb 1, 2019
@tabatkins
Copy link
Member Author

Dupe of #3588

@tabatkins tabatkins added this to the CSS Syntax 3 June 2019 CR milestone Jun 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Closed as Duplicate Commenter Satisfied Commenter has indicated satisfaction with the resolution / edits. css-syntax-3 Testing Unnecessary Memory aid - issue doesn't require tests
Projects
None yet
Development

No branches or pull requests

1 participant