Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Case-sensitive as="" for <link> is weird #1665

Closed
domenic opened this issue Aug 12, 2016 · 79 comments
Closed

Case-sensitive as="" for <link> is weird #1665

domenic opened this issue Aug 12, 2016 · 79 comments

Comments

@domenic
Copy link
Member

domenic commented Aug 12, 2016

In #1449 @annevk talked us into making as="" for <link> case-sensitive. Part of Anne's argument was that HTML has been inconsistent on this so far.

However, upon prompting from @esprehn, I checked all the other attributes in the HTML spec (using the attributes index). Everything that is comparing against a predefined set of values (including things like MIME types) is treated case-insensitively.

I think introducing this new, inconsistent way of matching attribute values was a mistake. I think we should make as="" a normal enumerated attribute that matches case-insensitively.

/cc @igrigorik @yoavweiss

@annevk
Copy link
Member

annevk commented Aug 13, 2016

It was also a mistake, e.g., for HTTP methods. Everyone still thinks those are case-insensitive.

I'd prefer we just did case-sensitive from now on so there will eventually be less confusion between markup and APIs that use enums, unless you want to make those case-insensitive too?

@yoavweiss
Copy link
Contributor

yoavweiss commented Aug 13, 2016

A functional requirement for case sensitivity is likely to cause confusion. I'd be fine with such a conformance requirement, but ignoring matching values because of case seems over restrictive to me. Are there any precedents to that in HTML?

@esprehn
Copy link

esprehn commented Aug 14, 2016

I'd prefer we just did case-sensitive from now on so there will eventually be less confusion between markup and APIs that use enums, unless you want to make those case-insensitive too?

I'd prefer we didn't fork the largely consistent behavior across attribute values here. Having rel@ be case-insensitive and is@ be case-sensitive is very strange and will lead to author confusion.

As an implementor this is also unfortunate since we'd need a whitelist for the case-sensitivity of this one attribute in our generated code.

@annevk
Copy link
Member

annevk commented Aug 14, 2016

The problem is that by making it case-insensitive here you would subset Fetch. We cannot keep doing that to standards HTML integrates with.

@esprehn
Copy link

esprehn commented Aug 14, 2016

I'm not sure what you mean by "subset fetch". If I was an author making a custom element I would just call toLowerCase() before passing the value to fetch, the platform can do the same.

@annevk
Copy link
Member

annevk commented Aug 14, 2016

Because HTML would constrain the value space. E.g., as I said before, HTTP GET and Get are distinct.

@esprehn
Copy link

esprehn commented Aug 14, 2016

At the spec level I would object to any spec that wants to have an enumerated value that varies only by case. Fetch is not going to have values for as@ that are not lowercase. I would formally object to any spec being implemented in Blink that tried to do that.

@annevk
Copy link
Member

annevk commented Aug 14, 2016

Also, with custom elements you run into the problem that the most convenient lowercasing operation in JavaScript is not compatible with what HTML uses.

@rniwa
Copy link

rniwa commented Aug 14, 2016

The assertion that no other attribute value does case-sensitive comparison is false. selectionDirection on HTMLInputElement, for example, uses (to set the selection range) case-sensitive comparison against a set of known values.

HTMLInputElement also has name attribute, and if its value is case-sensitivity equal to _chartset_, we would ignore the value of the element.

HTMLOListElement's type attribute also does case-sensitive comparison against the list of known values.

A form-associated element's form content attribute uses case-sensitive comparison against IDs of form elements.

getElementsByName, for example, also uses case-sensitive comparison of element names.

There are a lot more examples of content attributes and features that use case-sensitive comparisons in HTML. Consistency is good but there is hardly any consistency about case-sensitivity in HTML/Web. There are even multiple definitions of case-insensitiveness, and some APIs are only case-insensitive inside a HTML document or with regards to HTML elements.

Given the increased use of inline SVG elements and other XML features being incorporated into modern Web sites and web apps, the simplest thing we can do today is to make new APIs case-sensitive and treat insensibility as a compatibility feature for legacy APIs.

@domenic
Copy link
Member Author

domenic commented Aug 14, 2016

Wow, thanks for that search @rniwa! I don't think the case-sensitive ID/name comparison is the same type of thing; mostly we're talking about cases where there's a finite set of enumerated values. And your selectionDirection example is an IDL attribute, not a content attribute; IDL attributes are indeed always case-sensitive. But the _charset_ and <ol type> cases are definitely counterexamples.

I still think it would be better to go with the vast majority and always be case-insensitive, but I no longer feel as strongly.

I do think @esprehn is right that we shouldn't be concerned about value constraints here; Fetch should never add a destination that differs from other destinations only in case. We shouldn't let one bad case (HTTP methods) guide us toward a bad precedent.

However, I agree with @annevk that using JS lowercasing in e.g. a custom element is very bad; you really should be using some kind of case-folding comparison, which isn't even something exposed to JavaScript. (At least, not in any way I know about; I'm not aware of all the Intl library stuff, however.) We can get away with toLowerCase() as long as nobody uses Unicode, but that's an unfortunate assumption to make.

@annevk
Copy link
Member

annevk commented Aug 14, 2016

I also think that everywhere where we use case-insensitive comparison it is now generally seen as a mistake. Avoiding it going forward should make things more predictable as everyone will converge on canonical casing.

@domenic
Copy link
Member Author

domenic commented Aug 14, 2016

I generally disagree with that. I think enumerated values being case-insensitive is more author-friendly and consistent with HTML in general (e.g. <A CROSSORIGIN="ANONYMOUS"> is allowed just as much as <a crossorigin="anonymous"> is). We rejected the strictness of XHTML.

@rniwa
Copy link

rniwa commented Aug 15, 2016

I'm not sure if "we rejected" is a good characterization of what has happened. It was more to do with backwards compatibility requiring case-insensibility. And the whole situation is a bit of mess due to case-insensibility behavior being inconstant across the platform.

@annevk
Copy link
Member

annevk commented Aug 15, 2016

We rejected not having error handling and forward compat for syntax. It's much more subtle than "embracing tagsoup" as some like to portray it.

@annevk
Copy link
Member

annevk commented Aug 15, 2016

And indeed, rejected not being backwards compatible. None of those are at issue here.

@tabatkins
Copy link
Contributor

I generally disagree with that.

Yes, I also disagree. Case-insensitivity for enumerated values constrained to be within the ASCII range is simple and easy for authors, and common throughout the web platform - CSS, in particular, uses it everywhere. If you're constrained to the ASCII range, the obvious JS method for lowercasing works 100% correctly.

As @esprehn says, any spec defining enumerated values that has two distinct values that differ only be case is doing something terribly wrong; such a value would draw strong objections from implementors, for good reason. So, there's no practical concern about value clashing here, just aesthetic/design concerns.

This is very distinct from arbitrary / open-ended names, particularly if you're allowing full Unicode (which is generally good practice). At that point there's no single "correct" way to lowercase, and so we should just be doing codepoint comparison. Again, CSS does precisely this.

But the charset and <ol type> cases are definitely counterexamples.

<ol type> is iffy. It's not taking a word, it's taking "what symbol will the first <li> display". It'd be different if it was type=alpha vs type=ALPHA (that runs into the "don't define distinct enumerated values that differ only by case"), but I'm willing to bite the bullet on "single symbols that map directly to the visual result they cause". (<ol type> is not a good design by any metric - ideally it would take a CSS counter-style name - but we're stuck with it. I'd be against this design if it were proposed today.)

As well, input names are "open-ended", so it's more natural that they're compared codepoint-wise. Thus the charset thing is ok.

@annevk
Copy link
Member

annevk commented Aug 16, 2016

the obvious JS method for lowercasing works 100% correctly

> "İ".toLowerCase() == "i"

@domenic
Copy link
Member Author

domenic commented Aug 16, 2016

@annevk, please read the first part of @tabatkins's sentence that you quoted.

@annevk
Copy link
Member

annevk commented Aug 16, 2016

@domenic how are attribute values constrained in that way? They're not. Or do you mean you first have to check that all the code points are in the ASCII range? That's a lot more complicated than writing toLowerCase() and then comparing the result.

@domenic
Copy link
Member Author

domenic commented Aug 16, 2016

They are in all the cases so far. We don't know yet whether authors plan to create enumerated attributes containing Turkish Is in their custom elements, but I think @tabatkins was assuming they would not, in his sentence which you misquoted.

@annevk
Copy link
Member

annevk commented Aug 16, 2016

Again, if someone writes as="İmage" it would work in his JavaScript implementation and not in a browser. That's a problem.

@tabatkins
Copy link
Contributor

I'm saying that all existing enumerated attributes in HTML (and all we plan to add) are ascii-only. If you write an attribute value also in ASCII, JS's lowercasing works great. "Someone might use an API and input an attribute value that includes some random unicode characters that happen to JS-lowercase into ASCII values" is a super-bizarre case for us to care about. Why is this something we need to worry about and optimize our API design for?

And if people do end up, in their custom elements, including non-ASCII values, they should follow the web's common practices and match those codepoint-wise. No lowercasing involved there at all.

@annevk
Copy link
Member

annevk commented Aug 17, 2016

You said it handles it 100% correctly, but that is simply not true. And we should care about error handling since the web tends to start depending on the errors.

Furthermore, it would be much easier if we required canonical case as then you can just copy it around without hassle.

@tabatkins
Copy link
Contributor

You said it handles it 100% correctly, but that is simply not true.

No, it's true in the context I gave. If the value-space is ASCII and people are working in ASCII, then it's fine.

Your concern is just that polyfills will accidentally support people typing non-ASCII. I'm confused how that has anything to do with HTML itself; how would that freeze us into accepting non-ASCII? It's also just an incredibly weird thing to worry about imo.

Furthermore, it would be much easier if we required canonical case as then you can just copy it around without hassle.

What hassle is caused by ASCII-CI values? I don't see how copy-pasting is affected.

@annevk
Copy link
Member

annevk commented Aug 17, 2016

They do not work in APIs that take enums. We are going in circles now.

@zcorpan
Copy link
Member

zcorpan commented Aug 19, 2016

I think that's a problem that can trip up beginners but in practice is not much of a problem at all. We have the same situation with attribute names and reflecting IDL attributes (e.g. <button onClick="..."> works but button.onClick = ...; does not).

Maybe the Web platform should have convenience functions for ASCII-lowercasing etc.

@domenic
Copy link
Member Author

domenic commented Sep 13, 2016

It seems like there is not agreement on changing this to be case-insensitive. Someone should add web platform tests for the reflection, though, since this is so unusual compared to existing reflections. My understanding is that Chrome would fail such tests. I will open an issue on web platform tests and on Chrome.

@domenic
Copy link
Member Author

domenic commented Sep 13, 2016

@domenic domenic closed this as completed Sep 13, 2016
@zcorpan
Copy link
Member

zcorpan commented Feb 23, 2017

Case sensitive or not now came up for Feature Policy: w3c/webappsec-permissions-policy#54 (comment)

@foolip
Copy link
Member

foolip commented Mar 3, 2017

Thanks @zcorpan, that would suggest that unless someone polices new specs very well, we'd keep getting some case-insensitive things even if we "decide" that we don't like it.

Out of curiosity, for content attribute values, what are the places where it's definitely case-sensitive if we're adding new APIs? Keywords and tokens are all that I can think of that's actually compared with other strings and not just passed along or parsed somehow.

@annevk
Copy link
Member

annevk commented Mar 3, 2017

What do you mean by places? IDs, classes, URLs, all have various degrees of being case-sensitive (and sometimes not (and where not it's been a source of problems historically)).

@foolip
Copy link
Member

foolip commented Mar 7, 2017

You got my meaning, and keywords and tokens are the "places" where I know it's usually case-insensitive. IDs and classes I suppose are a mess due to history, if doing them today I think we'd want them to be case-sensitive since the value space is open, to avoid having to pick a certain kind of case-insensitivity. URLs aren't compared, but we clearly shouldn't have any case folding in any case.

Namespaces are an odd case where we do compare against a set of known values, but do so case-sensitively, presumably because they look like URLs even though they're never resolved or otherwise used as URLs.

I don't think we could formulate a principle that explains the current mess, but my preference going forward would be case-insensitive for tokens and keywords like to sandbox flags or track kinds, and case-sensitive for essentially everything else that I can think of. That is, if we're comparing attribute values against some (non-namespace) internal string, do so case-insensitively.

@annevk
Copy link
Member

annevk commented Mar 7, 2017

So your preference would be to do it differently from how the surrounding programming environment and API would handle it?

Anyway, if everyone wants that inconsistency, fine, but don't count on me cleaning up the resulting mess.

@foolip
Copy link
Member

foolip commented Mar 7, 2017

Consistency isn't on the menu here, but yes, that is the particular kind of inconsistency that I'd prefer, perhaps because the taste is already familiar.

FWIW, if I felt strongly the other way, I'd probably try to add use counters in the parsing of all existing attributes that could be turned into enums in the IDL if made case-sensitive. Then, if the usage was super low I'd argue that we should change them for better ergonomics even if it's already interoperable. Short of that, only inconsistency is on offer.

@annevk
Copy link
Member

annevk commented Mar 7, 2017

Changing features that are already fully interoperable is rarely (if ever) worth it. Avoiding repeating past mistakes often is.

@domenic
Copy link
Member Author

domenic commented Mar 7, 2017

Changing features that are already fully interoperable is rarely (if ever) worth it.

I strongly agree! Which is why I am upset that this change was made during the speccing process, when we already have interoperable implementations of as="" that are case-insensitive.

@annevk
Copy link
Member

annevk commented Mar 7, 2017

as="" is far from fully interoperable. Let's not play games.

@domenic
Copy link
Member Author

domenic commented Mar 7, 2017

??? It's interoperably implemented as case-insensitive in both UAs that implement it at all.

@annevk
Copy link
Member

annevk commented Mar 7, 2017

Okay, what I mean by fully interoperable is implemented in the same way by all browsers. See also ancestorOrigins for why that distinction is important.

@domenic
Copy link
Member Author

domenic commented Mar 7, 2017

Implemented in two UAs one way versus implemented in 0 UAs the way you specced it seems like a clear case of the editor overstepping.

@annevk
Copy link
Member

annevk commented Mar 7, 2017

Let's not make it personal? I don't think I even wrote the text here.

@domenic
Copy link
Member Author

domenic commented Mar 7, 2017

My apologies. My point was more that it goes against the process we are trying to embody. Such speculative changes without implementation interest should be left as PRs, not as part of the spec.

@foolip
Copy link
Member

foolip commented Mar 7, 2017

So, hmm, given that these changes predate our recent adventures in editorial policy, let's treat it like any old decision that we'd like to revisit.

None of us have a formula for making trade-offs like this or the data to plug in to it, so we're probably not going to convince one another about what's actually best here.

If we just broadcast asking for implementer interest we will probably get none, so how about we try to find some relevant person for each engine to muster an opinion, which could be "don't care." From the list of usual suspects:

Chromium: @dominiccooney or @tkent-google?
EdgeHTML: anyone know who to ask?
Gecko: @Ms2ger or @smaug----?
WebKit: @cdumez or @rniwa?

Then, a week or two from now, our editor-in-chief @domenic can make the call if it's still not obvious. Yes?

@annevk
Copy link
Member

annevk commented Mar 7, 2017

It would be good to hear what folks want for as="" and what they want going forward for new attributes.

@foolip
Copy link
Member

foolip commented Mar 7, 2017

Good point, the answer may not be the same.

@annevk
Copy link
Member

annevk commented Mar 8, 2017

Via email from @travisleithead:

So, I read through the issue and think I follow each side of the arguments given. For as=””, I land on the side of local consistency/ergonomics. E.g., I think that authors have a particular style for when they work in HTML markup, vs., say, CSS or JavaScript code. If an author is working in HTML and the surrounding casing content is insensitive-based (upper-case or mixed case or whatever) they will naturally continue using the same pattern when they add support for as=”” into the markup. This is just human nature to apply the familiar based on context (and why copy/paste is so popular). Given these human tendencies, I fall on the side of preferring case-insensitivity for the as attribute, as a way to avoid local inconsistency when authoring markup. The future-consistency, and simplification arguments for case-sensitivity just don’t sound practical to me.

@smaug----
Copy link

I want simpler platform in the future, when possible, and case-insensitive handling makes it more complicated. And there doesn't seem to be strong reason to not have case-sensitiveness here.
So, I agree with @annevk's #1665 (comment)

Given that 'as' is relatively new, using case-sensitive there sounds reasonable.

My rule of thumb for cases when there isn't very strong reason for behavior X vs Y is to think what kind of behavior I'd like to see in the platform in general. And case-sensitiveness certainly is such.

@domenic
Copy link
Member Author

domenic commented Apr 10, 2017

The Blink bug has been wont-fixed, and WebKit continues to maintain case-insensitivity in their implementation. I'd like to close this as 3/4 browsers prefer case-insensitivity.

@domenic
Copy link
Member Author

domenic commented Apr 10, 2017

Or, I guess, not close this issue, since we still need to fix the spec to be case-insensitive for as="" and match the existing implementations. But close the discussion, and not let it block further work such as #2515.

domenic added a commit that referenced this issue Apr 25, 2017
Closes #1665 by aligning with other enumerated attributes.
domenic added a commit that referenced this issue Apr 26, 2017
Closes #1665 by aligning with other enumerated attributes.
inikulin pushed a commit to HTMLParseErrorWG/html that referenced this issue May 9, 2017
Closes whatwg#1665 by aligning with other enumerated attributes.
inikulin pushed a commit to HTMLParseErrorWG/html that referenced this issue May 9, 2017
Closes whatwg#1665 by aligning with other enumerated attributes.
alice pushed a commit to alice/html that referenced this issue Jan 8, 2019
Closes whatwg#1665 by aligning with other enumerated attributes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

10 participants