Provide a way to do caseless comparison #99

domenic · 2016-08-14T15:31:43Z

Unicode defines several caseless comparisons: caseless, compatibility caseless, canonical caseless, and identifier caseless. It would be nice to be able to do these from JavaScript. @littledan says the best way is to expose the Unicode case-folding operations.

littledan · 2016-08-15T13:35:03Z

Specifically, what if we had toCaseFold/toLocaleCaseFold methods on String.prototype? This would allow users to build all three of the caseless comparisons that @domenic mentions. These are really basic functions for many Unicode algorithms, so they seem like a good building block to have. cc @jungshik

littledan · 2016-08-26T05:39:51Z

I wrote up a quick explainer for toCaseFold. Any thoughts? Would this group be interested in seeing case folding pushed forward, or is it an edge case that's not so relevant?

cc @sebmarkbage @jungshik @ericf @jswalden @caridy @zbraniecki

littledan · 2016-08-26T22:29:45Z

Apologies, this is unnecessary; you can already set the sensitivity lower for comparison, e.g.,

"foo".localeCompare("FOO", "en", {sensitivity: "accent"})  // 0

This bug can be closed

domenic · 2016-08-26T22:48:55Z

Fascinating. Do you know how those sensitivity options match to Unicode's comparison algorithms? The spec doesn't seem super clear, but maybe I am not reading it fully...

There's still value in toCaseFold for map keys and such though. But I guess it probably loses priority.

littledan · 2016-08-26T23:06:32Z

It's based on an earlier part of the collation key, described in UTS 10. I think the next thing to do from here if we want to improve performance for a case like that would be to expose an API to get the collation key of a string, probably as a Uint8Array, though that would be rather inconvenient as a Map key.

jungshik · 2016-08-27T07:17:41Z

@domenic See http://unicode.org/reports/tr10/#Multi_Level_Comparison
and http://userguide.icu-project.org/collation/concepts

'base', 'accent' and 'variant' correspond to level 1 (primary strength), level 2 (secondary strength) and level 3 (tertiary strength) in UTS 10.

My memory is fuzzy as to what exactly 'case' is for. (there were a lot of back'n'forth on this issue when version 1.0 spec was worked on).

http://www.ecma-international.org/ecma-402/3.0/#collator-objects has the following:

Collator, however, requires that the usage is specified through the usage property of the
options object, alternate handling through the ignorePunctuation property of the options object, > and case level and the strength through the sensitivity property of the options object.

I believe that setting the sensitivity to 'case' will turn on 'case level' in UTS 10. Ok, I read the v8 implementation (which was done while the spec 1.0 was written). Setting the sensitivity to 'case' will use level 1 in UTS 10 (Primary strength) AND turn on 'case level'. That is level 1.5 in a sense (primary difference + case difference is taken into account but accent is ignored).

As a result 'case level' cannot be set independent of 'collation level'. It's mostly ok except that 'level 2.5' (level between level 2 and level 3) cannot be created. In level 2.5, level 3 differences (regular kana vs small kana, "A" vs "Ⓐ" )other than case difference will be ignored.

Maybe, the spec need to have a (non-normative) note explaining what setting 'sensitivity' means.

littledan · 2016-08-28T16:09:14Z

It may be useful to note that this is a different meaning of case insensitive comparison than case folding. For example, case folding would leave punctuation included, and this notion of strength would not IIRC. I think the collation definition is probably more useful semantically, but some standards/algorithms (e.g. HTML) make reference to case folding.

domenic · 2016-08-29T19:06:23Z

Wow, thanks @jungshik for the detailed answer! This stuff is complicated...

Maybe, the spec need to have a (non-normative) note explaining what setting 'sensitivity' means.

IMO the spec already has that in "The sensitivity of collator is interpreted as follows:". Although it looks normative, I guess it is non-normative, since the actual behavior delegates to UTS 10. (I assume?)

littledan · 2016-09-07T17:43:36Z

@eaenet You and I discussed this in person, and I believe our tentative conclusion was, it still could have value to provide case folding, due to the differences from collation that I mentioned in this comment. If we don't provide case folding, end users may implement it themselves wrong. Any thoughts from anyone on that proposition? Note that case folding is used internally by ECMAScript in case-insensitive RegExps.

eaenet · 2016-09-08T13:09:24Z

@littledan agreed. At some point we should have a discussion about what APIs make sense as part of ECMAScript and which make more sense as Web APIs.

jungshik · 2016-09-08T21:05:27Z

Maybe, the spec need to have a (non-normative) note explaining what setting 'sensitivity' means.

IMO the spec already has that in "The sensitivity of collator is interpreted as follows:". Although it > looks normative, I guess it is non-normative, since the actual behavior delegates to UTS 10. (I
assume?)

Sorry I overlooked that. Yes, indeed the spec explains that. I think it's still normative. A lot of things in Ecma402 refers to the Unicode standard and UTS 10 is a part of the TUS.

jungshik · 2016-09-08T21:06:14Z

@littledan , @eaenet: Are you aware of ignorePunctuation (boolean) in Intl.Collator?

littledan · 2017-08-10T17:08:47Z

@jungshik Doesn't ignorePunctuation do something different than case folding? Not sure if it's needed, but some things definitely use case folding (e.g., some case-insensitive file systems, databases).

domenic added the enhancement label Aug 14, 2016

littledan mentioned this issue Nov 10, 2018

add String.toTitleCase, String.toLocaleTitleCase #294

Open

sffc added s: help wanted Status: help wanted; needs proposal champion c: text Component: case mapping, collation, properties and removed enhancement labels Mar 19, 2019

sffc added Proposal Larger change requiring a proposal s: comment Status: more info is needed to move forward and removed s: help wanted Status: help wanted; needs proposal champion labels Jun 5, 2020

sffc mentioned this issue Oct 14, 2020

Add locale sensitive substring matching functions to Intl.Collator #506

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide a way to do caseless comparison #99

Provide a way to do caseless comparison #99

domenic commented Aug 14, 2016

littledan commented Aug 15, 2016

littledan commented Aug 26, 2016 •

edited

Loading

littledan commented Aug 26, 2016 •

edited

Loading

domenic commented Aug 26, 2016 •

edited

Loading

littledan commented Aug 26, 2016

jungshik commented Aug 27, 2016

littledan commented Aug 28, 2016

domenic commented Aug 29, 2016 •

edited

Loading

littledan commented Sep 7, 2016

eaenet commented Sep 8, 2016

jungshik commented Sep 8, 2016

jungshik commented Sep 8, 2016

littledan commented Aug 10, 2017

Provide a way to do caseless comparison #99

Provide a way to do caseless comparison #99

Comments

domenic commented Aug 14, 2016

littledan commented Aug 15, 2016

littledan commented Aug 26, 2016 • edited Loading

littledan commented Aug 26, 2016 • edited Loading

domenic commented Aug 26, 2016 • edited Loading

littledan commented Aug 26, 2016

jungshik commented Aug 27, 2016

littledan commented Aug 28, 2016

domenic commented Aug 29, 2016 • edited Loading

littledan commented Sep 7, 2016

eaenet commented Sep 8, 2016

jungshik commented Sep 8, 2016

jungshik commented Sep 8, 2016

littledan commented Aug 10, 2017

littledan commented Aug 26, 2016 •

edited

Loading

littledan commented Aug 26, 2016 •

edited

Loading

domenic commented Aug 26, 2016 •

edited

Loading

domenic commented Aug 29, 2016 •

edited

Loading