-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Separate to_lowercase() into correct Unicode and simple implementations #26244
Comments
+1 |
cc #39659 |
As to locale-aware case mapping, I think this is starting to be outside of standard-library-territory and should be on crates.io instead. Such a library might want not just language-dependent entries of |
I don't believe we need three different variations of to_lowercase in std. There are four tiers that have been raised in this issue:
I am closing this issue because I would like to see this explored in a crate instead. Once there is an implementation and more clarity around what algorithm we mean by codepointwise lowercase, if people still believe this needs to be in std, I would be open to reconsidering. |
I think there are two distinct use cases for string lowercasing:
Currently the locale-unaware
to_lowercase
tries to do both, but doesn't do either one quite right. It isn't quite correct for the first case (it handles Greek #26035, but doesn't handle Turkish), and it's quirky which makes it difficult to be used safely in the second case.Therefore I suggest splitting this function into two, e.g.,
to_locale_lowercase(locale)
andto_partial_lowercase()
: one that fully implements Unicode (requires locale specified and is good for displaying strings to people), and another which is incorrect in many cases, shouldn't be displayed to users, but preserves simple invariants of ASCII lowercasing that make it useful and safe for algorithms that need code-point-wise lowercasing.The partial implementation should meet invariants for every valid string
a
andb
:The text was updated successfully, but these errors were encountered: