-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restore eszett ß - which is not accented. #44
Conversation
This reverts commit 4b147a0.
I'm not sure how universally true this is, but... *if performing unicode decomposition of a single character yields two code points, and one of those belongs to a combining diacritical block, this is a clear candidate for diacritical removal. In fact this method can be quite effective. > 'Š'.normalize('NFKD').split('')
< ["S", "̌"] (2)
> 'Š'.normalize('NFKD').split('').map(c => c.charCodeAt(0).toString(16))
< ["53", "30c"] (2) On the other hand, if a character cannot be decomposed into two code points then I'd argue it's not 'accented', although again, I'm not sure how universally true this is. > 'ß'.normalize('NFKD').split('')
< ["ß"] (1) So yeah, 👍 Eszett is not a 'accented' IMO |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR 👍
Seems like tests are currently failing - see my comment below.
|
||
// See https://github.com/tyxla/remove-accents/issues/12 | ||
tape('ß is not accented', function(t) { | ||
t.same(removeAccents.remove('Straße'), 'Straße'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tape needs t.end()
to end the suite - see prior existing tests.
I'd very much like to refactor from tape
to jest
or another modern runner, but that is work for another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh geeze sorry. Apparently I didn't run the tests. I've fixed this (and run the tests this time).
I forgot to call `t.end()` 🤦 The now outdated "ß" -> "ss" was added to the "remove accents from string" test case as part of 1fe0b90 It seems like maybe the misunderstanding was that the string contained every to-be-sanitized character, but that's not true. Since ß now has it's own unit test, I've removed it from the "remove accents from string" test.
Thanks again, @michaelkirk 🙌
I guess folks can manually |
As requested at #12 (comment), ß should not be replaced.
Originally discussed in #12, this behavior regressed at #31
Since someone requested #31 in the first place, it's likely that this change will surprise at least one person (/cc @gollenia), and probably others.
Is there a recommended alternative for the behavior that people like @gollenia want? Maybe something like https://github.com/anyascii/anyascii ? (disclaimer: I've never used it)