Replies: 5 comments 4 replies
-
Yes, that's correct. The problem is that most emojis are actually multi-byte Unicode characters. This is also the case with many non-English alphabets. My proposal for a solution to this is that instead of treating the alphabet as a long string, it might be favorable to request the use to input the alphabet as a list of strings, i.e. It's easy to transform a plain language alphabet string to an Array by invoking The only caveat is that in such a case one of two statements about the alphabet need to be true:
In practice, this is not a problem for regular alphabets as used with Hashids, or even with alphabets containing any multibyte unicode characters, which guarantee uniqueness. If we don't care about multi-character "letters", the simplest way to solve this is to just disallow them. Alternatively, if we want to support multi-character letters e.g. for token usage, we could add a simple const sqids = new Sqids({ alphabet: { tokens: ANIMAL_NAMES, separator: '-' } })
const result = sqids.encode([1, 2, 3])
result === 'snake-octopus-zebra-elephant-rabbit' or const sqids = new Sqids({ alphabet: EMOJIS })
const result = sqids.encode([1, 2, 3])
result === `🔥🐈⬛✈️😮🧠` |
Beta Was this translation helpful? Give feedback.
-
@niieani, I see, thank you for clarifying. At first glance, I don't see it as a difficult change to accept an array of chars/strings in the constructor, as opposed to a single string of alphabet (other than typing/formatting for the custom alphabet end users). I guess my concerns are regarding the possibilities + complexities that this opens up. But first, I did a quick search to see if anyone had similar issues, so far I've found these:
[Maybe there's more?] So far I see that encoding/decoding emojis would be possible, but I'm wondering what the practical use-case for this is. Right now, v2 advertises support for avoiding most common profanity words (because it'd be unfortunate if they showed up randomly in the URL). I'm a bit worried that if we claim to officially support Unicode characters, people might assume we have baked in a solution to treat characters like Regarding word tokens: if we support IDs like Generally, I do see it as a plus when random codebases support Unicode, but for this library I'm a bit concerned for the complexities this opens up and the use-cases it encourages. Therefore, let me ask this - what's the one practical use-case where supporting Unicode would make this library better (& justify the complexity)? |
Beta Was this translation helpful? Give feedback.
-
I would personally NOT put UTF8 as a direct requirement to the library, since it's quirky and cumbersome to implement in "classic" programming languages, and most people will never actually use it. It would additionally add more ambiguity (i.e. Using custom tokens (multichar strings) and separators instead of just characters is one other thing I'd happily avoid - again, no library in the world can (and SHOULDN'T) accommodate all possible needs. Think of it the UNIX way - do one thing but do it good. As for the general UTF8 support, I plan to actually have it by using a few preprocessor directives, which would allow for enabling/disabling it on compile time (I ported a micro-UTF8 library a few years ago while working on a Lua propject.) |
Beta Was this translation helpful? Give feedback.
-
@niieani thoughts on any of the above? My impression is that unicode requests are mainly for emoji shuffling. I'm trying to figure out if there's an important use-case I'm missing... |
Beta Was this translation helpful? Give feedback.
-
Closing for now. Let's revisit when we get more feedback. |
Beta Was this translation helpful? Give feedback.
-
Carrying over discussion of unicode from https://github.com/orgs/hashids/discussions/92#discussioncomment-6223192
@niieani, I want to be sure I understand the problem correctly first. If the alphabet was 1 string of many emojis of different lengths, was the issue that the length of the string did not match the number of emojis (which I assume broke encoding/decoding)?
Beta Was this translation helpful? Give feedback.
All reactions