-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
U+1F6xx block: emoticons and dingbats #18
base: master
Are you sure you want to change the base?
Conversation
Yes, emoticons are problematic. I would try to be consistent with the textual description provided by Unicode. e.g. have all emoticons where description says "open mouth" use "D" etc. In your patch for example U+1F603 "Smiling face with open mouth" is Regarding other emojis, I agree colon codes seem to be the best solution. Problems I see are:
But honestly I don't see any other solution. They seem to be the de-facto way people put emojis into plain ASCII. |
Sorry it's taken so long for me to follow up on your comments, but I appreciate them a lot. Making some standards on the emoticons, matching representations based on the descriptions seems to be a good idea. We can base the more graphic-oriented emojis similarly, making up colon syntaxes based on Unicode name rather than any specific application. My only problem is that the name of the character can be rather verbose and long. As an example, 🖖 is represented in Keybase with |
I haven't seen actual Unicode names used in this way. I agree that using them for colon codes wouldn't be the best. The de-facto standard seems to be "short codes", like listed here: https://www.webpagefx.com/tools/emoji-cheat-sheet I don't think these are condoned by Unicode and I don't know where they originally came from. Some software library or Wordpress perhaps? I seem to remember seeing a page that listed differences in these codes between different services, but I can't find it at the moment. |
Unidecode already translates based on romanisation and pronunciation of some foreign characters, and I believe the intention of this library is to convey representation - after all, our output is ambiguous at best, as the actual truth lies within the original unicode literal. For this reason, I think the best effort approach is to convey the clearest meaning, and I would suggest the cheat-sheet that @avian2 linked, as the shortcodes described there are a de-facto implementation used by most chat clients. There is nothing to say we can't change this in future, right? |
May I suggest not using smiley faces made of punctuation? IMO shortcodes like
|
On the other hand, The problem of a smiley possibly merging with an adjacent word can be solved by surrounding them with leading and/or trailing space. Unidecode already does that for some symbols. |
Ah, I assumed it was targeting English the whole time, since it mentions the US keyboard layout in the readme, and also this: https://github.com/avian2/unidecode/blob/master/unidecode/x033.py But fair enough I guess. |
I wouldn't like to target English any more than the fact that Unidecode transliterates into a character set with an US origin. US keyboard layout is mentioned because that is the most common layout used to enter ASCII text. I used it as an illustration of what problem Unidecode tries to solve - imaging a person trying to enter non-English words into a computer that only accepts ASCII through an American keyboard. I am not familiar with the U+33xx Unicode page you mention. Codepoint descriptions suggest these represent those specific English words (I'm guessing for use in Japanese text?) To be honest, I see no perfect solution for emojis, and Unidecode is about compromise. I think English short-codes as discussed above would be a good start. |
The emoticons might end up being especially contentious, with styles and opinions both varying wildly, and I've only tried to replicate ones that seem to be pretty simple.
For the emoji that aren't really representable in ASCII, I'm not sure what should be done. Maybe just the colon-sandwiched codes as seen in some messengers and GitHub? Eg,
:cheese_wedge:
or:wolf
: