Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Noto Color Emoji Migration #554

Merged
merged 13 commits into from
Mar 18, 2023
Merged

Noto Color Emoji Migration #554

merged 13 commits into from
Mar 18, 2023

Conversation

sunkhaskasis
Copy link
Contributor

@sunkhaskasis sunkhaskasis commented Feb 22, 2023

This is my attempt at migrating Twitch Downloader from Twemoji to Google Noto Color Emoji. I tried my best to do it, but of course there are certain things I couldn't fix.

Here are some tests with emoji I did, shamelessly stolen from https://getemoji.com/.

Normal Emoji

smileys and gestures
food and sport
travel and objects
flags and symbols

Skin Tones

image

And now, for the notes and bugs:

  • Right now at least, Noto is bulkier than Twemoji by ~8 MB. I've trimmed as much as I could, leaving only 72x72 images. If you could strip it down a bit more, that'd be great.
  • Emoji 14 emoji don't render, for some reason. Maybe it has something to do with the regex string, but the archive does contain images needed for them.
  • On their own, hair components and skin modifiers (🏻🏼🏽🏾🏿🦰🦱🦲🦳) don't render. The images for them do exist, however.
  • Numbers (0️⃣ 1️⃣ 2️⃣ 3️⃣ 4️⃣ 5️⃣ 6️⃣ 7️⃣ 8️⃣ 9️⃣ 🔟) render as partially broken.
  • Emoji that are supposed to include the FE0F character sometimes render with an extra empty character near them (this is just my hypothesis, I don't actually know what's causing it).

Here is the JSON file I've used to test the emoji. It's rather messy since I just took the order from getemoji, but it should contain everything, including Emoji 14 emoji + hair components and skin modifiers on their own.
https://www.dropbox.com/s/w9tugy84ydlbpoc/emojitest.json?dl=1

I'll also include an empty comment JSON, just so that you could test individual emoji on your own without having to render everything.
https://www.dropbox.com/s/v820dshjohb2zyc/emojitest_blank.json?dl=1

@ScrubN
Copy link
Collaborator

ScrubN commented Feb 22, 2023

Emoji that are supposed to include the FE0F character sometimes render with an extra empty character near them (this is just my hypothesis, I don't actually know what's causing it).

That was indeed the issue for some, though there are more than one variation character that also exhibit the behavior.

Emoji 14 emoji don't render, for some reason. Maybe it has something to do with the regex string, but the archive does contain images needed for them.

This is also the case for twemoji. Once again the issue is probably an artifact of Span<T>.StartsWith() not being the same as String.StartsWith() for emojis. I'll try reverting back to strings and see if anything changes, it's probably worth the increased memory usage to users.

Numbers (0️⃣ 1️⃣ 2️⃣ 3️⃣ 4️⃣ 5️⃣ 6️⃣ 7️⃣ 8️⃣ 9️⃣ 🔟) render as partially broken.

They render just fine with Twemoji, probably means that the filename isn't the same in Noto Color?

Copy link
Collaborator

@ScrubN ScrubN left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The README needs to be updated to say Noto Color Emojis instead of Twemoji

@ScrubN ScrubN linked an issue Feb 22, 2023 that may be closed by this pull request
2 tasks
@sunkhaskasis
Copy link
Contributor Author

sunkhaskasis commented Feb 22, 2023

They render just fine with Twemoji, probably means that the filename isn't the same in Noto Color?

Yeah, you're right. Here's how they're named in Twemoji:
image

And here's Noto (and it appears there's actually two different sets):
image
Don't mind the "emoji_u" prefix, I just took it from the unchanged repo. The one I've embedded loosed it.

@ScrubN
Copy link
Collaborator

ScrubN commented Feb 23, 2023

I think you're gonna love this.

1.51.1:
image
image

Using a proper fix:
image
image

Yes there are a few issues here and there left to tackle but its MUCH better.

@ScrubN
Copy link
Collaborator

ScrubN commented Feb 23, 2023

Emoji 14 emoji don't render, for some reason. Maybe it has something to do with the regex string, but the archive does contain images needed for them.

The issue seems to be that NeoSmart.Unicode does not contain a definition for emojis above Unicode 13.0

@sunkhaskasis
Copy link
Contributor Author

Emoji 14 emoji don't render, for some reason. Maybe it has something to do with the regex string, but the archive does contain images needed for them.

The issue seems to be that NeoSmart.Unicode does not contain a definition for emojis above Unicode 13.0

I made a draft PR in their repo today where I attempt to update it to use emoji from Unicode 14.0. Hopefully, the maintainers will be able to check it out.

neosmart/unicode.net#22

@ScrubN
Copy link
Collaborator

ScrubN commented Feb 24, 2023

I managed to fix the broken skin tones on some specific emojis
image
image

However some flags have stopped working correctly and I'm not even remotely sure why. The English, Scottish, and Wales flags have very different sequence encodings and yet emojiString.StartsWith(emoji.Sequence.AsString) seems to mix them up. Clearly at some point they did work but I have no clue what I did differently that worked, and also basically every other compound emoji is broken in that render.
image
image

I think I'll settle for this right now as its the least broken :p

@sunkhaskasis
Copy link
Contributor Author

sunkhaskasis commented Feb 24, 2023

I managed to fix the broken skin tones on some specific emojis image image

However some flags have stopped working correctly and I'm not even remotely sure why. The English, Scottish, and Wales flags have very different sequence encodings and yet emojiString.StartsWith(emoji.Sequence.AsString) seems to mix them up. Clearly at some point they did work but I have no clue what I did differently that worked, and also basically every other compound emoji is broken in that render. image image

I think I'll settle for this right now as its the least broken :p

Have you tested the skin and hair modifiers (🏻🏼🏽🏾🏿🦰🦱🦲🦳) on their own? Last time they didn't work for me.

@ScrubN
Copy link
Collaborator

ScrubN commented Feb 26, 2023

Have you tested the skin and hair modifiers (🏻🏼🏽🏾🏿🦰🦱🦲🦳) on their own? Last time they didn't work for me.

Yes but seems to use system emojis.
image

@sunkhaskasis
Copy link
Contributor Author

sunkhaskasis commented Feb 27, 2023 via email

@Ashdemai
Copy link

Ashdemai commented Mar 5, 2023

Is this related to my issue #578 (comment) and why I can't render chat?

@sunkhaskasis
Copy link
Contributor Author

Is this related to my issue #578 (comment) and why I can't render chat?

No, this PR hasn't even been merged yet. Your issue is something different.

@ScrubN
Copy link
Collaborator

ScrubN commented Mar 7, 2023

I just made some improvements to emoji fetching so you'll have to sort that out, sorry.

I was also thinking maybe we could make the emoji type customizable during the migration. That would mean making a separate function for Noto and then can just do an inline if inside the renderer fetch function

@sunkhaskasis
Copy link
Contributor Author

I just made some improvements to emoji fetching so you'll have to sort that out, sorry.

Did it fix those flags?

I was also thinking maybe we could make the emoji type customizable during the migration. That would mean making a separate function for Noto and then can just do an inline if inside the renderer fetch function

I mean, it's not like this PR is gonna go anywhere until the newer NuGet package of Unicode.net is released, which might take some time.

@ScrubN
Copy link
Collaborator

ScrubN commented Mar 7, 2023

I just made some improvements to emoji fetching so you'll have to sort that out, sorry.

Did it fix those flags?

No, just switching from reading everything into memory to streaming directly to the disk, streaming directly to SKBitmap.Decode. I might have an idea to fix those flags but there's no telling if it'll actually work.

@ScrubN
Copy link
Collaborator

ScrubN commented Mar 7, 2023

83680e4 ce19f7d I managed to fix the emoji flags by using ordinal comparison specifically on flags, it breaks some other emojis, and I also had the brilliant idea to use a parallelized LINQ query which brought emojitest.json from 1.95x to ~4x (1st run)/4.5x (3rd run)

@sunkhaskasis
Copy link
Contributor Author

sunkhaskasis commented Mar 11, 2023

I just made some improvements to emoji fetching so you'll have to sort that out, sorry.

OK, I think I was able to resolve the conflicts. LMK if something is wrong, it's been ages since I've done it. Also, I'm not opposed to having both Twemoji and Noto, if that's what you meant by customizable emoji type, but my personal preference would be to keep Noto as the default vendor, as it's being maintained unlike Twemoji.

@ScrubN
Copy link
Collaborator

ScrubN commented Mar 12, 2023

I think you accidentally undid everything you changed in TwitchHelper.cs. I can help revert that if you'd like

@sunkhaskasis
Copy link
Contributor Author

Yes, please do. I haven't done any major Git things in years now.

@sunkhaskasis
Copy link
Contributor Author

sunkhaskasis commented Mar 12, 2023

Thanks. I believe it can be merged now. We'll take care of Unicode.net 14 later, as it might take a few more months.

@sunkhaskasis sunkhaskasis marked this pull request as ready for review March 12, 2023 00:32
@ScrubN
Copy link
Collaborator

ScrubN commented Mar 12, 2023

Pushing that to the PR from my local editor instead of vscode.dev was a bit of a pain. The changes I made:

  • Adds the ability to switch vendors
  • Fixed a bug where emojis would not render on the first run with a fresh cache, which definitely confused me while adding the switching
  • Reduced the noto-emoji zip by about 300KB

The noto-emoji zip will result in a near doubling in the size for the CLI which made me think maybe we should stop embedding the emojis inside the binary and instead fetch it from this git repository when it is required. This also means no emojis when rendering offline with a fresh cache, but I personally think it's worth the smaller initial download size. Does anyone else agree or disagree?

@ScrubN
Copy link
Collaborator

ScrubN commented Mar 15, 2023

For the hell of it, I tried compressing the emojis zips as .tar.gz and got much better results than zip. The only problem is native TAR support is net7+, but upgrading to net7 severely impacted render performance seemingly unpredictably. I guess this can just be planned for the future?

@ScrubN
Copy link
Collaborator

ScrubN commented Mar 18, 2023

I'll merge this for now I guess since nobody commented about fetching emojis at runtime. I'll look into .tar.gz archives when we upgrade to NET7, and hopefully we can discuss runtime emoji fetching then too.

@ScrubN ScrubN merged commit 05b4c33 into lay295:master Mar 18, 2023
@sunkhaskasis sunkhaskasis deleted the noto-emoji branch March 18, 2023 20:04
@ScrubN
Copy link
Collaborator

ScrubN commented Mar 18, 2023

+23MB for google noto emojis, that hurts. I really want to implement downloading them at runtime instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate to other emoji vendor
3 participants