Skip to content

Commit

Permalink
Bump CLDR version to 2019-10-06
Browse files Browse the repository at this point in the history
Mainly keyword additions for some languages.
  • Loading branch information
Mange committed Oct 6, 2019
1 parent c7ecc89 commit 6eab084
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion cldr
Submodule cldr updated 833 files

6 comments on commit 6eab084

@rainypixels
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out that this bump also updates to unicode 13.0 (CLDR Release 36) which is still not supported by major platforms. Is there any way to specify the CLDR release 35.1? I’ve been playing around with this but have had no luck... 😞

@Mange
Copy link
Owner Author

@Mange Mange commented on 6eab084 Apr 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the CLDR has numbered releases like that.

If there was a way to determine the "version" a emoji was added in, it would be great to have that output in the data files used by this repo to build different reports for different versions.

For you to solve your specific problem, the only thing I can think of is bisecting the cldr repo and lock yourself to that version and then regenerate the data files.

@rainypixels
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the CLDR has numbered releases like that.

Actually, I poked around a bunch, and it turns out that they do. They use tagged releases. I tried to modify the gitmodules file to point to a tag, but that turned out to be a rabbit hole since submodules support only branches (not even SHAs).

For you to solve your specific problem, the only thing I can think of is bisecting the cldr repo and lock yourself to that version and then regenerate the data files.

That’s actually exactly what I did. For posterity I’ll share it here in case someone else comes looking. I just pulled the emoji-test.txt file from the release-35-1 tag on the CLDR repo, removed the comments at the beginning and end, and replaced the local version. I also commented out the keyword merging code in compile.rb since that code was now merging in keywords from different languages because the annotation files were from a later branch. I imagine replacing the two annotations files with those from the tagged release would resolve that issue; I merge in other keywords myself downstream that are far better than the official keywords, so I didn’t bother. And then make all.

If there was a way to determine the "version" a emoji was added in, it would be great to have that output in the data files used by this repo to build different reports for different versions.

I suppose technically there is but it’d be tedious and would involve diffing release tags. It’d probably suffice to generate the diffs from the end user data files Unicode provides: https://www.unicode.org/Public/emoji/. I’ve seen a couple of repos provide the version in their own variants of emoji JSONs.

Anyhow, hope that helps! Thanks for your quick response.

@Mange
Copy link
Owner Author

@Mange Mange commented on 6eab084 Apr 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be possible to just stop using submodules and instead downloading named releases. It could even be an input to the scripts so you get to pick a version.

I merge in other keywords myself downstream that are far better than the official keywords

Is this something you would mind elaborating on?

@rainypixels
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be possible to just stop using submodules and instead downloading named releases. It could even be an input to the scripts so you get to pick a version.

Totally. If ya ever get bored 😛

Is this something you would mind elaborating on?

Of course. The emoji JSON your repo provides is actually an input for our own generator. Our generator takes the raw JSON, and spits out a Swift enum that provides a bunch of extra functionality like dominant background colors for emoji, etc. that our app needs. We were using the keywords to provide instant search on our emoji picker screen, but as you probably already know, the keywords are often completely useless because they don’t map well to any colloquialisms. So a few months ago, we started generating our own keywords (particularly for popular emoji), and it’s turned into another input for our generator. Turns out this is the approach Apple and Whatsapp use as well to power their own emoji search. I’m surprised this hasn’t been standardized yet considering the requirements and needs across platforms and languages are fairly uniform.

@Mange
Copy link
Owner Author

@Mange Mange commented on 6eab084 Apr 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for elaborating. I'm happy that this repo was useful for you, and I concur in that I wish more and better sources of keywords were present.

Please let me know if you ever find any open data related to it (and I shall do the same) so I could possibly integrate it here.

Please sign in to comment.