Compact data from the Unicode Common Locale Data Repository
For anyone interested, I just dumped most of the CLDR data in a compact way (see provided CLDR.INI file).
The final data for all languages is 13,727,497 bytes, but still highly compressible, as seen below.
BSC: 1,023,137 bytes, ratio=92.5468% enctime=1211822us dectime=757387us
BROTLI: 1,287,148 bytes, ratio=90.6236% enctime=67236011us dectime=50243us
LZMA25: 1,369,212 bytes, ratio=90.0258% enctime=3961970us dectime=98609us
LZIP: 1,369,811 bytes, ratio=90.0214% enctime=3895218us dectime=131528us
LZMA20: 1,423,667 bytes, ratio=89.6291% enctime=3334007us dectime=103905us
MINIZ: 1,892,977 bytes, ratio=86.2103% enctime=697077us dectime=31830us
ZSTD: 2,108,053 bytes, ratio=84.6436% enctime=65694us dectime=34525us
LZ4HC: 2,151,652 bytes, ratio=84.326% enctime=491871us dectime=13641us
LZ4: 2,918,991 bytes, ratio=78.7362% enctime=37851us dectime=13775us
RAW: 13,727,497 bytes, ratio=0% enctime=16242us dectime=7658us
This is what is currently processed from the CLDR repos:
- skipped
- extracted
cldr-core/supplemental/
- aliases.json
- calendarData.json
- calendarPreferenceData.json
- characterFallbacks.json
- codeMappings.json
- currencyData.json
- gender.json
- languageData.json
- languageMatching.json
- likelySubtags.json
- measurementData.json
- metaZones.json
- numberingSystems.json
- ordinals.json
- parentLocales.json
- plurals.json
- primaryZones.json
- references.json
- telephoneCodeData.json
- territoryContainment.json
- territoryInfo.json (interesting!)
- timeData.json
- weekData.json
- windowsZones.json
cldr-dates-modern\main\xx-XX
- ca-generic.json
- ca-gregorian.json
- dateFields.json
- timeZoneNames.json
cldr-localenames-modern\main\xx-XX
- languages.json
- localeDisplayNames.json
- scripts.json
- territories.json
- transformNames.json
- variants.json
cldr-misc-modern\main\xx-XX
- characters.json
- contextTransforms.json
- delimiters.json
- layout.json
- listPatterns.json
- posix.json
cldr-numbers-modern\main\xx-XX
- currencies.json
- numbers.json
cldr-segments-modern\segments\xx-XX
- suppressions.json
- CLDR, public domain.
- http://unicode.org/copyright.html, original license for the data.