This release updates Demicode with support for the upcoming release of Unicode 16.0. That includes the ability to run with prerelease data in general and to run code generation without requiring full access to the Unicode character database files (which creates a circular dependency and results in a crash).
Unicode 16.0 again makes substantial changes to the definition of grapheme clusters. Nonetheless, Demicode's implementation of grapheme cluster breaking passed all updated tests without requiring any changes. I see that as validation of Demicode's approach, which uses a clever encoding of Unicode properties as Unicode letters and a straight-forward regular expression obtained by applying the encoding to the rules from Unicode Standard Annex #29 on text segmentation.
Since the preliminary files for version 16.0 of the Unicode Character Database have already been posted on Unicode's website, you too can run Demicode 1.4 with the prerelease data. Just add the --ucd-version 16.0.0
option on the command line. Without that option, Demicode continues to default to Unicode 15.1—until the next weekly update check after the release of Unicode 16.0. By contrast, Demicode 1.3 fails with an error declaring that Unicode 16.0 is "from future." Well, with Demicode 1.4, the future is now! 🎉