Releases: apparebit/demicode
v1.4: Ready for Unicode 16.0
This release updates Demicode with support for the upcoming release of Unicode 16.0. That includes the ability to run with prerelease data in general and to run code generation without requiring full access to the Unicode character database files (which creates a circular dependency and results in a crash).
Unicode 16.0 again makes substantial changes to the definition of grapheme clusters. Nonetheless, Demicode's implementation of grapheme cluster breaking passed all updated tests without requiring any changes. I see that as validation of Demicode's approach, which uses a clever encoding of Unicode properties as Unicode letters and a straight-forward regular expression obtained by applying the encoding to the rules from Unicode Standard Annex #29 on text segmentation.
Since the preliminary files for version 16.0 of the Unicode Character Database have already been posted on Unicode's website, you too can run Demicode 1.4 with the prerelease data. Just add the --ucd-version 16.0.0
option on the command line. Without that option, Demicode continues to default to Unicode 15.1—until the next weekly update check after the release of Unicode 16.0. By contrast, Demicode 1.3 fails with an error declaring that Unicode 16.0 is "from future." Well, with Demicode 1.4, the future is now! 🎉
v1.3: Easier experiments and a bug fix
This release greatly simplifies running demicode across several popular terminal emulators, at least on macOS. It also fixes #1.
v1.2 gains benchmarking, improves mirroring and testing
With this release, demicode gains the ability to benchmark page rendering. Initial results for nine terminal applications suggest that all of them are reasonably fast at rendering styled text, taking 4–9ms for a 120×40 page on a four-year-old macOS laptop. But when demicode queries the terminal for the current column (once each for 38 of those 40 lines), the spread of average latencies explodes to 10-946ms. Judging by these results, it seems that a few terminals strongly oversell their nimbleness.
This release also improves the mirroring of UCD and CLDR data, introducing a from the ground rewrite that uses an explicit manifest to track what data has been mirrored. To see for yourself, --ucd-list-versions
lists the UCD versions included in the current mirror. The implementation also is more structured and performs more aggressive error checking. As of today, demicode is using GitHub actions for CI, which hopefully ensures that demicode releases become only more robust.
v1.1: A critical bug fix, a nice-to-have feature, and better tooling
User-Visible Changes
This release makes the following major changes:
- It fixes a crashing bug for mirrored CLDR files.
- It improves terminal input/output, notably by
--incrementally
/-I
displaying character blots. That does markedly slow down tool output. But it also allows for measuring the size of character blots by querying the terminal.
Internal Changes
This release also makes significant internal changes. Notably, the UCD implementation is becoming more uniform and more decoupled. The long-term goal is to provide a generally useful UCD abstraction that may not be the fastest but has excellent support for exploratory coding against the UCD.
The development setup has also been updated. Instead of mypy, demicode now uses pyright for type-checking. In my experience, pyright is more accurate than mypy for the same annotations. It has also surfaced two very subtle bugs. They both are fixed.
The runtest.py
script runs both type checker and unit tests. Tests are based on Python's unittest
package because I find pytest
too invasive and too magical, which always ends up interfering with tests in the long term. Unfortunately, unittest
is rather baroque and hard to extend because (1) its interfaces are too wide and (2) it hides critical state. The test.runtime
module introduces adapter classes that fix these issues for unittest.TestCase
and unittest.TestResult
. The test script uses them to provide more readable and helpful output.
v1.0 Demicode Is All Grown Up 🎉
This version adds support for Unicode 15.1. Notably, it incorporates the changes to the grapheme cluster breaking algorithm, which changed substantially since Unicode 15.0. The changes are automatically activated when UnicodeCharacterDatabase
is instantiated with 15.1 and they are effectively no-ops for 15.0 and earlier.
The --stats
option now prints the bit-width for Unicode properties, too. It also includes data on code points that have non-default values for both the Indic_Conjunct_Break
and Grapheme_Cluster_Break
properties. Such overlap matters because both properties help determine grapheme cluster breaks. If feasible, integrating both into the same enumeration with single letter enumeration constant values simplifies the implementation of the break algorithm significantly.
v1.0.b1 A Better UI, Refactored Unicode Database
Demicode's user experience is much improved: It now pages back and forth. On Linux and macOS it only takes a keypress—take your pick: ‹left›
/‹right›
, b
/f
, p
/n
, ‹tab›
/‹shift-tab›
, ‹space›
/‹delete›
—to select the next page. For now, Windows still requires you to type a letter, backward
/forward
, previous
/next
work too, and then follow the letter or command with ‹return›
. Though ‹return›
by itself continues to page forward as well.
This release has been tested with all known Unicode versions from 4.1 forward and does run with them. It also removes several unused Unicode properties that are likely to remain so and introduces several more, which will be needed for implementing grapheme cluster breaks according to the revised Unicode 15.1 algorithm.
The new --with-ucd-extended-pictographic
command line option blots all characters that have the Extended_Pictographic property, including unassigned ones. Since that's quite the mouthful and the set of characters especially important for fixed-width rendering, the much shorter -x
works, too. Similarly, --with-curation
has -q
as an alias.
Internally, this release incorporates a significant refactor of the code for loading Unicode Character Database files. Much of the clutter and boilerplate has been eliminated, since I finally found a pattern that is both simple and also flexible enough to accommodate the loading of most files: It requires two lines, one for the context manager that mirrors and opens the file and one for the parser, with a callback constructing the desired datatype. The global UCD
singleton instance has been eliminated as well. A direct beneficiary is statistics collection with --stats
: It now uses its own private instance and can hence print counts for both the unoptimized and optimized internal representation in one run.
There are no more features to add nor modules to refactor. At least no in the short term. Once Unicode 15.1 has been released, I'll update the grapheme cluster breaking algorithm to account for Indic syllables as well. So please consider this first beta more or less a release candidate for the big 1.0.0, too.
v0.7 Approaching 1.0
Starting with this release, demicode clearly distinguishes between user errors and unexpected exceptions, even if it internally uses exceptions for both. For the former, it only prints the error message. For the latter, it also prints an exception trace and points to the issue tracker. Demicode's output of statistics with the --stats
option has been significantly improved as well.
The test script has been modularized using Python's builtin unittest
module. You can run tests with ./runtest.py
or with Visual Studio Code, the latter thanks to the configuration in .vscode/settings.json
. In preparation of the release of Unicode 15.1, the versions for code generation have been locked down. In particular, testing grapheme cluster breaks now is specific to Unicode 15.0, since 15.1 updates the algorithm.
v0.6 Handle Older Unicode Versions
Demicode won't crash when ingesting UCD files from Unicode versions before 13.0.0 any more. The lack of some information and the presence of outdated property values are now gracefully ignored.
This release also changes how unassigned code points and sequences of more than one grapheme cluster are handled. Assuming that they may just be valid for some future version of the Unicode standard than the currently active one, demicode now elides blots for them and adds an explanatory note instead of the (non-existent) name.
v0.5 Faster UCD look ups
In addition to considerable clean-up of demicode's internal code, the tool now optimizes UCD data for faster look ups. Several of the --with-…
selections have been improved. In particular --with-version-oracle
now displays exactly one emoji per detectable Unicode version.
v0.4 Make Mirroring and Width Computation Great Again
- This release fixes a bug in the URL creation logic for mirroring and now mirrors UCD and CLDR files to the operating system's cache directory.
- Furthermore, it significantly streamlines the computation of grapheme cluster width, which now takes all emoji into account. That yields significantly better and more consistent results than the wcwidth solely based on Unicode's East Asian Width.
- Finally, this release further modularizes the code, with the mirroring logic now in its own module.