Releases · apparebit/demicode

23 Aug 21:28

147df91

Latest

This release updates Demicode with support for the upcoming release of Unicode 16.0. That includes the ability to run with prerelease data in general and to run code generation without requiring full access to the Unicode character database files (which creates a circular dependency and results in a crash).

Unicode 16.0 again makes substantial changes to the definition of grapheme clusters. Nonetheless, Demicode's implementation of grapheme cluster breaking passed all updated tests without requiring any changes. I see that as validation of Demicode's approach, which uses a clever encoding of Unicode properties as Unicode letters and a straight-forward regular expression obtained by applying the encoding to the rules from Unicode Standard Annex #29 on text segmentation.

Since the preliminary files for version 16.0 of the Unicode Character Database have already been posted on Unicode's website, you too can run Demicode 1.4 with the prerelease data. Just add the --ucd-version 16.0.0 option on the command line. Without that option, Demicode continues to default to Unicode 15.1—until the next weekly update check after the release of Unicode 16.0. By contrast, Demicode 1.3 fails with an error declaring that Unicode 16.0 is "from future." Well, with Demicode 1.4, the future is now! 🎉

Assets 4

07 Jan 22:27

apparebit

v1.3

5d153fb

v1.3: Easier experiments and a bug fix

This release greatly simplifies running demicode across several popular terminal emulators, at least on macOS. It also fixes #1.

Assets 3

30 Oct 20:59

apparebit

v1.2

c36ac0a

v1.2 gains benchmarking, improves mirroring and testing

With this release, demicode gains the ability to benchmark page rendering. Initial results for nine terminal applications suggest that all of them are reasonably fast at rendering styled text, taking 4–9ms for a 120×40 page on a four-year-old macOS laptop. But when demicode queries the terminal for the current column (once each for 38 of those 40 lines), the spread of average latencies explodes to 10-946ms. Judging by these results, it seems that a few terminals strongly oversell their nimbleness.

This release also improves the mirroring of UCD and CLDR data, introducing a from the ground rewrite that uses an explicit manifest to track what data has been mirrored. To see for yourself, --ucd-list-versions lists the UCD versions included in the current mirror. The implementation also is more structured and performs more aggressive error checking. As of today, demicode is using GitHub actions for CI, which hopefully ensures that demicode releases become only more robust.

Assets 3

17 Oct 21:33

apparebit

v1.1

de5d7c9

v1.1: A critical bug fix, a nice-to-have feature, and better tooling

User-Visible Changes

This release makes the following major changes:

It fixes a crashing bug for mirrored CLDR files.
It improves terminal input/output, notably by --incrementally/-I displaying character blots. That does markedly slow down tool output. But it also allows for measuring the size of character blots by querying the terminal.

Internal Changes

This release also makes significant internal changes. Notably, the UCD implementation is becoming more uniform and more decoupled. The long-term goal is to provide a generally useful UCD abstraction that may not be the fastest but has excellent support for exploratory coding against the UCD.

The development setup has also been updated. Instead of mypy, demicode now uses pyright for type-checking. In my experience, pyright is more accurate than mypy for the same annotations. It has also surfaced two very subtle bugs. They both are fixed.

The runtest.py script runs both type checker and unit tests. Tests are based on Python's unittest package because I find pytest too invasive and too magical, which always ends up interfering with tests in the long term. Unfortunately, unittest is rather baroque and hard to extend because (1) its interfaces are too wide and (2) it hides critical state. The test.runtime module introduces adapter classes that fix these issues for unittest.TestCase and unittest.TestResult. The test script uses them to provide more readable and helpful output.

Assets 3

19 Sep 16:13

apparebit

v1.0

605f42a

v1.0 Demicode Is All Grown Up 🎉

This version adds support for Unicode 15.1. Notably, it incorporates the changes to the grapheme cluster breaking algorithm, which changed substantially since Unicode 15.0. The changes are automatically activated when UnicodeCharacterDatabase is instantiated with 15.1 and they are effectively no-ops for 15.0 and earlier.

The --stats option now prints the bit-width for Unicode properties, too. It also includes data on code points that have non-default values for both the Indic_Conjunct_Break and Grapheme_Cluster_Break properties. Such overlap matters because both properties help determine grapheme cluster breaks. If feasible, integrating both into the same enumeration with single letter enumeration constant values simplifies the implementation of the break algorithm significantly.

Assets 3

12 Sep 21:06

apparebit

v1.0.b1

c5b1496

v1.0.b1 A Better UI, Refactored Unicode Database

Demicode's user experience is much improved: It now pages back and forth. On Linux and macOS it only takes a keypress—take your pick: ‹left›/‹right›, b/f, p/n, ‹tab›/‹shift-tab›, ‹space›/‹delete›—to select the next page. For now, Windows still requires you to type a letter, backward/forward, previous/next work too, and then follow the letter or command with ‹return›. Though ‹return› by itself continues to page forward as well.

This release has been tested with all known Unicode versions from 4.1 forward and does run with them. It also removes several unused Unicode properties that are likely to remain so and introduces several more, which will be needed for implementing grapheme cluster breaks according to the revised Unicode 15.1 algorithm.

The new --with-ucd-extended-pictographic command line option blots all characters that have the Extended_Pictographic property, including unassigned ones. Since that's quite the mouthful and the set of characters especially important for fixed-width rendering, the much shorter -x works, too. Similarly, --with-curation has -q as an alias.

Internally, this release incorporates a significant refactor of the code for loading Unicode Character Database files. Much of the clutter and boilerplate has been eliminated, since I finally found a pattern that is both simple and also flexible enough to accommodate the loading of most files: It requires two lines, one for the context manager that mirrors and opens the file and one for the parser, with a callback constructing the desired datatype. The global UCD singleton instance has been eliminated as well. A direct beneficiary is statistics collection with --stats: It now uses its own private instance and can hence print counts for both the unoptimized and optimized internal representation in one run.

There are no more features to add nor modules to refactor. At least no in the short term. Once Unicode 15.1 has been released, I'll update the grapheme cluster breaking algorithm to account for Indic syllables as well. So please consider this first beta more or less a release candidate for the big 1.0.0, too.

Assets 3

07 Sep 02:30

apparebit

v0.7

f05419f

v0.7 Approaching 1.0

Starting with this release, demicode clearly distinguishes between user errors and unexpected exceptions, even if it internally uses exceptions for both. For the former, it only prints the error message. For the latter, it also prints an exception trace and points to the issue tracker. Demicode's output of statistics with the --stats option has been significantly improved as well.

The test script has been modularized using Python's builtin unittest module. You can run tests with ./runtest.py or with Visual Studio Code, the latter thanks to the configuration in .vscode/settings.json. In preparation of the release of Unicode 15.1, the versions for code generation have been locked down. In particular, testing grapheme cluster breaks now is specific to Unicode 15.0, since 15.1 updates the algorithm.

Assets 3

05 Sep 17:58

apparebit

v0.6

c7c39cb

v0.6 Handle Older Unicode Versions

Demicode won't crash when ingesting UCD files from Unicode versions before 13.0.0 any more. The lack of some information and the presence of outdated property values are now gracefully ignored.

This release also changes how unassigned code points and sequences of more than one grapheme cluster are handled. Assuming that they may just be valid for some future version of the Unicode standard than the currently active one, demicode now elides blots for them and adds an explanatory note instead of the (non-existent) name.

Assets 3

04 Sep 14:32

apparebit

v0.5

1a6766d

v0.5 Faster UCD look ups

In addition to considerable clean-up of demicode's internal code, the tool now optimizes UCD data for faster look ups. Several of the --with-… selections have been improved. In particular --with-version-oracle now displays exactly one emoji per detectable Unicode version.

Assets 3

02 Sep 00:56

apparebit

v0.4

0acbc65

v0.4 Make Mirroring and Width Computation Great Again

This release fixes a bug in the URL creation logic for mirroring and now mirrors UCD and CLDR files to the operating system's cache directory.
Furthermore, it significantly streamlines the computation of grapheme cluster width, which now takes all emoji into account. That yields significantly better and more consistent results than the wcwidth solely based on Unicode's East Asian Width.
Finally, this release further modularizes the code, with the mirroring logic now in its own module.

Assets 3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

User-Visible Changes

Internal Changes

Releases: apparebit/demicode

v1.4: Ready for Unicode 16.0

v1.3: Easier experiments and a bug fix

v1.2 gains benchmarking, improves mirroring and testing

v1.1: A critical bug fix, a nice-to-have feature, and better tooling

User-Visible Changes

Internal Changes

v1.0 Demicode Is All Grown Up 🎉

v1.0.b1 A Better UI, Refactored Unicode Database

v0.7 Approaching 1.0

v0.6 Handle Older Unicode Versions

v0.5 Faster UCD look ups

v0.4 Make Mirroring and Width Computation Great Again