Skip to content
This repository has been archived by the owner on Mar 7, 2023. It is now read-only.

Meeting 8 minutes #50

Merged
merged 1 commit into from
Mar 25, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
174 changes: 174 additions & 0 deletions minutes/meeting-20210209-8.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,174 @@
# Minutes of Meeting 8, 2021-02-09, 20:30–22:00 UTC

- Acting chair: Liang Hai
- Scribe: Simon Cozens
- Agenda: https://github.com/w3c/font-text-cg/issues/47

## Resolutions

None

## Actions

1. Drafting a proposal that can be submitted to the Unicode Consortium for consideration.
2. Liang: Start collecting test cases for segmentation/itemization.

## Attendance

Benedikt Bramböck, Bobby de Vos,\* Christopher Chapman,\* David Corbett,\* Duncan MacMichael, Erin McLaughlin,\* Erwin Denissen, Fredrick Brennan,\* Greg Hitchcock, Jany Belluz,\* John Hudson,\* Josh Hadley,\* Kamile Demir, Laurence Penney,\* Liang Hai,\* Ned Holbrook,\* Neil Patel,\* Norbert Lindenberg, Peter Constable, Renzhi Li, Sascha Brawer, Simon Cozens,\* Tamye Riggs, Vladimir Levantovsky\*

\* Formal participants of the FTCG.

## Minutes

### 1. Self-introduction of new participants

Sacha Brawer: Started Unicode test suite.

Benedikt Bramböck: Font technologist, mostly here to listen.

Kamile Demir: Just started as font designer at Adobe, here to listen.

Tamye Riggs: Executive Director ATYPI (not new...)

### 2. Review the last meeting’s minutes (note actions) [link]

### 3. Plan major agenda items for next meeting (20:30–22:00 UTC Tue 9 Mar)

### 4. Meeting 7 action 1: Discuss unicode-org/text-rendering-tests [@brawer]

Liang introduced the project, used to be a pretty well-known test suite online but at some point active development stopped. Effort to provide extensive testing. May overlap with the immediate issue we were trying to discuss, script/font segmentation/itemization. May also provide a good foundation for other tests because it’s for text rendering, everything we care about - not only segmentation. Sascha to introduce later. Prepare questions.

Saschaors, so ; 3-4 years ago, owned by Unicode. Motivation was about the unpredictability of text rendering. At the time, was working at Google I18N on text rendering/Noto. Ran into issues with platforms rendering differently, so useful to have a test suite. Putting it at Google would make it problematic for other contributors, so conclusion was to host it at Unicode.

Just a number of tests that can run on a free stack (Harfbuzz/fribidi/raqm), coretext stack, the Tehreer Stack and two Javascript implementations.

What the tests do: generates a couple of HTML files by running tests. Font variation test has documentation of expected and observed rendering, and test suite compares expected to observed and displays conformance or failure.

The test cases are defined as HTML snippets with embedded SVG and the test suite asks the stack to render the text as an SVG path. Test harness calls native implementation which converts to a path. The test suite checks if the paths are the same. Shows difference between expected and observed in IVS tests.

Roughly organised by font table. Lots of variation tests. Failures (with opentype.js!) are because implementation is pretty crappy! FreeStack (harfbuzz/etc.) mostly succeeds.

Many variations tests because at the time we were working to revive TTGX/font variations, so a lot of the corner cases that came in that development ended up being tests in the test suite. Later Behdad Esfahbod was working on the Apple AAT support for Harfbuzz, which is semi-documented - public documentation but not always complete. We did a reverse engineering effort to figure out how AAT really works, and made these test fonts part of the Unicode text rendering tests.

Most tests compare paths expected/observed. Some tests the engine doesn’t crash when faced with malicious fonts. Because OT is Turing complete, you can basically have infinite loops, and you can’t really detect that so you need to have a counter to make sure the program stops. Also AAT version of infinite loops.

Last part to show is the Noto shaping corner cases. Bugs were reported to Noto which ended up being bugs in the shaping engine, so we added test cases there.

Would be great if someone could add more test cases, maintain it, clean up things. One thing that’s needed is to test the Microsoft implementation, but nobody has the time to write the test harness. Unicode CLA, so if you send a pull request a bot will ask you to sign. Otherwise, fork and PR! If anyone wants to take over, please do.

Scope: point is to have a test suite for text rendering. Whatever you want to do, feel free?

Liang; Realised it’s not an abstract project but many tests came from Noto bug reports.

Sascha: Asked around for contributions. Adobe and MS also contributed test fonts.

Shaping is incomplete - when a bug came in I added tests.

Harfbuzz subsets fonts to primary test case and uses that; we did not do that, cleanroom tests. Rasterisation and hinting still completely missing. Not much development going on. If you want a test suite, go ahead, but you will need to think hard about how to make the test harness. Could be expanded to have more tests for other things - color fonts etc.. Clean up test harnesses.

Hosting at Unicode was a good idea - very noncontroversial for the companies. It felt like hosting it at Google may have been offensive to people, but doing the same thing at Unicode worked.

### 5. Proposal: maintain a list of text runs as OpenType shaping test cases [#41, @Lorp]

Just a feeling that, having seen the Unicode tests. Simon seemed to be suggesting that things were working well already and shouldn’t go end-to-end, and that a visual approach was not necessary. I felt for implementors, seeing visual mistakes was useful. Maybe having done the variation work it was useful to see it visually but maybe for shapers it is OK just to see that the right glyphs come out. Simon/Sascha to debate the merits? By the way, the HarfBuzz test suite incorporates the Unicode text rendering tests; HarfBuzz checks glyph positions and glyph IDs [see source code] without doing a “visual” comparison on glyph paths.

### 6. Meeting 7 action 3: Itemization [#44, @PeterConstable]

John’s proposal to define a spec for itemization. My action to follow up with Unicode Consortium to see if they would be willing to be a home for it. Some interest/openness in UTC, some questions to be fleshed out. How many in UTC able/interested in contributing? Mentioned this may bring in additional people.(UTC=Unicode Technical Committee, one of the committees within Unicode, the one responsible for Unicode Standard.)

Work of UTC divided into multiple subcommittees. Properties/algorithms subcommittee may be a reasonable home but those currently in that not working on things related to text rendering. Different subgroup may be more appropriate. Was asked that we come up with a more detailed proposal to give to Unicode about what the project would be. We talked about specifically itemization with the idea that this could broaden. UTC has a specification for Arabic mark ordering; not necessarily in terms of encoded but in terms of how glyphs are sequenced when shaping. Comments that a similar spec would be useful for other scripts (e.g. Hebrew). Some obvious other candidates for specs that such a group within UTC could work on.

Request back from Unicode leadership to work on a more complete proposal. Woudl be happy to collaborate with others on that.

In terms of the issues; sorry for not responding to query about splitting the issue - didn’t go looking for an existing issue! In #44, specifically trying to frame the scope of the itemization problem and try to come up with some parameters to characterise the problem. Some people provided comments, haven’t had a chance to respond yet. Need more specific examples of problem cases that people have encountered, either to provide more concrete examples or to help narrow the scope. There was some discussion about how itemization interacts with other kinds of segmentation that is done on text. Have started writing up some notes about how I saw different kinds of segmentation interacting. Didn’t put that into this issue as that starts to get into technical details which are beyond the scope of the charter of the group.

John: One thought was the degree to which this is/is not OT specific. My interaction with segmentation and itemization is OT specific. I want to make OT lookups and want to know whether they’re going to get processed between certain glyphs by predicting whether or not they’ll be in the same run or separate runs. My understanding from Ned is that AAT doesn’t care at all. Don’t know whether Graphite does. If we’re looking at this in the context of a pre-step for rendering, it seems to be quite OT specific and I wonder if we can confirm that is the case - whether UTC are happy coming up with an algorithm specifically for OT or if it makes sense to look at it in more generic terms.

Fred: It isn’t OT specific, at least at the implementation level. In Graphite, same segmentation that is used for other things is presented before Graphite run application. It sees the same segmented runs as other things that support OT.

Norbert: When you do rendering there is not just script segmentation, but other kinds of segmentation - line breaks, style runs, bidi, etc. All those runs provide opportunities to break the text which may be unfortunate. In Brahmic scripts, you deal with clusters and if cluster is incomplete a dotted circle is added. If that is separated from the rest of its cluster another dotted circle is added! SImilarly for John Hudson’s example of parentheses which may end up getting separated from their run. Doesn’t make sense to break out script segmentation as a separate issue. UTC already specifies other ways of segmentation - line breaking, extended grapheme clusters, etc. - should get them all specified when thinking about segmentation for shaping.

Peter: UTC does have some specifications. There isn’t one for itemization. Bidi/line breaking/clustering all done, although they may not be complete. These segmentation types do have some interactions. For example, if itemization has broken down a string into a script-specific run, then you’re not going to be able to form clusters across those boundaries because the shaping engine is where the clusters get resolved. But on the other hand, if you look at line breaking, you cannot determine line breaks until you’ve measured the text and you can’t do that until you have shaped. So shaping the entire paragraph as a single string needs to be done to measure and determine which line break opportunities are appropriate. (Unicode line breaking algorithm suggests opportunities.) So there is a logical sequencing: shape, then look for line breaking opportunities. Some questions about interaction between bidi and itemization. Will itemization or bidi handle neutral characters any differently? That’s where the interesting issues may lie.

Renzhi: Deciding linebreaking doesn’t require a shaped paragraph. Different software does this very differently. Knuth-style line breaking shapes all the different intervals of possible linebreaks. Some software uses a fetching process - you have a line, and you fetch “words” (up to a potential linebreaking point) from a text stream, shape that, and once you have enough you form the line.

Fred: Conceptual question. Segmentation currently too strict? Segmentation into scripts and not having a feature that can cross scripts too script? Example a cursive font for Japanese. If you have an ideograph and kana you could never create a ligature with those on the current segmentation model. Is the problem that it’s too strict?

Peter: That is one of the various problems. You could characterize that as: should every Unicode script be the level of granularity for segmentation, or should the kana and kanji be merged? That’s an open question. Another aspect: (discussed here in the past) after you’ve done essential shaping for a script, you may have things you want to apply across segment boundaries. Thought of having a “late pass” in shaping which is not limited to a specific script - opportunity to do cross-script contextual things / bidi level boundaries.

Norbert: One more eaxmple of linebreaking. Partly specified by Unicode can identify opportunities within a cluster! Seen clusters being broken across lines. Should be looked at together.

Bobby: When you’re talking about what script, you should consider script extensions part of Unicode - character has a primary script but some codepoints legimitately exist in multiple scripts.

Liang: Inside Unicode, Roozbeh has been envisioning a new property, a better script extension as script extensions has its own architectural limitations. Should be some ongoing drafts in his Unicode repo.

Peter: …

---

Liang: Any actionable tasks? Documenting.

**ACTION:** Drafting a proposal that can be submitted to the Unicode Consortium for consideration.

**ACTION:** Liang: Start collecting test cases for segmentation/itemization.

### 7. Actionable proposals [#36]

### 8. Stalled proposals [#9] and other proposals in the issue tracker

### 9. Informal liaison report from MPEG-OTSPEC [@vlevantovsky, list archives]

See the meeting agenda! Updated working draft has been made available for public review. Link is there if you want to look at it. Minor updates being actively worked on, various stages of discussion, PRs are open and some already closed. We have a lot of time until the next meeting so will be in good shape to finalize and submit before next meeting in April.

Liang: This is the due process, it’s your responsibility as the public to participate in standardization, don’t miss the opportunity!

### 10. Terminological glossary? [#42, @tiroj]

Passing thought based on discussions so far - terminology were being used in different ways based on where people were coming from. “Rendering” for some people very broadly to mean everything, whereas others from a rasterization background might see it just as rasterization/hinting. Wary of the idea of a glossary as is a gigantic invitation to bikeshedding! Maybe we just acknowledge that terms are used in different ways and we need to be aware that terms we use may mean different things to different people and encourage people to define as they use it. Maybe be more helpful as a community practice than a glossary in which we define all the different ways that a term is used.

Liang: Maybe just a wordlist rather than definitions?

John: Yeah, deal with it in a single issue and log it as they turn up.

Liang: We have lots of opportunities to work on documentation, but beware of bikeshedding. Can be very helpful as a way to recruit newcomers into the effort,.

### 11. UTN#11 versus OpenType Myanmar shaping [#43, @simoncozens]

Simon: Looking at Myanmar texts. Wondering what should happen in the absence of Unicode normalization. The Unicode Technical Note #11 suggests some order, but it’s not an official document. When you have to do it yourself in font, how should you do it.

Liang:

### 12. Font metrics to report blanks in CJK punctuation [#45, @frivoal]

### 13. Introduce fontFeatures 1.0 [@simoncozens]

Simon:

[slides] Intermediate representation of OTL. Substitute, position, attach, chain.

shaperLib. It’s not a complete shaper. Passes about 80% of HarfBuzz tests.

FEE (Font Engineering with Extension Language). In the status quo, the feature code doesn’t really access your font. Define glyph class with filtering glyph names, or other predicates. Can also load plugins that do complex tasks for you. [Examples.]

A shaping engine. A way to implement the language, a new way to specify OTL rules.

Renzhi: Curious why new DSL.

Fred B: This DSL already very close to Python.

David C: Can you modify glyphs?

John H: Font makers shouldn’t need to learn Python in order to do a font.

Peter C: Regarding David C’s question, it’ll be nice some day if we have only one glyph and adjust the outline (eg Myanmar sign-ra) using variation.

Laurence P: DSL vs Python, I’d like to use JavaScript instead of having a further locking down in Python.

John H: What would involve to get VOLT’s VTP into this?

Simon C: Probably for me to first understand VTP.

Peter C: [Terminology suggestions.]

Jany B: Slides to be available?