Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[css-text] Add new CSS text-transform values for math #3745

Closed
fred-wang opened this issue Mar 19, 2019 · 28 comments
Closed

[css-text] Add new CSS text-transform values for math #3745

fred-wang opened this issue Mar 19, 2019 · 28 comments
Labels
a11y-tracker Group bringing to attention of a11y, or tracked by the a11y Group but not needing response. css-text-4 i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response.

Comments

@fred-wang
Copy link

fred-wang commented Mar 19, 2019

cc @mrego @emilio @rwlbuis @bfgeek

The proposal is to introduce new values for the text-transform properties in order to map alphanumeric text to equivalent Mathematical Alphanumeric Symbols. For details, see the explainer from the MathML Refresh CG repository.

Tentative tests:
web-platform-tests/wpt#16922
All the tests pass in Igalia's chromium-mathml build.

Draft:
https://mathml-refresh.github.io/mathml-core/#new-text-transform-values

@fred-wang fred-wang changed the title [css-text] Add new CSS text-transform values for math transforms [css-text] Add new CSS text-transform values for math Mar 19, 2019
@bkardell
Copy link
Contributor

After discussing this with @fred-wang and @mrego, I think that we agree that we probably want the values list to be just

none | [capitalize | uppercase | lowercase ] || full-width || full-size-kana | math-auto | math-bold | math-italic | math-bold-italic | math-double-struck | math-bold-fraktur | math-script | math-bold-script | math-fraktur | math-sans-serif | math-bold-sans-serif | math-sans-serif-italic | math-sans-serif-bold-italic | math-monospace | math-initial | math-tailed | math-looped | math-stretched

That's more in keeping with what is there today, it's one less character to type, and it doesn't get us into a whole host of other questions about why is it a function, what does the function produce and so on. Ultimately this is really just a name that identifies the transform, I think, and unless we want to start tackling the thing at the end of section 2

Note: A future level of CSS may introduce the ability to create custom mapping tables for less common text transforms, such as by an @text-transform rule similar to @counter-style from

And try explaining these in terms of those then I don't think there is a need for more. But, I also don't think that's necessary or maybe even desirable because of what/where this sits?

@AmeliaBR
Copy link
Contributor

AmeliaBR commented Mar 20, 2019

I think this is a good idea. In addition to their use in math typesetting, these characters have gained some notoriety as faux-fonts & it would be nice to have a way to use them without destroying the meaning of the text content.

(It wouldn't get rid of the underlying reason they became popular: social media that supports full Unicode but not rich text formatting. But, it would mean that a conscientious web developer working an a CSS environment could recreate that visual effect in an accessible way.)

I agree with Brian about using keywords instead of a function notation, since the math() function in the proposal seems to be merely a namespacing technique, and doesn't imply an operation

Some other thoughts:

  1. Can the auto value be handled by user agent stylesheets? We don't really have a pseudoclass for "has no element children", but that seems to be all you want: math|mi:no-children { text-transform: math-italic }

  2. If we're considering the general/decorative use case, and not just math typesetting, there are a few other conversions that I'd include:

    • circled for ASCII letters and digits (ⒶⒷⒸⓐⓑⓒ①②③) and maybe some punctuation (⊜⊛) and Hangul, katakana, and CJK ideographs (I don't know enough about what the circled CJK characters mean/how they are used to know if it makes sense as a stylistic transform)

    • circled-black or circled-negative for digits and capital ASCII letters (🅐🅑🅒❶❷❸)

    • squared (🄰🄱🄲) and squared-black or squared-negative (🅰🅱🅲) for capital ASCII letters (Note: would have to deal with browsers that upgrade some of the negative squared letters to "blood type marker emoji")

    Both the circled and squared transformations could be made more general if we accept that a text-transform can consist of adding a combining character after each character in the sequence.

  3. Should there be any Unicode normalization happening before the transformation? E.g., to decompose é into e +´ so that the letter can be transformed to a math formatting character and then the accent combined with the replacement?

  4. Based on the grammar, text-transform: capitalize math-fraktur would be allowed. It should probably be clear what the order of operations is, because there are a few case conversions that switch letters in/out of the ASCII range and therefore affect whether there are formatting conversions available.

  5. I think the grammar Brian posted should be modified to include square brackets around full-width || full-size-kana, otherwise it also allows text-transform: full-width math-fraktur. I don't think full-width variants of the math characters exist, at least not as dedicated Unicode points.

@frivoal
Copy link
Collaborator

frivoal commented Mar 20, 2019

unless we want to start tackling the thing at the end of section 2

Note: A future level of CSS may introduce the ability to create custom mapping tables for less common text transforms, such as by an @text-transform rule similar to @counter-style from

I did have a go at it a while back. If someone wants to revive / refine / debug this, let me know: https://specs.rivoal.net/css-custom-tt/

@fred-wang
Copy link
Author

I think this is a good idea. In addition to their use in math typesetting, these characters have gained some notoriety as faux-fonts & it would be nice to have a way to use them without destroying the meaning of the text content.

(It wouldn't get rid of the underlying reason they became popular: social media that supports full Unicode but not rich text formatting. But, it would mean that a conscientious web developer working an a CSS environment could recreate that visual effect in an accessible way.)

Thanks for the feedback!

About keeping the meaning of the text content, from the math point of view, a A with fraktur style should have the same meaning as the transformed character MATHEMATICAL FRAKTUR CAPITAL A' (U+1D504). Or at least a11y tools should find some way to indicate that it is a fraktur A and not a normal A. Not sure if that's the case currently for text-transform, see for example https://groups.google.com/a/chromium.org/forum/#!msg/chromium-accessibility/enk1PBjEfRc/InWurOYUg0EJ

I agree with Brian about using keywords instead of a function notation, since the math() function in the proposal seems to be merely a namespacing technique, and doesn't imply an operation

Yes, sorry that was a misunderstanding on my side. I thought the CSS WG had requested to change my initial proposal to use math( ) but that was just some random idea from @rego. I'm happy with any solution that makes implementation easy.

1. Can the `auto` value be handled by user agent stylesheets? We don't really have a pseudoclass for "has no element children", but that seems to be all you want: `math|mi:no-children { text-transform: math-italic }`

AFAIK not with current CSS capability. It's not "mi with no children" but "mi with a unique text node which itself has a single character" (e.g. <mi>x</mi> is italic but <mi>sin</mi> is not). Gecko handles that outside the CSS engine (which does not know about the DOM) by adding some special flag on the text node.

3. Should there be any Unicode normalization happening before the transformation? E.g., to decompose é into e +´ so that the letter can be transformed to a math formatting character and then the accent combined with the replacement?

I personally don't know math use cases for that. At least for MathML, accents (https://en.wikibooks.org/wiki/LaTeX/Special_Characters#Math_mode) are generally placed in a separate <mo> elements.

4. Based on the grammar, `text-transform: capitalize math-fraktur` would be allowed. It should probably be clear what the order of operations is, because there are a few case conversions that switch letters in/out of the ASCII range and therefore affect whether there are formatting conversions available.

5. I think the grammar Brian posted should be modified to include square brackets around `full-width || full-size-kana`, otherwise it also allows `text-transform: full-width math-fraktur`. I don't think full-width variants of the math characters exist, at least not as dedicated Unicode points.

I think Brian's grammar does not allow mixing math-* values with others (if I understand correctly https://drafts.csswg.org/css-values-4/#component-combinators). That was the initial intention since I'm not sure there is a clear use case for that (again from the math point of view).

However, I agree that mixing math-* values with capitalize, uppercase or lowercase has obvious interpretation and could be acceptable, at least for latin letters. If so, I believe the grammar should be something like

none | [capitalize | uppercase | lowercase ] || full-width || full-size-kana | [capitalize | uppercase | lowercase ] || [ math-auto | math-bold | math-italic | math-bold-italic | math-double-struck | math-bold-fraktur | math-script | math-bold-script | math-fraktur | math-sans-serif | math-bold-sans-serif | math-sans-serif-italic | math-sans-serif-bold-italic | math-monospace | math-initial | math-tailed | math-looped | math-stretched ]

But maybe we just want to allow to mix everything and ignore transforms that don't make sense, so that the grammar is just

none | [capitalize | uppercase | lowercase ] || full-width || full-size-kana || [ math-auto | math-bold | math-italic | math-bold-italic | math-double-struck | math-bold-fraktur | math-script | math-bold-script | math-fraktur | math-sans-serif | math-bold-sans-serif | math-sans-serif-italic | math-sans-serif-bold-italic | math-monospace | math-initial | math-tailed | math-looped | math-stretched ]

@AmeliaBR
Copy link
Contributor

About keeping the meaning of the text content, from the math point of view, a A with fraktur style should have the same meaning as the transformed character MATHEMATICAL FRAKTUR CAPITAL A' (U+1D504). Or at least a11y tools should find some way to indicate that it is a fraktur A and not a normal A.

That's a good point, and is different from how we want assistive text to expose text-transform changes to case. It doesn't mean it isn't do-able, it just means that the accessible version of the transformed text needs to be clearly defined. You'd want to get input from people who work on tools that expose math to screen readers and Braille display and so on, to find out how they implement it.

@joanmarie
Copy link

That's a good point, and is different from how we want assistive text to expose text-transform changes to case. It doesn't mean it isn't do-able, it just means that the accessible version of the transformed text needs to be clearly defined. You'd want to get input from people who work on tools that expose math to screen readers and Braille display and so on, to find out how they implement it.

As a screen-reader developer, my preference is that the transformed/rendered text is what gets exposed via the platform accessibility APIs. For instance if the DOM text is "foo", but that gets CSS transformed into rendered text of "FOO", atk_text_get_text(0, -1) (and all other AtkText text-retrieval methods) should return "FOO"; not "foo".

Screen readers, as a general rule, want to present the content sighted users see. Exactly how it got rendered is, more often than not, irrelevant.* Furthermore, it's a drag for an AT to have to take every single bit of accessible text, retrieve some additional property "just in case", and occasionally do additional work to transform it. The render tree already contains the final/displayed version, so please share that with me. 😄

*But if any ATs might have a need, exposing the transform property as an attribute on the accessible object shouldn't hurt anything and should be relatively easy to implement within the user agent.

@frivoal
Copy link
Collaborator

frivoal commented Mar 25, 2019

@joanmarie, I am genuiely curious, can you explain a bit more about why you'd want the transform to be preserved in general? How do screen readers read words in all uppercase? For instance, if text-transform:uppercase is applied to a first line, how does that get read out?

@joanmarie
Copy link

@joanmarie, I am genuiely curious, can you explain a bit more about why you'd want the transform to be preserved in general?

Because one of the primary functions of a screen reader is to present (in speech and in braille) what is on the screen. If sighted users see capital letters and blind users see lower-case letters, the latter group is missing out on information. And at least some of them -- if not many of them -- would report it as a bug in the screen reader.

How do screen readers read words in all uppercase? For instance, if text-transform:uppercase is applied to a first line, how does that get read out?

That depends on two things: 1) The screen reader and 2) User preference. For instance some screen readers have options like:

  • Use higher pitch
  • Play a tone
  • Say "cap"

The user can then pick which option(s) work best, and often configure the options on a per-app basis.

For the braille display, the indication would be based on the braille code being used. For instance, in the U.S. each capitalized word would be preceded by two dot-6 cells.

@AmeliaBR
Copy link
Contributor

Thanks for the insights, @joanmarie. This is very different from the "common sense" (but I guess outdated) advice I'd been given — that one should always use text-transform: uppercase instead of using all caps in the text content because all caps text content is often read out as initials instead of words.

For the math use case, it does greatly simplify things if the proper accessibility practice is to expose the formatted text.

It doesn't help the accessibility issues of decorative-characters, though, but I think that's beside the point, since this is already an accessibility issue without text-transform. Misuse of math characters has become so common on social media that screen readers probably need to add smart heuristics so they can read out 𝕸𝖆𝖙𝖍 in a way similar to how they'd announce Math or MATH.

@joanmarie
Copy link

FWIW, user agents are already exposing the rendered (transformed) text. Below are three screenshots, the first from Firefox Nightly, the second from Epiphany (WebKitGtk), and the third a locally-built Chrome. All three are from Linux. I can grab macOS next in case that would be helpful.

In the meantime, what you'll see in each is that I've:

  • Loaded data:text/html,<div style="text-transform: uppercase;">foo</div>
  • As a result of the above, I'm seeing a rendered all-caps version of the text, namely "FOO"
  • Using Accerciser's iPython console, I've used the Python bindings for AT-SPI2, to retrieve all of the accessible text for the accessible object backing the div whose text has been transformed. The command happens to be acc.queryText().getText(0, -1). What I get (and what the screen reader Orca gets) is the all-caps, rendered text, namely "FOO"

So the good news is that I'm already getting what I'd expect/want. 😄 Assuming this is the same on other platforms (my guess is that it is, with one caveat I'll come back to momentarily), it really doesn't matter if you type a literal all-caps or use CSS to transform it into all caps -- as far as the end user is concerned -- because the end user will wind up with the same result regardless of whether or not he/she happens to be sighted.

The caveat I mentioned is this: Some (Windows) screen readers historically have had virtual buffers or other recreations of the content. For any that still do: If they are grabbing the text from the DOM rather than from the accessibility tree, then one might indeed see differences in what the end user gets.

gecko-uppercase-transform
wk-uppercase-transform
blink-uppercase-transform

@joanmarie
Copy link

Update: On the Mac, I just performed the same experiment using Chrome Canary and Safari Technology Preview. And I got the same results, namely the accessible value obtained by inspecting the element with XCode's Accessibility Inspector is the all-caps, rendered "FOO."

@frivoal
Copy link
Collaborator

frivoal commented Mar 26, 2019

@joanmarie (disclaimer: you clearly know what you're talking about. I am not dismissing your expertise, but trying to understand it, as it doesn't quite mesh with my own sense of how this is supposed to work).

This is a bit counter intuitive to me, as I'd expect an upper-casing text transform to be the stylistic in the same way as a font-feature or a color change would be, and therefore that it should not be preserved, to avoid having ::first-line { text-transform: uppercase} result in the first line being read in a screaming voice or spelled out (neither of which seem the intended effect), or to avoid having 𝕸𝖆𝖙𝖍 being read as "mathematical bold fraktur capital M, mathematical bold fraktur small a, mathematical bold fraktur small t, mathematical bold fraktur small h" instead of as "math". This is not to say that this styling information should not be available to a user of the screen reader in some form (but then I'd expect similar announcements for text-transform: uppercase and font-variant-caps: all-small-caps to be announced in a similar way, or for text-transform: math-bold-fraktur and font-family: "Fraktur Bold"), but when reading "just the text", I'd expect the transform not to be applied. If it's semantically important, shouldn't it be in the source?

Also, only having access to the post-transform text seems to play poorly of the i18n oriented values:

  • It breaks text-transform: full-width. Using it to display "IBM" as "IBM" can be desirable in Chinese / Japanese / Korean text (particularly but not only in vertical text), but that doesn't mean that it should be read aloud as "full width Latin capital letter I, full width Latin capital letter B, full width Latin capital letter M". I still want to hear IBM. It's even worse if the word isn't an accronym to begin with. Maybe the Text-to-speech engine can be smart enough to know how to read words regardless of which character variant is being used, but short of reversing the transform (but then why have it applied in the first place), I expect it will, at least occasionally, run into things it doesn't know how to say other than by enumerating the Unicode character names. That sounds bad.
  • It defeats the point of text-transform: full-size-kana: "りょ" and "りよ" are different (the first is ryo, the second is riyo), but at the very small font sizes sometimes used in ruby, using the smaller ょ in りょ can make the text hard to see, and so authors sometimes want to display りよ instead of りょ despite the difference. The whole point of doing it using text-transform: full-size-kana rather than by hard-coding りよ into the DOM is so that screen readers can read it as ryo instead of riyo. For example, if the ruby for 無料 (which means free / costless) is encoded (as it should) as むりょう, that is read is muryō, and all is understandable. If it is encoded (or rendered into) むりよう/muriyō, this could be misinterpreted as 無利用, which means "useless". For a sighted reader, the potential confusion is probably preferable to the letter being too small to read, and they would see the original ideographs next to it, clearing things up if they can read them. But for someone going through a screen reader, it could be pretty confusing. If the screen reader reads both the original text and the ruby, you still hear "free and useless", and if it sustitutes the ruby for the original text, you would just hear "useless", instead of the intended "free (free)".

All in all, I'd definitely expect the styling information to be available to the screen reader so that it can do something useful with it and inform the user of notable styling features that are deemed relevant (such as, as you mentioned, playing some tone, changing the voice pitch, displaying two dot-6 cells on a braille display, or whatever is appropriate), but I'd expect the text itself to be the untransformed text. If we don't have that, I don't really understand what the point of having text-transform in CSS instead of doing some server side preprocessing change to the document content itself (or the same DOM change in js/react/vue/… for people who're more fashionable than me).

@frivoal frivoal added i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response. a11y labels Mar 26, 2019
@frivoal
Copy link
Collaborator

frivoal commented Mar 26, 2019

Also, picking form the explainer:

Mathematical formulae rely on mathematical alphanumeric symbols to provide specific meanings. For example, the Planck constant is denoted by an italic h, vectors in physics are often written as bold variables[...]

That seems like an argument for having it in markup rather than via CSS. If it's in CSS, this nformation would get discarded as stylistic for example when displaying in reader mode, or when the same content is picked up by a web crawler…

@fred-wang
Copy link
Author

Exposing 𝕸, as "M" rather than "mathematical bold fraktur capital M" is wrong. It's very common in math text to use the same letter with different "style" to convey different meanings, so we really want all users to be able to distinguish them. That was the rationale for proposing these new characters to the Unicode standard and the reason why they were accepted. Arguably, people who rely on these in social networks for purely stylistic purpose are doing misappropriate use, they should rather keep the original ASCII letters with a different font (which as said above is not easy/possible in social networks).

This text-transform proposal is based on the MathML @mathvariant attribute. Originally, the idea for this MathML attribute was to help systems that do not support non-BMP characters very well while still trying to preserve the semantic. It turned out that it is also useful for bottom-to-top LaTeX-to-MathML converters (e.g. fast ones generated LALR parsers) or other MathML generators. The MathML CG has been discussing deprecating/removing this attribute (since it actually duplicates direct use of transformed character) but unfortunately it seems it is still widely used ( w3c/mathml#77 ). Hence we still need one implementation.

For browser implementation, experience has shown that we want to implement mathvariant by relying on CSS inheritance (rather than implementing a MathML-specific inheritance like in WebKit or older Gecko versions) and to do the character transform at the same place as text-transform (otherwise we get limited and hacky support like in WebKit or older Gecko versions). Since WebKit (and Chromium?) does not have this notion of "internal" CSS property and since text-transform is very similar, this proposal tries to rely on this existing support to make implementation easy.

To make things more complicate, there is this "math-auto" value to render variables with a math italic (that is visually different from CSS italic) and is provided by math fonts in the math italic unicode range. This is instrumental to get beautiful rendering of math equations but does not really convey semantics, so should not be exposed to assistive technologies so the spec should probably mention it. Authors won't use the transformed characters in that case.

I guess in any case the spec should be amended to say how the characters are exposed to assistive technologies. math and Asian examples mentioned above suggests it could depend on text-transform values.

@Crissov
Copy link
Contributor

Crissov commented Mar 26, 2019

In school and university, I was exposed to several approaches to styling vectors, all of them conventionally using a single lowercase roman letter:

  • rightwards arrow above
  • underlined, horizontal line below
  • broken, blackletter: handwritten Sütterlin or printed Fraktur
  • italic (print only)
  • no special styling at all (but, like scalar variables, always lowercase, while matrices use uppercase letters)

There are probably other customs as well.

As an author, I would expect to have all of these choices in a single location or in a unicorn way:

  • hard-coding the character code point(s), perhaps with character entity references
  • choosing a dedicated markup element type or attribute value for each instance, e. g. <var font=blackletter>
  • defining a central style for certain semantic markup like <vector>

CSS is obviously hardly concerned with the first of these three options and best suited for the third one. It can be used as the (overridable) backbone for the second variant as well.

The first variant, Unicode, has all cases covered, either using the infamous math alphabets from SMP or combining diacritics like U+20D7 and U+0331.
The second variant very much depends on capabilities of the markup language, CSS-wise it is identical to the third variant, just using different selectors.
The final variant, which is on topic here, lacks support for adding an arrow above. Underlines are considered text decoration, casing is text transformation, italic face a font variant and blackletter a font family.
This feature request, if accepted by the WG, seeks to unify the final three styles inside text-transform, while underlines remain in a separate property (which is intended for runs of characters, not isolated ones) and arrows would still not be possible at all (as far as I know).

I guess that makes sense, but also constitutes an argument to make text-transform even more powerful – as in @frivoalʼs stale proposal. #3132

@fantasai
Copy link
Collaborator

fantasai commented Mar 26, 2019

Wrt accessibility, agreed 100% with @frivoal's comment in #3745 (comment) The entire point of 'text-transform' is to alter visual styling without changing the underlying content as it is read aloud or presented in alternative ways (e.g. alternate styling, reader mode, etc.). If AT is exposing these transformations when reading text aloud, this means we have no way to make these visual transformations without distorting the text for AT. I think @frivoal's examples of ::first-line capitalization and full-size-kana show why dissociating the visual and AT texts is sometimes appropriate and useful.

Currently text-transform is defined to affect rendering only. https://www.w3.org/TR/css-text-3/#text-transform-property It does not change the underlying document semantics. It does not affect plaintext copy/paste (and imo should not affect speech rendering). Using text-transform, like using order, is about intentionally creating a discrepency between the underlying text and the visual rendering, because the ideal visual rendering, if reflected in the document content, would inappropriately distort that content. If the underlying content should be affected, then the underlying content should be changed.

Which brings us back to the original point: should we add these values to better support MathML. I'm not entirely clear on the problem we're trying to solve here and why this would be the right approach. If math wants to shift visual styling to better reflect semantics already in the document, that would be inherent even if CSS is turned off, then that's one thing. But if this request is because it is more convenient to do the transformation in CSS than in a preprocessor even though it should be done in a preprocessor, then I don't think that's a good justification for adding it to CSS. (If the concern is just about fallbacks between glyphs assigned to these math Unicode codepoints vs font variations, we can talk about how to solve that at the glyph selection level.) Are there fundamental use cases other than supporting the mathvariant attribute that we need to solve?

@jcsteh
Copy link

jcsteh commented Mar 27, 2019

Regarding accessibility, the line between content, semantics and styling is not as clear-cut as one might hope. Puristically, it's tempting to say that CSS is purely for styling and nothing else. In reality, that isn't entirely true any more in the real world.

Two examples which demonstrate this clearly:

  1. If we say that CSS is purely about styling, then display: inline vs display: block would be irrelevant to accessibility. In reality, that isn't true. Whether several elements are rendered on the same line (e.g. a horizontal navigation bar or a credit card number input divided into 4 fields) can be just as useful/relevant for screen reader users as any other user.
  2. If we say that CSS is purely about styling, then :before and :after content, list-style, etc. would not be exposed to AT users. In reality, this information can be very helpful (if not essential) to the user.

On the flip side, we've come to realise that we do not want to reflect CSS display: block on tables, display: contents, etc. for accessibility.

text-transform is a grey area, IMO. On one hand, it can cause weirdness for speech. On the other, if someone chose to render something a certain way visually in a particular view of the text, they clearly did that for a reason. Braille in particular tries to provide a mapping to prevalent visual text features. Braille has indicators for capitalisation, bold, italics, etc. You'll note that these are not called emphasis, strong emphasis, etc.; they are not purely semantic. When we talk about accessibility, we cannot think about speech alone.

Finally, there's a technical concern. We absolutely must support things like :before/:after content, list-style, etc. In order to do that, we need the rendered text, unless we want to duplicate that rendering in accessibility code, which opens up a world of mapping (e.g. text offsets) and sustainability problems. That means text-transform ends up coming along for the ride whether we like it or not.

@frivoal
Copy link
Collaborator

frivoal commented Mar 27, 2019

Exposing 𝕸, as "M" rather than "mathematical bold fraktur capital M" is wrong. It's very common in math text to use the same letter with different "style" to convey different meanings, so we really want all users to be able to distinguish them.

In that case, the difference should be preserved even when css isn't applied, and so it should be in the document itself, not in its CSS. Removing css from a document is a thing that happens (bad network, reader mode, search engine crawler, same data in an RSS feed…), and it should not modify document semantics. Authors should not rely on css for document semantics.

So the question is not whether "𝕸" should be read as "M", but whether "M" should be read as "M" when it's been styled to look like "𝕸". When authors actually mean 𝕸 and not "M styled to look like 𝕸", they should put 𝕸 in the document.

Just because CSS has selectors and selectors are a convenient generic mechanism doesn't make CSS into the right mechanism for solving all kinds of problems.

text-transform is a grey area, IMO. On one hand, it can cause weirdness for speech. On the other, if someone chose to render something a certain way visually in a particular view of the text, they clearly did that for a reason. Braille in particular tries to provide a mapping to prevalent visual text features. Braille has indicators for capitalisation, bold, italics, etc. You'll note that these are not called emphasis, strong emphasis, etc.; they are not purely semantic. When we talk about accessibility, we cannot think about speech alone.

To me, this just means that braille terminals are capable of applying some styling, so they should look at CSS to decide what styling to apply.

The point is not that styling is useless. It certainly isn't, and assistive technologies should style things to the extent they can given their particular constraints. But things that must remain even when styling is dropped should be in the document, not in css.

Puristically, it's tempting to say that CSS is purely for styling and nothing else. In reality, that isn't entirely true any more in the real world.

I'm not a purist about this. I think the line between style and not style is sometimes fuzzy, particularly around UI/UX things. But "if you don't apply this bit of CSS, the meaning of the document changes and becomes wrong or misleading" is a very strong sign that CSS is the wrong tool for the job.

"It's hard to type" is an argument for a better IME or a better processor. Not for new features in CSS.

@jcsteh
Copy link

jcsteh commented Mar 27, 2019

To me, this just means that braille terminals are capable of applying some styling, so they should look at CSS to decide what styling to apply.

That's what happens now. Braille terminals are not a browser. They are controlled by screen readers, which in turn get their information from browser accessibility APIs. And those APIs currently expose "text content". As explained in my previous comment, we expose "rendered" text because of :before, :after, list-style, etc.

@fred-wang
Copy link
Author

Thank you everybody for the feedback.

First let me do a small digression as I'd like to clarify the somewhat separate MathML thing, which is the main motivation for this proposal:

  1. Both mathvariant or using the transformed character will be preserved on MathML formulas (and their containing document). So it's still correct to use the mathvariant attribute, even if one could just using the transformed character.

  2. In practice there are still a lot of existing tools/content using the mathvariant attribute so we are not going to just get rid of it, we need a native implementation. This is used by e.g. all MathJax or Wikipedia pages or those generated by the bottom-to-top itex2MML converter (which has been there for 20 years).

  3. Existing math fonts (including Cambria Math pre-installed on Windows or the popular STIX fonts) don't provide glyph-level selection for these mathvariant transforms. Instead it is expected that math rendering engines (modern LaTeX, browsers...) will just know how to perform the transform to the Math Alphanumeric Symbols. Again, we cannot just expect the fonts to be modified and updated so we need to do something at the browser level.

  4. Experience for native MathML implementations showed that appropriate way to implement mstyle attributes is to rely on CSS properties rather than duplicating the mechanism somewhere (that's how it used to be done in Firefox and caused some issues). In the case of mathvariant, the feature is very similar to text-transform so the most natural thing is to rely on it. Some Apple reviewer commented that WebKit does not allow "internal" CSS properties so we have a hacky/partial implementation until we can standardize a CSS property for mathvariant.

  5. Experience in browsers/assistive technologies by @joanmarie to make mathvariant accessible showed that if text-transform does not expose the transformed text we would need to perform the mathvariant transform somewhere else. This is basically the general problem @jcsteh described.

Now coming back to the CSS issue, I thought extending text-transform for these existing math characters is sensible and would be easy to implement and test (do everybody agree with that?). Igalia offers to implement it and we already have some similar tests for mathvariant, and it is important for native MathML implementations as I previously explained. In addition, it could be useful for people implementing "math polyfill" e.g. using some CSS stylesheet with mstyle[mathvariant="double-struck"] { text-transform: math-double-struck; } is much more efficient and lightweight than a script that goes over all the DOM and modifies the text nodes to transform characters. Of course, nobody uses text-transform for such use case yet, so it shouldn't be a concern for now, contrary to the MathML use case.

Regarding the a11y issue, I believe a11y and CSS people should first agree on whether to expose the text-transform text as what the CSS spec says and the actual a11y implementations seem to differ. I don't see any consensus on this thread or https://groups.google.com/a/chromium.org/forum/#!msg/chromium-accessibility/enk1PBjEfRc/InWurOYUg0EJ. Probably this can be discussed in a separate GitHub issue and we can still add these math text-transforms in the meantime. Depending on the outcome, we would either have nothing to do to make mathvariant accessible (my personal preference) or keep doing the extra effort that we experimented in (5).

@AmeliaBR
Copy link
Contributor

On the question of whether semantically essential styling should be in the document (@frivoal's comments), I think a good comparison is HTML dir vs CSS direction. We recommend that authors use the attribute to keep the semantics in the document, but the CSS property exists so that the same effect can be achieved with other content languages (e.g., it maps to a presentation attribute in SVG), and regardless of the markup used, it gets mapped into the same text processing pathways.

So if MathML supports ASCII text that gets upgraded to mathematical formatting using an attribute, that is still preserving the semantics in the document. But if it removes some of the "magic" of the rendering to describe the effect of that attribute through a CSS style, then keeping semantics in the document shouldn't be an objection to the style rule.

@AmeliaBR
Copy link
Contributor

FYI to all, @bkardell has started a dedicated thread to discuss text-transform and accessibility, beyond just the math proposal: #3775

@asmusf
Copy link

asmusf commented Mar 30, 2019

What is essential is that different types of variables, e.g. scalars and vectors, be typographically distinct. However, the actual choice of appearance is a matter of style. Someone gave this example already: vectors in particular can be presented in many ways.

That makes a text-transform a reasonable tool in defining how to render certain constructs in different environments (different documents).

For accessibility, what should be passed on to the screen reader would ideally be the semantic value (e.g. vector) and not the raw choice of rendered value. Certainly, it would be more useful to hear "vector a" than "mathematical bold small a".

That would presuppose that styling is associated in a logical way with semantic information somewhere, e.g. via a class=vector, perhaps.

@briansmith
Copy link

math-auto

math-auto is proposed here because there is no CSS selector for the text content of an element. However, there are other areas in MathML and elsewhere where we need similar behavior. For example, deciding how much space to render before/after a math operator <mo> also depends on the text inside the element. It would be better to create some new minimal CSS selector that would allow us to match on the text content, and then implement the auto behavior in pure CSS using that selector. So, I think it would be good to conditionally accept the proposal as is, but also open an issue about replacing the math-auto value with a more general mechanism.

Ideally browsers' "Find in Page" feature would allow searching for the text pre-transform and post-transform. That way, for example, I could search for the variable 𝒙 on the page by searching for either "x" or "𝒙". Firefox's MathML implementation does this, for example.

@css-meeting-bot
Copy link
Member

The CSS Working Group just discussed (css-text) Add new CSS text-transform values for math.

The full IRC log of that discussion <emilio> Topic: (css-text) Add new CSS text-transform values for math
<emilio> Github: https://github.com//issues/3745
<emilio> fantasai: I really think this should be done with i18n experts
<emilio> ... probably at tpac
<emilio> Rossen: who added it to the agenda?
<emilio> bkardell_: I did
<emilio> Rossen: are we ready to discuss this?
<emilio> bkardell_: there were very strong objections when I added it to the agenda but it seems that has been solved
<emilio> bkardell_: I'd like to make some progress on that issue
<emilio> astearns: what's the points you're talking about?
<emilio> bkardell_: the philosophy of text-transforms and how it interacts with a11y
<AmeliaBR> q+
<iank_> q+
<emilio> bkardell_: we have a new transform that prob. needs to do something different
<emilio> ... the are at least two answers to the questions
<emilio> florian: I think that's why we need other experts on the discussions
<emilio> ... because whether text-transform affects the semantics is complicated
<emilio> fantasai: I think text-transform, if we're stuck for compat with a11y for capitalization that's unfortunate, but we shouldn't introduce the same for math
<fantasai> https://github.com//issues/3745#issuecomment-478206009
<emilio> florian: once we have the markup communicate the semantics, we probably want text-transform to affect some of the presentation but not a11y
<emilio> bkardell_: my understanding is that we'd encourage putting it in the markup, but there's a bunch of legacy content which is what would make this necessary
<emilio> AmeliaBR: I think florian's point is that you could have a presentation transform but the accessibility stuff should be in the markup
<emilio> bkardell_: I'd like to get some kind of checkpoint on this
<emilio> AmeliaBR: the question is: "for math-variant to be exposed and accessible, does it need to be exposed through text-transform transforming the text-content, or via an attribute that gets passed to the a11y tree, which _also_ gets a UA rule to make the presentation change'
<emilio> fantasai: I think that's the right way to go, and gives you the ability to give more accurate info to a11y that doing unicode transformation
<emilio> ... unicode transformation doesn't contain math semantics
<emilio> ... that needs to be in the markup, if that information is in the content it should be given to a11y that way, rather than via text-transform
<emilio> ... a11y doesn't say "Bold", it uses the fact that you're on <strong>
<emilio> AmeliaBR: actually that's not true, a11y is trying to introduce roles for <strong> and similar, authors don't distinguish <i> or <em>
<emilio> ... if there's an italic <span> in a header it'd be called out depending on the verbosity level
<emilio> Rossen: it calls out format stops
<AmeliaBR> s/a11y/ARIA WG/
<emilio> bkardell_: there's a similarity here with html's kinda-weak semantics, a11y tools get mostly plain text with other hints, for math a11y my understanding is that some tools use the unicode information
<emilio> Rossen: we don't support mathml in Edge
<emilio> bkardell_: well, Edge does parse mathml as XML content
<emilio> ... which is why mathml is a kinda weird place
<emilio> Rossen: [describes how a11y works in Edge and how screen readers can poke at the DOM / render tree in non-Edge (Chromium / Firefox)]
<emilio> Rossen: Edge doesn't allow third-party processes in its process
<emilio> bkardell_: please fact-check me :)
<emilio> florian: I think this is a topic for TPAC, since different a11y experts have diverging oppinions
<emilio> Rossen: are we going to make any progress?
<Rossen_> q?
<emilio> bkardell_: do we have some specific questions/
<emilio> *?
<astearns> ack AmeliaBR
<astearns> ack iank_
<emilio> AmeliaBR: I think my point was previously said, so to wrap up: We're saying that the specific proposal is to take it back to see if we can decouple the style and the markup a11y, and defer for TPAC?
<emilio> bkardell_: yeah
<Rossen_> ack iank_
<emilio> iank_: seems like the people objecting about text-transform is just about a11y, is that correct?
<emilio> AmeliaBR: yeah, it's about whether it sees the characters for a11y before or after transform
<emilio> florian: it's not just about a11y but more generally the semantics, like what would happen in reader mode and such?
<emilio> iank_: another point is that mathml has a lot of legacy content, and mathml does this in a very magic way. We don't want to get in a situation where other math libraries cannot opt-in to this power
<emilio> Rossen_: let's stop here for now

@frivoal
Copy link
Collaborator

frivoal commented Jun 8, 2019

This point relates to the comment @tabatkins made in #3746 (comment) If we're in situation (1), then this isn't needed, but it is if we're in the others.

Personally, my feeling is that the mathml mathvariant attribute should continue to be the main way that such variant letters are selected, and this attribute should be the thing that causes the accessibility tree to carry the transformed variant. In addition, it seems reasonable to also have the text-transform values, but these would not cause the text in the AT to be transformed (although there could/should an annotation in the AT indicating that the text has been transformed, separately from the text itself). These text transforms could certainly be used in the UA stylesheet as the mean by which the visual rendition is achieved though.

This way:

  • alternative views, such as reader mode, would show the correct glyphs despide dropping all author css
  • alternative uses of these transforms (such as playful use as often seen in social media user names) could still be read as the original text (or be read out as transformed, if the screenreader so desires, using the metadata)

If we wish to generalize the ability to character transform that affect the AT (and readermode) outside of mathml, what we should do is generalize the mathvariant attribute and allow it on arbitrary markup.

But again, this depends on the answer to #3746 (comment)

@plehegar plehegar added the a11y-tracker Group bringing to attention of a11y, or tracked by the a11y Group but not needing response. label Feb 15, 2020
@fred-wang
Copy link
Author

@bkardell It looks like this can be closed in favor of #5386 ?

@astearns
Copy link
Member

Sounds good. The new issue links to this discussion so we're not losing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a11y-tracker Group bringing to attention of a11y, or tracked by the a11y Group but not needing response. css-text-4 i18n-tracker Group bringing to attention of Internationalization, or tracked by i18n but not needing response.
Projects
None yet
Development

No branches or pull requests