Sending both the base text and ruby text to the text-to-speech engine #9

murata2makoto · 2021-11-03T07:58:56Z

I agree there are edge cases, and that the は example is likely to be pronounced better if the base text is sent to the text-to-speech engine. However, once the text-to-speech engines understand ruby context, I think exposing both (to be pronounced as a single instance) is likely to result in better results, not worse. Ruby-unaware speech engines should just attempt to pronounce the base text in those instances of "phonetic-optional."

I also think that sending both the base text and phonetics ruby text to the text-to-speech engine would be useful. But I am not aware of any text-to-speech APIs that can send both.

One idea is to use Unicode characters for ruby shown below. Then, text-only APIs would be good enough.

Code point FFF9 (hex)—Interlinear annotation anchor—marks start of annotated text
Code point FFFA (hex)—Interlinear annotation separator—marks start of annotating character(s)
Code point FFFB (hex)—Interlinear annotation terminator—marks end of annotated text

But most engines would simply ignore these characters and read aloud both the base text and ruby text, which is usually very bad.

cookiecrook · 2021-11-03T19:48:07Z

Yes. My suggestion above is out of context, but this possibility would require updates to 1) the Ruby spec, 2) Web Engines, 3) Speech Engines, and possibly 4) Assistive Technology like screen readers that may interface between 2 and 3.

Your suggestion could work, but would require updates to either the Speech Engine (or some intermediary service.)

aleventhal · 2021-11-08T16:14:53Z

Are speech engines even the right place to implement heuristics? They can lack context. For example, when the user is navigating by word or character, there is much less context. It's possible that a sentence, paragraph or even the entire document is the most useful context for applying ML.

Also, if the rules are applied at a higher level (in the browser or AT for example), then TTS APIs would not need to change.

murata2makoto · 2021-11-11T14:31:13Z

@aleventhal

Are speech engines even the right place to implement heuristics?

It is not clear to me how TTS engines and user agents (or other ATs) interact. I am not aware of any documents that describe their interactions. In the Japan DAISY consortium, we tried to create a document (in Japanese) but I admit that it is still immature although it may contain some useful information about Japanese TTS.

aleventhal · 2021-11-11T16:16:33Z

First, the screen reader loads the entire document, or at least what it thinks is currently relevant, into its own memory space, using accessibility APIs.
As the user uses the web page, events are fired by the web browser to let the screen reader know there are changes to content. The screen reader responds to these events with more requests via the accessibility API, and updates its model.
Whenever the user's point of regard changes (e.g. focus/caret/selection), the screen reader looks at its model and builds a string to send to the the text-to-speech or Braille engine, and sends it. To build this string, it utilizes not only the text/semantics at the current location, but also contextual information from the parent nodes, nearby nodes or other related nodes.

murata2makoto · 2024-05-26T07:20:30Z

The latest draft has a note:

NOTE
This option does not necessarily ignore ruby annotations. Although text-to-speech engines mainly use ruby bases, they may also use ruby annotations as a hint.

Is this good enough to close this issue?

murata2makoto self-assigned this Nov 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sending both the base text and ruby text to the text-to-speech engine #9

Sending both the base text and ruby text to the text-to-speech engine #9

murata2makoto commented Nov 3, 2021

cookiecrook commented Nov 3, 2021 •

edited

Loading

aleventhal commented Nov 8, 2021

murata2makoto commented Nov 11, 2021

aleventhal commented Nov 11, 2021 •

edited

Loading

murata2makoto commented May 26, 2024

Sending both the base text and ruby text to the text-to-speech engine #9

Sending both the base text and ruby text to the text-to-speech engine #9

Comments

murata2makoto commented Nov 3, 2021

cookiecrook commented Nov 3, 2021 • edited Loading

aleventhal commented Nov 8, 2021

murata2makoto commented Nov 11, 2021

aleventhal commented Nov 11, 2021 • edited Loading

murata2makoto commented May 26, 2024

cookiecrook commented Nov 3, 2021 •

edited

Loading

aleventhal commented Nov 11, 2021 •

edited

Loading