Heuristic for distinguishing phonetic and non-phonetic ruby? #8

murata2makoto · 2021-11-03T07:47:00Z

@aaeventhal wrote here:

Finally, I would like to know what the possibilities are for a heuristic that detects the note/complementary situation. Can we get an evaluation on how accurate that could be?

Such a heuristic is possible, but I do not believe that it will be very reliable. For example, manga and light-novel authors go crazy and think of bizarre ways of reading kanji characters for human names. 不死川 (shinazugawa) is a good example. Now, it is widely recognized thanks to the commercial success of Demon Slayer. But it is simply impossible to enumerate all such bizarre readings.

murata2makoto · 2021-11-06T12:57:54Z

The technical committee of the Japan DAISY Consortium discussed this topic today. We think that (1) automatic detection can be quite reliable, (2) it cannot be perfect, and (3) it requires a lot of development cost and run-time cost. In other words, if we can mimic morphological analysis of modern machine translation engines and timely update of input methods for commonly-used strange names, heuristics can be quite reliable though not perfect. But can we expect this much for browser engines, which should run on mobile devices very well?

aleventhal · 2021-11-08T16:10:49Z

The technical committee of the Japan DAISY Consortium discussed this topic today. We think that (1) automatic detection can be quite reliable, (2) it cannot be perfect

Understood. The heuristic can be the fallback, but a clever author can still use a semantic to indicate when something is a note or note "do not use the default / do not guess". But falling back on the heuristic will help in the 95+% cases where the author has not added semantics.

and (3) it requires a lot of development cost and run-time cost. In other words, if we can mimic morphological analysis of modern machine translation engines and timely update of input methods for commonly-used strange names, heuristics can be quite reliable though not perfect. But can we expect this much for browser engines, which should run on mobile devices very well?

I think it can — definitely worth exploring. As an example of what's possible: some browsers and screen readers already use ML to provide automatic labelling for images that are missing alt text. This has proven to be quite useful.

murata2makoto · 2023-10-20T11:39:42Z

Two years have passed since this discussion. Having seen ChatGPT, I am more optimistic about automatic detection of phonetic ruby and non-phonetic ruby.

murata2makoto changed the title ~~Heuristic for detecting phonetic and non-phonetic ruby?~~ Heuristic for distinguishing phonetic and non-phonetic ruby? Nov 3, 2021

murata2makoto self-assigned this Nov 3, 2021

murata2makoto mentioned this issue Nov 6, 2021

Reading aloud ruby without reading aloud ruby base #12

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Heuristic for distinguishing phonetic and non-phonetic ruby? #8

Heuristic for distinguishing phonetic and non-phonetic ruby? #8

murata2makoto commented Nov 3, 2021 •

edited

Loading

murata2makoto commented Nov 6, 2021

aleventhal commented Nov 8, 2021

murata2makoto commented Oct 20, 2023

Heuristic for distinguishing phonetic and non-phonetic ruby? #8

Heuristic for distinguishing phonetic and non-phonetic ruby? #8

Comments

murata2makoto commented Nov 3, 2021 • edited Loading

murata2makoto commented Nov 6, 2021

aleventhal commented Nov 8, 2021

murata2makoto commented Oct 20, 2023

murata2makoto commented Nov 3, 2021 •

edited

Loading