Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Heuristic for distinguishing phonetic and non-phonetic ruby? #8

Open
murata2makoto opened this issue Nov 3, 2021 · 3 comments
Open
Assignees

Comments

@murata2makoto
Copy link
Contributor

murata2makoto commented Nov 3, 2021

@aaeventhal wrote here:

Finally, I would like to know what the possibilities are for a heuristic that detects the note/complementary situation. Can we get an evaluation on how accurate that could be?

Such a heuristic is possible, but I do not believe that it will be very reliable. For example, manga and light-novel authors go crazy and think of bizarre ways of reading kanji characters for human names. 不死川 (shinazugawa) is a good example. Now, it is widely recognized thanks to the commercial success of Demon Slayer. But it is simply impossible to enumerate all such bizarre readings.

@murata2makoto murata2makoto changed the title Heuristic for detecting phonetic and non-phonetic ruby? Heuristic for distinguishing phonetic and non-phonetic ruby? Nov 3, 2021
@murata2makoto murata2makoto self-assigned this Nov 3, 2021
@murata2makoto
Copy link
Contributor Author

The technical committee of the Japan DAISY Consortium discussed this topic today. We think that (1) automatic detection can be quite reliable, (2) it cannot be perfect, and (3) it requires a lot of development cost and run-time cost. In other words, if we can mimic morphological analysis of modern machine translation engines and timely update of input methods for commonly-used strange names, heuristics can be quite reliable though not perfect. But can we expect this much for browser engines, which should run on mobile devices very well?

@aleventhal
Copy link

The technical committee of the Japan DAISY Consortium discussed this topic today. We think that (1) automatic detection can be quite reliable, (2) it cannot be perfect

Understood. The heuristic can be the fallback, but a clever author can still use a semantic to indicate when something is a note or note "do not use the default / do not guess". But falling back on the heuristic will help in the 95+% cases where the author has not added semantics.

and (3) it requires a lot of development cost and run-time cost. In other words, if we can mimic morphological analysis of modern machine translation engines and timely update of input methods for commonly-used strange names, heuristics can be quite reliable though not perfect. But can we expect this much for browser engines, which should run on mobile devices very well?

I think it can — definitely worth exploring. As an example of what's possible: some browsers and screen readers already use ML to provide automatic labelling for images that are missing alt text. This has proven to be quite useful.

@murata2makoto
Copy link
Contributor Author

Two years have passed since this discussion. Having seen ChatGPT, I am more optimistic about automatic detection of phonetic ruby and non-phonetic ruby.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants