-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Heuristic for distinguishing phonetic and non-phonetic ruby? #8
Comments
The technical committee of the Japan DAISY Consortium discussed this topic today. We think that (1) automatic detection can be quite reliable, (2) it cannot be perfect, and (3) it requires a lot of development cost and run-time cost. In other words, if we can mimic morphological analysis of modern machine translation engines and timely update of input methods for commonly-used strange names, heuristics can be quite reliable though not perfect. But can we expect this much for browser engines, which should run on mobile devices very well? |
Understood. The heuristic can be the fallback, but a clever author can still use a semantic to indicate when something is a note or note "do not use the default / do not guess". But falling back on the heuristic will help in the 95+% cases where the author has not added semantics.
I think it can — definitely worth exploring. As an example of what's possible: some browsers and screen readers already use ML to provide automatic labelling for images that are missing alt text. This has proven to be quite useful. |
Two years have passed since this discussion. Having seen ChatGPT, I am more optimistic about automatic detection of phonetic ruby and non-phonetic ruby. |
@aaeventhal wrote here:
Such a heuristic is possible, but I do not believe that it will be very reliable. For example, manga and light-novel authors go crazy and think of bizarre ways of reading kanji characters for human names. 不死川 (shinazugawa) is a good example. Now, it is widely recognized thanks to the commercial success of Demon Slayer. But it is simply impossible to enumerate all such bizarre readings.
The text was updated successfully, but these errors were encountered: