-
Notifications
You must be signed in to change notification settings - Fork 4
/
index.html
executable file
·892 lines (820 loc) · 45.6 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<title>Text to Speech of Electronic Documents Containing Ruby: User Requirements</title>
<script
defer
class="remove"
src="https://www.w3.org/Tools/respec/respec-w3c"
></script>
<script class="remove">
var respecConfig = {
shortName: "ruby-t2s-req",
specStatus: "ED", // "WG-NOTE"
noRecTrack: true,
edDraftURI: "https://w3c.github.io/ruby-t2s-req/",
editors: [
{ name: "MURATA Makoto [FAMILLY Given]", company: "DAISY Consortium", w3cid: "32937", },
],
group: "i18n",
github: {
repoURL: "w3c/ruby-t2s-req",
branch: "gh-pages",
},
localBiblio: {
JISX4051: {
title: "Formatting rules for Japanese documents (『日本語文書の組版方法』; JIS X 4051)",
publisher: "Japanese Standards Association",
date: "2004",
id: "JIS X 4051:2004",
},
ACCESSIBLE_E_BOOKS: {
title: "Guidelines for creating accessible e-books for text-to-speech",
publisher: "the Ministry of Internal Affairs and Communications",
date: "2015",
href: "https://web.archive.org/web/20220118065321/https://www.soumu.go.jp/main_content/000354698.pdf",
},
'epub-32': {
title: "EPUB 3.2",
href: "https://www.w3.org/publishing/epub3/epub-spec.html",
status: "W3C Final Community Group Specification",
publisher: "W3C",
date: "08 May 2019",
},
'epub-32-Packages': {
title: "EPUB Packages 3.2",
href: "https://www.w3.org/publishing/epub3/epub-packages.html",
status: "W3C Final Community Group Specification",
publisher: "W3C",
date: "08 May 2019",
},
'epub-32-ContentDocs': {
title: "EPUB Content Documents 3.2",
href: "https://www.w3.org/publishing/epub3/epub-contentdocs.html",
status: "W3C Final Community Group Specification",
publisher: "W3C",
date: "08 May 2019",
},
'JEITA_IT-4002': {
title: "Symbols for Japanese Text-to-Speech Synthesizer",
id: "JEITA IT-4002",
date: "May 2005",
publisher: "Japan Electronics and Information Technology Industries Association",
},
'JEITA_IT-4006': {
title: "Symbols for Japanese Text-to-Speech Synthesizer",
id: "JEITA IT-4006",
date: "March 2010",
publisher: "Japan Electronics and Information Technology Industries Association",
},
'Transliteration Training Course': {
title: "Textbook for the Volunteer Transliteration Training Course: Basics (In Japanese, 音訳ボランティア養成講習会テキスト 基礎課程編)",
href: "https://naiiv-books.net/shopdetail/000000000023/",
date: "March 2010",
publisher: "National Council of Japan for the Visually Impaired (In Japanese, 全国視覚障害者情報提供施設協会)",
},
},
xref: true,
postProcess: [ ],
};
</script>
<style>
.todo {
background-color: #BBEFFC;
}
</style>
</head>
<body>
<section id="abstract">
<p>This document describes user requirements for text to speech of electronic documents containing ruby.</p>
</section>
<section id="sotd">
<p></p>
</section>
<section id="purpose">
<h2>Purpose</h2>
<p>
This document addresses concerns related to the text-to-speech
functionality in HTML documents and EPUB publications that
contain ruby annotations. While typographical aspects of ruby
are covered by [[?JLREQ]], text-to-speech
issues in this context have not received widespread
recognition. The primary focus of this document is to outline
user requirements.</p>
<p>
In Section 2, we enumerate the various roles of ruby
annotations in relation to their associated ruby
bases. Section 3 provides an overview of potential options for
using ruby bases and/or ruby annotations in text-to-speech,
along with a discussion of the advantages and disadvantages of
each option. Section 4 addresses markup issues related to the
text-to-speech of ruby annotations. Section 5 introduces
alternative mechanisms, such as SSML and PLS. Section 6 delves
into the use of ruby annotations in translating HTML or EPUB
document to braille.</p>
</section>
<section>
<h2>Roles of ruby annotations</h2>
<section>
<h3 id="furigana-background">Furigana, background</h3>
<p>
The primary purpose of ruby annotations is to indicate how
to pronounce CJK ideographic characters, a practice known
as <dfn>Furigana</dfn> (see also <a data-cite="JLREQ#term.furigana">JLReq terminology</a>).
</p>
<p>
In contemporary usage, it is uncommon to attach ruby
annotations to all CJK ideographic characters
in a given document. Instead, it is more common to
attach ruby annotations to only some of the CJK ideographic characters.
</p>
<p>Ruby annotations find their application in various
contexts, including trade books, newspapers, textbooks,
teaching materials, and more, but are rarely utilized in
business documents.</p>
<p>
Even for simple CJK ideographic characters, ruby annotations may be added for some users who have particular
difficulties with CJK ideographic characters
(in electronic documents, it is easy to make ruby annotations visible or invisible based on user preferences).
Such ruby annotations are called as furigana-added-for-enhanced-accessibility.
</p>
<p>
Some simple CJK ideographic characters have more than one possible reading and thus require ruby annotations for disambiguation.
This is common for names of people and places. For example, <span lang="ja">山崎</span> (a person's name) may be read as
YAMAZAKI or YAMASAKI.
</p>
<p>
If ruby annotations are attached to only some of the CJK ideographic characters in a given document, the first occurrence
of a CJK ideographic character or a word composed of such characters may have a ruby annotation, while subsequent occurrences typically do not. This practice
assumes that users will learn the correct pronunciation from the first occurrence.
</p>
</section>
<section>
<h3>Gikun, background</h3>
<p>
Especially in Japan, ruby annotations are
also used to indicate something different from the reading of a CJK ideographic character.
Such ruby annotations are referred to as <dfn>Gikun</dfn>. Gikun is commonly employed in light novels and comics.
</p>
<p>
Here are some examples of Gikun:
</p>
<ul>
<li>
<span lang="ja"><ruby><rb>敵</rb><rt>とも</rt></ruby></span>
(where <span lang="ja">敵</span> means 'enemy' and <span lang="ja">とも</span> means 'friend'). The combination means 'frenemy'.
</li>
<li>
<span lang="ja"><ruby><rb>生命</rb><rt>いのち</rt></ruby></span>
(where the typical reading of <span lang="ja">生命</span> is SEIMEI rather than <span lang="ja">いのち</span> (INOCHI),
both of which mean 'Life')
</li>
<li>
<span lang="ja"><ruby><rb>背景</rb><rt>バック</rt></ruby></span>
(where the typical reading of <span lang="ja">背景</span> is HAIKEI rather than <span lang="ja">バック</span> (back),
an English translation)
</li>
<li>
<span lang="ja"><ruby><rb>牛乳</rb><rt>ミルク</rt></ruby></span>
(where the typical reading of <span lang="ja">牛乳</span> is GYUUNYUU rather than <span lang="ja">ミルク</span> (milk),
an English translation)
</li>
</ul>
<p>
Even when Gikun is used for a compound word, it is unlikely to be repeated for later occurrences of the same word.
Moreover, different [=GIKUN=] may be added for subsequent occurrences of the same word.
For example, the next occurrence of <span lang="ja">生命</span> may well be
<span lang="ja"><ruby><rb>生命</rb><rt>ライフ</rt></ruby></span>
where <span lang="ja">ライフ</span> (life) is an English translation.
</p>
</section>
<section>
<h3>Unusual names of people and places, background</h3>
<p>
Unusual names of people in Japan are typically written
using CJK ideographic characters but are pronounced quite
differently from the standard reading of these
characters. For instance, <span lang="ja"><ruby>
<rb>男</rb><rt>あだむ</rt></ruby></span> is an unusual name,
where <span lang="ja">男</span> (usually read as OTOKO)
means 'man', and <span lang="ja">あだむ</span> represents
'Adam' in Kana.
</p>
<p>
Character names in comics, animations, and light novels can
sometimes be extremely challenging to pronounce. Many of the
character names
in <a href="https://en.wikipedia.org/wiki/Demon_Slayer:_Kimetsu_no_Yaiba">Demon
Slayer (Kimetsu no Yaiba)</a> fall into this category. For
example, almost no one can read
<span lang="ja">不死川 玄弥</span> as SHINAZUGAWA GENNYA without assistance.
</p>
<p>
Names of places can also be difficult to read due to
historical reasons. For instance, <span lang="ja"><ruby>
<rb>神居古潭</rb><rt>かむいこたん</rt></ruby></span>,
<span lang="ja"><ruby><rb>温根沼</rb><rt>おんねとう</rt></ruby></span>,
<span lang="ja"><ruby><rb>音威子府</rb><rt>おといねっぷ</rt></ruby></span>
are names of places in Hokkaido (the northern
island of Japan). These names are challenging to pronounce
because they originated from <a href="https://en.wikipedia.org/wiki/Ainu_people#Language">the Ainu language</a>,
which is
entirely different from the Japanese language.
</p>
<p>
In many instances, the first occurrence of an unusual name is accompanied by a ruby annotation, but subsequent occurrences are not.
</p>
</section>
<section>
<h3>Interlinear notes, background</h3>
<p>
<dfn>Interlinear notes</dfn> resemble ruby annotations in appearance.
A <a data-cite="JLREQ#n224">note in JLreq</a> introduces interlinear notes as follows:
</p>
<aside class="note" title="Quoted note from JLReq" id="n20211101001">
Other than these styles of note, explanations of facts and persons in study aid books and history texts,
and modern translations of Japanese classic texts are sometimes set between lines.
These notes are called interlinear notes (see <a data-cite="JLREQ#fig3_2_8">Figure 241</a>).
</aside>
<p>
In the example shown in
<a data-cite="JLREQ#fig3_2_8">a figure referenced in the quoted note ("An example of a note in inter lines")</a>,
<span lang="ja">徳川慶喜</span> (Tokugawa Yoshinobu) is accompanied by an interlinear note
"1837-1913 <span lang="ja">江戸幕府最後の将軍</span>" (1837-1913 the last shogun of the Edo shogunate).
Other examples are: a modern kana phrase as an interlinear note for a historical kana phrase,
a standard Japanese expression as an interlinear note for an expression in a dialect,
a modern CJK ideographic character as an interlinear note for a traditional CJK ideographic character,
an English text chunk as an interlinear note for a Japanese text chunk,
and an official name as an interlinear note for an abbreviated name.
</p>
<p>
One could argue that HTML ruby elements should not be used for representing interlinear notes
(see <a href="https://lists.w3.org/Archives/Public/public-i18n-japanese/2021AprJun/0051.html">Kobayashi Sensei's mail in Japanese</a>).
However, it is not difficult to imagine that ruby elements are actually used for representing interlinear notes.
</p>
</section>
<section>
<h3>Ruby annotations for indicating the pronunciation of foreign phrases in language textbooks, background</h3>
<p>
In language textbooks, ruby annotations are at times
employed to indicate the pronunciation of foreign phrases
written in hiragana or katakana. For example, a Chinese
phrase <span lang="zh-hans">我去学校</span> may include
<span lang="ja">ウオ チュー シュエシャオ</span> as a ruby annotation.
</p>
</section>
<section>
<h3>Double-sided ruby, background</h3>
<p>
A sequence of characters can be accompanied by two ruby annotations,
typically consisting of [=Furigana=] and either [=GIKUN=] or an [=interlinear note=].
In <a data-cite="JLREQ#fig2_3_12">an example provided in JLreq</a>
("An example of ruby annotations attached to both sides of the base characters"),
<span lang="ja">東南</span> is accompanied by <span lang="ja">たつみ</span> and <span lang="ja">とうなん</span>.
Here <span lang="ja">東南</span> means 'southeast', with <span lang="ja">とうなん</span> (TOUNAN) serving as [=Furigana=],
and <span lang="ja">たつみ</span> (TATSUMI) as [=GIKUN=],
as <span lang="ja">辰巳</span> (read as <span lang="ja">TATSUMI</span>) indicates the same direction as <span lang="ja">東南</span>.
</p>
<p>We offer two additional illustrative examples.</p>
<figure id="f20211101001">
<img src="./img/rdr001.svg" alt="Double-sided ruby example 1" width="65" height="54" />
<figcaption><span lang="ja">東洋</span> features an upper-side ruby annotation <span lang="ja">オリエント</span> and a lower-side ruby annotation <span lang="ja">とうよう</span></figcaption>
</figure>
<p>
In this example, <span lang="ja">とうよう</span> serves as [=Furigana=], while <span lang="ja">オリエント</span> is used as [=Gikun=]
</p>
<figure id="f20211101002">
<img src="./img/rdr002.svg" alt="Double-sided ruby example 2" width="110" height="53" />
<figcaption><span lang="ja">織田信長</span> features an upper-side ruby annotation <span lang="ja">"1534〜82"</span> and a lower-side ruby annotation <span lang="ja">おだのぶなが</span></figcaption>
</figure>
<p>
In this example, <span lang="ja">おだのぶなが</span> serves as [=Furigana=], while <span lang="ja">"1534〜82"</span> is presented as an [=interlinear note=].
</p>
</section>
</section>
<section>
<h2>Which should be read aloud, ruby bases or ruby annotations, or both?</h2>
<p>
There are three possible options: (1) both ruby bases and ruby annotations, (2) ruby annotations only, and (3) ruby bases only.
</p>
<section>
<h3>Reading aloud both ruby bases and ruby annoations</h3>
<p>
In this option, both ruby bases and ruby annotations are read aloud (double reading).
Many implementations (screen readers, in particular) support this option only.
For example, <ruby><rb>foo</rb><rt>bar</rt></ruby> is read aloud as 'foo bar' or 'bar foo'.
</p>
<section>
<h4 id="furigana_both_read_aloud">Furigana, when both read aloud</h4>
<p>
The option of reading aloud both interferes with readers' understanding significantly.
</p>
<section>
<h5>Examples of harmful double reading: Japanese</h5>
<p lang="ja">彼の名前は<ruby><rb>出羽内</rb><rt>でわない</rt></ruby>です。</p>
<p>This sentence is intended to mean "His name is
Dewanai". Double reading completely changes the meaning:
it will be interpreted as "His name is NOT Dewanai".</p>
<p lang="ja">それでは<ruby><rb>話</rb><rt>はなし</rt></ruby>にならない。</p>
<p>This sentence is intened to mean "Nonsense!". Double
reading completely changes the meaning: it will be
interpreted as "You have to deal with it".</p>
</section>
<section>
<h5>Examples of harmful double reading: English</h5>
<p>Consider this English sentence having a ruby annotation: "My name is <ruby><rb>Knot</rb><rt>not</rt></ruby>".</p>
<p>Double reading completely changes the meaning: it will be interpreted as "My name is not Knot".</p>
<p>Another example: "There is a road in Autin spelled
both <ruby><rb>Manchaca</rb><rt>Man-Chack</rt></ruby>
and <ruby><rb>Menchaca</rb><rt>Man-Chack</rt></ruby>".</p>
<p>Double reading makes the road name read aloud twice,
possibly differently.</p>
<p>Yet another example: "<ruby><rb>Oxoerythromycin</rb><rt>oxo-eur-ithro-mycin</rt></ruby>
is a ketone derived from erythromycin".</p>
<p>Double reading makes this compound name read aloud twice,
possibly differently.</p>
</section>
</section>
<section>
<h4 id="gikun_both_read_aloud">Gikun, when both read aloud</h4>
<p>
The option of reading aloud both is sensible. It is common to read aloud ruby annotations first then ruby bases next, but it is sometimes better to read aloud ruby bases first and ruby annotations next [[?Transliteration Training Course]]).
</p>
<p>
<span lang="ja"><ruby><rb>敵</rb><rt>とも</rt></ruby></span> is read aloud as TEKI TOMO or TOMO TEKI, which means 'enemy friend' or 'friend enemy' (equal to 'frenemy').</p>
<p>
<span lang="ja"><ruby><rb>生命</rb><rt>いのち</rt></ruby></span> is read aloud as SEIMEI INOCHI or INOCHI SEIMEI,
where SEIMEI is a loan word from Chinese and INOCHI is a native Japanese word. Both means life.
</p>
</section>
<section>
<h4 id="unusual_names_both_read_aloud">Unusual names of people and places, when both read aloud</h4>
<p>
The option of reading aloud both interferes with readers' understanding significantly.
</p>
<p>
<span lang="ja"><ruby><rb>不死川玄弥</rb><rt>しなずがわげんや</rt></ruby></span>
is read aloud as FUSHIKAWA GENYA SHINAZUGAWA GENYA or SHINAZUGAWA GENYA FUSHIKAWA GENYA, which suggests two persons rather than one person.
</p>
</section>
<section>
<h4 id="interlinear_note_both_read_aloud">Interlinear notes, when both read aloud</h4>
<p>
The option of reading aloud both is sensible. It is necessary to read aloud ruby bases first then ruby annotations next.
</p>
<p>
For example, <span lang="ja"><ruby><rb>徳川慶喜</rb><rt>1837-1913 江戸幕府最後の将軍</rt></ruby></span>
is read aloud as TOKUGAWA YOSHINOBU 1837-1913 EDO BAKUFU SAIGONO SHOUGUN,
which means 'Tokugawa Yoshinobu 1837-1913, the last shogun of the Edo shogunate'.
</p>
</section>
<section>
<h4>Ruby annotations for indicating the pronunciation of foreign phrases in language books, when both read aloud</h4>
<p>
The option of reading aloud both interferes with readers' understanding significantly.
</p>
<p>
In the example of <span lang="zh-hans">我去学校</span>,
even if <span lang="ja">ウオ チュー シュエシャオ</span> is read aloud using the Japanese text-to-speech engine,
the result will not be helpful to learners because of the incorrect pronunciation and four tones.
Katakana pronunciation is also useless in languages such as English.
</p>
</section>
<section>
<h4>Double-sided ruby, when both read aloud</h4>
<p>
Since there are two ruby annotations, double-sided ruby leads to reading aloud three times.
One of the ruby annotations is typically furigana, so the description in <a href="#furigana_both_read_aloud"><span class="secno">3.1.1</span></a> applies.
If the other ruby annotation is a Gikun, the description in <a href="#gikun_both_read_aloud"><span class="secno">3.1.2</span></a> applies;
if it is an interlinear note, the description in <a href="#interlinear_note_both_read_aloud"><span class="secno">3.1.4</span></a> applies.
</p>
</section>
</section>
<section>
<h3>Reading aloud ruby annotations only</h3>
<p>
In this option, ruby annotations are read aloud but ruby bases
are not. For example, <ruby><rb>foo</rb><rt>bar</rt></ruby> is
read aloud as 'bar'.
</p>
<section>
<h4 id="furigana_annotation_read_aloud">Furigana, when ruby annotations read aloud</h4>
<p>
Even native Japanese speakers may easily assume, without
thorough consideration, that the option of reading only ruby
annotations aloud will provide reasonable results. However,
this is not always the case.
</p>
<section>
<h5>Incorrect pitch accent</h5>
<p>Each hiragarana character represents a mora (a basic timing
unit in phonology), which is typically a single vowel or a
consonant followed by a single vowel. The same sequence of
moras may mean different words depending on the pitch accent.
For example, both <span lang="ja">雨</span> (rain) and
<span lang="ja">飴</span> (candy) consists of the same moras:
<span lang="ja">あ</span> and <span lang="ja">め</span>.
However, if the Tokyo accent is used as a basis, the first mora in <span lang="ja">雨</span> has a
high pitch, and the second has a low pitch;
<span lang="ja">飴</span> has the opposite pitch accent.</p>
<p>Reading aloud ruby annotations rather than ruby bases often
leads to incorrect pitch accent. As an example, consider
<span lang-"ja"><ruby><rb>雨</rb><rt>あめ</rt></ruby>が好き</span>
(I like rain) and <span lang-"ja"><ruby><rb>飴</rb><rt>あめ</rt></ruby>が好き</span>
(I like candy). In both cases, reading aloud ruby annotations
rather than ruby bases implies that the TTS engine will receive
<span lang-"ja">あめが好き</span> and create the same result.</p>
<p>A similar example is <span lang-"ja"><ruby><rb>牡蠣</rb><rt>かき</rt></ruby>を食べる</span>
(I eat oysters) and <span lang-"ja"><ruby><rb>柿</rb><rt>かき</rt></ruby>を食べる</span>
(I eat persimmons), where <span lang="ja">牡蠣</span> and
<span lang="ja">柿</span> have the same two moras but opposite
pitch accents.</p>
</section>
<section>
<h5>Incorrectly pronouncing non-particle は or へ as particles</h5>
<p>In modern Japanese, there is basically only one way to read
each hiragana character. But <span lang="ja">は</span> and
<span lang="ja">へ</span> are exceptions. <span lang="ja">は</span>
is usually read aloud as /ha/ but is read aloud as /wa/ when
it is used as a particle. Likewise, <span lang="ja">へ</span>
is usually read aloud as /he/ but is read aloud as /e/ when
it is used as a particle.</p>
<p>Reading aloud ruby annotations rather than ruby bases implies
that CJK ideographic characters in ruby bases will not be
passed to the TTS engine, only hiragana characters in ruby
annotations will be.</p>
<p>Without CJK ideographic characters, Japanese morphological
analysis is likely to fail. For example, <span lang="ja">やがてはいしになる</span> may be
misinterepreted as <span lang="ja">やがては いしに なる</span> ("I will eventually become a doctor") rather
than <span lang="ja">やがて はいしに なる</span> ("It will be abolished eventually").
Occurrences of
<span lang="ja">は</span> or <span lang="ja">へ</span> as
non-particles in ruby annotations may well be mistakenly
interpreted as particles. Consequently, such occurrences
of <span lang="ja">は</span> and <span lang="ja">へ</span>
may well be mistakenly read aloud as /wa/ and /e/, respectively.</p>
<p>For example, consider
<span lang="ja">やがて<ruby><rb>廃止</rb><rt>はいし</rt></ruby>になる</span>.
This sentence means "It will be abolished eventually". But
if <span lang="ja">やがてはいしになる</span> is passed to the
TTS engine, <span lang="ja">は</span> may well be mistakenly
read aloud as /wa/ rather than /ha/. The result means
"I will eventually become a doctor".
</p>
<p>Here are some similar examples. All occurrences of
<span lang="ja">は</span> and <span lang="ja">へ</span>
in ruby annotations are likely to be mistakenly read aloud.</p>
<ul>
<li><span lang="ja">人員<ruby><rb>配置</rb><rt>はいち</rt></ruby></span> </li>
<li><span lang="ja">自然<ruby><rb>破壊</rb><rt>はかい</rt></ruby></span> </li>
<li><span lang="ja">社会<ruby><rb>波紋</rb><rt>はもん</rt></ruby></span> </li>
<li><span lang="ja">天皇<ruby><rb>陛下</rb><rt>へいか</rt></ruby></span></li>
<li><span lang="ja">大学<ruby><rb>併願</rb><rt>へいがん</ruby></span></li>
<li><span lang="ja">学級<ruby><rb>閉鎖</rb><rt>へいさ</rt></ruby></span></li>
</ul>
</section>
<section>
<h5>Inconsistency between the first and subsequent occurrences</h5>
<p>
As described in <a href="#furigana-background"><span class="secno">2.1</span></a>,
furigana as a ruby annotation may be attached to only the
first occurrence of a CJK ideographic character or a word
composed from such characters. Thus, there is a risk that
the first occurrence and the others are read aloud differently.
For example, consider <span lang="ja">智子</span> as the name
of a character in the novel. Tthere are several possible readings of this name, such as
<span lang="ja">さとこ</span> and
<span lang="ja">ともこ</span>. If <span lang="ja">さとこ</span>
as a ruby annotation is attached only to the first occurrence
of the name, it will be read as <span lang="ja">さとこ</span>
and the other occurrences may be read as <span lang="ja">ともこ</span>.
The reader would then think that <span lang="ja">さとこ</span>
and <span lang="ja">ともこ</span> are different characters.</p>
<aside class="note" title="" id="n20211101002">
One approach to avoid this problem is to create a table of
ruby base-annotation pairs. When a CJK ideographic character
or a word composed of such characters is encountered, this
table allows the TTS engine to receive the ruby annotation
for not only the first occurrence but also for the subsequent
occurrences.
</aside>
</section>
</section>
<section>
<h4 id="gikun_annotation_read_aloud">Gikun, when ruby annotations read aloud</h4>
<p>
The option of reading aloud ruby annotations only provides an understandable result but does not properly convey the author's intention.
</p>
<p>
<span lang="ja"><ruby><rb>敵</rb><rt>とも</rt></ruby></span> is read aloud as TOMO, which means 'friend', but 'frenemy' is intended.</p>
<p>
<span lang="ja"><ruby><rb>生命</rb><rt>いのち</rt></ruby></span> will be read aloud as INOCHI(<span lang="ja">いのち</span>).
</p>
</section>
<section>
<h4>Unusual names of people and places, when ruby annotations read aloud</h4>
<p>
The option of reading aloud ruby annotations only works correctly.
However, if the first occurrence of a name is accompanied by a ruby annotation and the other occurrences are not,
the first occurrence is read aloud differently from the others thus suggesting different persons or places.
</p>
<p>
For example, <span lang="ja"><ruby><rb>不死川玄弥</rb><rt>しなずがわげんや</rt></ruby></span>
is read aloud as SHINAZUGAWA GENYA correctly.
But later occurrences of <span lang="ja">不死川玄弥</span> are read aloud as FUSHIKAWA GENYA if they do not have ruby annotations.
</p>
<aside class="note" title="" id="n20211101003">
The workaround as described in the note in <a href="#furigana_annotation_read_aloud"><span class="secno">3.2.1</span></a> is available.
</aside>
</section>
<section>
<h4 id="interlinear_note_annotation_read_aloud">Interlinear notes, when ruby annotations read aloud</h4>
<p>
The option of reading aloud ruby annotations only provides incomprehensible results often.
</p>
<p>
If <span lang="ja">"1837-1913 江戸幕府最後の将軍"</span> is attached to <span lang="ja">徳川慶喜</span> as a ruby annotation,
it will be read aloud as 1837-1913 EDOBAKUFU SAIGO NO SHOGUN
(1837-1913 the last shogun of the Edo shogunate), which is reasonable.
But if only "1837-1913" is attached as a ruby annotation, the result is 1837-1913, which does not make any sense.
</p>
</section>
<section>
<h4>Ruby annotations for indicating the pronunciation of foreign phrases in language books, when ruby annotations read aloud</h4>
<p>
The option of reading aloud ruby annotations only interferes with readers' understanding significantly.
</p>
<p>
In the example of <span lang="zh-hans">我去学校</span> (<span lang="ja">ウオ チュー シュエシャオ</span>),
even if <span lang="ja">ウオ チュー シュエシャオ</span> is read out in the Japanese style,
it will not be helpful to learners because of the inaccurate pronunciation and the four tones (tones).
Katakana pronunciation is also useless in languages such as English.
</p>
</section>
<section>
<h4>Double-sided ruby, when ruby annotations read aloud</h4>
<p>
The option of reading aloud ruby annotations only makes two ruby annotations be read aloud while ignoring their ruby base.
Since one of the two ruby annotations is typically furigana, the description in <a href="#furigana_annotation_read_aloud"><span class="secno">3.2.1</span></a> applies.
If the other ruby annotation is a Gikun, the description in <a href="#gikun_annotation_read_aloud"><span class="secno">3.2.2</span></a> applies;
if it is an interlinear note, the description in <a href="#interlinear_note_annotation_read_aloud"><span class="secno">3.2.4</span></a> applies.
</p>
</section>
</section>
<section>
<h3>Reading aloud ruby bases only</h3>
<p>
In this option, ruby bases are read aloud but ruby annotations are not.
For example, <ruby><rb>foo</rb><rt>bar</rt></ruby> is read aloud as foo.
</p>
<aside class="note" title="" id="n20211101004">
This option does not necessarily ignore ruby annotations.
Although text-to-speech engines mainly use ruby bases, they may also use ruby annotations as a hint.
</aside>
<section>
<h4 id="furigana_base_read_aloud">Furigana, when bases read aloud</h4>
<p>
The option of reading aloud ruby bases only may or may not provide good results, depending on text-to-speech engines.
</p>
<p>
The following is a quote from [[?ACCESSIBLE_E_BOOKS]].
</p>
<aside class="note" title="" id="n20211101005">
Many TTS engines support characters in JIS X 0208:1997 but do not typically support characters beyond it.
Thus, more than a half of the JIS CJK ideographic characters cannot be read aloud.
</aside>
<p>
Furthermore, compound words made up from CJK ideographic characters in JIS X 0208 are sometimes read aloud incorrectly.
</p>
<p>
As the importance of accessibility is well recognized and text-to-speech engines are improved,
more and more words will be read aloud correctly.
However, there are some words, such as the aforementioned YAMAZAKI,
that cannot be read aloud correctly by text-to-speech engines and even native Japanese speakers.
</p>
</section>
<section>
<h4 id="gikun_base_read_aloud">Gikun, when bases read aloud</h4>
<p>
The option of reading aloud ruby bases only results in a perfectly understandable result.
However, since gikun is ignored, the author's intent is not completely conveyed.
</p>
<p>
<span lang="ja"><ruby><rb>敵</rb><rt>とも</rt></ruby></span> is read aloud as TEKI, which means 'enemy', but 'frenemy' is intended.</p>
<p>
<span lang="ja"><ruby><rb>生命</rb><rt>いのち</rt></ruby></span> is read out as SEIMEI.
</p>
</section>
<section>
<h4>Unusual names of people and places, when bases read aloud</h4>
<p>
The option of reading ruby bases only leads to incorrect results.
However, since every occurrence of a name is read aloud in the same way, users will not be confused.
</p>
<p>
Every occurrence <span lang="ja"><ruby><rb>不死川 玄弥</rb><rt>しなずがわ げんや</rt></ruby></span>
will always be incorrectly read aloud as FUSHIKAWA GENYA, regardless of the presence or absence of ruby annotations.
</p>
</section>
<section>
<h4 id="interlinear_note_base_read_aloud">Interlinear notes, when bases read aloud</h4>
<p>
The option of reading ruby bases only provides a perfectly understandable result.
However, since interline notes are ignored, the author's intention is not conveyed well.
</p>
<p>
<span lang="ja"><ruby><rb>徳川慶喜</rb><rt>1837-1913 江戸幕府最後の将軍</rt></ruby></span>
(Tokugawa Yoshinobu 1837-1913, the last shogun of the Edo shogunate),
will be read aloud as <span lang="ja">とくがわよしのぶ</span> (Tokugawa Yoshinobu).
</p>
</section>
<section>
<h4>Ruby annotations for indicating the pronunciation of foreign phrases in language books, when bases read aloud</h4>
<p>
The option of reading ruby bases only is most appropriate when natural languages are correctly identified
and ruby bases are read aloud by a text-to-speech engine for that language.
On the other hand, if the natural language cannot be identified or the text-to-speech engine for that language is not available,
the result is not understandable.
</p>
</section>
<section>
<h4>Double-sided ruby, when bases read aloud</h4>
<p>
The option of reading ruby bases only will ignore the two ruby annotations and read their ruby base only.
When one of the two ruby annotations is furigana, the description in <a href="#furigana_base_read_aloud"><span class="secno">3.3.1</span></a> applies.
If the other is a gikun, the description in <a href="#gikun_base_read_aloud"><span class="secno">3.3.2</span></a> applies, and if it is an interlinear note, the description in <a href="#interlinear_note_base_read_aloud"><span class="secno">3.3.4</span></a> applies.
</p>
</section>
</section>
</section>
<section>
<h2>Miscellaneous issues around ruby markup</h2>
<section>
<h3>Conversion from small kana characters to full-size kana characters</h3>
<p>
Small kana characters <span lang="ja">ゃ</span>, <span lang="ja">ゅ</span>, <span lang="ja">ょ</span>, and
<span lang="ja">っ</span> are too small when they appear in ruby annotations.
For this reason, instead of these small characters, full-size kana characters <span lang="ja">や</span>,
<span lang="ja">ゆ</span>, <span lang="ja">よ</span>, and <span lang="ja">つ</span> are used in ruby annotations.
</p>
<p>
However, since full-size kana characters are pronounced differently from small kana,
ruby annotations containing full-size kana are read aloud differently.
</p>
<p>
CSS has a mechanism for overcoming this problem.
Value '<a data-cite="css-text-3" data-xref-type="css-value" data-xref-for="text-transform">full-size-kana</a>' of
the <a data-cite="css-text-3" data-xref-type="css-property">text-transform</a> property as specified in CSS Text converts
small kana characters to full-size kana.
It is thus possible to use small kana in ruby annotations while rendering them using full-size kana.
Text-to-speech engines can provide correct results even when ruby annotations are read aloud.
</p>
</section>
<section>
<h3>A single ruby element or multiple ruby elements per one compound word</h3>
<p>When attaching a ruby annotation to a compound word consisting of multiple CJK ideographic characters in an HTML or EPUB document, one way is to create a single HTML <code>ruby</code> element for the entire word. However, in some cases, a separate <code>ruby</code> element is created for each CJK ideographic character.
For example, to attach the ruby annotation <span lang="ja">せいめい</span> to the word <span lang="ja">生命</span> (meaning “life” in Japanese), the typical approach is to create a single <code>ruby</code> element for this word. This <code>ruby</code> element may have a single <code>rt</code> element for “せいめい” or two <code>rt</code> elements (one for “せい” and another for “めい”). However, it is not entirely uncommon to see two <code>ruby</code> elements for this word: one for “生” and another for “命”.</p>
<p>Some people argue that
creating a ruby element per compound word is better than creating a ruby element for each character in a compound word. They argue that
it becomes easier for the text-to-speech engine to maintain a correspondence table between ruby bases and ruby annotations so that subsequent occurrences of the compound word without ruby can be pronounced correctly.
</p>
<p>Meanwhile, others argue that there is a good reason to attach ruby
annotations to some, but not all, characters in a compound word.
For example, consider <span lang="ja">佳人</span>, where
<span lang="ja">佳</span> is taught in junior high schools while
<span lang="ja">人</span> is taught in the first grade of elementary
schools. Therefore, it makse sense to attach a ruby annotation to
<span lang="ja">佳</span> only (one <code>ruby</code> element for
<span lang="ja">佳</span> and no <code>ruby</code> element for
<span lang="ja">人</span>). Similarly, it is reasonable to attach ruby
annotations to the first and third CJK idegraphic characters in
<span lang="ja">屯田兵</span> only but not to the second one (thus,
two <code>ruby</code> elements).</p>
</section>
<section>
<h3>Markup for indicating furigana-added-for-enhanced-accessibility</h3>
<p>
Although furigana-added-for-enhanced-accessibility is necessary for those readers who have particular
difficulties with CJK ideographic characters, it is unnecessary or slightly disturbing for others.
If furigana-added-for-enhanced-accessibility is distinguishable from normal furigana,
it can be made visible or invisible depending on user preferences.
It is thus necessary to standardize a markup mechanism for indicating furigana-added-for-enhanced-accessibility.
</p>
</section>
<section>
<h3>Markup for indicating ruby annotations used as gikun or interlinear note</h3>
<p>
In Section 3, we have seen that ruby annotations used as gikun or interline notes should be read aloud differently from the other cases.
It is thus necessary to standardize a markup mechanism for clearly indicating ruby annotations used as gikun or interlinear note.
</p>
</section>
</section>
<section>
<h2>Alternatives to ruby</h2>
<p>[[?SSML]] and [[?PRONUNCIATION-LEXICON]] offer alternatives for
conveying phonemic and phonetic pronunciations of CJK ideographic
characters to speech synthesis engines. These methods are not intended for visual
presentations but can offer superior control over text-to-speech compared to using ruby annotations.</p>
<section>
<h3>SSML</h3>
<p>
[[?SSML]] employs symbol collections (such as IPA and
[[?JEITA_IT-4006]]) to represent the sounds of human
languages. Phonemic and phonetic pronunciations are conveyed
through sequences of these symbols.
</p>
<p>
[[?epub-32]] allows the use of SSML attributes within
<a data-cite="epub-33#dfn-xhtml-content-document">XHTML
content documents</a> in EPUB publications. In [[?epub-33]],
these attributes are relocated to
[[?epub-tts-10]]. Meanwhile, the W3C Accessible Platform
Architectures Working Group is developing [[?spoken-html]],
which outlines two potential methods for incorporating SSML
attributes into HTML elements.</p>
<p>
In Japan, SSML finds extensive application in digital
textbooks, adopted by the biggest textbook publisher in
Japan. However, it has been noted that attaching SSML
attributes to CJK ideographic characters significantly
raises authoring costs. In the case of DAISY textbooks in
Japan, SSML is not used, as they contain recorded voice.
Trade books in Japan do not typically employ SSML
either.</p>
</section>
<section>
<h3>PLS</h3>
<p>PLS ([[PRONUNCIATION-LEXICON]]) enables the use of pronunciation lexicons, which map words to
sequences of symbol collections such as those found in IPA or
[[?JEITA_IT-4006]].
</p>
<p>
While SSML attributes are embedded within <a data-cite="epub-33#dfn-xhtml-content-document">XHTML content
documents</a> in EPUB publications, PLS lexicons
in EPUB publications are stored
externally to and referenced by <a data-cite="epub-33#dfn-xhtml-content-document">XHTML content documents</a>
(see <a data-cite="epub-tts-10#pls">Pronunciation Lexicons
section</a> in [[?epub-tts-10]]). As of the present,
[[spoken-html]] does not offer a mechanism for associationg
PLS lexicons with HTML documents.
</p>
<p>
PLS is a robust tool for rendering unusual names of people
and places in text-to-speech applications. In particular, PLS allows
every occurrence of a word or phrase to be consistently pronounced,
regardless of the presence of ruby annotations. At the time of this writing, PLS is
used by at least one digital textbook publisher in Japan.
</p>
</section>
</section>
<section>
<h2>Use of ruby for automatic braille translation</h2>
<p>
The conversion of HTML documents and EPUB publications to
braille is expected to become increasingly important in the
near future.
</p>
<p>
Japanese braille lacks CJK ideographic characters and does not
distinguish between hiragana and katakana. (Note: Han braille
in Japan does include CJK ideographic characters, but it is
not widely used.)
</p>
<p>
Braille exhibits some syntactical differences from the
Japanese writing system. First, space characters are inserted
as delimiters between words. Second, two Japanese particles,
<span lang="ja">は</span> and <span lang="ja">へ</span>, are transcribed as they are pronounced, meaning
<span lang="ja">は</span> and <span lang="ja">へ</span> are represented as if they were
<span lang="ja">わ</span> and <span lang="ja">え</span>,
respectively. Third, <span lang="ja">う</span> pronounced as an elongated sound is
represented using the long vowel character. For example,
to tranlsate <span lang="ja">たいよう</span> to braille,
<span lang="ja">たいよう</span> is first converted to <span lang="ja">たいよー</span> and then translated to braille.
</p>
<p>
Natural language processing is required to handle these
differences during the conversion to braille. However, unlike
the case of text-to-speech, intonation is not relevant.
</p>
<p>
When converting HTML or EPUB content to braille, it is
essential to select the correct reading for each CJK
ideographic character. Choosing an incorrect reading can
result in erroneous braille output. Similar to text-to-speech,
ruby annotations provide valuable hints, while [[?SSML]] and PLS
([[?PRONUNCIATION-LEXICON]]) serve as effective alternatives.
</p>
<p>
For furigana and the transcription of unusual names of people
and places, natural language processing is more effective when
using ruby bases (typically containing CJK ideographic
characters) as the foundation. In contrast, the correct
readings are chosen when using ruby annotations as the
basis. It is also possible to combine both ruby bases and ruby
annotations.
</p>
</section>
</body>
</html>