index.html

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title>Text to Speech of Electronic Documents Containing Ruby: User Requirements</title>
    <script
      defer
      class="remove"
      src="https://www.w3.org/Tools/respec/respec-w3c"
    ></script>
    <script class="remove">
      var respecConfig = {
        shortName: "ruby-t2s-req",
        specStatus: "ED", // "WG-NOTE"
        noRecTrack: true,
        edDraftURI: "https://w3c.github.io/ruby-t2s-req/",
        editors: [
          { name: "MURATA Makoto [FAMILLY Given]", company: "DAISY Consortium", w3cid: "32937", },
        ],
        group: "i18n",
        github: {
          repoURL: "w3c/ruby-t2s-req",
          branch: "gh-pages",
        },
        localBiblio: {
          JISX4051: {
            title: "Formatting rules for Japanese documents (&#12302;&#26085;&#26412;&#35486;&#25991;&#26360;&#12398;&#32068;&#29256;&#26041;&#27861;&#12303;; JIS X 4051)",
            publisher: "Japanese Standards Association",
            date: "2004",
            id: "JIS X 4051:2004",
          },
          ACCESSIBLE_E_BOOKS: {
            title: "Guidelines for creating accessible e-books for text-to-speech",
            publisher: "the Ministry of Internal Affairs and Communications",
            date: "2015",
            href: "https://web.archive.org/web/20220118065321/https://www.soumu.go.jp/main_content/000354698.pdf",
          },
          'epub-32': {
            title: "EPUB 3.2",
            href: "https://www.w3.org/publishing/epub3/epub-spec.html",
            status: "W3C Final Community Group Specification",
            publisher: "W3C",
            date: "08 May 2019",
          },
          'epub-32-Packages': {
            title: "EPUB Packages 3.2",
            href: "https://www.w3.org/publishing/epub3/epub-packages.html",
            status: "W3C Final Community Group Specification",
            publisher: "W3C",
            date: "08 May 2019",
          },
          'epub-32-ContentDocs': {
            title: "EPUB Content Documents 3.2",
            href: "https://www.w3.org/publishing/epub3/epub-contentdocs.html",
            status: "W3C Final Community Group Specification",
            publisher: "W3C",
            date: "08 May 2019",
          },
          'JEITA_IT-4002': {
            title: "Symbols for Japanese Text-to-Speech Synthesizer",
            id:  "JEITA IT-4002",
            date: "May 2005",
            publisher: "Japan Electronics and Information Technology Industries Association",
          },
          'JEITA_IT-4006': {
            title: "Symbols for Japanese Text-to-Speech Synthesizer",
            id:  "JEITA IT-4006",
            date: "March 2010",
            publisher: "Japan Electronics and Information Technology Industries Association",
          },
          'Transliteration Training Course': {
              title: "Textbook for the Volunteer Transliteration Training Course: Basics (In Japanese, 音訳ボランティア養成講習会テキスト　基礎課程編)",
	      href: "https://naiiv-books.net/shopdetail/000000000023/",
            date: "March 2010",
            publisher: "National Council of Japan for the Visually Impaired (In Japanese, 全国視覚障害者情報提供施設協会)",
          },
        },
        xref: true,
        postProcess: [  ],
      };
    </script>
    <style>
      .todo {
        background-color: #BBEFFC;
      }
    </style>
  </head>
  <body>
    <section id="abstract">
      <p>This document describes user requirements for text to speech of electronic documents containing ruby.</p>
    </section>
    <section id="sotd">
      <p></p>
    </section>
    <section id="purpose">
      <h2>Purpose</h2>
      <p>
	This document addresses concerns related to the text-to-speech
	functionality in HTML documents and EPUB publications that
	contain ruby annotations. While typographical aspects of ruby
	are covered by [[?JLREQ]], text-to-speech
	issues in this context have not received widespread
	recognition. The primary focus of this document is to outline
	user requirements.</p>
      <p>
	In Section 2, we enumerate the various roles of ruby
	annotations in relation to their associated ruby
	bases. Section 3 provides an overview of potential options for
	using ruby bases and/or ruby annotations in text-to-speech,
	along with a discussion of the advantages and disadvantages of
	each option. Section 4 addresses markup issues related to the
	text-to-speech of ruby annotations. Section 5 introduces
	alternative mechanisms, such as SSML and PLS. Section 6 delves
	into the use of ruby annotations in translating HTML or EPUB
	document to braille.</p>
    </section>

    <section>
      <h2>Roles of ruby annotations</h2>

      <section>
        <h3 id="furigana-background">Furigana, background</h3>
        <p>
	  The primary purpose of ruby annotations is to indicate how
	  to pronounce CJK ideographic characters, a practice known
	  as <dfn>Furigana</dfn> (see also <a data-cite="JLREQ#term.furigana">JLReq terminology</a>).
        </p>
        <p>
	  In contemporary usage, it is uncommon to attach ruby
	  annotations to all CJK ideographic characters
	  in a given document. Instead, it is more common to
	  attach ruby annotations to only some of the CJK ideographic characters.
        </p>
        <p>Ruby annotations find their application in various
        contexts, including trade books, newspapers, textbooks,
        teaching materials, and more, but are rarely utilized in
        business documents.</p>
        <p>
          Even for simple CJK ideographic characters, ruby annotations may be added for some users who have particular 
          difficulties with CJK ideographic characters 
          (in electronic documents, it is easy to make ruby annotations visible or invisible based on user preferences). 
          Such ruby annotations are called as furigana-added-for-enhanced-accessibility.
        </p>
        <p>
          Some simple CJK ideographic characters have more than one possible reading and thus require ruby annotations for disambiguation.  
          This is common for names of people and places. For example, <span lang="ja">山崎</span> (a person's name) may be read as 
          YAMAZAKI or YAMASAKI. 
        </p>
        <p>
	  If ruby annotations are attached to only some of the CJK ideographic characters in a given document, the first occurrence
	  of a CJK ideographic character or a word composed of such characters may have a ruby annotation, while subsequent occurrences typically do not. This practice
	  assumes that users will learn the correct pronunciation from the first occurrence.
        </p>
      </section>

      <section>
        <h3>Gikun, background</h3>
        <p>
          Especially in Japan, ruby annotations are
	  also used to indicate something different from the reading of a CJK ideographic character.
          Such ruby annotations are referred to as <dfn>Gikun</dfn>.  Gikun is commonly employed in light novels and comics. 
        </p>
        <p>
          Here are some examples of Gikun:
        </p>
        <ul>
          <li>
            <span lang="ja"><ruby><rb>敵</rb><rt>とも</rt></ruby></span> 
            (where <span lang="ja">敵</span> means 'enemy' and <span lang="ja">とも</span> means 'friend').  The combination means 'frenemy'.
          </li>
	  <li>
            <span lang="ja"><ruby><rb>生命</rb><rt>いのち</rt></ruby></span> 
            (where the typical reading of <span lang="ja">生命</span> is SEIMEI rather than <span lang="ja">いのち</span> (INOCHI), 
            both of which mean 'Life')
          </li>
          <li>
            <span lang="ja"><ruby><rb>背景</rb><rt>バック</rt></ruby></span> 
            (where the typical reading of <span lang="ja">背景</span> is HAIKEI rather than <span lang="ja">バック</span> (back), 
            an English translation)
          </li>
          <li>
            <span lang="ja"><ruby><rb>牛乳</rb><rt>ミルク</rt></ruby></span> 
            (where the typical reading of <span lang="ja">牛乳</span> is GYUUNYUU rather than <span lang="ja">ミルク</span> (milk), 
            an English translation)
          </li>
        </ul>
        <p>
          Even when Gikun is used for a compound word, it is unlikely to be repeated for later occurrences of the same word. 
          Moreover, different [=GIKUN=] may be added for subsequent occurrences of the same word. 
          For example, the next occurrence of <span lang="ja">生命</span> may well be 
          <span lang="ja"><ruby><rb>生命</rb><rt>ライフ</rt></ruby></span>
          where <span lang="ja">ライフ</span> (life) is an English translation.
        </p>
      </section>

      <section>
        <h3>Unusual names of people and places, background</h3>
	<p>
	  Unusual names of people in Japan are typically written
	  using CJK ideographic characters but are pronounced quite
	  differently from the standard reading of these
	  characters. For instance, <span lang="ja"><ruby>
	      <rb>男</rb><rt>あだむ</rt></ruby></span> is an unusual name,
	  where <span lang="ja">男</span> (usually read as OTOKO)
	  means 'man', and <span lang="ja">あだむ</span> represents
	  'Adam' in Kana.
	</p>

	<p>
	  Character names in comics, animations, and light novels can
	  sometimes be extremely challenging to pronounce. Many of the
	  character names
	  in <a href="https://en.wikipedia.org/wiki/Demon_Slayer:_Kimetsu_no_Yaiba">Demon
	  Slayer (Kimetsu no Yaiba)</a> fall into this category. For
	  example, almost no one can read
	  <span lang="ja">不死川 玄弥</span> as SHINAZUGAWA GENNYA without assistance.
	</p>

	<p>
	  Names of places can also be difficult to read due to
	  historical reasons. For instance, <span lang="ja"><ruby>
	      <rb>神居古潭</rb><rt>かむいこたん</rt></ruby></span>,
          <span lang="ja"><ruby><rb>温根沼</rb><rt>おんねとう</rt></ruby></span>, 
          <span lang="ja"><ruby><rb>音威子府</rb><rt>おといねっぷ</rt></ruby></span>
	  are names of places in Hokkaido (the northern
	  island of Japan). These names are challenging to pronounce
	  because they originated from <a href="https://en.wikipedia.org/wiki/Ainu_people#Language">the Ainu language</a>,
	  which is
	  entirely different from the Japanese language.
	</p>
	<p>	  
	In many instances, the first occurrence of an unusual name is accompanied by a ruby annotation, but subsequent occurrences are not.
	</p>

      </section>

      <section>
        <h3>Interlinear notes, background</h3>
        <p>
          <dfn>Interlinear notes</dfn> resemble ruby annotations in appearance.  
          A <a data-cite="JLREQ#n224">note in JLreq</a> introduces interlinear notes as follows:
        </p>
        <aside class="note" title="Quoted note from JLReq" id="n20211101001">
          Other than these styles of note, explanations of facts and persons in study aid books and history texts, 
          and modern translations of Japanese classic texts are sometimes set between lines. 
          These notes are called interlinear notes (see <a data-cite="JLREQ#fig3_2_8">Figure 241</a>).
        </aside>
        <p>
          In the example shown in 
          <a data-cite="JLREQ#fig3_2_8">a figure referenced in the quoted note ("An example of a note in inter lines")</a>, 
          <span lang="ja">徳川慶喜</span> (Tokugawa Yoshinobu) is accompanied by an interlinear note 
          "1837-1913 <span lang="ja">江戸幕府最後の将軍</span>" (1837-1913 the last shogun of the Edo shogunate). 
          Other examples are: a modern kana phrase as an interlinear note for a historical kana phrase, 
          a standard Japanese expression as an interlinear note for an expression in a dialect, 
          a modern CJK ideographic character as an interlinear note for a traditional CJK ideographic character, 
          an English text chunk as an interlinear note for a Japanese text chunk, 
          and an official name as an interlinear note for an abbreviated name.
        </p>
        <p>
          One could argue that HTML ruby elements should not be used for representing interlinear notes 
          (see <a href="https://lists.w3.org/Archives/Public/public-i18n-japanese/2021AprJun/0051.html">Kobayashi Sensei's mail in Japanese</a>). 
          However, it is not difficult to imagine that ruby elements are actually used for representing interlinear notes.
        </p>
      </section>

      <section>
        <h3>Ruby annotations for indicating the pronunciation of foreign phrases in language textbooks, background</h3>
        <p>
	  In language textbooks, ruby annotations are at times
	  employed to indicate the pronunciation of foreign phrases
	  written in hiragana or katakana. For example, a Chinese
	  phrase <span lang="zh-hans">我去学校</span> may include
	  <span lang="ja">ウオ チュー シュエシャオ</span> as a ruby annotation.
        </p>
      </section>

      <section>
        <h3>Double-sided ruby, background</h3>
        <p>
          A sequence of characters can be accompanied by two ruby annotations,
	  typically consisting of [=Furigana=] and either [=GIKUN=] or an [=interlinear note=].  
          In <a data-cite="JLREQ#fig2_3_12">an example provided in JLreq</a> 
          ("An example of ruby annotations attached to both sides of the base characters"), 
          <span lang="ja">東南</span> is accompanied by <span lang="ja">たつみ</span> and <span lang="ja">とうなん</span>. 
          Here <span lang="ja">東南</span> means 'southeast', with <span lang="ja">とうなん</span> (TOUNAN) serving as [=Furigana=], 
          and <span lang="ja">たつみ</span> (TATSUMI) as [=GIKUN=], 
          as <span lang="ja">辰巳</span> (read as <span lang="ja">TATSUMI</span>) indicates the same direction as <span lang="ja">東南</span>.
        </p>

	<p>We offer two additional illustrative examples.</p>

        <figure id="f20211101001">
          <img src="./img/rdr001.svg" alt="Double-sided ruby example 1" width="65" height="54" />
          <figcaption><span lang="ja">東洋</span> features an upper-side ruby annotation <span lang="ja">オリエント</span> and a lower-side ruby annotation <span lang="ja">とうよう</span></figcaption>
        </figure>
        <p>
	  In this example, <span lang="ja">とうよう</span> serves as [=Furigana=], while <span lang="ja">オリエント</span> is used as [=Gikun=]
        </p>
        <figure id="f20211101002">
          <img src="./img/rdr002.svg" alt="Double-sided ruby example 2" width="110" height="53" />
          <figcaption><span lang="ja">織田信長</span> features an upper-side ruby annotation <span lang="ja">"1534〜82"</span> and a lower-side ruby annotation <span lang="ja">おだのぶなが</span></figcaption>
        </figure>
        <p>
          In this example, <span lang="ja">おだのぶなが</span> serves as [=Furigana=], while <span lang="ja">"1534〜82"</span> is presented as an [=interlinear note=].
        </p>
      </section>
    </section>
    
    <section>
      <h2>Which should be read aloud, ruby bases or ruby annotations, or both?</h2>
      <p>
        There are three possible options: (1) both ruby bases and ruby annotations, (2) ruby annotations only, and (3) ruby bases only.
      </p>

      <section>
        <h3>Reading aloud both ruby bases and ruby annoations</h3>
        <p>
          In this option, both ruby bases and ruby annotations are read aloud (double reading).
          Many implementations (screen readers, in particular) support this option only. 
          For example, <ruby><rb>foo</rb><rt>bar</rt></ruby> is read aloud as 'foo bar' or 'bar foo'.
        </p>

        <section>
          <h4 id="furigana_both_read_aloud">Furigana, when both read aloud</h4>
          <p>
            The option of reading aloud both interferes with readers' understanding significantly. 
          </p>
	  <section>
	    <h5>Examples of harmful double reading: Japanese</h5>

	    <p lang="ja">彼の名前は<ruby><rb>出羽内</rb><rt>でわない</rt></ruby>です。</p>

	    <p>This sentence is intended to mean "His name is
	    Dewanai".  Double reading completely changes the meaning:
	    it will be interpreted as "His name is NOT Dewanai".</p>

	    <p lang="ja">それでは<ruby><rb>話</rb><rt>はなし</rt></ruby>にならない。</p>

	    <p>This sentence is intened to mean "Nonsense!".  Double
	    reading completely changes the meaning: it will be
	    interpreted as "You have to deal with it".</p>
	  </section>

	  <section>
	    <h5>Examples of harmful double reading: English</h5>

	    <p>Consider this English sentence having a ruby annotation: "My name is <ruby><rb>Knot</rb><rt>not</rt></ruby>".</p>

	    <p>Double reading completely changes the meaning: it will be interpreted as "My name is not Knot".</p>

	    <p>Another example: "There is a road in Autin spelled
	    both <ruby><rb>Manchaca</rb><rt>Man-Chack</rt></ruby>
	    and <ruby><rb>Menchaca</rb><rt>Man-Chack</rt></ruby>".</p>

	    <p>Double reading makes the road name read aloud twice,
	    possibly differently.</p>

	  <p>Yet another example: "<ruby><rb>Oxoerythromycin</rb><rt>oxo-eur-ithro-mycin</rt></ruby>
	    is a ketone derived from erythromycin".</p>

	  <p>Double reading makes this compound name read aloud twice,
	    possibly differently.</p>
	</section>
	</section>

        <section>
          <h4 id="gikun_both_read_aloud">Gikun, when both read aloud</h4>
          <p>
            The option of reading aloud both is sensible.  It is common to read aloud ruby annotations first then ruby bases next, but it is sometimes better to read aloud ruby bases first and ruby annotations next [[?Transliteration Training Course]]).
          </p>
	  <p>
            <span lang="ja"><ruby><rb>敵</rb><rt>とも</rt></ruby></span> is read aloud as TEKI TOMO or TOMO TEKI, which means 'enemy friend' or 'friend enemy'  (equal to 'frenemy').</p>
		      
          <p>
            <span lang="ja"><ruby><rb>生命</rb><rt>いのち</rt></ruby></span> is read aloud as SEIMEI INOCHI or INOCHI SEIMEI, 
            where SEIMEI is a loan word from Chinese and INOCHI is a native Japanese word.  Both means life.
          </p>
        </section>

        <section>
          <h4 id="unusual_names_both_read_aloud">Unusual names of people and places, when both read aloud</h4>
          <p>
            The option of reading aloud both interferes with readers' understanding significantly. 
          </p>
          <p>
            <span lang="ja"><ruby><rb>不死川玄弥</rb><rt>しなずがわげんや</rt></ruby></span>
            is read aloud as FUSHIKAWA GENYA SHINAZUGAWA GENYA or SHINAZUGAWA GENYA FUSHIKAWA GENYA, which suggests two persons rather than one person.
          </p>
        </section>

        <section>
          <h4 id="interlinear_note_both_read_aloud">Interlinear notes, when both read aloud</h4>
          <p>
            The option of reading aloud both is sensible.  It is necessary to read aloud ruby bases first then ruby annotations next.
          </p>
          <p>
            For example, <span lang="ja"><ruby><rb>徳川慶喜</rb><rt>1837-1913 江戸幕府最後の将軍</rt></ruby></span> 
            is read aloud as TOKUGAWA YOSHINOBU 1837-1913 EDO BAKUFU SAIGONO SHOUGUN, 
            which means 'Tokugawa Yoshinobu 1837-1913, the last shogun of the Edo shogunate'.        
          </p>
        </section>

        <section>
          <h4>Ruby annotations for indicating the pronunciation of foreign phrases in language books, when both read aloud</h4>
          <p>
            The option of reading aloud both interferes with readers' understanding significantly.
          </p>
          <p>
            In the example of <span lang="zh-hans">我去学校</span>, 
            even if <span lang="ja">ウオ チュー シュエシャオ</span> is read aloud using the Japanese text-to-speech engine, 
            the result will not be helpful to learners because of the incorrect pronunciation and four tones. 
            Katakana pronunciation is also useless in languages such as English.
          </p>
        </section>

        <section>
          <h4>Double-sided ruby, when both read aloud</h4>
          <p>
            Since there are two ruby annotations, double-sided ruby leads to reading aloud three times. 
            One of the ruby annotations is typically furigana, so the description in  <a href="#furigana_both_read_aloud"><span class="secno">3.1.1</span></a> applies. 
            If the other ruby annotation is a Gikun, the description in <a href="#gikun_both_read_aloud"><span class="secno">3.1.2</span></a> applies; 
            if it is an interlinear note, the description in <a href="#interlinear_note_both_read_aloud"><span class="secno">3.1.4</span></a> applies.
          </p>
        </section>
      </section>

      <section>
        <h3>Reading aloud ruby annotations only</h3>
        <p>
          In this option, ruby annotations are read aloud but ruby bases
	  are not. For example, <ruby><rb>foo</rb><rt>bar</rt></ruby> is
	  read aloud as 'bar'.
        </p>
        <section>
          <h4 id="furigana_annotation_read_aloud">Furigana, when ruby annotations read aloud</h4>
          <p>
	    Even native Japanese speakers may easily assume, without
	    thorough consideration, that the option of reading only ruby
	    annotations aloud will provide reasonable results. However,
	    this is not always the case.
          </p>

	  <section>
	    <h5>Incorrect pitch accent</h5>

	    <p>Each hiragarana character represents a mora (a basic timing
	      unit in phonology), which is typically a single vowel or a
	      consonant followed by a single vowel.  The same sequence of
	      moras may mean different words depending on the pitch accent.
	      For example, both <span lang="ja">雨</span> (rain) and
	      <span lang="ja">飴</span> (candy) consists of the same moras:
	      <span lang="ja">あ</span> and <span lang="ja">め</span>.
	      However, if the Tokyo accent is used as a basis, the first mora in <span lang="ja">雨</span> has a
	      high pitch, and the second has a low pitch;
	      <span lang="ja">飴</span> has the opposite pitch accent.</p>

	    <p>Reading aloud ruby annotations rather than ruby bases often
	      leads to incorrect pitch accent. As an example, consider
	      <span lang-"ja"><ruby><rb>雨</rb><rt>あめ</rt></ruby>が好き</span>
	      (I like rain) and <span lang-"ja"><ruby><rb>飴</rb><rt>あめ</rt></ruby>が好き</span>
	      (I like candy). In both cases, reading aloud ruby annotations
	      rather than ruby bases implies that the TTS engine will receive
	      <span lang-"ja">あめが好き</span> and create the same result.</p>

	    <p>A similar example is <span lang-"ja"><ruby><rb>牡蠣</rb><rt>かき</rt></ruby>を食べる</span>
	      (I eat oysters) and <span lang-"ja"><ruby><rb>柿</rb><rt>かき</rt></ruby>を食べる</span>
	      (I eat persimmons), where <span lang="ja">牡蠣</span> and
	      <span lang="ja">柿</span> have the same two moras but opposite
	      pitch accents.</p>
	  </section>	    

	  <section>
	    <h5>Incorrectly pronouncing non-particle は or へ as particles</h5>

	    <p>In modern Japanese, there is basically only one way to read
	      each hiragana character. But <span lang="ja">は</span> and
	      <span lang="ja">へ</span> are exceptions. <span lang="ja">は</span>
	      is usually read aloud as /ha/ but is read aloud as /wa/ when
	      it is used as a particle. Likewise, <span lang="ja">へ</span>
	      is usually read aloud as /he/ but is read aloud as /e/ when
	      it is used as a particle.</p>

	    <p>Reading aloud ruby annotations rather than ruby bases implies
	      that CJK ideographic characters in ruby bases will not be
	      passed to the TTS engine, only hiragana characters in ruby
	      annotations will be.</p>

	    <p>Without CJK ideographic characters, Japanese morphological
	      analysis is likely to fail. For example, <span lang="ja">やがてはいしになる</span> may be
	      misinterepreted as <span lang="ja">やがては いしに なる</span> ("I will eventually become a doctor") rather
	      than <span lang="ja">やがて はいしに なる</span> ("It will be abolished eventually").
	      Occurrences of
	      <span lang="ja">は</span> or <span lang="ja">へ</span> as
	      non-particles in ruby annotations may well be mistakenly
	      interpreted as particles.  Consequently, such occurrences
	      of <span lang="ja">は</span> and <span lang="ja">へ</span>
	      may well be mistakenly read aloud as /wa/ and /e/, respectively.</p>

	    <p>For example, consider  
	      <span lang="ja">やがて<ruby><rb>廃止</rb><rt>はいし</rt></ruby>になる</span>.
	      This sentence means "It will be abolished eventually".  But
	      if <span lang="ja">やがてはいしになる</span> is passed to the
	      TTS engine, <span lang="ja">は</span> may well be mistakenly
	      read aloud as /wa/ rather than /ha/.  The result means
	      "I will eventually become a doctor".
	    </p>

	    <p>Here are some similar examples.  All occurrences of
	      <span lang="ja">は</span> and <span lang="ja">へ</span>
	      in ruby annotations are likely to be mistakenly read aloud.</p>
	    
	    <ul>
	      <li><span lang="ja">人員<ruby><rb>配置</rb><rt>はいち</rt></ruby></span> </li>
	      <li><span lang="ja">自然<ruby><rb>破壊</rb><rt>はかい</rt></ruby></span> </li>
	      <li><span lang="ja">社会<ruby><rb>波紋</rb><rt>はもん</rt></ruby></span> </li>
	      <li><span lang="ja">天皇<ruby><rb>陛下</rb><rt>へいか</rt></ruby></span></li>
	      <li><span lang="ja">大学<ruby><rb>併願</rb><rt>へいがん</ruby></span></li>
	      <li><span lang="ja">学級<ruby><rb>閉鎖</rb><rt>へいさ</rt></ruby></span></li>
	    </ul>
	  </section>
	  <section>
	    <h5>Inconsistency between the first and subsequent occurrences</h5>
            <p>
	      As described in <a href="#furigana-background"><span class="secno">2.1</span></a>,
	      furigana as a ruby annotation may be attached to only the
	      first occurrence of a CJK ideographic character or a word
	      composed from such characters.  Thus, there is a risk that
	      the first occurrence and the others are read aloud differently.
	      For example, consider <span lang="ja">智子</span> as the name
	      of a character in the novel.  Tthere are several possible readings of this name, such as 
	      <span lang="ja">さとこ</span> and
	      <span lang="ja">ともこ</span>. If <span lang="ja">さとこ</span>
	      as a ruby annotation is attached only to the first occurrence
	      of the name, it will be read as <span lang="ja">さとこ</span>
	      and the other occurrences may be read as <span lang="ja">ともこ</span>.
	      The reader would then think that <span lang="ja">さとこ</span>
	      and <span lang="ja">ともこ</span> are different characters.</p>
          <aside class="note" title="" id="n20211101002">
            One approach to avoid this problem is to create a table of
	    ruby base-annotation pairs. When a CJK ideographic character
	    or a word composed of such characters is encountered, this
	    table allows the TTS engine to receive the ruby annotation
	    for not only the first occurrence but also for the subsequent
	    occurrences.
          </aside>

	  </section>
	  </section>
	  
        <section>
          <h4 id="gikun_annotation_read_aloud">Gikun, when ruby annotations read aloud</h4>
          <p>
            The option of reading aloud ruby annotations only provides an understandable result but does not properly convey the author's intention.
          </p>
  	  <p>
            <span lang="ja"><ruby><rb>敵</rb><rt>とも</rt></ruby></span> is read aloud as TOMO, which means 'friend', but 'frenemy' is intended.</p>

          <p>
            <span lang="ja"><ruby><rb>生命</rb><rt>いのち</rt></ruby></span> will be read aloud as INOCHI(<span lang="ja">いのち</span>).
          </p>
        </section>

        <section>
          <h4>Unusual names of people and places, when ruby annotations read aloud</h4>
          <p>
            The option of reading aloud ruby annotations only works correctly. 
            However, if the first occurrence of a name is accompanied by a ruby annotation and the other occurrences are not, 
            the first occurrence is read aloud differently from the others thus suggesting different persons or places.
          </p>
          <p>
            For example, <span lang="ja"><ruby><rb>不死川玄弥</rb><rt>しなずがわげんや</rt></ruby></span> 
            is read aloud as SHINAZUGAWA GENYA correctly. 
            But later occurrences of <span lang="ja">不死川玄弥</span> are read aloud as FUSHIKAWA GENYA if they do not have ruby annotations.
          </p>
          <aside class="note" title="" id="n20211101003">
            The workaround as described in the note in <a href="#furigana_annotation_read_aloud"><span class="secno">3.2.1</span></a> is available.
          </aside>
        </section>

        <section>
          <h4 id="interlinear_note_annotation_read_aloud">Interlinear notes, when ruby annotations read aloud</h4>
          <p>
            The option of reading aloud ruby annotations only provides incomprehensible results often.
          </p>
          <p>
            If <span lang="ja">"1837-1913 江戸幕府最後の将軍"</span> is attached to <span lang="ja">徳川慶喜</span> as a ruby annotation, 
            it will be read aloud as 1837-1913 EDOBAKUFU SAIGO NO SHOGUN
            (1837-1913 the last shogun of the Edo shogunate), which is reasonable.
            But if only "1837-1913" is attached as a ruby annotation, the result is 1837-1913, which does not make any sense.
          </p>
        </section>

        <section>
          <h4>Ruby annotations for indicating the pronunciation of foreign phrases in language books, when ruby annotations read aloud</h4>
          <p>
            The option of reading aloud ruby annotations only interferes with readers' understanding significantly.
          </p>
          <p>
            In the example of <span lang="zh-hans">我去学校</span> (<span lang="ja">ウオ チュー シュエシャオ</span>), 
            even if <span lang="ja">ウオ チュー シュエシャオ</span> is read out in the Japanese style, 
            it will not be helpful to learners because of the inaccurate pronunciation and the four tones (tones). 
            Katakana pronunciation is also useless in languages such as English.
          </p>
        </section>

        <section>
          <h4>Double-sided ruby, when ruby annotations read aloud</h4>
          <p>
            The option of reading aloud ruby annotations only makes two ruby annotations be read aloud while ignoring their ruby base. 
            Since one of the two ruby annotations is typically furigana, the description in <a href="#furigana_annotation_read_aloud"><span class="secno">3.2.1</span></a> applies. 
            If the other ruby annotation is a Gikun, the description in <a href="#gikun_annotation_read_aloud"><span class="secno">3.2.2</span></a> applies; 
            if it is an interlinear note, the description in <a href="#interlinear_note_annotation_read_aloud"><span class="secno">3.2.4</span></a> applies.
          </p>
        </section>
      </section>

      <section>
        <h3>Reading aloud ruby bases only</h3>
        <p>
          In this option, ruby bases are read aloud but ruby annotations are not. 
          For example, <ruby><rb>foo</rb><rt>bar</rt></ruby> is read aloud as foo.
        </p>
        <aside class="note" title="" id="n20211101004">
          This option does not necessarily ignore ruby annotations. 
          Although text-to-speech engines mainly use ruby bases, they may also use ruby annotations as a hint.
        </aside>

        <section>
          <h4 id="furigana_base_read_aloud">Furigana, when bases read aloud</h4>
          <p>
            The option of reading aloud ruby bases only may or may not provide good results, depending on text-to-speech engines.
          </p>
          <p>
            The following is a quote from [[?ACCESSIBLE_E_BOOKS]].
          </p>
          <aside class="note" title="" id="n20211101005">
            Many TTS engines support characters in JIS X 0208:1997 but do not typically support characters beyond it.
            Thus, more than a half of the JIS CJK ideographic characters cannot be read aloud.
          </aside>
          <p>
            Furthermore, compound words made up from CJK ideographic characters in JIS X 0208 are sometimes read aloud incorrectly.
          </p>
          <p>
            As the importance of accessibility is well recognized and text-to-speech engines are improved, 
            more and more words will be read aloud correctly. 
            However, there are some words, such as the aforementioned YAMAZAKI, 
            that cannot be read aloud correctly by text-to-speech engines and even native Japanese speakers.
          </p>
        </section>

        <section>
          <h4 id="gikun_base_read_aloud">Gikun, when bases read aloud</h4>
          <p>
            The option of reading aloud ruby bases only results in a perfectly understandable result. 
            However, since gikun is ignored, the author's intent is not completely conveyed.
          </p>
	  <p>
            <span lang="ja"><ruby><rb>敵</rb><rt>とも</rt></ruby></span> is read aloud as TEKI, which means 'enemy', but 'frenemy' is intended.</p>

          <p>
            <span lang="ja"><ruby><rb>生命</rb><rt>いのち</rt></ruby></span> is read out as SEIMEI.
          </p>
        </section>

        <section>
          <h4>Unusual names of people and places, when bases read aloud</h4>
          <p>
            The option of reading ruby bases only leads to incorrect results. 
            However, since every occurrence of a name is read aloud in the same way, users will not be confused.
          </p>
          <p>
            Every occurrence <span lang="ja"><ruby><rb>不死川 玄弥</rb><rt>しなずがわ　げんや</rt></ruby></span> 
            will always be incorrectly read aloud as FUSHIKAWA GENYA, regardless of the presence or absence of ruby annotations.
          </p>
        </section>

        <section>
          <h4 id="interlinear_note_base_read_aloud">Interlinear notes, when bases read aloud</h4>
          <p>
            The option of reading ruby bases only provides a perfectly understandable result. 
            However, since interline notes are ignored, the author's intention is not conveyed well.
          </p>
          <p>
            <span lang="ja"><ruby><rb>徳川慶喜</rb><rt>1837-1913 江戸幕府最後の将軍</rt></ruby></span>
              (Tokugawa Yoshinobu 1837-1913, the last shogun of the Edo shogunate), 
              will be read aloud as <span lang="ja">とくがわよしのぶ</span> (Tokugawa Yoshinobu).
          </p>
        </section>

        <section>
          <h4>Ruby annotations for indicating the pronunciation of foreign phrases in language books, when bases read aloud</h4>
          <p>
            The option of reading ruby bases only is most appropriate when natural languages are correctly identified 
            and ruby bases are read aloud by a text-to-speech engine for that language. 
            On the other hand, if the natural language cannot be identified or the text-to-speech engine for that language is not available, 
            the result is not understandable.        
          </p>
        </section>

        <section>
          <h4>Double-sided ruby, when bases read aloud</h4>
          <p>
            The option of reading ruby bases only will ignore the two ruby annotations and read their ruby base only. 
            When one of the two ruby annotations is furigana, the description in <a href="#furigana_base_read_aloud"><span class="secno">3.3.1</span></a> applies. 
            If the other is a gikun, the description in <a href="#gikun_base_read_aloud"><span class="secno">3.3.2</span></a> applies, and if it is an interlinear note, the description in <a href="#interlinear_note_base_read_aloud"><span class="secno">3.3.4</span></a> applies.
          </p>
        </section>
      </section>
    </section>

    <section>
      <h2>Miscellaneous issues around ruby markup</h2>

      <section>
        <h3>Conversion from small kana characters to full-size kana characters</h3>
        <p>
          Small kana characters <span lang="ja">ゃ</span>, <span lang="ja">ゅ</span>, <span lang="ja">ょ</span>, and 
          <span lang="ja">っ</span> are too small when they appear in ruby annotations. 
          For this reason, instead of these small characters, full-size kana characters <span lang="ja">や</span>, 
          <span lang="ja">ゆ</span>, <span lang="ja">よ</span>, and <span lang="ja">つ</span> are used in ruby annotations.
        </p>
        <p>
          However, since full-size kana characters are pronounced differently from small kana, 
          ruby annotations containing full-size kana are read aloud differently.
        </p>
        <p>
          CSS has a mechanism for overcoming this problem. 
          Value '<a data-cite="css-text-3" data-xref-type="css-value" data-xref-for="text-transform">full-size-kana</a>' of 
          the <a data-cite="css-text-3" data-xref-type="css-property">text-transform</a> property as specified in CSS Text converts 
          small kana characters to full-size kana. 
          It is thus possible to use small kana in ruby annotations while rendering them using full-size kana. 
          Text-to-speech engines can provide correct results even when ruby annotations are read aloud.
        </p>
      </section>

      <section>
        <h3>A single ruby element or multiple ruby elements per one compound word</h3>
        <p>When attaching a ruby annotation to a compound word consisting of multiple CJK ideographic characters in an HTML or EPUB document, one way is to create a single HTML <code>ruby</code> element for the entire word. However, in some cases, a separate <code>ruby</code> element is created for each CJK ideographic character. 
For example, to attach the ruby annotation <span lang="ja">せいめい</span> to the word <span lang="ja">生命</span> (meaning “life” in Japanese), the typical approach is to create a single <code>ruby</code> element for this word.  This <code>ruby</code> element may have a single <code>rt</code> element for “せいめい” or two <code>rt</code> elements (one for “せい” and another for “めい”).  However, it is not entirely uncommon to see two <code>ruby</code> elements for this word: one for “生” and another for “命”.</p>

	<p>Some people argue that 
          creating a ruby element per compound word is better than creating a ruby element for each character in a compound word. They argue that 
it becomes easier for the text-to-speech engine to maintain a correspondence table between ruby bases and ruby annotations so that subsequent occurrences of the compound word without ruby can be pronounced correctly.
        </p>
	<p>Meanwhile, others argue that there is a good reason to attach ruby
	  annotations to some, but not all, characters in a compound word.
	  For example, consider <span lang="ja">佳人</span>, where
	  <span lang="ja">佳</span> is taught in junior high schools while
	  <span lang="ja">人</span> is taught in the first grade of elementary
	  schools.  Therefore, it makse sense to attach a ruby annotation to
	  <span lang="ja">佳</span> only (one <code>ruby</code> element for
	  <span lang="ja">佳</span> and no <code>ruby</code> element for
	  <span lang="ja">人</span>).  Similarly, it is reasonable to attach ruby
	  annotations to the first and third CJK idegraphic characters in
	  <span lang="ja">屯田兵</span> only but not to the second one (thus,
	  two <code>ruby</code> elements).</p>
      </section>

      <section>
        <h3>Markup for indicating furigana-added-for-enhanced-accessibility</h3>
        <p>
          Although furigana-added-for-enhanced-accessibility is necessary for those readers who have particular 
          difficulties with CJK ideographic characters, it is unnecessary or slightly disturbing for others. 
          If furigana-added-for-enhanced-accessibility is distinguishable from normal furigana, 
          it can be made visible or invisible depending on user preferences. 
          It is thus necessary to standardize a markup mechanism for indicating furigana-added-for-enhanced-accessibility.
        </p>
      </section>

      <section>
        <h3>Markup for indicating ruby annotations used as gikun or interlinear note</h3>
        <p>
          In Section 3, we have seen that ruby annotations used as gikun or interline notes should be read aloud differently from the other cases. 
          It is thus necessary to standardize a markup mechanism for clearly indicating ruby annotations used as gikun or interlinear note.
        </p>
      </section>
    </section>

    <section>
      <h2>Alternatives to ruby</h2>

      <p>[[?SSML]] and [[?PRONUNCIATION-LEXICON]] offer alternatives for
      conveying phonemic and phonetic pronunciations of CJK ideographic 
      characters to speech synthesis engines. These methods are not intended for visual 
      presentations but can offer superior control over text-to-speech compared to using ruby annotations.</p>
      
      <section>
        <h3>SSML</h3>
	<p>
	  [[?SSML]] employs symbol collections (such as IPA and
	  [[?JEITA_IT-4006]]) to represent the sounds of human
	  languages. Phonemic and phonetic pronunciations are conveyed
	  through sequences of these symbols.
	</p>
	<p>
	  [[?epub-32]] allows the use of SSML attributes within
	  <a data-cite="epub-33#dfn-xhtml-content-document">XHTML
	  content documents</a> in EPUB publications. In [[?epub-33]],
	  these attributes are relocated to
	  [[?epub-tts-10]]. Meanwhile, the W3C Accessible Platform
	  Architectures Working Group is developing [[?spoken-html]],
	  which outlines two potential methods for incorporating SSML
	  attributes into HTML elements.</p>

	<p>
	  In Japan, SSML finds extensive application in digital
	  textbooks, adopted by the biggest textbook publisher in
	  Japan. However, it has been noted that attaching SSML
	  attributes to CJK ideographic characters significantly
	  raises authoring costs. In the case of DAISY textbooks in
	  Japan, SSML is not used, as they contain recorded voice.
	  Trade books in Japan do not typically employ SSML
	  either.</p>
      </section>

      <section>
        <h3>PLS</h3>
	<p>PLS ([[PRONUNCIATION-LEXICON]]) enables the use of pronunciation lexicons, which map words to
	  sequences of symbol collections such as those found in IPA or
	  [[?JEITA_IT-4006]].
	  </p>
        <p>
          While SSML attributes are embedded within <a data-cite="epub-33#dfn-xhtml-content-document">XHTML content 
            documents</a> in EPUB publications, PLS lexicons 
	  in EPUB publications are stored
          externally to and referenced by <a data-cite="epub-33#dfn-xhtml-content-document">XHTML content documents</a>
          (see <a data-cite="epub-tts-10#pls">Pronunciation Lexicons
          section</a> in [[?epub-tts-10]]).  As of the present,
          [[spoken-html]] does not offer a mechanism for associationg
          PLS lexicons with HTML documents.
        </p>
        <p>
	  PLS is a robust tool for rendering unusual names of people
	  and places in text-to-speech applications.  In particular, PLS allows
	  every occurrence of a word or phrase to be consistently pronounced, 
	  regardless of the presence of ruby annotations.  At the time of this writing, PLS is
	  used by at least one digital textbook publisher in Japan.
        </p>
      </section>
    </section>

    <section>
      <h2>Use of ruby for automatic braille translation</h2>
      <p>
	The conversion of HTML documents and EPUB publications to
	braille is expected to become increasingly important in the
	near future.
      </p>
      <p>
	Japanese braille lacks CJK ideographic characters and does not
	distinguish between hiragana and katakana. (Note: Han braille
	in Japan does include CJK ideographic characters, but it is
	not widely used.)
      </p>
      <p>
	Braille exhibits some syntactical differences from the
	Japanese writing system. First, space characters are inserted
	as delimiters between words. Second, two Japanese particles,
	<span lang="ja">は</span> and <span lang="ja">へ</span>, are transcribed as they are pronounced, meaning
	<span lang="ja">は</span> and <span lang="ja">へ</span> are represented as if they were
	<span lang="ja">わ</span> and <span lang="ja">え</span>,
	respectively. Third, <span lang="ja">う</span> pronounced as an elongated sound is
	represented using the long vowel character. For example,
	to tranlsate <span lang="ja">たいよう</span> to braille,
	<span lang="ja">たいよう</span> is first converted to <span lang="ja">たいよー</span> and then translated to braille.
      </p>
      <p>
	Natural language processing is required to handle these
	differences during the conversion to braille. However, unlike
	the case of text-to-speech, intonation is not relevant.
      </p>
      <p>
	When converting HTML or EPUB content to braille, it is
	essential to select the correct reading for each CJK
	ideographic character. Choosing an incorrect reading can
	result in erroneous braille output. Similar to text-to-speech,
	ruby annotations provide valuable hints, while [[?SSML]] and PLS
	([[?PRONUNCIATION-LEXICON]]) serve as effective alternatives.
      </p>
      <p>
	For furigana and the transcription of unusual names of people
        and places, natural language processing is more effective when
        using ruby bases (typically containing CJK ideographic
        characters) as the foundation. In contrast, the correct
        readings are chosen when using ruby annotations as the
        basis. It is also possible to combine both ruby bases and ruby
        annotations.
      </p>
    </section>

  </body>
</html>