Skip to content

Commit

Permalink
docs(cli): document the recent changes to the CLI
Browse files Browse the repository at this point in the history
  • Loading branch information
joanise committed Nov 17, 2021
1 parent 8fa98dd commit 67f0593
Show file tree
Hide file tree
Showing 3 changed files with 142 additions and 59 deletions.
2 changes: 1 addition & 1 deletion docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ To view the documentation, run an HTTP server in the directory where the build
is found, e.g.,

cd _build/html
python -m http.server
python3 -m http.server

and navigate to http://127.0.0.1:8000 to view the results (or whatever port
your local web server displays).
Expand Down
193 changes: 138 additions & 55 deletions docs/cli-guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,8 @@ The ReadAlongs CLI has two main commands: ``readalongs prepare`` and
in).

- Alternatively, if your plain text file does not need to be modified, you can
run ``align`` directly and use the ``-i`` option to indicate that the input
is plain text and not xml. You'll also need the ``-l <language>`` option to
indicate what language your text is in.
run ``align`` directly on it, since it also accepts plain text input. You'll
need the ``-l <language(s)>`` option to indicate what language your text is in.

Two additional commands are sometimes useful: ``readalongs tokenize`` and
``readalongs g2p``.
Expand Down Expand Up @@ -52,10 +51,12 @@ breaks are marked by two blank lines.
+-----------------------------------+-----------------------------------+
| Key Options | Option descriptions |
+===================================+===================================+
| ``-l, --language`` (required) | The language code for story.txt. |
| ``-l, --language(s)`` (required) | The language code for story.txt. |
| | Specifying multiple languages |
| | triggers :ref:`g2p-cascade`. |
+-----------------------------------+-----------------------------------+
| ``-f, --force-overwrite`` | Force overwrite output files |
| | (handy if youre troubleshooting |
| | (handy if you're troubleshooting |
| | and will be aligning repeatedly) |
+-----------------------------------+-----------------------------------+
| ``-h, --help`` | Displays CLI guide for |
Expand All @@ -68,9 +69,10 @@ code <https://en.wikipedia.org/wiki/ISO_639-3>`__ as an argument.
The languages supported by RAS can be listed by running ``readalongs prepare -h``
and they can also be found in the :ref:`cli-prepare` reference.

So, a full command for a story in Algonquin would be something like:
So, a full command for a story in Algonquin, with a g2p fallback to
Undetermined, would be something like:

``readalongs prepare -l alq Studio/story.txt Studio/story.xml``
``readalongs prepare -l alq:und Studio/story.txt Studio/story.xml``

The generated XML will be parsed in to sentences. At this stage you can
edit the XML to have any modifications, such as adding ``do-not-align``
Expand Down Expand Up @@ -100,7 +102,7 @@ xml file.
it, e.g., <p do-not-align="true">...</p>, or
<s>Some text <foo do-not-align="true">do not align this</foo> more text</s> -->

To use DNA audio, you can specify a frame of time in milliseconds in the
To use DNA audio, you can specify a timeframe in milliseconds in the
``config.json`` file which you want the aligner to ignore.

::
Expand Down Expand Up @@ -144,20 +146,15 @@ created, as ``output_base*``
+-----------------------------------+---------------------------------------+
| Key Options | Option descriptions |
+===================================+=======================================+
| ``-l, --language`` | The language code for story.txt. |
| ``-l, --language(s)`` | The language code for story.txt. |
| | Specifying multiple languages |
| | triggers :ref:`g2p-cascade`. |
| | (required if input is plain text) |
+-----------------------------------+---------------------------------------+
| ``-c, --config PATH`` | Use ReadAlong-Studio |
| | configuration file (in JSON |
| | format) |
+-----------------------------------+---------------------------------------+
| ``-i, --text-input`` | Input is plain text (TXT) |
| | (otherwise it’s assumed to be |
| | XML) |
+-----------------------------------+---------------------------------------+
| ``--g2p-fallback G2P_FALLBACK`` | Colon-separated list of fallback langs|
| | for g2p; enables the g2p cascade |
+-----------------------------------+---------------------------------------+
| ``--g2p-verbose`` | Display verbose g2p error messages |
+-----------------------------------+---------------------------------------+
| ``-s, --save-temps`` | Save intermediate stages of |
Expand All @@ -174,9 +171,18 @@ created, as ``output_base*``

See above for more information on the ``-l, --language`` argument.

A full command would be something like:
A full command could be something like:

``readalongs align -f -c config.json story.xml story.mp3 story-aligned``

**Is the text file plain text or XML?**

``readalongs align`` accepts its text input as a plain text file or an XML file.

``readalongs align -f -c Studio/config.json Studio/story.xml Studio/story.mp3 Studio/story/aligned``
- If the file name ends with ``.txt``, it will be read as plain text.
- If the file name ends wiht ``.xml``, it will be read as XML.
- With other extensions, the beginning of the file is examined to
automatically determine if it's XML or plain text.

The config.json file
~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -223,76 +229,96 @@ separate elements in a list or dictionnary, but if you accidentally have
a comma after the last element (e.g., by cutting and pasting whole
lines), you will get a syntax error.

.. _g2p-cascade:

The g2p cascade
~~~~~~~~~~~~~~~

Sometimes the g2p conversion of the input text will not succeed, for
various reasons. A word might use characters not recognized by the g2p
various reasons. A word might use characters not recognized by the g2p mapping
for the language, or it might be in a different language. Whatever the
reason, the output for the g2p conversion will not be valid ARPABET, and
so the system will not be able to proceed to alignment by the readalongs
so the system will not be able to proceed to alignment by the
aligner, SoundSwallower.

If you know the language for that text, you can mark it as such in the
XML. E.g., ``<s xml:lang="eng">This sentence is in English.</s>``. The
``xml:lang`` attribute can be added to any element in the XML structure
XML. E.g.:

.. code-block:: xml
<s xml:lang="eng">This sentence is in English.</s>
The ``xml:lang`` attribute can be added to any element in the XML structure
and will apply to text at any depth within that element, unless the
attribute is specified again at a deeper level, e.g.,
``<s xml:lang="eng">English mixed with <foo xml:lang="fra">français</foo>.</s>``.
attribute is specified again at a deeper level, e.g.:

.. code-block:: xml
<s xml:lang="eng">English mixed with <foo xml:lang="fra">français</foo>.</s>
There is also a simpler option available: the g2p cascade. When the g2p
cascade is enabled, the g2p mapping will be done by first trying the
language specified in the XML file (or with the ``-l`` flag on the
language specified by the `xml:lang` attribute in the XML file
(or with the first language provided to the ``-l`` flag on the
command line, if the input is plain text). For each word where the
result is not valid ARPABET, the g2p mapping will be attempted again
with each of the languages specified in the g2p cascade, in order, until
a valid ARPABET conversion is obtained. If not valid conversion is
a valid ARPABET conversion is obtained. If no valid conversion is
possible, are error message is printed and alignment is not attempted.

To enable the g2p cascade, add the ``--g2p-fallback l1:l2:...`` option
to ``readalongs g2p`` or ``readalongs align``:
To enable the g2p cascade, provide multiple languages via the ``-l`` switch
(for plain text input) or add the ``fallback-langs="l2:l3:...`` attribute to
any element in the XML file:

::
.. code-block:: xml
<s xml:lang="eng" fallback-langs="fra:und">English mixed with français.</s>
Command line example that will set the language to ``fra`` with the g2p cascade
falling back to ``end`` and then ``und`` when needed:

readalongs g2p --g2p-fallback fra:eng:und myfile.tokenize.xml myfile.g2p.xml
readalongs align --g2p-fallback fra:eng:und myfile.xml myfile.wav output
.. code-block:: bash
readalongs prepare -l fra:eng:und myfile.txt myfile.xml
readalongs align -l fra:eng:und myfile.txt myfile.wav output-dir
The "Undetermined" language code: und
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Notice that the two examples above use ``und`` as the last language in the
cascade. ``und``, for Undetermined, is a special language mapping that
uses the Unicode definition of all known characters in all alphabets, and
uses the definition of all characters in all alphabets that are part of the
Unicode standard, and
maps them as if the name of that character was how it is pronounced.
While crude, this mapping works surprisingly well for the purposes of
forced alignment, and allows ``readalongs align`` to successfully align
most text with a few foreign words without any manual intervention. We
recommend systematically using ``und`` at the end of the cascade. Note
that adding another language after ``und`` will have no effect, since
that adding other languages after ``und`` will have no effect, since
the Undetermined mapping will map any string to valid ARPABET.

Debugging g2p mapping issues
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The warning messages issued by ``readalongs g2p`` and
``readalongs align`` indicate which words are causing g2p problems. It
can be worth inspecting to input text to fix any encoding or spelling
The warning messages issued by ``readalongs g2p`` and ``readalongs align``
indicate which words are causing g2p problems and what fallbacks were tried.
It can be worth inspecting to input text to fix any encoding or spelling
errors highlighted by these warnings. More detailed messages can be
produced by adding the ``--g2p-verbose`` switch, to obtain a lot more
information about g2ping words in each language g2p was unsucessfully
information about g2p'ing words in each language g2p was unsucessfully
attempted.

Breaking up the pipeline
~~~~~~~~~~~~~~~~~~~~~~~~

Two commands were added to the CLI in the last year to break processing up step
Some commands were added to the CLI in the last year to break processing up step
by step.

The following series of commands:

::

readalongs prepare -l lang file.txt file.xml
readalongs prepare -l l1:l2:und file.txt file.xml
readalongs tokenize file.xml file.tokenized.xml
readalongs g2p file.tokenized.xml file.g2p.xml
readalongs align file.g2p.xml file.wav output
Expand All @@ -301,11 +327,13 @@ is equivalent to the single command:

::

readalongs align -i -l lang file.txt file.wav output
readalongs align -l l1:l2:und file.txt file.wav output

except that when running the pipeline as four separate commands, you can
edit the XML files between each step to make any required adjustments
and corrections.
edit the XML files between each step to make manual adjustments and
corrections if you want, like inserting anchors, silences, changing the
language for indivual elements, or even manually editting the ARPABET encoding
for some words.

Anchors: marking known alignment points
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand All @@ -329,9 +357,10 @@ element or text.

Example:

::
.. code-block:: xml
<?xml version='1.0' encoding='utf-8'?> <TEI> <text xml:lang="eng"> <body>
<?xml version='1.0' encoding='utf-8'?>
<TEI> <text xml:lang="eng"> <body>
<anchor time="143ms"/>
<div type="page">
<p>
Expand All @@ -358,17 +387,20 @@ The beginning and end of files are implicit anchors: *n* anchors define
anchor, between pairs of anchors, and from the last anchor to the end of
the audio and text.

Special cases equivalent to do-not-align audio: - If an anchor occurs
before the first word in the text, the audio up to that anchor’s
timestamps is excluded from alignment. - If an anchor occurs after the
last word, the end of the audio is excluded from alignment. - If two
anchors occur one after the other, the time span between them in the
audio is excluded from alignment. Using anchors to define do-not-align
audio segments is effectively the same as marking them as "do-not-align"
in the ``config.json`` file, except that DNA segments declared using
anchors have a known alignment with respect to the text, while the
position of DNA segments declared in the config file are inferred by the
aligner.
Special cases equivalent to do-not-align audio:

- If an anchor occurs before the first word in the text, the audio up to that
anchor’s timestamps is excluded from alignment.
- If an anchor occurs after the last word, the end of the audio is excluded
from alignment.
- If two anchors occur one after the other, the time span between them in the
audio is excluded from alignment.

Using anchors to define do-not-align audio segments is effectively the same as
marking them as "do-not-align" in the ``config.json`` file, except that DNA
segments declared using anchors have a known alignment with respect to the
text, while the position of DNA segments declared in the config file are
inferred by the aligner.

Anchor use cases
^^^^^^^^^^^^^^^^
Expand All @@ -387,3 +419,54 @@ Anchor use cases
alignments.

These known timestamps can be converted to anchors.

Silences: inserting pause-like silences
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

There are times where you might want a read-along to pause at a particular
place for a specific time and resume again after. This can be accomplished by
inserting silences in your audio stream. You can do it manually by editing your
audio file ahead of time, but you can also have ``readalongs align`` insert the
silences for you.

Silence syntax
^^^^^^^^^^^^^^

Silences are inserted in the audio stream wherever a ``silence`` element is
found in the XML input.
**TODO say something about how the silence placement determined.**
The syntax is like the anchor syntax: ``<silence dur="4.2s"/>`` or
``<silence dur="100ms"/>``. Like anchors, silence elements can be inserted
anywhere.

Example:

.. code-block:: xml
<?xml version='1.0' encoding='utf-8'?>
<TEI> <text xml:lang="eng"> <body>
<silence dur="1s"/>
<div type="page">
<p>
<s>Hello.</s>
<silence dur="10s"/>
<s>After this pregnant pause, <silence dur="100ms"/> we'll pause
again before it's all over!</s>
</p>
<silence dur="1s"/>
</div>
</body> </text> </TEI>
Silence use cases
^^^^^^^^^^^^^^^^^

1. Your read along has a title page that is not read out in the audio stream:
insert a silence at the beginning so that it stays on the first page for
the specified time.
**TODO: test that a silence before the first word really keeps the RA on the
first page during that silence, even if all text on the first page is DNA.**

2. Your read along has a credits page at the end that is not read out in the
audio stream: insert a silence at the end so that people see that credits
page for the specified time before the streaming end.
**TODO: also test that this use case works as described.**
6 changes: 3 additions & 3 deletions readalongs/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -212,8 +212,8 @@ def cli():
multiple=True,
callback=joiner_callback(LANGS),
help=(
"The language code(s) for text in TEXTFILE (use only with -i, i.e., with plain text input); "
"multiple codes can be joined by ':' or by repeating the option; "
"The language code(s) for text in TEXTFILE (use only with plain text input); "
"multiple codes can be joined by ':', or by repeating the option, to enable the g2p cascade; "
"run 'readalongs langs' to list all supported languages."
),
)
Expand Down Expand Up @@ -416,7 +416,7 @@ def align(**kwargs):
callback=joiner_callback(LANGS),
help=(
"The language code(s) for text in PLAINTEXTFILE; "
"multiple codes can be joined by ':' or by repeating the option; "
"multiple codes can be joined by ':', or by repeating the option, to enable the g2p cascade; "
"run 'readalongs langs' to list all supported languages."
),
)
Expand Down

0 comments on commit 67f0593

Please sign in to comment.