Skip to content

HfstFst2Strings

eaxelson edited this page Nov 24, 2017 · 6 revisions

hfst-fst2strings

Purpose

Display the string pairs recognized by a transducer, i.e. paths that lead from the initial state to a final state.

Usage

The help message:

Usage: hfst-fst2strings [OPTIONS...] [INFILE]
Display the strings recognized by a transducer

Common options:
  -h, --help             Print help message
  -V, --version          Print version info
  -v, --verbose          Print verbosely while processing
  -q, --quiet            Only print fatal erros and requested output
  -s, --silent           Alias of --quiet
Fst2strings options:
  -n, --max-strings=NSTR     print at most NSTR strings
  -N, --nbest=NBEST          print at most NBEST best strings
  -r, --random=NRAND         print at most NRAND random strings
  -c, --cycles=NCYC          follow cycles at most NCYC times
  -w, --print-weights        display the weight for each string
  -S, --print-separator      print separator '--' after each transducer
  -e, --epsilon-format=EPS   print epsilon as EPS
  -X, --xfst=VARIABLE        toggle xfst compatibility option VARIABLE
Ignore paths if:
  -b, --beam=B               output string weight not within B from the weight
                             of the best output string
  -l, --max-in-length=MIL    input string longer than MIL
  -L, --max-out-length=MOL   output string longer than MOL
  -p, --in-prefix=OPREFIX    input string not beginning with IPREFIX
  -P, --out-prefix=OPREFIX   output string not beginning with OPREFIX
  -u, --in-exclude=IXSTR     input string containing IXSTR
  -U, --out-exclude=OXST     output string containing OXSTR

If OUTFILE or INFILE is missing or -, standard streams will be used.
Format of result depends on format of INFILE
If all NSTR, NBEST and NCYC are omitted, all possible paths are printed:
NSTR, NBEST and NCYC default to infinity.
NBEST overrides NSTR and NCYC
NRAND overrides NBEST, NSTR and NCYC
B must be a non-negative float
If EPS is not given, default is empty string.
Numeric options are parsed with strtod(3).
Xfst variables supported are { obey-flags, print-flags,
print-pairs, print-space, quote-special }.

Examples:
  hfst-fst2strings lexical.hfst  generates all forms of lexical.hfst

Known bugs:
  Does not work correctly for hfst optimized lookup format.

Report bugs to <hfst-bugs@helsinki.fi> or directly to our bug tracker at:
<https://sourceforge.net/tracker/?atid=1061990&group_id=224521&func=browse>

Options

Fst2strings options

option explanation note
-n, --max-strings=NSTR Print at most NSTR shortest strings. Defaults to infinity. Overriden by NBEST.
-N, --nbest=NBEST Print at most NBEST best strings. For weighted transducers, the best strings are defined as the strings with the lowest weight. For unweighted transducers, the best strings are the shortest ones. Defaults to infinity. Overrides NSTR and NCYC. This option is intended for weighted transducers, it is not guaranteed that it will work correctly for all unweighted transducers.
-r, --random=NRAND Print at most NRAND random strings. This option tries to get an even distribution from all strings, i.e. the result should have both short and long strings containing different symbols.
-c, --cycles=NCYC Follow cycles at most NCYC times. Defaults to infinity. Overriden by NBEST. Intended to limit the number of strings in the result for cyclic transducers. For acyclic transducers, the option is ignored.
-w, --print-weights Display the weight for each string.
-S, --print-separator Print separator -- after each transducer is processed. See a use example below.
-e, --epsilon-format=EPS Print epsilon as EPS. The default is the empty string.
-X, --xfst=VARIABLE Toggle xfst compatibility option VARIABLE. Variables supported are { obey-flags, print-flags, print-pairs, print-space, quote-special }. See compatibility options.

String separator

The option --print-separator can be useful for example in the following case. We have a transducer fi2en.hfst that maps Finnish words into English ones and want to look up words in it by using the following pipeline of HFST command line tools:

hfst-strings2fst | hfst-compose fi2en.hfst | hfst-fst2strings --print-separator

Now we can simply write a Finnish word to the standard input, press enter, and let the pipelined commands print the English equivalent and a separating line. This will look as follows on the screen:

kissa
cat
--
koira
dog
--

Xfst compatibility options

These options are toggled with the command line tool option --xfst.

option explanation note
obey-flags Obey the flag diacritics, i.e. do not print strings that violate them. Defaults to true.
print-flags Print the flag diacritics. Defaults to false.
print-pairs For each transition in the path, print the input and output symbol separated by a colon. Defaults to false.
print-space Print a space between each transition in the path. Defaults to false.
quote-special Print special characters quoted. See below . Defaults to false.

Ignore path options

option explanation
-l, --max-in-length=MIL Do not print paths having input string longer than MIL.
-L, --max-out-length=MOL Do not print paths having output string longer than MOL.
-p, --in-prefix=OPREFIX Print only paths whose input string begins with IPREFIX.
-P, --out-prefix=OPREFIX Print only paths whose output string begins with OPREFIX.
-u, --in-exclude=IXSTR Do not print paths whose input string contains IXSTR.
-U, --out-exclude=OXST Do not print paths whose output string contains OXSTR.

Output

Special symbols are printed as follows unless options -e or -X are used:

symbol printed as note
epsilon '' can be changed to EPS with -e EPS
colon as such printed as '@_COLON_@' if -X quote-special is requested
tabulator as such printed as '@_TAB_@' if -X quote-special is requested
space as such printed as '@_SPACE_@' if -X quote-special is requested
flag diacritics '' printed as such if -X print-flags is requested

Limiting output

For transducers containing an infinite number of strings, it is useful to limit the number of strings generated. Following examples can be generated from the transducer ab_star.hfst that contains the language a:b*. The command

cat ab_star.hfst | hfst-fst2strings

yields the following results with different options:

options output note
--max-strings 3 a : b%BR%aa : bb%BR%aaa : bbb
--nbest 4 a : b%BR%aa : bb%BR%aaa : bbb Epsilon path is counted as one path but is not printed.
--cycles 5 a : b%BR%aa : bb%BR%aaa : bbb%BR%aaaa : bbbb%BR%aaaaa : bbbbb

Formatting output

The paths of a transducer can be formatted in many ways, most of which have been copied from another systems, such as Xerox's XFST utility. The output format is controlled with command line parameter -X. The way epsilons are printed can be controlled with the option -e. Following examples use a transducer cat2mouse.hfst that contains the language [c:m a:o t:u 0:s 0:e].

       hfst-fst2strings cat2mouse.hfst
              cat:mouse

       hfst-fst2strings -X print-pairs cat2mouse.hfst
              c:ma:ot:u:s:e

       hfst-fst2strings -X print-space cat2mouse.hfst
              c a t : m o u s e

       hfst-fst2strings -X print-space -X print-pairs cat2mouse.hfst
              c:m a:o t:u :s :e

       hfst-fst2strings -e 0 cat2mouse.hfst
              cat00:mouse

Examples

hfst-fst2strings cat.hfst

or

cat cat.hfst | hfst-fst2strings

Display all string pairs recognized by the transducer cat.hfst.

Shortcomings

The option nbest does not work if the transducer has cycles with negative weights.

See also

hfst-strings2fst, hfst-lookup

Clone this wiki locally