-
Notifications
You must be signed in to change notification settings - Fork 35
HfstFst2Strings
Display the string pairs recognized by a transducer, i.e. paths that lead from the initial state to a final state.
The help message:
Usage: hfst-fst2strings [OPTIONS...] [INFILE]
Display the strings recognized by a transducer
Common options:
-h, --help Print help message
-V, --version Print version info
-v, --verbose Print verbosely while processing
-q, --quiet Only print fatal erros and requested output
-s, --silent Alias of --quiet
Fst2strings options:
-n, --max-strings=NSTR print at most NSTR strings
-N, --nbest=NBEST print at most NBEST best strings
-r, --random=NRAND print at most NRAND random strings
-c, --cycles=NCYC follow cycles at most NCYC times
-w, --print-weights display the weight for each string
-S, --print-separator print separator '--' after each transducer
-e, --epsilon-format=EPS print epsilon as EPS
-X, --xfst=VARIABLE toggle xfst compatibility option VARIABLE
Ignore paths if:
-b, --beam=B output string weight not within B from the weight
of the best output string
-l, --max-in-length=MIL input string longer than MIL
-L, --max-out-length=MOL output string longer than MOL
-p, --in-prefix=OPREFIX input string not beginning with IPREFIX
-P, --out-prefix=OPREFIX output string not beginning with OPREFIX
-u, --in-exclude=IXSTR input string containing IXSTR
-U, --out-exclude=OXST output string containing OXSTR
If OUTFILE or INFILE is missing or -, standard streams will be used.
Format of result depends on format of INFILE
If all NSTR, NBEST and NCYC are omitted, all possible paths are printed:
NSTR, NBEST and NCYC default to infinity.
NBEST overrides NSTR and NCYC
NRAND overrides NBEST, NSTR and NCYC
B must be a non-negative float
If EPS is not given, default is empty string.
Numeric options are parsed with strtod(3).
Xfst variables supported are { obey-flags, print-flags,
print-pairs, print-space, quote-special }.
Examples:
hfst-fst2strings lexical.hfst generates all forms of lexical.hfst
Known bugs:
Does not work correctly for hfst optimized lookup format.
Report bugs to <hfst-bugs@helsinki.fi> or directly to our bug tracker at:
<https://sourceforge.net/tracker/?atid=1061990&group_id=224521&func=browse>
option | explanation | note |
---|---|---|
-n, --max-strings=NSTR | Print at most NSTR shortest strings. |
Defaults to infinity. Overriden by NBEST . |
-N, --nbest=NBEST | Print at most NBEST best strings. For weighted transducers, the best strings are defined as the strings with the lowest weight. For unweighted transducers, the best strings are the shortest ones. |
Defaults to infinity. Overrides NSTR and NCYC . This option is intended for weighted transducers, it is not guaranteed that it will work correctly for all unweighted transducers. |
-r, --random=NRAND | Print at most NRAND random strings. |
This option tries to get an even distribution from all strings, i.e. the result should have both short and long strings containing different symbols. |
-c, --cycles=NCYC | Follow cycles at most NCYC times. |
Defaults to infinity. Overriden by NBEST . Intended to limit the number of strings in the result for cyclic transducers. For acyclic transducers, the option is ignored. |
-w, --print-weights | Display the weight for each string. | |
-S, --print-separator | Print separator -- after each transducer is processed. |
See a use example below. |
-e, --epsilon-format=EPS | Print epsilon as EPS . |
The default is the empty string. |
-X, --xfst=VARIABLE | Toggle xfst compatibility option VARIABLE . |
Variables supported are { obey-flags, print-flags, print-pairs, print-space, quote-special } . See compatibility options. |
The option --print-separator
can be useful for example in the following case.
We have a transducer fi2en.hfst
that maps Finnish words into English ones and
want to look up words in it by using the following pipeline of HFST command line tools:
hfst-strings2fst | hfst-compose fi2en.hfst | hfst-fst2strings --print-separator
Now we can simply write a Finnish word to the standard input, press enter, and let the pipelined commands print the English equivalent and a separating line. This will look as follows on the screen:
kissa
cat
--
koira
dog
--
These options are toggled with the command line tool option --xfst
.
option | explanation | note |
---|---|---|
obey-flags |
Obey the flag diacritics, i.e. do not print strings that violate them. | Defaults to true. |
print-flags |
Print the flag diacritics. | Defaults to false. |
print-pairs |
For each transition in the path, print the input and output symbol separated by a colon. | Defaults to false. |
print-space |
Print a space between each transition in the path. | Defaults to false. |
quote-special |
Print special characters quoted. See below . | Defaults to false. |
option | explanation |
---|---|
-l, --max-in-length=MIL | Do not print paths having input string longer than MIL . |
-L, --max-out-length=MOL | Do not print paths having output string longer than MOL . |
-p, --in-prefix=OPREFIX | Print only paths whose input string begins with IPREFIX . |
-P, --out-prefix=OPREFIX | Print only paths whose output string begins with OPREFIX . |
-u, --in-exclude=IXSTR | Do not print paths whose input string contains IXSTR . |
-U, --out-exclude=OXST | Do not print paths whose output string contains OXSTR . |
Special symbols are printed as follows unless options -e
or -X
are used:
symbol | printed as | note |
---|---|---|
epsilon | '' |
can be changed to EPS with -e EPS
|
colon | as such | printed as '@_COLON_@' if -X quote-special is requested |
tabulator | as such | printed as '@_TAB_@' if -X quote-special is requested |
space | as such | printed as '@_SPACE_@' if -X quote-special is requested |
flag diacritics | '' |
printed as such if -X print-flags is requested |
For transducers containing an infinite number of strings, it is useful to limit the number of strings generated.
Following examples can be generated from the transducer ab_star.hfst
that contains the language a:b*
.
The command
cat ab_star.hfst | hfst-fst2strings
yields the following results with different options:
options | output | note |
---|---|---|
--max-strings 3 |
a : b%BR%aa : bb%BR%aaa : bbb |
|
--nbest 4 |
a : b%BR%aa : bb%BR%aaa : bbb |
Epsilon path is counted as one path but is not printed. |
--cycles 5 |
a : b%BR%aa : bb%BR%aaa : bbb%BR%aaaa : bbbb%BR%aaaaa : bbbbb |
The paths of a transducer can be formatted in many ways, most of which have been copied from another systems, such as Xerox's XFST utility.
The output format is controlled with command line parameter -X
. The way epsilons are printed can be controlled with the option -e
.
Following examples use a transducer cat2mouse.hfst
that contains the language [c:m a:o t:u 0:s 0:e]
.
hfst-fst2strings cat2mouse.hfst
cat:mouse
hfst-fst2strings -X print-pairs cat2mouse.hfst
c:ma:ot:u:s:e
hfst-fst2strings -X print-space cat2mouse.hfst
c a t : m o u s e
hfst-fst2strings -X print-space -X print-pairs cat2mouse.hfst
c:m a:o t:u :s :e
hfst-fst2strings -e 0 cat2mouse.hfst
cat00:mouse
hfst-fst2strings cat.hfst
or
cat cat.hfst | hfst-fst2strings
Display all string pairs recognized by the transducer cat.hfst
.
The option nbest
does not work if the transducer has cycles with negative weights.