Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error transducer has weird behaviour with lttoolbox #2

Open
ftyers opened this issue Apr 13, 2018 · 0 comments
Open

Error transducer has weird behaviour with lttoolbox #2

ftyers opened this issue Apr 13, 2018 · 0 comments

Comments

@ftyers
Copy link
Member

ftyers commented Apr 13, 2018

Some extra strings seem to be added to the transducer, e.g. Татарстан<np><top><attr><err_orth> which is not in the original.

$ echo "татарстан" | lt-proc tat.automorf.bin  | tr '/' '\n'
^татарстан
Татарстан<np><top><attr><err_orth>
Татарстан<np><top><nom><err_orth>
Татарстан<np><top><nom>+и<cop><aor><p3><pl><err_orth>
Татарстан<np><top><nom>+и<cop><aor><p3><sg><err_orth>$

$ echo "Татарстан" | lt-proc tat.automorf.bin  | tr '/' '\n'
^Татарстан
Татарстан<np><top><attr>
Татарстан<np><top><nom>
Татарстан<np><top><attr><err_orth>
Татарстан<np><top><nom><err_orth>
Татарстан<np><top><nom>+и<cop><aor><p3><pl>
Татарстан<np><top><nom>+и<cop><aor><p3><sg>
Татарстан<np><top><nom>+и<cop><aor><p3><pl><err_orth>
Татарстан<np><top><nom>+и<cop><aor><p3><sg><err_orth>$

Compare to hfst-lookup:

$ hfst-lookup tat.automorf.hfst 
татарстан
татарстан	Татарстан<np><top><attr><err_orth>	0,000000
татарстан	Татарстан<np><top><nom><err_orth>	0,000000
татарстан	Татарстан<np><top><nom>+и<cop><aor><p3><pl><err_orth>	0,000000
татарстан	Татарстан<np><top><nom>+и<cop><aor><p3><sg><err_orth>	0,000000

Татарстан
Татарстан	Татарстан<np><top><attr>	0,000000
Татарстан	Татарстан<np><top><nom>	0,000000
Татарстан	Татарстан<np><top><nom>+и<cop><aor><p3><pl>	0,000000
Татарстан	Татарстан<np><top><nom>+и<cop><aor><p3><sg>	0,000000

Note that this doesn't happen with lt-proc -c:

$ echo "татарстан" | lt-proc -c tat.automorf.bin  | tr '/' '\n'
^татарстан
Татарстан<np><top><attr><err_orth>
Татарстан<np><top><nom><err_orth>
Татарстан<np><top><nom>+и<cop><aor><p3><pl><err_orth>
Татарстан<np><top><nom>+и<cop><aor><p3><sg><err_orth>$

$ echo "Татарстан" | lt-proc -c tat.automorf.bin  | tr '/' '\n'
^Татарстан
Татарстан<np><top><attr>
Татарстан<np><top><nom>
Татарстан<np><top><nom>+и<cop><aor><p3><pl>
Татарстан<np><top><nom>+и<cop><aor><p3><sg>$

Although of course with lt-proc -c, other case forms won't work.

The extra strings can be easily removed with some CG rules for the <err_orth> tag, but they are undesirable for the model in general.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant