Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HFSTTokenizer chokes on input longer than 550(?) characters #49

Open
reynoldsnlp opened this issue Oct 5, 2020 · 2 comments
Open

HFSTTokenizer chokes on input longer than 550(?) characters #49

reynoldsnlp opened this issue Oct 5, 2020 · 2 comments
Labels
bug Something isn't working

Comments

@reynoldsnlp
Copy link
Owner

The interactive shell (accessed using pexpect) appears to limit line lengths over 550 (not really sure about this number) characters. If more are given, then bell characters (ascii codepoint 7, displayed as ^G in less) are printed to the logfile and pexpect hangs because it gets no output.

@reynoldsnlp reynoldsnlp added the bug Something isn't working label Oct 5, 2020
@reynoldsnlp
Copy link
Owner Author

Submitted issue to HFST about this: hfst/hfst#483.

The maximum buffer size appears to be 1024 bytes, so a workaround could check len(bytes(input_str, encoding='utf8')) < 1000, and use a regular subprocess to process that string. This check shouldn't be too expensive.

@reynoldsnlp
Copy link
Owner Author

Workaround implemented in 765a2af.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant