-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
apertium crashes with bad_alloc #95
Comments
Can you see what part of the pipeline causes it? Take the contents of the cat-spa.mode file (if you installed the package, it'll be in /usr/share/apertium/modes), remove the
|
It's the very first lt-proc that crashes. |
The input line is, and I kid you not, 7401 bytes without spaces:
|
If I clip the input to 6001 bytes, it works. 6002 bytes segfaults. Also, moving this to lttoolbox. |
Indeed. This is the backtrace I get when I hook gdb to that first lt-proc:
|
The 6000 doesn't seem to be a magic threshold. I found another one of these "sentences", except that the difference between okay and error is Attached: trouble.txt |
It'll vary depending on runs. The Buffer class which causes it can by default hold 2048 characters, so any more than 2 KiB garbage input is out-of-bounds. But the OS won't notice this until it goes out of the allocated pages' bounds. |
Here is the relevant code: https://github.com/apertium/lttoolbox/blob/master/lttoolbox/buffer.h#L74 |
I've attached an excerpt of about 25 lines that causes apertium to be busy for a long time and eventually crash with a bad_alloc exception.
Troublesome excerpt: apertium-bug-ca-es-bad-alloc.txt
I'm feeding apertium (with the cat-spa model) internet archive text to translate it line by line. The data itself is pretty noisy, which might give me garbage output, but that I'm fine with.
My "pipeline":
Apertium is compiled from the source in this repository, at 267d7555af270261916d89980250cf3cd7df8f0c.
The text was updated successfully, but these errors were encountered: