-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File compressed with wlog=25 gets corrupted #3350
Comments
This is surprising. For information, I tested the exact same command, using the Your report implies that you have observed this problem multiple times, that it's reproducible ? edit : I can confirm that the file provided as |
I will drop you binaries later.
Yes, I compiled dev branch with gcc and clang with the same result. I would compile windows version but my wine has trouble with these functions `InitializeConditionVariable', `SleepConditionVariableCS' on that computer. |
This is definitely unexpected... I'll take a look at the compressed file and see if I see anything obviously wrong. In the mean time, can you provide more information about your system? Anything you can think of would help, but at least:
|
I could test with the exact Windows is a bit more complex to validate, |
It is failing because an offset is too large, specifically My best guess is that your filesystem is somehow different, and maybe exposing some bug in the zstd CLI's I/O code. Can you try not writing the file to the filesystem, and decompressing it directly, e.g. this command:
That will help us narrow down the issue by taking the filesystem out of the picture. |
Can you also try with |
Here are binaries. I tried it with gcc 4.9.4, 6.5, 8.1, clang 3.8, 6. There are only two left, I thrown the rest. They all produced bit identical output, verified with md5sum. In the archive there is windows binary I cooked which produces correct output. This (windoze) program is single threaded. I suspect that to be a reason, at least for windows. If not then I don't know. It come to me after experiencing those crazy functions I never heard of. IMO there are simpler solutions to that. Even winpthreads, even for msvc. [ed] Here are results for these windows binaries, which you can check yourself:
What is funny, is that windows binaries released produce different output. |
One interesting point is that this operating system is 32 bit. |
As I said, although not precised that, these windows binaries (x32/x64) fail on 64-bit windows (10). All single threaded, on linux and windows were correct. PS. (As a side note) Is it normal that clang is faster with compression and gcc faster with decompression? |
I've reproduced the issue! Thanks @tansy for providing clear instructions for how to reproduce it! I was only able to reproduce it with the file provided by @tansy with SHA1
We'll get this debugged & fixed shortly! |
Yeah, they vary a bit. Every release there is a bit of noise, but that is what we currently observe. We're working on transitioning to compiling zstd with clang internally, and as part of that process one of the engineers working on clang is helping us get clang's decompression performance up to par with gcc.
|
The first bad commit is c90e81a. To work around this issue, please use release |
|
Fix an off-by-one error in the compressor that emits corrupt blocks if: * Zstd is compiled in 32-bit mode * The windowLog == 25 exactly * An offset of 2^25-3, 2^25-2, 2^25-1, or 2^25 is emitted * The bitstream had 7 bits leftover before writing the offset This bug has been present since before v1.0, but wasn't able to easily be triggered, since until somewhat recently zstd wasn't able to find matches that were within 128KB of the window size. Add a test case, and fix 2 bugs in `ZSTD_compressSequences()`: * The `ZSTD_isRLE()` check was incorrect. It wouldn't produce corruption, but it could waste CPU and not emit RLE even if the block was RLE * One windowSize was `1 << windowLog`, not `1u << windowLog` Thanks to @tansy for finding the issue, and giving us a reproducer! Fixes Issue facebook#3350.
PR #3361 should fix the issue. Specifically the patch to this line: zstd/lib/compress/zstd_compress.c Line 2627 in 6be3181
- const int longOffsets = cctxParams->cParams.windowLog > STREAM_ACCUMULATOR_MIN;
+ const int longOffsets = cctxParams->cParams.windowLog >= STREAM_ACCUMULATOR_MIN; |
Fix an off-by-one error in the compressor that emits corrupt blocks if: * Zstd is compiled in 32-bit mode * The windowLog == 25 exactly * An offset of 2^25-3, 2^25-2, 2^25-1, or 2^25 is emitted * The bitstream had 7 bits leftover before writing the offset This bug has been present since before v1.0, but wasn't able to easily be triggered, since until somewhat recently zstd wasn't able to find matches that were within 128KB of the window size. Add a test case, and fix 2 bugs in `ZSTD_compressSequences()`: * The `ZSTD_isRLE()` check was incorrect. It wouldn't produce corruption, but it could waste CPU and not emit RLE even if the block was RLE * One windowSize was `1 << windowLog`, not `1u << windowLog` Thanks to @tansy for finding the issue, and giving us a reproducer! Fixes Issue facebook#3350.
Fix an off-by-one error in the compressor that emits corrupt blocks if: * Zstd is compiled in 32-bit mode * The windowLog == 25 exactly * An offset of 2^25-3, 2^25-2, 2^25-1, or 2^25 is emitted * The bitstream had 7 bits leftover before writing the offset This bug has been present since before v1.0, but wasn't able to easily be triggered, since until somewhat recently zstd wasn't able to find matches that were within 128KB of the window size. Add a test case, and fix 2 bugs in `ZSTD_compressSequences()`: * The `ZSTD_isRLE()` check was incorrect. It wouldn't produce corruption, but it could waste CPU and not emit RLE even if the block was RLE * One windowSize was `1 << windowLog`, not `1u << windowLog` Thanks to @tansy for finding the issue, and giving us a reproducer! Fixes Issue #3350.
Please reopen if you find that the issue persists on the latest |
Two questions: Is -19 the only option to trigger this error? I tried some smaller level and it didn't. New test fails with old source at
Is it the error? |
Describe the bug
File compressed with wlog=25 gets corrupted
To Reproduce
I tried to compress silesia.tar with wlog=25. I went fine but when tested i reported error and decompression to stdout failed in the middle of the file.
I uploaded these files to the cloud. You can check them here:
Original, compressed: silesia.tar.lz
Corrupted zst: silesia.tar.zst
Expected behavior
To be able to decompress compressed file.
Desktop (please complete the following information):
Just tested brand new dev branch with the same result.
The text was updated successfully, but these errors were encountered: