Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix chaintable check to include rowhash in ZSTD_reduceIndex() #2598

Merged
merged 1 commit into from
May 3, 2021

Conversation

senhuang42
Copy link
Contributor

@senhuang42 senhuang42 commented May 1, 2021

The row hash strategy doesn't use a chainTable, similarly to ZSTD_fast. Therefore, existing checks that strategy != ZSTD_fast whose actual intent it was to check for a chaintable, are no longer accurate, and all existing sites must be migrated to ZSTD_allocateChainTable(). I didn't see any more instances of this in the codebase.

In this case, ZSTD_reduceIndex() with row hash could actually end up trying to access memory outside the cctx in cases where the chainLog was large enough (even though that's not a param row hash actually uses).

ZSTD_reduceIndex() actually has just the right params for us to use, so the fix is fairly straightforward. The params passed are appliedParams, so the matchfinder mode won't be ZSTD_urm_auto (and we assert that in ZSTD_allocateChainTable()).

Repro of the bug:

yes | ./zstd -5 --single-thread -v -c --zstd=clog=30 -o /dev/null                                                                                                                                               ✔  19:44:30

*** zstd command line interface 64-bits v1.4.10, by Yann Collet ***
(L5) Buffered :   0 MB - Consumed :3583 MB - Compressed :   0 MB => 0.01% Caught SIGSEGV signal, printing stack:
4   zstd                                0x000000010a4bdbaf ZSTD_compressContinue_internal + 1103
5   zstd                                0x000000010a4c34a0 ZSTD_compressStream2 + 1424
6   zstd                                0x000000010a5eb4c2 FIO_compressFilename_srcFile + 3586
7   zstd                                0x000000010a5e8e88 FIO_compressFilename + 136
8   zstd                                0x000000010a5f948d main + 21805
9   libdyld.dylib                       0x00007fff204ae621 start + 1
[1]    38461 broken pipe         yes | 
       38462 segmentation fault  ./zstd -5 --single-thread -v -c --zstd=clog=30 -o /dev/null

@Cyan4973
Copy link
Contributor

Cyan4973 commented May 1, 2021

Would that make sense to add a test that cover this use case ?

We already have a category "large data test".
Could that fit in ? Or would such a test need an unreasonably long time ?

Alternatively, could we retrofit one of these tests to cover this case ?
I'm surprised the issue was not detected by these tests, but I presume none of the tests large enough to trigger ZSTD_reduceIndex() was using rowHash strategy due to speed concerns.

@senhuang42
Copy link
Contributor Author

Would that make sense to add a test that cover this use case ?

We already have a category "large data test".
Could that fit in ? Or would such a test need an unreasonably long time ?

Alternatively, could we retrofit one of these tests to cover this case ?
I'm surprised the issue was not detected by these tests, but I presume none of the tests large enough to trigger ZSTD_reduceIndex() was using rowHash strategy due to speed concerns.

I think @terrelln mentioned that he might add some more fuzzer tests. #2601 adds a simple test that would catch this.

@senhuang42 senhuang42 merged commit cc31bb8 into facebook:dev May 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants