Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extra bits on block are not reported as error. #3089

Open
klauspost opened this issue Mar 9, 2022 · 2 comments
Open

Extra bits on block are not reported as error. #3089

klauspost opened this issue Mar 9, 2022 · 2 comments

Comments

@klauspost
Copy link

klauspost commented Mar 9, 2022

Describe the bug

Extra bits on stream does not get reported, even after #1598

To Reproduce

Decompress: fea5d210d01530bfeb0130452ef432734b23e744.zst.gz
Execute zstd -d fea5d210d01530bfeb0130452ef432734b23e744.zst

Expected behavior

The sample contains 15 extra bits when the sequence has been decoded. These should be reported as an error.

zstd with debugging reports:

zstd-git\lib\decompress\zstd_decompress.c: ZSTD_decompressContinue (srcSize:17)
zstd-git\lib\decompress\zstd_decompress.c: ZSTD_decompressContinue: case ZSTDds_decompressBlock
zstd-git\lib\decompress\zstd_decompress.c: ZSTD_decompressContinue: case bt_compressed
zstd-git\lib\decompress\zstd_decompress_block.c: ZSTD_decompressBlock_internal (size : 17)
zstd-git\lib\decompress\zstd_decompress_block.c: ZSTD_decodeLiteralsBlock
zstd-git\lib\decompress\zstd_decompress_block.c: ZSTD_decodeLiteralsBlock : cSize=6, nbLiterals=5
zstd-git\lib\decompress\zstd_decompress_block.c: ZSTD_decodeSeqHeaders
zstd-git\lib\decompress\zstd_decompress_block.c: ZSTD_decompressSequences
zstd-git\lib\decompress\zstd_decompress_block.c: ZSTD_decompressSequences_body
zstd-git\lib\decompress\zstd_decompress_block.c: ZSTD_initFseState : val=44 using 6 bits
zstd-git\lib\decompress\zstd_decompress_block.c: ZSTD_initFseState : val=0 using 5 bits
zstd-git\lib\decompress\zstd_decompress_block.c: ZSTD_initFseState : val=18 using 6 bits
zstd-git\lib\decompress\zstd_decompress_block.c: seq: litL=1, matchL=62, offset=1
zstd-git\lib\decompress\zstd_decompress_block.c: regenerated sequence size : 63
zstd-git\lib\decompress\zstd_decompress_block.c: seq: litL=2, matchL=62, offset=1
zstd-git\lib\decompress\zstd_decompress_block.c: regenerated sequence size : 64
zstd-git\lib\decompress\zstd_decompress_block.c: seq: litL=2, matchL=33, offset=1
zstd-git\lib\decompress\zstd_decompress_block.c: regenerated sequence size : 35
zstd-git\lib\decompress\zstd_decompress_block.c: ZSTD_decompressSequences_body: after decode loop, remaining nbSeq : 0
zstd-git\lib\decompress\zstd_decompress.c: ZSTD_decompressContinue: decoded size from block : 162
zstd-git\lib\decompress\zstd_decompress.c: ZSTD_decompressContinue: decoded size from frame : 162
zstd-git\lib\decompress\zstd_decompress.c: ZSTD_decompressContinue: decoded size from frame : 162
zstd-git\lib\decompress\zstd_decompress.c: stage zdss_read
zstd-git\lib\decompress\zstd_decompress.c: neededInSize = 0
fea5d210d01530bfeb0130452ef432734b23e744.zst: 162 bytes

Also tested with v1.5.2 release.

It seems we "agree" on decompression:

2022/03/09 12:13:06 Literals: 5 hash: 51174239790196767 and 3 sequences.
2022/03/09 12:13:06 Seq 0 Litlen: 1 mo: 1 (abs) ml: 62 . bits remain: 49
2022/03/09 12:13:06 Seq 1 Litlen: 2 mo: 1 (abs) ml: 62 . bits remain: 31
2022/03/09 12:13:06 Seq 2 Litlen: 2 mo: 1 (abs) ml: 33 . bits remain: 15
2022/03/09 12:13:06 error after: 15 extra bits on block, should be 0

Desktop (please complete the following information):

  • OS: Windown 10, 64 bits.
  • Compiler Visual Studio

Edit: Second example, 0 sequences, but 4 bytes left on stream after literal decoding: 447902c8e4faf807f9b8f9c1861abe7990c476dd.zst.gz

@terrelln
Copy link
Contributor

Thanks for the bug report!

We'll aim to fix this one, but we're okay with not reporting all types of corruption. Our policy is that if you want error detection, enable checksumming. That said, we still like to report corruption whenever we can.

The differential fuzzing you're doing is super interesting! Glad to see that there is another robust enough implementation out there that differential fuzzing can reveal bugs in the reference decoder!

@klauspost
Copy link
Author

One thing I've learnt from the decoder is that it checks everything :)

This should be a good (but not reliable) corruption detection method. Of course not as reliable as CRC, but it can also be seen as a good supplement for the chance of the 32 bit CRC colliding, just like the other checks.

Yeah. It is not a clean-room implementation, but still based on the spec and with peeks at the code for the "not-clear-enough-for-me" parts. Most of the remaining mismatches are because of how the wrapper is implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants