Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[doc] Remove Limitation that Compressed Block is Smaller than Uncompressed Content #1689

Merged
merged 2 commits into from
Jul 17, 2019

Conversation

felixhandte
Copy link
Contributor

This changes the size limit on compressed blocks to match those of the other
block types: they may not be larger than the Block_Maximum_Decompressed_Size,
which is the smaller of the Window_Size and 128 KB, removing the additional
restriction that had been placed on Compressed_Blocks, that they be smaller
than the decompressed content they represent.

Several things motivate removing this restriction. On the one hand, this
restriction is not useful for decoders: the decoder must nonetheless be
prepared to accept compressed blocks that are the full
Block_Maximum_Decompressed_Size. And on the other, this bound is actually
artificially limiting. If block representations were entirely independent,
a compressed representation of a block that is larger than the contents of the
block would be ipso facto useless, and it would be strictly better to send it
as an Raw_Block. However, blocks are not entirely independent, and it can
make sense to pay the cost of encoding custom entropy tables in a block, even
if that pushes that block size over the size of the data it represents,
because those tables can be re-used by subsequent blocks.

Finally, as far as I can tell, this restriction in the spec is not currently
enforced in any Zstandard implementation, nor has it ever been. This change
should therefore be safe to make.

…essed Content

This changes the size limit on compressed blocks to match those of the other
block types: they may not be larger than the `Block_Maximum_Decompressed_Size`,
which is the smaller of the `Window_Size` and 128 KB, removing the additional
restriction that had been placed on `Compressed_Block`s, that they be smaller
than the decompressed content they represent.

Several things motivate removing this restriction. On the one hand, this
restriction is not useful for decoders: the decoder must nonetheless be
prepared to accept compressed blocks that are the full
`Block_Maximum_Decompressed_Size`. And on the other, this bound is actually
artificially limiting. If block representations were entirely independent,
a compressed representation of a block that is larger than the contents of the
block would be ipso facto useless, and it would be strictly better to send it
as an `Raw_Block`. However, blocks are not entirely independent, and it can
make sense to pay the cost of encoding custom entropy tables in a block, even
if that pushes that block size over the size of the data it represents,
because those tables can be re-used by subsequent blocks.

Finally, as far as I can tell, this restriction in the spec is not currently
enforced in any Zstandard implementation, nor has it ever been. This change
should therefore be safe to make.
@felixhandte felixhandte merged commit a2861d7 into facebook:dev Jul 17, 2019
@Cyan4973
Copy link
Contributor

Cyan4973 commented Jul 18, 2019

FYI :
@vivekmig just found an interesting corner case directly applicable to this part of the spec.

This is a reminder that Block_Maximum_Size is necessarily <= Window_Size,
but Window_Size is not always defined as 2 ^ WindowLog :
Window_Size can be == Frame_Content_Size when the Single_Segment flag is set.

And Frame_Content_Size can be any value. This situation frequently happens when Frame_Content_Size is very small btw, including 0 (zero).

In this last case, an RLE block, which always has a "compressed" size of 1, becomes an invalid block,
since its compressed size is now larger than Block_Maximum_Size (== 0).

This outcome is compatible with both wordings of the spec (past and present).
But decodecorpus was able to generate such block up to now,
and incorrectly classified them as "valid".
This used to pass tests, because the decoder would not check that the size constraint was respected.

@vivekmig 's work addresses this last point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants