[doc] Remove Limitation that Compressed Block is Smaller than Uncompressed Content #1689

felixhandte · 2019-07-17T21:50:02Z

This changes the size limit on compressed blocks to match those of the other
block types: they may not be larger than the Block_Maximum_Decompressed_Size,
which is the smaller of the Window_Size and 128 KB, removing the additional
restriction that had been placed on Compressed_Blocks, that they be smaller
than the decompressed content they represent.

Several things motivate removing this restriction. On the one hand, this
restriction is not useful for decoders: the decoder must nonetheless be
prepared to accept compressed blocks that are the full
Block_Maximum_Decompressed_Size. And on the other, this bound is actually
artificially limiting. If block representations were entirely independent,
a compressed representation of a block that is larger than the contents of the
block would be ipso facto useless, and it would be strictly better to send it
as an Raw_Block. However, blocks are not entirely independent, and it can
make sense to pay the cost of encoding custom entropy tables in a block, even
if that pushes that block size over the size of the data it represents,
because those tables can be re-used by subsequent blocks.

Finally, as far as I can tell, this restriction in the spec is not currently
enforced in any Zstandard implementation, nor has it ever been. This change
should therefore be safe to make.

…essed Content This changes the size limit on compressed blocks to match those of the other block types: they may not be larger than the `Block_Maximum_Decompressed_Size`, which is the smaller of the `Window_Size` and 128 KB, removing the additional restriction that had been placed on `Compressed_Block`s, that they be smaller than the decompressed content they represent. Several things motivate removing this restriction. On the one hand, this restriction is not useful for decoders: the decoder must nonetheless be prepared to accept compressed blocks that are the full `Block_Maximum_Decompressed_Size`. And on the other, this bound is actually artificially limiting. If block representations were entirely independent, a compressed representation of a block that is larger than the contents of the block would be ipso facto useless, and it would be strictly better to send it as an `Raw_Block`. However, blocks are not entirely independent, and it can make sense to pay the cost of encoding custom entropy tables in a block, even if that pushes that block size over the size of the data it represents, because those tables can be re-used by subsequent blocks. Finally, as far as I can tell, this restriction in the spec is not currently enforced in any Zstandard implementation, nor has it ever been. This change should therefore be safe to make.

Cyan4973 · 2019-07-18T00:10:13Z

FYI :
@vivekmig just found an interesting corner case directly applicable to this part of the spec.

This is a reminder that Block_Maximum_Size is necessarily <= Window_Size,
but Window_Size is not always defined as 2 ^ WindowLog :
Window_Size can be == Frame_Content_Size when the Single_Segment flag is set.

And Frame_Content_Size can be any value. This situation frequently happens when Frame_Content_Size is very small btw, including 0 (zero).

In this last case, an RLE block, which always has a "compressed" size of 1, becomes an invalid block,
since its compressed size is now larger than Block_Maximum_Size (== 0).

This outcome is compatible with both wordings of the spec (past and present).
But decodecorpus was able to generate such block up to now,
and incorrectly classified them as "valid".
This used to pass tests, because the decoder would not check that the size constraint was respected.

@vivekmig 's work addresses this last point.

facebook-github-bot added the CLA Signed label Jul 17, 2019

felixhandte requested a review from Cyan4973 July 17, 2019 21:50

[doc] Bump Format Spec Version

919b991

Cyan4973 approved these changes Jul 17, 2019

View reviewed changes

felixhandte merged commit a2861d7 into facebook:dev Jul 17, 2019

felixhandte mentioned this pull request Jul 19, 2019

Merge v1.4.1 to Master #1691

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[doc] Remove Limitation that Compressed Block is Smaller than Uncompressed Content #1689

[doc] Remove Limitation that Compressed Block is Smaller than Uncompressed Content #1689

felixhandte commented Jul 17, 2019

Cyan4973 commented Jul 18, 2019 •

edited

Loading

[doc] Remove Limitation that Compressed Block is Smaller than Uncompressed Content #1689

[doc] Remove Limitation that Compressed Block is Smaller than Uncompressed Content #1689

Conversation

felixhandte commented Jul 17, 2019

Cyan4973 commented Jul 18, 2019 • edited Loading

Cyan4973 commented Jul 18, 2019 •

edited

Loading