-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Block splitter control parameter #4180
Conversation
not yet exposed to the interface. Also: renames `useBlockSplitter` to `postBlockSplitter` to better qualify the difference between the 2 settings.
test both that the new parameter works as intended, and that the over-split protection works as intended
lib/zstd.h
Outdated
* to ensure expansion guarantees in presence of incompressible data. | ||
*/ | ||
#define ZSTD_BLOCKSPLITTER_LEVEL_MAX 6 | ||
#define ZSTD_c_blockSplitter_level ZSTD_c_experimentalParam20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Lets call this ZSTD_c_blockSplitterLevel
to match the naming convention of the rest of the parameters.
I think I'll keep both parameters, But it's necessary to change the names, so that it's less misleading. In particular, Current name in mind: |
from ZSTD_c_useBlockSplitter to ZSTD_c_splitAfterSequences.
Finally settled on |
Make it possible to explicit select a block splitter level, via a new
CCtx
parameterZSTD_c_blockSplitter_level
.This capability is then exploited in a test, ensuring that incompressible data is not overly split, even in presence of an adversarial input (with full knowledge of the sampling pattern).
Note: possible follow-up:
This PR just adds a new parameter, to control the behavior of the new block splitter.
It doesn't modify the existing parameters.
But as a consequence, there are now 2 parameters for block splitters,
one (legacy) that is controlling the post block-splitter (after sequences are determined)
and a new one that is controlling the new pre block-splitter (before sequences are produced).
Already, it's debatable if it's useful for a user to be exposed to these concepts.
More importantly, the distinction between pre and post block splitter is not clear from the current parameters' names (
ZSTD_c_useBlockSplitter
vsZSTD_c_blockSplitter_level
).So it opens the question of a refactoring of these parameters.
For example, maybe both parameters could be fused into a single one, the new
ZSTD_c_blockSplitter_level
, that would be charged to enable both when level is high enough.Or maybe there is still value in keeping both these parameters separated, for example for an optimizer tool which could more naturally influence both code paths and maybe find a better combination for some specific use case. In which case, it's probably still useful to debate about meaningful parameter names.