Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Controls to Constrain Window Size to Protocol Limits #4261

Open
wants to merge 4 commits into
base: dev
Choose a base branch
from

Conversation

felixhandte
Copy link
Contributor

@felixhandte felixhandte commented Jan 22, 2025

This PR adds convenience options that ensure Zstd produces compressed outputs compatible with protocols that place limits on the window size. Currently, it supports two protocols:

This option is exposed as a new CCtx param ZSTD_c_constrainWindowForProtocol and a corresponding CLI argument --constrain-window.

To-Do:

  • Add CCtx Param.
  • Add CLI Flags.
  • Add tests.

@felixhandte felixhandte force-pushed the constrain-window-for-protocols branch 2 times, most recently from ca836a3 to 82fcdce Compare January 24, 2025 20:50
This commit adds a new CCtxParam which lets a user specify a protocol whose
requirements they want the compressed frame to comply with.
@felixhandte felixhandte force-pushed the constrain-window-for-protocols branch from 82fcdce to 8366bb7 Compare January 24, 2025 20:56
@felixhandte felixhandte marked this pull request as ready for review January 24, 2025 20:56
@felixhandte felixhandte force-pushed the constrain-window-for-protocols branch from 8366bb7 to 48014ab Compare January 24, 2025 21:01
@felixhandte
Copy link
Contributor Author

cc @pmeenan, @nidhijaju. We'd talked about this being desirable for adoption of dcz. Take a look and make sure it meets your needs? It should go out in the next Zstd release which is coming very soon.

Copy link

@pmeenan pmeenan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just the one nit.

We also talked about an option to add the dictionary hash in the skippable frame automatically. I'm assuming that will come separately, maybe after the RFC publishes.

FWIW, the draft was approved for publication and is just held up by the shared brotli normative reference. The shared brotli draft was also just approved for publication but has a normative reference to highway hash which will probably drag out the actual publication a bit.


/* Constrains the window size to comply with the limits imposed on the
* `dcz` Content-Encoding, as specified by the Compression Dictionary
* Transport protocol.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth mentioning what the actual limit is?

Copy link
Contributor

@nidhijaju nidhijaju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks for putting this together!

*/
typedef enum {
/* Equivalent to disabling. */
ZSTD_ConstrainWindow_auto = ZSTD_ps_auto,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, what is the difference between "auto" and "disable", they seem basically the same? Do you see a situation where they aren't the same in the future?

The command line option also only has "none".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're just here by analogy to the ZSTD_paramSwitch_e enum, which is used for most cctx param options. They behave identically.

@felixhandte felixhandte force-pushed the constrain-window-for-protocols branch from 48014ab to 7c1caba Compare January 27, 2025 14:45
@Cyan4973 Cyan4973 assigned Cyan4973 and unassigned Cyan4973 Jan 28, 2025
wlog = 27; /* 128 MB */
} else {
dictSize += dictSize >> 2; /* dictSize *= 1.25 */
wlog = ZSTD_highbit32((U32)dictSize);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need something more accurate than a power of 2.

The Zstandard specification makes it possible to specify a window size which is a power of 2 + n * 1/8th of the power of 2. This is the reason why the *1.25 rule was selected, because it guarantees that such a window size, both larger the dictSize, but not by too much, necessarily exists.

If we stick to pure power of 2, such a window size may not exist.
For example, 12 MB window size will become 15 MB max, which will be round down to 8 MB.
As a consequence, the dictionary will only remain valid up to 8 MB.
In contrast, using the 1/8th increment, it would be possible to match the 15 MB maximum window size.

One of the issues though is that the current code base is unprepared to handle window sizes which are not power of 2. Even if it can, this information cannot be returned as wlog.
So we probably need some in-depth tech talk on this topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants