-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zstd multithreaded output can depend on number of threads #2327
Comments
It's a shortcut to say that the outcome of multithreaded Actually, the feature supported is that the outcome of streaming multithreaded This definition makes it possible to consider another potential fix : This could be less disruptive than trying to adapt the single-pass MT compressor, Another (potentially positive) side effect is that it would guarantee that streaming multithreaded compression is always non-blocking, since it would no longer delegate to the (blocking) single-pass mode. |
I once wanted to propose adding a If the caller keeps checking the non-blocking progress, edit: Just found, checking the progress is not very inconvenient: do {
zstd_ret = ZSTD_compressStream2(self->cctx, &out, &in, ZSTD_e_continue);
} while (out.pos != out.size && in.pos != in.size && !ZSTD_isError(zstd_ret)); But it's better to have an always blocking |
Yeah, that is probably easier. I had forgotten that all the jobs in the single pass MT compressor needed to be launched at once.
Generally, the way people write streaming compression loops, it shouldn't be terribly inconvenient to not make maximal forward progress. If we were to add something like this, it wouldn't require a new API. We'd probably just need to add a compression parameter to control it. But, I don't currently see a great need for it. |
Simplifies the code and removes blocking from zstdmt. At this point we could completely delete `ZSTDMT_compress_advanced_internal()`. However I'm leaving it in because I think we want to do that in the zstd-1.5.0 release, in case anyone is still using the ZSTDMT API, even though it is not installed by default. Fixes facebook#2327.
Describe the bug
As reported by @animalize in Issue #2238:
When using ZSTD_e_end end directive and output buffer size >= ZSTD_compressBound() the job number is calculated by ZSTDMT_computeNbJobs() function. This function produces a different number of jobs depending on
nbWorkers
:zstd/lib/compress/zstdmt_compress.c
Lines 1243 to 1255 in b706286
Expected behavior
The output of zstd multithreaded compression must be independent of the number of threads.
Fix
ZSTDMT_computeNbJobs()
independent ofnbWorkers
.Workaround
If you need to work around this bug, don't start your streaming job with ZSTD_e_end. Pass at least one byte of input with ZSTD_e_continue before calling ZSTD_e_end, or ensure your output buffer is
< ZSTD_compressBound(inputSize)
.The text was updated successfully, but these errors were encountered: