Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

22 compression level file is 100 times bigger than on default #2011

Closed
532910 opened this issue Feb 21, 2020 · 12 comments
Closed

22 compression level file is 100 times bigger than on default #2011

532910 opened this issue Feb 21, 2020 · 12 comments
Assignees
Labels

Comments

@532910
Copy link

532910 commented Feb 21, 2020

xsession-errors size is about 16G it contains a lot of repeated lines
xsession-errors.zst is 1.4M
xsession-errors-ultra.zst 442M

@felixhandte felixhandte self-assigned this Feb 21, 2020
@felixhandte
Copy link
Contributor

That's surprising! Can you share any more details about the data you're compressing and how you're invoking Zstd to compress it?

@Cyan4973
Copy link
Contributor

Cyan4973 commented Feb 21, 2020

We recently identified an efficiency issue with very large files (> 3 GB) and --ultra mode.
It is fixed in current dev branch (see #1957).
It can also be circumvented in any previous version by using the --single-thread command.

Could you try testing agin your file with :

  • the newer version in dev branch
  • adding --single-thread to your command line ?

@532910
Copy link
Author

532910 commented Feb 21, 2020

just zstd in the first case and zstd --ultra -22 in the second

@532910
Copy link
Author

532910 commented Feb 21, 2020

I'm using 1.4.4+dfsg-3 from debian.
With --single-thread the size is 1.3M
Should I try dev branch?

@Cyan4973
Copy link
Contributor

Yes please,
it should give you an equivalent result, but without the need to specify --single-thread.

@532910
Copy link
Author

532910 commented Feb 21, 2020

dev version output is also 1.3M but it's not absolutely identical:

1323322 xsession-errors2.zst # 1.4.4 + --single-thread
1323410 xsession-errors3.zst # dev without --single-thread

@532910
Copy link
Author

532910 commented Feb 21, 2020

dev ans 1.4.4 with --single-thread produce absolutely the same output

% md5sum xsession-errors*
5a8e11808e77df3e4c6fb21401055909  xsession-errors2.zst # 1.4.4 with single-thread
3d2e2d9d332de854be3cff657356539f  xsession-errors3.zst # dev without single-thread
5a8e11808e77df3e4c6fb21401055909  xsession-errors4.zst # dev with single-thread

@Cyan4973
Copy link
Contributor

OK, so it seems that we have our answer.
Huge files (>3 GB) currently suffer from an efficiency issue when combined with --ultra.
This will be fixed in next release.
In the meantime, one can use --single-thread to circumvent the issue.

@532910
Copy link
Author

532910 commented Feb 22, 2020

Is the different in output between with and without single-thread an new issue?

@Cyan4973
Copy link
Contributor

Cyan4973 commented Feb 22, 2020

No, it's always been the case.
We are just trying to keep the difference as small as possible,
but internally, there are some big differences in the data flow between multi-threaded and single-threaded compressions, so there's no point trying to produce exactly the same output.

@532910
Copy link
Author

532910 commented Feb 22, 2020

Yann, thank you for explanation.
Should I close this issue or you need it untill release?

@Cyan4973
Copy link
Contributor

I believe it can be closed.
The fix is already in, now it's just a matter to reach release status.

@532910 532910 closed this as completed Feb 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants