-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zstd: Decoder assembly (tracking) #515
Comments
@klauspost this will be in my next PR, I've already done this. I also implemented in asm an optimisitc path for copying from literals and output. I wrote in asm the |
@WojciechMula Sounds great. Looking forward to that! |
@WojciechMula I fixed a decompression issue in #569 - but other than that it seems to me we are good to do a release. #543 can come later. Do you have any blockers before we release? |
@klauspost Thanks for fixing the issue. I agree that translation into avo can be done later, I don't have anything that blocks the release. |
Thank you @klauspost for the great support and testing! It took more than I expected (heh, as always), but the final result is really nice. And I hope it enables further speeding up. |
@WojciechMula Thanks for taking the lead on this. You definitely made sure that it was added much sooner than I hoped to. |
The decompressor rewrite #498 was also targeted at providing assembly rewrites of important parts. Splitting this into a separate function was to make it easy to assembler optimize.
The major one is the sequence execution: https://github.com/klauspost/compress/blob/master/zstd/seqdec.go#L244
A bit of a harder task is the sequence decoding: https://github.com/klauspost/compress/blob/master/zstd/seqdec.go#L102
Right now these 2 functions split execution time approximately 50/50, so both would need to be faster to make a significant difference.
The "combined" version of this will impact single-threaded decoding: https://github.com/klauspost/compress/blob/master/zstd/seqdec.go#L341
My hope was that doing each of the above in modular avo would allow me to combine them into the sync decoder as well. It is basically the two functions above without any intermediate state.
The text was updated successfully, but these errors were encountered: