-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
compress/flate: very memory intensive #32371
Comments
On my phone, but most gzip (and maybe flate?) types have a Reset method that you can call to allow their re-use. That should help considerably. If there’s a reason that you can’t use Reset, or a critical type is missing a Reset option, please detail that. |
Yes they both do have a Reset method and it does help. However, the amount of memory used by flate is still very excessive compared to compress/gzip or compress/zlib. |
Since both gzip and zlib is using deflate and store the But, let's put this into context. The deflate compression does do a lot of upfront allocations. The allocations are done so standard operation can be done without additional allocations, and why 'Reset' is available to reuse. As I wrote on the gorilla ticket: For compression level 1 (fastest), this also means a lot of un-needed allocations. I made an experiment a couple of years ago to be more selective about allocations. This is mainly for use when using A simpler optimization could be: For level 1, 0 and -2 the following arrays in the compressor are not needed: |
Makes sense.
For writing such small messages, I think 1.2 MB is a very large price to pay. I'm not an expert in compression algorithms but can't the buffers just be adjusted to grow as needed dynamically instead of always allocating so much?
That would make a massive difference but still leaves 560 KB per writer. That still seems very excessive to me for the WebSocket use case. |
Here is a simpler version of the PR above: klauspost/compress#107 It is fairly risk-free (compared to the other), so it should be feasible for a merge soon. |
That would come at a massive performance cost. The big allocations a hash table and chain table. The hash table is sort of a In the stdlib "level 1" uses its own (smaller, 128KB) hash table, so the allocations made for the more expensive levels are not used.
Let's break down the rest: 64KB is allocated in Finally the last thing I can think of is the output buffer which is 256 bytes. Not much to gain there. Let me see if I can fix up your benchmark to get some real numbers. |
The A reasonable addition would be a |
I have updated klauspost/compress#107 with the actual numbers remaining and here is a gist to a realistic benchmark: https://gist.github.com/klauspost/f5df3a3522ac4bcb3bcde448872dffe6 Most of the remaining allocations are for the Huffman table generators, which is pretty unavoidable no matter your input size. Again, note that "level 2" in my lib is the "level 1" in stdlib. So yes, the baseline for stdlib is about 540K. If you switch to my lib, it is about 340KB for level 1. |
Some exciting updates from @klauspost in gorilla/websocket#203 (comment) |
If you run that, you'll get:
1.2 MB per writer and 45 KB per reader is a lot, especially for usage with WebSockets where most messages are rather small, often on average 512 bytes. Why is compress/flate allocating so much and is there a way to reduce it?
gzip (along with zlib though I didn't include it in the benchmark) use much less memory.
Related:
The text was updated successfully, but these errors were encountered: