-
-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zlib compression needs to be sync #1369
Comments
This issue nodejs/node#8871 (comment) is the problem in ws and yes it will probably go away if sync calls are used but it's still valid as the async usage is what is actually recommended. As written in the comment a different allocator can work around the issue and in fact, there is no leak on macOS. I don't know if it's viable to use the Node.js zlib sync methods. It will certainly require extensive testing and a big production environment to validate the results. |
I'm not going to argue with you either. I've tried in the past, it's useless, there is no constructive discussion. |
Again, experimentation with this in a production environment would be greatly appreciated. Just back it up with numbers. |
Where? BitMEX? In that case why isn't @STRML proposing a patch? They proposed multiple patches in the past to work around the issue. Or are they no longer interested now since they are using uws? Open a PR, convince maintainers that it is the right thing to do (with numbers), PR merged, discussion ends, all users of ws benefit from this. How awesome is that. |
Hey @lpinca - sorry for not updating you further. I'm happy to contribute a sync patch and put it through our benchmarks for completeness' sake. Even with concurrency=1 we still saw fragmentation-related blowups pretty much daily in production. I have not yet tested if sync will fix it but it rings true. Unfortunately this is a difficult bug to reproduce in test. |
Cool, looking forward to that. My biggest fear is that using sync calls will fix the memory fragmentation issue but introduce other issues like blocking the event loop for too long. This is what I meant above with "will probably go away". |
Yeah it depends on how you've clustered out your app. If you're doing one app per hyperthread it's already pretty close to counterproductive to thread out zlib, and the fragmentation bug takes it over the edge. |
Well this is why many computationally intensive tasks are delegated to the libuv thread pool no? |
Well they can be. Zlib is a bit of an outlier because the standard task delegated to the libuv pool is IO, not CPU. If you assume that any process can actually spawn 1-5 CPU-intensive processes this completely breaks assumptions when clustering. Ideally you want to spawn one cluster worker per hyperthread otherwise. |
And here we are again with that arrogant attitude. The point is that it's not ws fault. Any Node.js app/library that uses the asynchronous zlib deflate methods is potentially affected by that memory fragmentation issue. Is it viable to use the sync methods without incurring in other issues? I don't know. Very happy if this is the case as the code will actually be also a lot easier to maintain. |
Issue should stay open, and I already said thank you for your suggestion, not sure what else should I do haha. Please explain me exactly what is ws doing wrong because I'm shortsighted. Is using a recommended and public API ws fault? Is it ws fault if that API leads to memory fragmentation? |
So, #1204 does not fix the issue completely, right? |
@luislobo according to the above comments, yes that's right. |
After some testing I saw the async version of gzip takes up to 10 times more than the sync version according to the content size. 100KB: sync version takes 5ms and async version 20ms As websocket usually is used for small chunk of json data I think regardless of the memory leak it's better to use sync deflate for small data considering under 1ms operations are neglectable. I suggest adding another threshold option for sync compression (syncThreshold maybe) with default value of 10KB or more. this way anyone who wants to go totally sync can increase this value. |
Sorry for not having read it properly a minute ago. 👍 for the threshold option. |
There should be two thresholds, one to enable compression and another to do
it async... Sometimes, gzip compressed data gets bigger than the original
one for small data chunks, and also there's the CPU usage for the
conpression.
El sáb., 25 de agosto de 2018 0:46, Alex Hultman <notifications@github.com>
escribió:
… As websocket usually is used for small chunk of json data I think
regardless of the memory leak it's better to use sync deflate for small
data considering under 1ms operations are neglectable.
Bingo.
At these small messages (most users send less than 500 bytes a message)
the threading, eventing, copying and synchronization overhead it means to
"make it async" becomes a bigger cost than the actual compression itself.
Making it "async" only fools oneself as it accomplishes nothing of value.
I would never put compression as "async" because it simply is logically
impossible to do so, but I also don't care enough to argue anymore. Setting
a threshold would solve it enough to be usable.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1369 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAgfviJ5nirilZGMU4_CHPXS9_BNVwMyks5uUIJjgaJpZM4TgILB>
.
|
I'm assuming you toss the compressed variant if it turns out bigger than the original..... |
I'll assume that for each length of input, gzip achives maximum efficiency if and only if all input bytes are the same. In this case, and using values from 0…127, the break-even is at 23 bytes. |
Node.js does not have an official API to use inflate/deflate streams synchronously. We can't use https://nodejs.org/api/zlib.html#zlib_zlib_deflaterawsync_buffer_options as that does not work as intended for fragmented messages of if context takeover is enabled. We need the ability to reuse the same https://nodejs.org/api/zlib.html#zlib_class_zlib_deflateraw instance. In order to use a deflate stream synchronously we need to copy this internal function https://github.com/nodejs/node/blob/5d210d4ac938a16d132a1433857927c5c58a8954/lib/zlib.js#L459-L542 but ofc that can change at any time. As said in #1369 (comment) experimentation (backed by numbers) is welcomed. |
Yes, but where is the code that produced those results? How many deflate instances were used (we have one per socket)? How long is the event loop blocked when doing that for say 50k clients? I would like to see this kind of stuff. |
@rahbari can you please share your code? I did a dummy and irrelevant quick test and this is what I get: Code'use strict';
const crypto = require('crypto');
const zlib = require('zlib');
const buf = crypto.randomBytes(2048);
console.time('deflate sync');
for (let i = 0; i < 1000; i++) {
zlib.deflateRawSync(buf);
}
console.timeEnd('deflate sync');
const deflate = zlib.createDeflateRaw();
deflate.resume();
let runs = 0;
function runAsync() {
deflate.write(buf);
deflate.flush(() => {
if (++runs === 1000) {
console.timeEnd('deflate async');
return;
}
runAsync();
});
}
console.time('deflate async');
runAsync(); Results
|
I think it would be nicer to run the benchmark some few hundered times in the same nodejs process for warmup, only then collect measurements. v8 might be able to optimize functions, once it knows we want to use them a lot. |
@piranna please read the full comment. That creates a new instance for each buffer, see https://github.com/nodejs/node/blob/5d210d4ac938a16d132a1433857927c5c58a8954/lib/zlib.js#L692-L706. We can't use that for fragmented messages or if context takeover is enabled. |
But does compression being done at fragment level? Isn't it message level? |
It is done at the frame/fragment level, yes. |
The option to enable compression based on the message byte size is available since version 2.0.0, see https://github.com/websockets/ws/blob/1e78aba9b9b29407517ec7ba1da97427c1678e5c/doc/ws.md#new-websocketserveroptions-callback
|
I noticed that there is const deflate = new zlib.DeflateRaw();
const data = deflate._processChunk(chunk, zlib.Z_SYNC_FLUSH); |
I made the following code to reproduce the problem (still happening on alpine linux or debian buster with ws@7.2.0 and node 12 or 13). const WebSocket = require('ws');
const wss = new WebSocket.Server({
port: 8080,
perMessageDeflate: {
zlibDeflateOptions: {
memLevel: 3,
},
serverMaxWindowBits: 10,
threshold: 16,
},
});
wss.on('connection', (ws) => {
ws.on('message', (message) => {
ws.send(message);
});
ws.on('close', () => {
console.log(`close, ${wss.clients.size} listeners left`);
});
});
const message = Buffer.alloc(256);
function sendMessage(ws, j) {
if (j < 100) {
setTimeout(() => {
ws.send(message, sendMessage.bind(null, ws, j + 1));
}, 16);
} else {
ws.close();
}
}
for (let i = 0; i < 1000; ++i) {
const ws = new WebSocket('ws://localhost:8080/');
ws.on('open', () => {
sendMessage(ws, 0);
});
} I tried to make the zlib compression sync, but I only managed to break the unit tests. However from an external point of view of the By the way, the memory leak doesn't happen on Windows 10. I don't feel good about deploying windows servers to fix a memory leak though. |
Would it help to deflate/inflate buffers instead of the stream? That way we could use zlib.deflateSync. ("Added in: v0.11.12") |
The difficulty is to support "context takeover". The RFC defines it like this :
In my opinion, context takeover is what makes websocket compression very cool. You can compress based on the previous messages, which is very convenient when you have many similar messages which I think is very common with websockets. So, |
@fungiboletus I can't reproduce on Ubuntu 18.04 using your example and Node.js 13.1.0: server.js'use strict';
const WebSocket = require('ws');
const message = Buffer.alloc(256);
const wss = new WebSocket.Server({
port: 8080,
perMessageDeflate: {
zlibDeflateOptions: {
memLevel: 3
},
serverMaxWindowBits: 10,
threshold: 16
}
});
function onClose() {
if (wss.clients.size === 0) {
console.log('connections: 0');
}
}
function broadcast() {
console.log('Broadcasting message');
for (const ws of wss.clients) {
ws.send(message);
ws.close();
}
}
wss.on('connection', (ws) => {
if (wss.clients.size === 1000) {
console.log('connections: 1000');
broadcast();
}
ws.on('close', onClose);
});
wss.on('listening', function() {
console.log('Listening on *:8080');
});
setInterval(function() {
console.log(process.memoryUsage());
}, 10000); clients.js'use strict';
const WebSocket = require('ws');
const total = 1000;
let connected = 0;
function connect() {
const ws = new WebSocket('ws://localhost:8080/');
ws.on('open', function() {
if (++connected < total) connect();
ws.on('close', function() {
if (--connected === 0) setTimeout(connect, 5000);
});
});
}
connect(); Memory usage stabilizes at ~100 MiB. |
Why not send and receive compressed buffer in userland? |
|
Greetings, I am experimenting with synchronous zlib compression in (windows 10 build 19041, node.js 14.8.0, intel i5 7300HQ 2.5ghz) > node speedzlib.js
Generating 100 MiB of test data...
Testing ws on [::]:8181
10000 roundtrips of 64 B binary data: 2.9s 437.94 KiB/s
5000 roundtrips of 16 KiB binary data: 2.8s 55.45 MiB/s
1000 roundtrips of 128 KiB binary data: 13.5s 18.48 MiB/s
100 roundtrips of 1 MiB binary data: 9.6s 20.87 MiB/s
1 roundtrips of 100 MiB binary data: 9.4s 21.23 MiB/s
10000 roundtrips of 64 B text data: 3.6s 352.02 KiB/s
5000 roundtrips of 16 KiB text data: 2.6s 59.87 MiB/s
1000 roundtrips of 128 KiB text data: 13.2s 18.95 MiB/s
100 roundtrips of 1 MiB text data: 9.8s 20.34 MiB/s
1 roundtrips of 100 MiB text data: 9.5s 20.96 MiB/s
> npm install fast-zlib
> node speedzlib.js
Generating 100 MiB of test data...
Testing ws on [::]:8181
10000 roundtrips of 64 B binary data: 1.1s 1.11 MiB/s
5000 roundtrips of 16 KiB binary data: 1.5s 101.33 MiB/s
1000 roundtrips of 128 KiB binary data: 10.1s 24.77 MiB/s
100 roundtrips of 1 MiB binary data: 7.3s 27.53 MiB/s
1 roundtrips of 100 MiB binary data: 8s 24.93 MiB/s
10000 roundtrips of 64 B text data: 1.2s 1.05 MiB/s
5000 roundtrips of 16 KiB text data: 1.6s 96.19 MiB/s
1000 roundtrips of 128 KiB text data: 9.7s 25.65 MiB/s
100 roundtrips of 1 MiB text data: 7.7s 26.09 MiB/s
1 roundtrips of 100 MiB text data: 8.2s 24.29 MiB/s Synchronous zlib appears to provide performance gains of 20% to 40%+ depending on packet size. If anyone wants to experiment with it and test for memory issues, the code is available in my fork of Cheers! |
The problem as I see it is doing this synchronously when there are thousands of clients. Yes, it will be slower but doing it asynchronously does not block the event loop. The benchmark above uses only a single client and reflects the results of running this benchmark #1369 (comment). There is no need to use One more thing, it doesn't make sense to compress small messages. It is actually counterproductive. This is why |
Indeed, i intentionally disabled the threshold to test zlib in all situations, to avoid having mixed compressed and non-compressed results |
What about a flag to left user decide to use them sync or async? |
It's an option but it complicates everything :(. The user can already compress the message synchronously before sending it in a WebSocket message. It's not the same but close enough. This also allows to optimize a broadcast a lot because it allows to create only one frame and send the same frame to all clients. See discussion on #617. |
What is this based on? Probably true for stock trading apps but 99% of all WS apps seems a bit too broad. The default value can be changed but:
|
I've seen and worked on WS apps used to transfer huge files, used to stream audio/video data, etc. Those apps used messages way bigger than 1 KB (64 KB / 16 KB). |
Yes some use it like that. Strange. But it doesn't invalidate the observation that the vast majority of use cases don't. And it doesn't matter, really, all that matters is that small message sending is a huge use case with WS. So having an optimized/stable execution path that works well for small messages is important. That's all. |
I agree, we need an optimized path for applications that are basically signalling or chat, because that's how I will primarily use it. (Of course we shall still support large media transfer as well.) |
@mk-pmb to stay on topic, from my experience compressing very small messages (regardless of how compression is done, sync or async) does not make an app faster but slower. If you are not happy with |
Thanks for reminding. I should have expressed myself more clearly. I meant that even in scenarios where network traffic is the bottle neck, where a few bytes saved make a difference, and it's thus acceptable to invest some more CPU cycles, our compression should still try to be economical in how many extra CPU cycles users would need to invest. You're right that most of my chat-like apps won't usually run in such transmission-restricted circumstance though. |
The discussion about the performances is interesting, but I want to mention that the memory leak is still there with ws@7.3.1 and node 14.9.0. The current implementation may be faster in some uses cases and slower in some others, but it does leak memory. |
@fungiboletus I was not able to reproduce the memory fragmentation issue using your code above, see #1369 (comment). Also it should be better now as nodejs/node#34048 mitigated the original issue. |
I've read and followed the zlib "memory fragmentation" issue down to where people blaming the Linux kernel and glibc and whatnot.
My two cents:
The issue is with websockets/ws.
Doing CPU-bound tasks "async" is nothing more than a delusion - there is no such thing as an async CPU-bound task. If it takes 1 second of CPU-time, it takes 1 second of CPU time no matter when you perform it. Passing a CPU bound task to some thread pool doesn't improve anything in fact it only adds more overhead:
If you cannot keep up with what you buffer up, you're obviously going to blow the process up from too much memory usage since you produce more than you consume.
You have two solutions:
OR if you really want to try and use multithreading here (which goes against how Node.js is built and deployed):
The text was updated successfully, but these errors were encountered: