-
Notifications
You must be signed in to change notification settings - Fork 883
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
question: benefit of worker thread? #2077
Comments
There are no performance benefits in using transports to write to a file. The main benefit is to allow external transports (Elastic, Loki, syslog, etc) to be able to flush data before the main process exits, preventing the loss of data. This also increase performance by reducing the memory load in the main thread, and allowing better batching. However, I think you should be able to write to the same file from different threads, as long as you write to the same file descriptor. I have never seen an issue with stdout, but that might be special. |
Thanks for helping me understand the motivation behind the transport thread. I'll do a little experimentation this week and see if I have anything to add; if not, I'll close this, as it's already been helpful. |
i'm going to try using a single file handle that we pass to the esm-loader thread to see if a shared file handle stays in sync, but i suspect it will be a problem unless both threads are using Atomics on the same shared memory. presuming that doesn't work, i see a couple of alternatives:
the second approach seems like a more natural fit because the mechanism to place things in a shared array buffer and to synchronize access to it is already present. but i don't think i can pass the entire pino transport to a thread using messaging because it has methods associated with it - so i'd need something like "createTransportUsingThisSharedArrayBuffer". i'm still trying to familiarize myself pino overall; i didn't implement what we have now and am trying to decode what is possible with pino and how we should be use it simultaneously. net: i'd like to keep this open a bit and will continue to update it. |
If the same file descriptor is used in both threads, things work correctly. Makes sense in that the FD is an index into the control structure and there is only one per process. It's possible that there could still be a problem if one thread wrote more than whatever buffer size is available for it but the second thread tried to write before the first thread was scheduled to complete the write (not sure of the low-level implementation, so this may not be possible). But for our application, it has worked well so far. The negative is that there are still N worker threads and N shared buffers where N is the number of threads that create a pino logger to a single file. @mcollina - does this seem like a common (or increasingly common) use case that it would be useful to have a solution for? If so, I'm willing to spend some time trying to come up with a solution. I'm still figuring out pino, so any guidance would be appreciated, but I'm thinking something like: "if the file/destination (or another mutually exclusive property) is a shared array buffer then short-circuit spawning the worker and just use that as the target". There may be some knock-on problems that come from this, but it seems like a the right basic idea. What do you think? |
I still don't understand why you are using the transports to write to a file, instead of just using |
I might well be missing the obvious part of this, but (I think) the reason the code was originally using transports was so it could write to multiple destinations - both a log file and stdout, if configured. Is there a better way to handle that? If so, I just missed it and will be thrilled to change things in our code. I have managed to write to the file descriptor using the transports as they are now, so I've been able to demonstrate that both threads are using the same descriptor. I did this by passing the FD as the destination in the |
We use pino and are quite happy with the its performance and function. I recently spent some time walking through it and have some questions to 1) check my understanding of what's going on and 2) understand the benefit of spawning multiple worker threads that write to a single file. I understand that point 2 might be an uncommon case; we encounter it because we invoke pino in both node's ESM loader thread and in the main thread. Both write to the same log file.
From my reading:
So the only work that seems to be done in the worker thread is to write to the stream (and the Atomics synchronization).
Considering that the worker thread then relies on libuv to handle the actual disk IO, it seems like the worker thread is another layer that introduces some latency, memory, synchronization, scheduling, etc.
The benefit I see is that it allows multiple threads to write to the same file transparently without having any central controller.
My main question is that the docs suggests it's about performance, but my (admittedly less than perfect) understanding of how it works looks like it's more about being able to write to the same file from multiple threads.
I appreciate any clarity you can provide.
The text was updated successfully, but these errors were encountered: