Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question: benefit of worker thread? #2077

Open
bmacnaughton opened this issue Nov 4, 2024 · 6 comments
Open

question: benefit of worker thread? #2077

bmacnaughton opened this issue Nov 4, 2024 · 6 comments

Comments

@bmacnaughton
Copy link

We use pino and are quite happy with the its performance and function. I recently spent some time walking through it and have some questions to 1) check my understanding of what's going on and 2) understand the benefit of spawning multiple worker threads that write to a single file. I understand that point 2 might be an uncommon case; we encounter it because we invoke pino in both node's ESM loader thread and in the main thread. Both write to the same log file.

From my reading:

  • each thread applies mixins and formats the message as JSON (in our case), puts the string in a shared buffer, and wakes the worker thread(s)
  • whichever worker wakes up first (from the Atomics FIFO queue) ends up writing the bytes to the output stream

So the only work that seems to be done in the worker thread is to write to the stream (and the Atomics synchronization).

Considering that the worker thread then relies on libuv to handle the actual disk IO, it seems like the worker thread is another layer that introduces some latency, memory, synchronization, scheduling, etc.

The benefit I see is that it allows multiple threads to write to the same file transparently without having any central controller.

My main question is that the docs suggests it's about performance, but my (admittedly less than perfect) understanding of how it works looks like it's more about being able to write to the same file from multiple threads.

I appreciate any clarity you can provide.

@mcollina
Copy link
Member

There are no performance benefits in using transports to write to a file. The main benefit is to allow external transports (Elastic, Loki, syslog, etc) to be able to flush data before the main process exits, preventing the loss of data. This also increase performance by reducing the memory load in the main thread, and allowing better batching. However, pino.destination() provides peek performance to write things to a file.

I think you should be able to write to the same file from different threads, as long as you write to the same file descriptor. I have never seen an issue with stdout, but that might be special.

@bmacnaughton
Copy link
Author

Thanks for helping me understand the motivation behind the transport thread. I'll do a little experimentation this week and see if I have anything to add; if not, I'll close this, as it's already been helpful.

@bmacnaughton
Copy link
Author

bmacnaughton commented Nov 18, 2024

stdout is special in at least one respect - there is no concept of write-position. and the way our code is working now, the main thread and the esm-loader thread each create a pino instance, each with a transport to the same file path. and things are not quite right.

i'm going to try using a single file handle that we pass to the esm-loader thread to see if a shared file handle stays in sync, but i suspect it will be a problem unless both threads are using Atomics on the same shared memory. presuming that doesn't work, i see a couple of alternatives:

  • we pass the objects from the loader thread to the main thread for logging
  • find a way, when creating a pino instance, to have it use an existing worker thread from an already instantiated transport.

the second approach seems like a more natural fit because the mechanism to place things in a shared array buffer and to synchronize access to it is already present. but i don't think i can pass the entire pino transport to a thread using messaging because it has methods associated with it - so i'd need something like "createTransportUsingThisSharedArrayBuffer".

i'm still trying to familiarize myself pino overall; i didn't implement what we have now and am trying to decode what is possible with pino and how we should be use it simultaneously.

net: i'd like to keep this open a bit and will continue to update it.

@bmacnaughton
Copy link
Author

If the same file descriptor is used in both threads, things work correctly. Makes sense in that the FD is an index into the control structure and there is only one per process. It's possible that there could still be a problem if one thread wrote more than whatever buffer size is available for it but the second thread tried to write before the first thread was scheduled to complete the write (not sure of the low-level implementation, so this may not be possible). But for our application, it has worked well so far.

The negative is that there are still N worker threads and N shared buffers where N is the number of threads that create a pino logger to a single file. @mcollina - does this seem like a common (or increasingly common) use case that it would be useful to have a solution for? If so, I'm willing to spend some time trying to come up with a solution. I'm still figuring out pino, so any guidance would be appreciated, but I'm thinking something like: "if the file/destination (or another mutually exclusive property) is a shared array buffer then short-circuit spawning the worker and just use that as the target". There may be some knock-on problems that come from this, but it seems like a the right basic idea.

What do you think?

@mcollina
Copy link
Member

I still don't understand why you are using the transports to write to a file, instead of just using pino.destination() directly. In that way, you can just write to the file descriptor.

@bmacnaughton
Copy link
Author

bmacnaughton commented Nov 19, 2024

I might well be missing the obvious part of this, but (I think) the reason the code was originally using transports was so it could write to multiple destinations - both a log file and stdout, if configured. Is there a better way to handle that? If so, I just missed it and will be thrilled to change things in our code.

I have managed to write to the file descriptor using the transports as they are now, so I've been able to demonstrate that both threads are using the same descriptor. I did this by passing the FD as the destination in the targets array.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants