-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make No. of Transport Threads == Available CPUs #56488
Make No. of Transport Threads == Available CPUs #56488
Conversation
We never do any file IO or other blocking work on the transport threads so no tangible benefit can be derived from using more threads than CPUs for IO. There are however significant downsides to using more threads than necessary with Netty in particular. Since we use the default setting for `io.netty.allocator.useCacheForAllThreads` which is `true` we end up using `16MB` of thread local buffer cache for each transport thread. Meaning we potentially waste 2 * CPUs * 16MB of heap across both tcp and http transports.
Pinging @elastic/es-distributed (:Distributed/Network) |
This is a discuss, so we I'm sure we will discuss it. I'm onboard with this and I will also bring back and update the shared event loops PR.
These chunks are allocated in arenas. The number of arenas is by default the number of CPUs. The setting |
Thanks Tim, updated the OP with your input. The 16M/thread is the upper bound (if getting really unlucky) not a set number. |
We discussed this during our team meeting and since there were no objections to doing this I'm removing the discuss label. |
@elasticmachine update branch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks Tim! |
We never do any file IO or other blocking work on the transport threads so no tangible benefit can be derived from using more threads than CPUs for IO. There are however significant downsides to using more threads than necessary with Netty in particular. Since we use the default setting for `io.netty.allocator.useCacheForAllThreads` which is `true` we end up using up to `16MB` of thread local buffer cache for each transport thread. Meaning we potentially waste CPUs * 16MB of heap for unnecessary IO threads in addition to obvious inefficiencies of artificially adding extra context switches.
We never do any file IO or other blocking work on the transport threads so no tangible benefit can be derived from using more threads than CPUs for IO. There are however significant downsides to using more threads than necessary with Netty in particular. Since we use the default setting for `io.netty.allocator.useCacheForAllThreads` which is `true` we end up using up to `16MB` of thread local buffer cache for each transport thread. Meaning we potentially waste CPUs * 16MB of heap for unnecessary IO threads in addition to obvious inefficiencies of artificially adding extra context switches.
We never do any file IO or other blocking work on the transport threads
so no tangible benefit can be derived from using more threads than CPUs
for IO.
There are however significant downsides to using more threads than necessary
with Netty in particular. Since we use the default setting for
io.netty.allocator.useCacheForAllThreads
which istrue
we end upusing
16MB
of thread local buffer cache for each transport thread (as Tim points out this is the upper bound/worst case scenario).Meaning we potentially waste CPUs x 16MB of heap.