Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce default concurrency #3892

Closed
mperham opened this issue Jul 17, 2018 · 17 comments
Closed

Reduce default concurrency #3892

mperham opened this issue Jul 17, 2018 · 17 comments

Comments

@mperham
Copy link
Collaborator

mperham commented Jul 17, 2018

Today Sidekiq uses a default concurrency of 25. These means Sidekiq will spawn 25 worker threads and execute up to 25 jobs concurrently in a process.

glibc has a major memory fragmentation issue which gets worse with more threads, causing many people to move to jemalloc.

I also happen to think that, with time and experience but no hard data, that 25 is pretty aggressive and most apps can peg a CPU with a lower number of cores. Developers testing locally on macOS rarely need such large concurrency.

I'd suggest we reduce the default concurrency from 25 to 15 in Sidekiq 5.2.0. This will save memory and reduce fragmentation and bloat on Linux. Anyone who wants to retain the old value can add -c 25 to their command line.

WDYT?

@jc00ke
Copy link
Contributor

jc00ke commented Jul 17, 2018

I'd be curious to see the hard data. I don't think it would hurt to lower it, and my guess is that anyone that needs more than 15 will knowingly tune it to a much higher number in an environment where they have more processing power (ie not most Heroku dynos.)

@Fredar
Copy link

Fredar commented Jul 17, 2018

Usually using between 5 and 10 myself.
For hardcore CPU usage we had to run 20x1 once (20 processes with 1 worker)

@noahgibbs
Copy link

My experience with Rails (maybe similar to your workload, maybe not) is that you need about 6 threads per hyperthreaded CPU core to approximately fully saturate the CPU. Background threads might trend fewer if CPU heavy, or potentially even more threads if very I/O-heavy. But 5 or so is probably about right for many/most Ruby tasks.

So 25 would be enough to fully saturate a medium AWS instance. 15 would still run decently, but probably leave a bit of CPU idle - around 5%-10% with the workloads I run.

Obviously depends on the task. If you're just calculating giant Mandelbrot sets or late digits of Pi, 5-6 threads would saturate it just fine :-)

@amcaplan
Copy link
Contributor

Don't think I've ever seen a need for >10. 4-6 workers per process, with 1 process per core, is far more common in my experience.

@schneems
Copy link

Heroku has plenty of processing power thank you very much. The “performance” and “private” large dyno has 14GB of ram and 8 dedicated VCPUs (8 hyperthreads backed by 4 real cores on top of a hypervisor). Though I do realize you said “most”.

FWIW I think 25 is pretty high. Puma default is 16. Even then most people tune it down on the web. It would be helpful to get some kid of a standardized metric around when it is helpful to add extra sidekiq workers on a box.

@noahgibbs
Copy link

I can't tell here if you mean per process, or total. For total across multiple processes, 25 is probably great. Per process, yeah, 5 is reasonable, 10 is high and 25 is very high. Given the GIL, it's very hard to get CRuby to productively use more than 10-ish threads for real tasks -- and in cases where you can, it's because something like EventMachine or Node.js would have been a better choice than Ruby threads.

@mperham
Copy link
Collaborator Author

mperham commented Jul 17, 2018

I could be talked down to 10. I think 5 is too low; most business apps are I/O heavy, allowing pretty decent concurrency even with GIL.

@noahgibbs
Copy link

10-15 is also a reasonable insurance policy against pathological cases that really wish they were evented, but got written with threads anyway.

@jagthedrummer
Copy link

10 sounds about right to me, and even 15 would be better than 25.

@zachmccormick
Copy link

Is there a nice way to use Etc.nprocessors to determine a happy number for most people? I think Celery uses that sort of logic to determine default concurrency FWIW. I do agree with those above that in optimized cases you're going to want to look at the nature of your workload and tweak accordingly.

@amcaplan
Copy link
Contributor

@zachmccormick number of processors is irrelevant for MRI (i.e. what the vast majority of the community uses AFAICT); each Sidekiq process can be handled by only 1 processor due to the GIL, since Sidekiq achieves concurrency via threads. You need to run multiple Sidekiq processes to achieve true parallel processing.

@zachmccormick
Copy link

Ah I see - didn't realize that! Thanks!

@mperham
Copy link
Collaborator Author

mperham commented Jul 18, 2018

Sidekiq doesn't fork or scale processes, only threads. You need to start multiple Sidekiqs yourself, using the tool/init of your choice. Sidekiq Enterprise has a multi-process sidekiqswarm binary which scales Sidekiq processes according to CPU count.

https://github.com/mperham/sidekiq/wiki/Ent-Multi-Process

@noahgibbs
Copy link

@amcaplan @zachmccormick The number of processors matters if you add processes. But as @mperham says, you'd have to do that yourself - Sidekiq won't do that automatically.

Also, while the GIL means each Ruby thread blocks all others when running Ruby, there are some non-Ruby operations (e.g. network or disk I/O, database, some parts of garbage collection, many things done with native extensions) which can happen on a background thread and don't block your Ruby process. Those things can happen in parallel if you have more than one processor, but not if you don't. That's a lot of what I was talking about above with "saturating" a processor with 6+ threads per process - that makes sure that even when most of your Ruby code is blocked, something is running and making forward progress.

@amcaplan
Copy link
Contributor

Hehe, I gave a talk about this once (https://speakerdeck.com/amcaplan/threads-and-processes-lightning-talk-given-at-rails-israel-2015), maybe the slides will be useful to future issue watchers...

screen shot 2018-07-18 at 6 14 21 pm

screen shot 2018-07-18 at 6 13 09 pm

Obviously it's a bit oversimplified, but pretty good as a round estimate. Most Rails jobs I've seen hover around that figure.

Also worth noting the first 10 minutes of this 2015 talk by @schneems (a personal favorite - first time we were in the same room!) where playing with the setting led them to change concurrency from 30 to 4.

@noahgibbs
Copy link

Yup! That's a great summary. But since the CPU percentage for a given task can vary, there's a bit of an asymptote as far as how many threads are necessary to saturate...

@mperham
Copy link
Collaborator Author

mperham commented Jul 20, 2018

It's obvious that glibc's memory bloat, as discussed on my blog, gets worse as concurrency increases. I think reducing concurrency from 25 to 10 will reduce memory usage AND bloat, giving us a double win in memory.

Pro tip: you can get the old behavior by adding -c 25 to the command line.

tnir pushed a commit to tnir/gitlabhq that referenced this issue Sep 11, 2018
The most significant change in this version is that the default
concurrency has been lowered from 25 to 10 (sidekiq/sidekiq#3892).
This doesn't affect omnibus-gitlab because the concurrency is controlled via a
setting that defaults to 25 anyway and is passed in via the `-c` command-line
parameter.

However, source installations (including the GDK) will have to either specify
the concurrency in `sidekiq.yml` or use the `-c` option.

Full list of changes: https://github.com/mperham/sidekiq/blob/master/Changes.md
your added a commit to moneyadviceservice/rad that referenced this issue May 24, 2019
By default sidekiq 3.x spawns 25 threads per worker.

This causes some sporadic job failures in Heroku in staging
because of the current 20 connections max limit we have there.

Why is it good to lower concurrency _anyway_?

Because higher concurrency doesn't mean higher throughput.
This has been discussed in an issue by Mike Perham here:
sidekiq/sidekiq#3892 and as a result
most recent version of sidekiq actually have the default
concurrency set to 10 indeed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants