Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider avoiding Concurrent.processor_count when setting default background_worker_threads #2297

Closed
trevorturk opened this issue Apr 11, 2024 · 11 comments · Fixed by #2305
Closed
Assignees
Milestone

Comments

@trevorturk
Copy link

Issue Description

Since #1155 we have the ability to configure background_worker_threads which currently defaults to `Concurrent.processor_count. However, this config is likely to be too high in a shared hosting environment, leading to unnecessary memory usage. It may make sense to change the default to, say, a static 1 or 3 and add documentation about how best to configure?

See also: https://github.com/rails/rails/pull/50669/files#diff-e25cab4832f078ca2f9dffa8918f895c8474f8086878e4ced9f5795131fc2de2R32

/cc @st0012

Feel free to disregard, since this is easy to correct, but in my case I was surprised at the impact lowering from 8 to 1 made on my memory use on Heroku.

Reproduction Steps

$ heroku run rails c
Running rails c on ⬢ APP_NAME... up, run.4033 (Standard-1X)
Loading production environment (Rails 7.1.3.2)
irb(main):001> Concurrent.processor_count
=> 8

Expected Behavior

Consider defaulting to a static/lower number of background_worker_threads.

Actual Behavior

Defaults to Concurrent.processor_count.

Ruby Version

3.3.3

SDK Version

5.17.3

Integration and Its Version

No response

Sentry Config

No response

@sl0thentr0py
Copy link
Member

sl0thentr0py commented Apr 12, 2024

I do think 8 is too high, @st0012 opinions?
To add some context, python just uses one thread for the worker. I think the most important is to be able to send the HTTP requests async, I believe anything between 2-4 threads as a default might be ok.

@trevorturk
Copy link
Author

I was thinking about this a bit more (apologies if this is more complicating than helping) but it seems to me that if we did 1 thread by default, the only real risk is that you might have more than 30 (the default) items in the queue at which point they'd start being dropped (at least that's my understanding according to the docs). My thinking is that we could try pushing down to 1 thread and sending an event (or logs or something) if we see items being dropped, so the customer would know they need to increase the thread count?

@st0012
Copy link
Collaborator

st0012 commented Apr 29, 2024

I agree that we should probably lower the worker threads, thanks for raising this.

My thinking is that we could try pushing down to 1 thread and sending an event (or logs or something) if we see items being dropped, so the customer would know they need to increase the thread count?

This potentially means that some customers would need to redeploy (potentially multiple times) to adjust the thread count after we push such changes, which would likely be a pretty negative experience. Considering that the current default has existed for more than 3 years and we only have received a report now, IMO the potential benefit doesn't justify the risk.

I believe anything between 2-4 threads as a default might be ok.

How about we start with Concurrent.processor_count / 2?

I think the biggest problem is that concurrent-ruby's executors can't notify the SDK when an event is dropped (available fallbacks), so it's hard for us to make an informed decision on this.

It's also worth noting that while we could check the executor's current event size manually to collect the info ourselves, forcing it to synchronize repeatedly will likely hamper the performance of the executor.

@sl0thentr0py
Copy link
Member

How about we start with Concurrent.processor_count / 2?

I would be fine with this.

I think the biggest problem is that concurrent-ruby's executors can't notify the SDK when an event is dropped (available fallbacks), so it's hard for us to make an informed decision on this.

We already collect queue_overflow client reports and I think those are sufficient to diagnose this (via contacting support for now). We want to expose these statistics to the end user but sadly no one on the product side prioritizes this. :(

@trevorturk
Copy link
Author

trevorturk commented Apr 29, 2024

Thanks for the consideration!

FWIW I would say there's a definite improvement in terms of memory savings, so it's really worth considering doing something here, but I'm not sure what's best.

I'm wondering if perhaps we could start even with just a generator and/or documentation change. I think I'm sensing a trend that Concurrent.processor_count is generally considered incorrect on shared hosts (but perhaps not on Hetzner or w/e) so it may be advisable to prefer setting manually. In that case, maybe mentioning in the docs or generator if you have one could be reasonable, without risking any change for existing users. Then maybe a CHANGELOG entry or something to bring a bit more attention.

I think the Concurrent.processor_count / 2 idea plus a CHANGELOG entry could be reasonable, but that still might risk changing things on existing users, so I'm not sure.

Also, let me know if me submitting an issue re: queue_overflow would help prioritize at all etc. I'm happy to help where I can. Thank you!

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 2 Apr 29, 2024
@sl0thentr0py sl0thentr0py added this to the 6.0.0 milestone Apr 30, 2024
@sl0thentr0py
Copy link
Member

I think the Concurrent.processor_count / 2 idea plus a CHANGELOG entry could be reasonable

yea I'll do it as part of the 6.0 major

@st0012
Copy link
Collaborator

st0012 commented May 6, 2024

@sl0thentr0py I think it should be fine to release such change in a minor or even patch release, so I opened #2305 against master directly.

st0012 added a commit that referenced this issue May 6, 2024
* Decrease the default number of background worker threads by half

See #2297 for the discussion.

* Update changelog
@trevorturk
Copy link
Author

👋 look what I just came across! ruby-concurrency/concurrent-ruby#1038 (see linked commits showing this making its way into Rails as well...)

@sl0thentr0py
Copy link
Member

oh nice should we just use available_processor_count then @st0012 ?

@trevorturk
Copy link
Author

Just wanted to bump this again in case it's potentially useful ^

@st0012
Copy link
Collaborator

st0012 commented Jul 6, 2024

@trevorturk sorry for the delay and thanks for raising this. I've opened #2339 to adopt the new API 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants