Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow limiting the number of concurrent processes #5

Closed
wants to merge 2 commits into from

Conversation

dhrrgn
Copy link

@dhrrgn dhrrgn commented May 1, 2021

Love this package. However, when testing it for speeding up some of our tooling, I noticed that with performance degraded quickly with a larger number of processes. This makes sense, since forking a process hundreds or thousands of times is an expensive operation.

The feature allows limiting that concurrency, so fewer concurrent processes are forked. This drastically reduces CPU usage, since it doesn't have to manage so many processes and sockets.

During my testing, running 1000 concurrent tasks (that simply returned foo), took ~20 seconds. Limiting it to 100 concurrent processes reduced that to 14 seconds.

Possible Improvements

As implemented, it waits until all tasks of a concurrent group are completed before it starts the next group. It may be better to kick off a group, and as each task completes, start the next task until all are complete. However, this complicates things, and I am not convinced it would be any faster. Though, that is a gut feeling, not backed by any tests.

Open to suggestions.

@PuffyZA
Copy link

PuffyZA commented May 3, 2021

You beat me to it. Was going to play around with adding process limits to the package this week, since I think it's a pretty important feature to have.

I would however definitely change it so that it always spins up the max processes as each one finishes. For instance, in a use case I want to test, we have billing code that bills each client, but each client can vary wildly in terms of time taken to generate the bill (some clients have 1 package, others have hundreds or thousands). Waiting for one large client to finish before running more, my gut feel is that will be much slower than starting up more while it's still running, though I'd have to do some benchmarks to confirm.

@dhrrgn
Copy link
Author

dhrrgn commented May 3, 2021

Ya, I agree, and I have an idea for a simple way to implement that. I just haven't had time to work on it.

I should have time later today.

@brendt
Copy link
Contributor

brendt commented May 3, 2021

I would however definitely change it so that it always spins up the max processes as each one finishes.

That was my feedback as well on the discussion thread. If you can change your PR I'm happy to merge it.

@brendt
Copy link
Contributor

brendt commented May 4, 2021

@dhrrgn I've got some time right now, so I'm going to implement it myself so that we can quickly get it tagged :)

@brendt brendt closed this May 4, 2021
@brendt
Copy link
Contributor

brendt commented May 4, 2021

This is my PR, feel free to share your feedback: #10

@dhrrgn
Copy link
Author

dhrrgn commented May 10, 2021

@brendt Awesome, thank you! Sorry for disappearing on this for a bit. Got slammed at work and haven't been feeling well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants