Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Add std::env::concurrency_hint #985

Closed
wants to merge 1 commit into from

Conversation

alexcrichton
Copy link
Member

This new function will return a hint to the amount of concurrency the
underlying system can support.

Rendered

This new function will return a **hint** to the amount of concurrency the
underlying system can support.
@alexcrichton alexcrichton self-assigned this Mar 17, 2015
@mahkoh
Copy link
Contributor

mahkoh commented Mar 17, 2015

A useless compromise for compromise's sake. It is not as abstract as the C++ function which allows the C++ function to be used in exotic environments, nor is the stdlib as a whole suitable for this, nor is it as specific as the information that it actually available on the supported systems.

@comex
Copy link

comex commented Mar 17, 2015

For the record, I've never liked the approach of testing the number of CPUs and spawning that number of threads. What if you have some 1024-core NUMA CPU (hardly unheard of) and you're running, say, 64 opportunistically concurrent processes? It would be much better for each to spawn 16 threads than for each to overwhelm the kernel with 1024 threads... That's an extreme example, but I'm not sure regular systems are that different if you're running a bunch of CPU intensive applications all making their own thread pools.

It would be better if the kernel globally coordinated the threads, which I think is what Grand Central Dispatch tries to do on OS X. If a higher level thread pool ever makes it into the standard library, it should consider using it where available. Of course, this PR isn't that, and high level libraries always have situations where they're inapplicable. So just take this comment as a measure of general dissatisfaction with the status quo.

@sanxiyn
Copy link
Member

sanxiyn commented Mar 18, 2015

I like this.

@comex, serious users with NUMA and stuffs can use hwloc, which embeds insane amounts of knowledge and is not appropriate for the standard library. This function is not intended for such uses.

@theemathas
Copy link

What if we do not have any information about the amount of concurrency?

I believe that the correct type signature should be this:

pub fn concurrency_hint() -> Option<u32>;

@sanxiyn
Copy link
Member

sanxiyn commented Mar 19, 2015

I think the correct design is to return 1 if there is no information, and allowing runtime override, for example, using environment variable, with name like RUST_CONCURRENCY.

Someone could even write a program setting RUST_CONCURRENCY using information from hwloc.

@reem
Copy link

reem commented Mar 19, 2015

This sounds not very useful, in my opinion. I understand the need for stabilizing only conservative behavior, but for something like this it might as well just live in a cargo package and grow independently.

@Ericson2314
Copy link
Contributor

-1 for reasons given. Too simple to be useful for really low level code, and too crude for a good high-level abstraction, it's an "anti-Goldilocks". Such abstractions just tempt people to do the wrong thing.

@tshepang
Copy link
Member

People are going to expect this from libstd, and the functionality is too simple to have to create a whole external crate for it.

@alexcrichton
Copy link
Member Author

@theemathas

What if we do not have any information about the amount of concurrency?

I feel that all users of this function would use .unwrap_or(1) in the instance that None was returned, however. For example this function is largely intended to serve as simply a hint, not a rigorous statistic, so if None is returned there's nothing you can really do other than assume 1 anyway.


@Ericson2314

Too simple to be useful for really low level code, and too crude for a good high-level abstraction, it's an "anti-Goldilocks".

I agree that this is not useful for low level code, but the documentation is intended to strongly word that. The idea is that the return value is very strongly just a hint, no concrete statistic about the underlying system (hence the inappropriateness for low-level code which wants this sort of precise knowledge).

I'm not sure I agree that this is too crude, however. For example cargo wants to spin off some number of builds in parallel, but how does it determine what number of builds to spin off? Is using the number returned from this function wrong?

@Ericson2314
Copy link
Contributor

@alexcrichton To come clean, I had previous thought about peril of weak and leaky abstractions, but not include get_num_cpus among them. But the comments convinced me.

Looking at your cargo example, I am inclined to say it is indeed wrong. Realistically, better methods will take a good deal of developer effort, so no hard feelings if Cargo keeps on using it :). Consider that it doesn't reflect the current load of the system, and precludes the OS from scheduling more cargo-spawned processes when others are blocking. Fundamentally there is a tension between exposing the potential parallelism to the OS scheduler, and avoiding the overhead of many of processes---A classic granularity tradeoff. Num CPUs is but a crude heuristic, and a more high level way to expresses the dependency DAG, and somehow let the OS/third-party decide the best way to traverse it based on the dynamic characteristics of the system would be ideal, and work for distributed builds too.

Granted we do have convenience methods in e.g. read/writer that have pitfalls too, but at least there exist situations where those convenience methods are exactly what one wants. I can't think of a situation where Num CPUs is truly the ideal method, and not just a quick fix in liu of something harder to make but better.

@carllerche
Copy link
Member

My vote is to leave this API in a dedicated crate and have it evolve as needed. There are many arguments in this thread and I agree with them.

@alexcrichton
Copy link
Member Author

It doesn't look like there's enough consensus to add this at this time, so I'm going to close this for now. Thanks for the discussion everyone!

@tshepang
Copy link
Member

tshepang commented Apr 9, 2015

I find concurrency_hint a great name even if overlong. It's certainly better than the ugly num_cpus which is now quite a popular crate. My favorite alternative is cpu_count, even though the cpu part is controversial.

@Glitchy-Tozier
Copy link

Does anyone know whether something like this has been implemented as of yet? I can't find a lot of information, except "use the num_cpus crate".

@kennytm
Copy link
Member

kennytm commented Nov 30, 2021

@Glitchy-Tozier
Copy link

@kennytm Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.