`jl_effective_threads` not aware of CFS cpu limit (inside Docker container) #46226

Moelf · 2022-07-31T02:05:52Z

c.f. https://stackoverflow.com/questions/65551215/get-docker-cpu-memory-limit-inside-container

Should "auto" be aware of the fact that the container has a limited CPU? for example if we see

/ # cat /sys/fs/cgroup/cpu/cpu.cfs_quota_us
400000
/ # cat /sys/fs/cgroup/cpu/cpu.cfs_period_us
100000

we should set number of threads to 4 instead of, in my case, 128

The text was updated successfully, but these errors were encountered:

Moelf · 2022-10-23T02:34:15Z

bump

Moelf · 2023-01-09T19:07:41Z

this was fixed in:

Use uv_thread_getaffinity when --threads=auto #42340

Moelf · 2023-01-09T19:42:40Z

nvm, @gbaraldi tried checking with rust-lang/rust#92697

using this rust snippet:

fn main() {
    println!("{:?}", std::thread::available_parallelism()); // prints Ok(3)
}

and I get 32 as opposed to what Julia reports 48, given

[18:57] jiling-notebook-1:~/research_notebooks $ cat /sys/fs/cgroup/cpu/cpu.cfs_quota_us
3200000
[18:57] jiling-notebook-1:~/research_notebooks $ cat /sys/fs/cgroup/cpu/cpu.cfs_period_us
100000

Moelf · 2023-01-09T19:48:41Z

Maybe it's better to follow .NET: dotnet/runtime#11933 (comment)

in this case, Rust is doing something wrong so closing the issue again...

gbaraldi · 2023-01-09T20:02:04Z

For reference Java also follows the quota/period variables. So there might be some discussion warranted, but I don't think it's a big priority.

Seelengrab · 2023-01-09T20:26:03Z

For posterity sake, this is what man 5 systemd.resource-control has to say on my machine:

    CPUQuota=
       Assign the specified CPU time quota to the processes executed. Takes a percentage value, suffixed
       with "%". The percentage specifies how much CPU time the unit shall get at maximum, relative to the
       total CPU time available on one CPU. Use values > 100% for allotting CPU time on more than one CPU.
       This controls the "cpu.max" attribute on the unified control group hierarchy and "cpu.cfs_quota_us"
       on legacy. For details about these control group attributes, see Control Groups v2[2] and CFS
       Bandwidth Control[4]. Setting CPUQuota= to an empty value unsets the quota.

       Example: CPUQuota=20% ensures that the executed processes will never get more than 20% CPU time on
       one CPU.

   CPUQuotaPeriodSec=
       Assign the duration over which the CPU time quota specified by CPUQuota= is measured. Takes a time
       duration value in seconds, with an optional suffix such as "ms" for milliseconds (or "s" for
       seconds.) The default setting is 100ms. The period is clamped to the range supported by the kernel,
       which is [1ms, 1000ms]. Additionally, the period is adjusted up so that the quota interval is also at
       least 1ms. Setting CPUQuotaPeriodSec= to an empty value resets it to the default.

       This controls the second field of "cpu.max" attribute on the unified control group hierarchy and
       "cpu.cfs_period_us" on legacy. For details about these control group attributes, see Control Groups
       v2[2] and CFS Scheduler[3].

       Example: CPUQuotaPeriodSec=10ms to request that the CPU quota is measured in periods of 10ms.

So if we calculate an example with a period of 100ms and a quota of 320ms, on a machine with 48 physical cores, we can chose to run

32 threads, resulting in 32 cores running at 100% utilization, with 16 cores running at 0% (or whatever the background task takes up, assuming they're scheduled on a different core by the kernel)
48 threads, resulting in 48 cores running at 66% utilization, with 0 running at 0%

The reason for these numbers is simple - since the quotas are measured in cpu time and the period is 100ms walltime, that means in 100ms walltime a 48 core cpu has a budget of 480ms cputime. With the above setting, we get up to 320ms of that budget which we can share across cores as we see fit. Hence, utilizing 32 cores completely already uses the 320ms of the budget and similarly, utilizing 48 cores to 66.6% also uses 320ms.

Ultimately, the difference in threads and whether it's beneficial for performance depends on the workload. Numerical work that does a lot of work will see a benefit from running with 100% utilization, due to caching effects. On the other hand, a web server will likely see a benefit from running with 66% utilization, since there is likely going to be a lot of I/O waiting per thread/task, so running with more threads that get blocked can mean more work getting done in the same amount of time, due to being able to take advantage of more physical cores.

So if there is something to be done here, I'd say a switch between "try to utilize cores fully" and "give me the number of physical threads available" would be the most appropriate, although I question the usefulness since we can already set the number of threads explicitly anyway and if you're able to set the cpu quota, you should also be in the position to set the startup threads of julia. The "safe" default is though to follow the formula ceil(Int, quota/period) (since a ratio of e.g. 1.7 would mean that you need more than one thread for full utilization), given a consistent ability to query that.

vtjnash · 2024-02-07T15:47:39Z

libuv/libuv#4278

Moelf · 2024-05-07T18:17:21Z

looks like the upstream has merged, what should we do here?

Seelengrab · 2024-05-08T06:09:24Z

Looks like libuv now reports the correct available parallelism through uv_available_parallelism, but this doesn't seem to be used by us (yet?). With that in mind, I'd still just default to ceil(Int, uv_available_parallelism()).

Moelf · 2024-05-08T10:05:36Z

Would a PR switching to using that be potentially accepted?

giordano · 2024-08-23T19:58:00Z

Note that our fork of libuv doesn't currently included libuv/libuv#4278, which was merged a couple of weeks after our fork was last synced with upstream. We will have to wait for the next sync of our libuv to benefit from that change

giordano · 2024-08-25T17:01:57Z

Would a PR switching to using that be potentially accepted?

Based on this comment, probably yes (but note again that our libuv doesn't contain the fix for cgroups)

Corresponding PR to Yggdrasil: JuliaPackaging/Yggdrasil#9337. This build includes backports of libuv/libuv#4278 (useful for for #46226) and libuv/libuv#4521 (useful for #55592)

Moelf · 2024-09-07T16:23:27Z

amazing, thanks:

[16:03] jiling-notebook-1:~/research_notebooks $ julia --startup-file=no -t auto -e "println(Threads.nthreads())"
128
[16:23] jiling-notebook-1:~/research_notebooks $ julia +nightly --startup-file=no -t auto -e "println(Threads.nthreads())"
19

Corresponding PR to Yggdrasil: JuliaPackaging/Yggdrasil#9337. This build includes backports of libuv/libuv#4278 (useful for for #46226) and libuv/libuv#4521 (useful for #55592)

Moelf changed the title ~~jl_effective_threads aware of CFS cpu limit (useful for inside Docker container)~~ jl_effective_threads not aware of CFS cpu limit (useful for inside Docker container) Jul 31, 2022

Moelf changed the title ~~jl_effective_threads not aware of CFS cpu limit (useful for inside Docker container)~~ jl_effective_threads not aware of CFS cpu limit (inside Docker container) Jul 31, 2022

Moelf closed this as completed Jan 9, 2023

Moelf reopened this Jan 9, 2023

Moelf closed this as completed Jan 9, 2023

Moelf reopened this Jan 9, 2023

inkydragon added the upstream The issue is with an upstream dependency, e.g. LLVM label Feb 7, 2024

giordano mentioned this issue Aug 23, 2024

Choose number of OpenBLAS threads based on process affinity #55572

Closed

giordano added the multithreading Base.Threads and related functionality label Aug 26, 2024

This was referenced Aug 26, 2024

linux: fix uv_available_parallelism using cgroup (#4278) JuliaLang/libuv#39

Merged

[LibUV_jll] Update to new build #55647

Merged

giordano added a commit that referenced this issue Aug 31, 2024

[LibUV_jll] Update to new build (#55647)

d81a45b

Corresponding PR to Yggdrasil: JuliaPackaging/Yggdrasil#9337. This build includes backports of libuv/libuv#4278 (useful for for #46226) and libuv/libuv#4521 (useful for #55592)

giordano mentioned this issue Aug 31, 2024

Use uv_available_parallelism inside jl_effective_threads #55592

Merged

giordano closed this as completed in #55592 Sep 6, 2024

KristofferC pushed a commit that referenced this issue Sep 12, 2024

[LibUV_jll] Update to new build (#55647)

f2d5525

Corresponding PR to Yggdrasil: JuliaPackaging/Yggdrasil#9337. This build includes backports of libuv/libuv#4278 (useful for for #46226) and libuv/libuv#4521 (useful for #55592)

kshyatt pushed a commit that referenced this issue Sep 12, 2024

[LibUV_jll] Update to new build (#55647)

cc38d49

Corresponding PR to Yggdrasil: JuliaPackaging/Yggdrasil#9337. This build includes backports of libuv/libuv#4278 (useful for for #46226) and libuv/libuv#4521 (useful for #55592)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`jl_effective_threads` not aware of CFS cpu limit (inside Docker container) #46226

`jl_effective_threads` not aware of CFS cpu limit (inside Docker container) #46226

Moelf commented Jul 31, 2022

Moelf commented Oct 23, 2022

Moelf commented Jan 9, 2023 •

edited

Loading

Moelf commented Jan 9, 2023

Moelf commented Jan 9, 2023

gbaraldi commented Jan 9, 2023

Seelengrab commented Jan 9, 2023

vtjnash commented Feb 7, 2024

Moelf commented May 7, 2024

Seelengrab commented May 8, 2024

Moelf commented May 8, 2024

giordano commented Aug 23, 2024

giordano commented Aug 25, 2024

Moelf commented Sep 7, 2024

jl_effective_threads not aware of CFS cpu limit (inside Docker container) #46226

jl_effective_threads not aware of CFS cpu limit (inside Docker container) #46226

Comments

Moelf commented Jul 31, 2022

Moelf commented Oct 23, 2022

Moelf commented Jan 9, 2023 • edited Loading

Moelf commented Jan 9, 2023

Moelf commented Jan 9, 2023

gbaraldi commented Jan 9, 2023

Seelengrab commented Jan 9, 2023

vtjnash commented Feb 7, 2024

Moelf commented May 7, 2024

Seelengrab commented May 8, 2024

Moelf commented May 8, 2024

giordano commented Aug 23, 2024

giordano commented Aug 25, 2024

Moelf commented Sep 7, 2024

`jl_effective_threads` not aware of CFS cpu limit (inside Docker container) #46226

`jl_effective_threads` not aware of CFS cpu limit (inside Docker container) #46226

Moelf commented Jan 9, 2023 •

edited

Loading