Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly detect CPU cores when cpuset cgroup is used #5373

Merged
merged 1 commit into from
Sep 28, 2023

Conversation

CallMeFoxie
Copy link
Contributor

Description of the Change
CPU cores are not correctly detected when eg running within a container that uses cgroup cpusets. This change should improve the detection. I am not 100% sure about other platforms and may need some ifdefing, but I do not have many platforms to test it on.

Alternate Designs
this code was checked against nproc which uses the same method

Release Notes
Detect CPU cores when using cpuset to limit core count

@AenBleidd
Copy link
Member

@davidpanderson, could you please review this improvement?
@CharlieFenton, could you please verify that this works on MacOS without any issues?
Thank you both in advance.

@AenBleidd AenBleidd self-assigned this Sep 22, 2023
@CallMeFoxie
Copy link
Contributor Author

CallMeFoxie commented Sep 22, 2023

For the record the origin is from nproc code here https://github.com/coreutils/gnulib/blob/724096f6e860a107d66f7a20cd3a9538a8b7a418/lib/nproc.c#L127 - skipping the version checks as glibc 2.6 is something like 15 years old now, I presume we do not have to keep that old code :)

@CharlieFenton
Copy link
Contributor

@AenBleidd: __GNU_LIBRARY__ is not defined on the Mac, so the added code has no effect. The Mac has no sched_getaffinity() function, but a quick Google search came up with this as a possible workaround.

I am not familiar with the concept of cgroup cpusets or CPU affinity, but I am puzzled by the implementation in this PR because the file client/hostinfor_unix.cpp is used only in the client, and not in any BOINC libraries. So I suspect the code guarded by #elif __GNU_LIBRARY__ might never be compiled. Also, the Linux man page for sched_getaffinity() says:

If pid is zero, then the mask of the calling process is returned.

So this code will only return information for the client process itself. It's unclear to me whether this is what @CallMeFoxie intended.

@AenBleidd
Copy link
Member

@CharlieFenton,

I was mostly worried that this code might break some functionality on MacOS. If it will have no effect - that is perfectly fine.

GNU_LIBRARY is not defined on the Mac, so the added code has no effect. The Mac has no sched_getaffinity() function

This is completely fine. I have doubts anyone uses cgroup on MacOS.

So this code will only return information for the client process itself. It's unclear to me whether this is what @CallMeFoxie intended.

I believe, the intention is to limit CPU usage of BOINC using native linux tools (for whatever reasons).
@CallMeFoxie, please correct me if I'm wrong.

@CallMeFoxie
Copy link
Contributor Author

@AenBleidd: GNU_LIBRARY is not defined on the Mac, so the added code has no effect. The Mac has no sched_getaffinity() function, but a quick Google search came up with this as a possible workaround.

I am not familiar with the concept of cgroup cpusets or CPU affinity, but I am puzzled by the implementation in this PR because the file client/hostinfor_unix.cpp is used only in the client, and not in any BOINC libraries. So I suspect the code guarded by #elif __GNU_LIBRARY__ might never be compiled. Also, the Linux man page for sched_getaffinity() says:

If pid is zero, then the mask of the calling process is returned.

So this code will only return information for the client process itself. It's unclear to me whether this is what @CallMeFoxie intended.

I think it is working correctly? Because after the patch I can do
docker update --cpuset-cpus="0-9" boinc_test
and within the container I get

root@db4289a2920b:/src/boinc# nproc
10
root@db4289a2920b:/src/boinc# boinc
22-Sep-2023 08:39:14 [---] Starting BOINC client version 7.25.0 for x86_64-pc-linux-gnu
22-Sep-2023 08:39:14 [---] This a development version of BOINC and may not function properly
...
22-Sep-2023 08:39:15 [---] Processor: 10 GenuineIntel Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz [Family 6 Model 79 Stepping 1]

yet the original method was returning all 56 cores as lscpu does:

root@db4289a2920b:/src/boinc# lscpu
...
CPU(s):                  56

@CallMeFoxie
Copy link
Contributor Author

Basically the whole idea is that BOINC sees the actual assigned cpu core count, not the whole machine, because if it runs in some container it may have some limits imposed (be it docker or just general cgroups limits outside of any container)

@davidpanderson
Copy link
Contributor

Sorry to pick nits, but please use 4-space indentation and space in 'if ('.
Otherwise fine with me.

@AenBleidd
Copy link
Member

@CallMeFoxie, thank you for this fix!

@AenBleidd AenBleidd merged commit 8b5334d into BOINC:master Sep 28, 2023
49 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants