-
Notifications
You must be signed in to change notification settings - Fork 315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SDR: cannot put two runs into the same hwloc group #1556
Comments
I have my EPYC MILAN 7413 dual CPU test rig running Multicore SDR with Each CPU has 4 CCX and this gives a total of 8 CCX cross the whole platform. Logs are set to DEBUG. I see a great thread allocation of new PC1 jobs for each CCX up to 8 PC1 jobs. So for PC1 job 1-8, logs generally look like examples here below:
and
But as mentioned in the linked issue, we get a problem/error when trying to allocate threads for PC1 job 9. This is what the logs show when the allocation is done and fails:
I find the last couple of lines interesting. We see the message regarding
But it didn't and now the CPUs are just balancing the 9th PC1 job cross free cores on both CPUs - sealing time goes 3h->9h for those jobs :( |
Yes, this is what I tried to describe in my bug report. Currently you can only assign one SDR to each CCX, once each CCX has one SDR assigned, random cores are used. |
@vmx any idea if this will be addressed any time soon? |
@benjaminh83 there's no timeline for this yet, though it's something I'd surely like to look into. |
For multicore SDR it is important that the producer and conumers share the same (L3) cache. Hence we bind specific cores to threads. Prior to this change there was one multicore SDR job per group, even if the group could accompany multiple of such jobs. If there were more jobs scheduled than groups available, those additional jobs wouldn't use specific cores, but whatever the operating system decided. With this change, additional jobs are now put into the groups in case there is enough space to accompany them. Fixes #1556.
For multicore SDR it is important that the producer and conumers share the same (L3) cache. Hence we bind specific cores to threads. Prior to this change there was one multicore SDR job per group, even if the group could accompany multiple of such jobs. If there were more jobs scheduled than groups available, those additional jobs wouldn't use specific cores, but whatever the operating system decided. With this change, additional jobs are now put into the groups in case there is enough space to accompany them. Fixes #1556.
For multicore SDR it is important that the producer and conumers share the same (L3) cache. Hence we bind specific cores to threads. Prior to this change there was one multicore SDR job per group, even if the group could accompany multiple of such jobs. If there were more jobs scheduled than groups available, those additional jobs wouldn't use specific cores, but whatever the operating system decided. With this change, additional jobs are now put into the groups in case there is enough space to accompany them. Fixes #1556.
For multicore SDR it is important that the producer and conumers share the same (L3) cache. Hence we bind specific cores to threads. Prior to this change there was one multicore SDR job per group, even if the group could accompany multiple of such jobs. If there were more jobs scheduled than groups available, those additional jobs wouldn't use specific cores, but whatever the operating system decided. With this change, additional jobs are now put into the groups in case there is enough space to accompany them. Fixes #1556.
For multicore SDR it is important that the producer and conumers share the same (L3) cache. Hence we bind specific cores to threads. Prior to this change there was one multicore SDR job per group, even if the group could accompany multiple of such jobs. If there were more jobs scheduled than groups available, those additional jobs wouldn't use specific cores, but whatever the operating system decided. With this change, additional jobs are now put into the groups in case there is enough space to accompany them. Fixes #1556.
@vmx is there a way forward for this to get into a release soon? just to give you an impression on the impact: when planning pc1 servers this issue is the difference between a single server with 2x32core 3000$ cpus (if the scheduling works) and 2 64 core single cpu servers with each cpu at 7k |
sweet! quick question: if we have a 4 core ccx and run 1+1 producer mcSDR PC1s: will it fit 2(2x 1+1) PC1in that group or 3 (3x 1+1) - one producer core could be shared between 3 PC1s (in theory) |
@cryptowhizzard this is worth applying to the dual cpu milans for pc1 you got! it might solve the problems |
It will fit 2 multicore SDRs into the group. The multicore SDRs run independently, they don't share producers. |
they could share a core for their individual producers - thats what i meant (as a producer hardly max'es out the core) but ack'ed: [core count per group] / 1+ producers = max PC1's per group |
Correct, each consumer/producer runs in its own thread which is bound to one specific core each. |
@vmx Sounds like we have something for testing on ZEN3/MILAN. I would love to get those tests going on my dual 7413, where I have 6 cores in each CCX, and basically want to run 2 (or maybe actually 3) in each CCX. I hope this will work. Any Idea if this will work on a dual Intel Xeon 3rd gen? In theory, this will just look like two big CCX's. Would you expect this to work? I could test it on my dual 8380, but I wouldn't start the testing, if you know the fix does not cover this scenario? |
@benjaminh83 it would be great if someone would try it out!
Yes that would also work. It would then split those two big CXX' into smaller units. The test might make things clearer. I'd hope that they can even be understood by people not fluent in Rust. It shows how core are assigned. It returns a list of those "units" and the multicore SDR will use one of those units one after another. Your case sounds similar to this test case, where there are 16 cores, and two CXX', while having 2 producers (hence, due to the consumer, 3 threads running). |
I should also note to everyone trying out: if you run it with |
I would love to test this on the latest 5.18 kernels of Linux since they have more support for the Milan and Zen kernels and NUMA. Is it possible to make a special tag for this so we can compile it in? |
@cryptowhizzard what do you mean with "special tag"? |
@vmx to have it in lotus, to compile lotus-worker. |
@cryptowhizzard oh, you mean having a released version of it? |
@vmx @cryptowhizzard no problem! I will take it up in SPX - @jennijuju and see if we can get a "jenni-tagged" lotus with this package inside. |
For multicore SDR it is important that the producer and conumers share the same (L3) cache. Hence we bind specific cores to threads. Prior to this change there was one multicore SDR job per group, even if the group could accompany multiple of such jobs. If there were more jobs scheduled than groups available, those additional jobs wouldn't use specific cores, but whatever the operating system decided. With this change, additional jobs are now put into the groups in case there is enough space to accompany them. Fixes #1556.
For multicore SDR it is important that the producer and conumers share the same (L3) cache. Hence we bind specific cores to threads. Prior to this change there was one multicore SDR job per group, even if the group could accompany multiple of such jobs. If there were more jobs scheduled than groups available, those additional jobs wouldn't use specific cores, but whatever the operating system decided. With this change, additional jobs are now put into the groups in case there is enough space to accompany them. Fixes #1556.
For multicore SDR it is important that the producer and conumers share the same (L3) cache. Hence we bind specific cores to threads. Prior to this change there was one multicore SDR job per group, even if the group could accompany multiple of such jobs. If there were more jobs scheduled than groups available, those additional jobs wouldn't use specific cores, but whatever the operating system decided. With this change, additional jobs are now put into the groups in case there is enough space to accompany them. Fixes #1556.
Is this currently stable enough to be included in the lotus release? @vmx |
@benjaminh83 Could you request another jenni-tag since the code has been updated? |
Yes, the plan is to have it in the next Lotus release. |
Great! I hope you meant v1.17, not v1.18. |
v1.18. |
thats most likely september :/ |
so... when do you schedule v1.18_rc1? :) |
For multicore SDR it is important that the producer and conumers share the same (L3) cache. Hence we bind specific cores to threads. Prior to this change there was one multicore SDR job per group, even if the group could accompany multiple of such jobs. If there were more jobs scheduled than groups available, those additional jobs wouldn't use specific cores, but whatever the operating system decided. With this change, additional jobs are now put into the groups in case there is enough space to accompany them. Fixes #1556.
For multicore SDR it is important that the producer and conumers share the same (L3) cache. Hence we bind specific cores to threads. Prior to this change there was one multicore SDR job per group, even if the group could accompany multiple of such jobs. If there were more jobs scheduled than groups available, those additional jobs wouldn't use specific cores, but whatever the operating system decided. With this change, additional jobs are now put into the groups in case there is enough space to accompany them. Fixes #1556.
Thanks @vmx, now it works very well on 7302*2, great work~~ |
Description
We are using
hwloc
for the multicore SDR. We want to make sure that the workers and the producer should share the same L3 cache. As reported filecoin-project/lotus#7981 it looks like we assign at most one multicore SDR per group. If there requested more then those groups, they get arbitrarily assigned.Ideally if all groups already have one multicore SDR assigned, we should check if there is still enough room, to put another one into a group. Example: You have 24 cores and they form 4 groups where each group has its own L3 cache. You'd have 6 cores for SDR. If you now only have 2 producers (and one consumer), you only use 3 cores. You could put another SDR run in there. They will battle for the L3 cache, but that might be fine if the lookahead is small enough.
Acceptance criteria
You can have severals SDRs assigned to the same group if there are enough cores. Ideally it's a test that tests for the assignment it will use. This tests would likely need to be run manually as it's highly hardware specific, but that's fine.
Risks + pitfalls
Make sure that the current assignment of one SDR per group (until there are no more free groups left) is kept intact.
Where to begin
https://github.com/filecoin-project/rust-fil-proofs/blob/307e7af34732eed5d3a6885e4da33489dbcb1ccf/storage-proofs-porep/src/stacked/vanilla/cores.rs contains the relevant code.
The text was updated successfully, but these errors were encountered: