You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There is no problems with SDR enabled in 1x CPU setup. But it leads to problems in 2x CPU setups where two lotus-workers running (each binded to its own CPU and NUMA).
Looks like lotus-woker spawn its sealing tasks always from core 0 even if he binded to another NUMA node with cores 16-31 for example.
Typical picture of what is happening (2 cores per CCX, for example):
lotus-worker-1 binded to cpus 0-15 and numa node 1.
lotus-worker-2 binded to cpus 16-31 and numa node 2.
lotus-worker-1 spawn its first sealing task on cores 0+1.
lotus-worker-1 spawn its first sealing task on cores 0+1 too.
both sealing tasks run on cores 0+1 and affects each other.
Same situsation is observed when running by numactl with cpunodebind, numactl with physcpubind, taskset wich list of cores.
Current correctly wirking workaround is to manually create two different cgroups with different cpusets+mems and run workers inside them.
P.S. There are many processors with different configurations of CCX and L3 shares.
It will be better to give miner operator some low level config options.
So that the miner can specify the list of working cores for worker and sequence of launch this cores.
For example, if I work without SDR enabled in multicore CPUs I also don't want to run tasks on neighbor cores which shares L3 cache.
I want my miner use only cores with untuched L3 cahce. But its an another issue...
Description
Some background is detailed here:
filecoin-project/lotus#5848
Acceptance criteria
Risks + pitfalls
Where to begin
The text was updated successfully, but these errors were encountered: