Leader process apparently running on wrong NUMA node #1133

dpino · 2018-07-02T13:34:03Z

Fixes #731.

I triaged #731 and sometimes the manager process was bound to CPU 0 (NUMA 0) and other times to CPU 6 (NUMA 1). I think the reason why the manager process didn't get bound to a core in the same NUMA node as the lwAFTR's process was that actually the process was not bind to any CPU core. The manager simply called numa.bind_to_numa_node but this doesn't bind a process to a core. Thus, the OS scheduled the process to any of the cores available in both NUMA nodes (sometimes 0, sometimes 6).

wingo · 2018-09-03T12:32:36Z

Backing up a bit... the point I had in my mind about binding the leader process to a NUMA node is so that the shared-memory objects it creates for the workers would be local to the NUMA node that the worker is running on.

However, there's also the question that the shared objects made by the worker (e.g. counters) should remain NUMA-local to the worker. Reading them from a remote NUMA node would force them out of the Modified MESI state and into Shared; https://en.wikipedia.org/wiki/MESI_protocol. So I agree that we should try to get the manager is running on a CPU node of the NUMA node.

I think probably we should just just change lib.numa's "bind_to_numa_node" to also (try to) bind CPU. WDYT?

dpino · 2018-09-03T17:40:45Z

OK, I simplified the code. All the logic about figuring out what are the OS's available CPUs remains, as it's necessary to not schedule the manager process in an isolated CPU. But now CPUSet:bind_to_numa_node binds the process to the first available CPU. Before the manager called CPUSet:bind_to_first_available_cpu which ended up calling Numa:bind_to_numa_node.

Example on correct NUMA node:

$ sudo ./snabb lwaftr run --cpu 11 --conf lwaftr.conf --on-a-stick 83:00.0
lwaftr.conf: loading compiled configuration from lwaftr.o
lwaftr.conf: compiled configuration is up to date.
Migrating instance '0000:83:00.0' to '83:00.0'
Binding data-plane PID 15760 to CPU 11.
Bound main process to NUMA node: 1 (CPU 6)

Example on wrong NUMA node:

$ sudo ./snabb lwaftr run --cpu 2 --conf lwaftr.conf --on-a-stick 83:00.0
lwaftr.conf: loading compiled configuration from lwaftr.o
lwaftr.conf: compiled configuration is up to date.
Migrating instance '0000:83:00.0' to '83:00.0'
Warning: No CPU available on local NUMA node 1
Warning: Assigning CPU 2 from remote node 0
Binding data-plane PID 15753 to CPU 2.
Bound main process to NUMA node: 0 (CPU 0)

wingo · 2018-09-04T10:27:18Z

src/lib/cpuset.lua

+      return parse_cpulist_from_file(node_path..'/cpulist')
+   end
+   local function isolated_cpus ()
+      return parse_cpulist_from_file('/sys/devices/system/cpu/isolated')


Clever! I had no idea this file was a thing.

Two requests:

(1) Can we make a method to subtract cpulists ? That way avail = subtract(cpus_in_node, isolated_cpus).

(2) Can we set affinity in bind_to_numa_node to the interserction of the current CPU affinity with all non-isolated CPUs on a node, not just the first one? That will give Linux a bit more freedom to optimize these non-performance-critical processes, while also preserving the user's ability to do taskset -c N.

wingo

thanks for the patch and thanks for bearing with my feedback; lgtm!

wingo · 2018-09-04T14:55:28Z

src/lib/cpuset.lua

+   local function isolated_cpus ()
+      return set(parse_cpulist_from_file('/sys/devices/system/cpu/isolated'))
+   end
+   local function substract (s, t)


nit: "subtract"

wingo · 2018-09-04T14:58:59Z

src/lib/cpuset.lua

+      table.sort(ret)
+      return ret
+   end
+   return substract(cpus_in_node(node), isolated_cpus())


As a FIXME I think this should be unioned with the current CPU affinity (as returned by sched_getaffinity), to allow for users to run the manager with "taskset". However, not a blocker!

dpino force-pushed the fix-issue-731 branch 2 times, most recently from d2de90d to 8ac7f0b Compare September 3, 2018 17:34

wingo reviewed Sep 4, 2018

View reviewed changes

wingo approved these changes Sep 4, 2018

View reviewed changes

dpino added 5 commits September 5, 2018 08:47

Extract method parse list of cpus

8f8b43b

Bind manager process to first available CPU in NUMA node

1579392

Make CPUSet:bind_to_numa bind to CPU too

b636355

Compute available cores by subtracting total cores - isolated

7845eec

Bind manager process to a set of available cpus

b6d50e6

dpino force-pushed the fix-issue-731 branch from b690197 to b6d50e6 Compare September 5, 2018 08:48

dpino merged commit 66dd91d into Igalia:lwaftr Sep 5, 2018

dpino deleted the fix-issue-731 branch September 5, 2018 09:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Leader process apparently running on wrong NUMA node #1133

Leader process apparently running on wrong NUMA node #1133

dpino commented Jul 2, 2018 •

edited

Loading

wingo commented Sep 3, 2018

dpino commented Sep 3, 2018 •

edited

Loading

wingo Sep 4, 2018

wingo left a comment

wingo Sep 4, 2018

wingo Sep 4, 2018

Leader process apparently running on wrong NUMA node #1133

Leader process apparently running on wrong NUMA node #1133

Conversation

dpino commented Jul 2, 2018 • edited Loading

wingo commented Sep 3, 2018

dpino commented Sep 3, 2018 • edited Loading

wingo Sep 4, 2018

Choose a reason for hiding this comment

wingo left a comment

Choose a reason for hiding this comment

wingo Sep 4, 2018

Choose a reason for hiding this comment

wingo Sep 4, 2018

Choose a reason for hiding this comment

dpino commented Jul 2, 2018 •

edited

Loading

dpino commented Sep 3, 2018 •

edited

Loading