-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update benchmarks #46
Comments
Makefile to create benchmark csv files: LWAFTR=./snabb-lwaftr
IPV4_PCAP:=/home/dpino/snabbswitch/tests/apps/lwaftr/benchdata/ipv4-0550.pcap
IPV6_PCAP:=/home/dpino/snabbswitch/tests/apps/lwaftr/benchdata/ipv6-0550.pcap
BINDING_TABLE=/home/dpino/snabbswitch/tests/apps/lwaftr/data/binding.table
LWAFTR_CONF=/home/dpino/snabbswitch/tests/apps/lwaftr/data/icmp_on_fail.conf
LWAFTR_IPV4_PCIADDR=0000:81:00.1
LWAFTR_IPV6_PCIADDR=0000:82:00.1
TRANSIENT_IPV4_PCIADDR=0000:81:00.0
TRANSIENT_IPV6_PCIADDR=0000:82:00.0
TRANSIENT_SELF_TEST_NIC1_PCIADDR=0000:81:00.0
TRANSIENT_SELF_TEST_NIC2_PCIADDR=0000:81:00.1
SHELL=/bin/bash
transient-self-test.csv: $(LWAFTR)
./snabb-lwaftr transient -s 0.25e9 -D 2 $(IPV4_PCAP) NIC1 $(TRANSIENT_SELF_TEST_NIC1_PCIADDR) $(IPV4_PCAP) NIC2 $(TRANSIENT_SELF_TEST_NIC2_PCIADDR) > $@
lwaftr-%.csv:: $(LWAFTR)
( set -m; set -o pipefail; \
./snabb-lwaftr run -D 170 --bt $(BINDING_TABLE) --conf $(LWAFTR_CONF) --v4-pci $(LWAFTR_IPV4_PCIADDR) --v6-pci $(LWAFTR_IPV6_PCIADDR) & \
./snabb-lwaftr transient -s 0.25e9 -D 2 $(IPV4_PCAP) IPv4 $(TRANSIENT_IPV4_PCIADDR) $(IPV6_PCAP) IPv6 $(TRANSIENT_IPV6_PCIADDR) | tee $@.tmp && \
mv $@.tmp $@ && \
wait ) |
So this makefile resulted in poor perf numbers; even with taskset to ensure that the transient and the run executables were run on different sockets, strangely I was peaking at 1.5MPPS. Very odd. But then on separate login shells I ran the programs manually, with taskset, and reliably reached 2MPPS. Very weird. So in the end our perf numbers are from this second, more manual way of running things: one one shell:
and on another
|
I finally realized what is going on, I think. Since on interlaken the .0 ports are cabled to the .1 ports it's not possible to run the load generator on a different socket from the lwaftr and be optimal, because there are two NUMA nodes (one for each socket) and the .0 and .1 ports for a given card will probably be on the same NUMA node. Getting the NUMA node wrong is worse than the cache interference so we should be running on the same socket. |
snabbco#628 (intel10g txdesc prefetch) may improve load generator performance to compensate for NUMA effects. (Seems to in quick testing now.) |
@lukego interesting! will follow that as we see how we do on smaller packets. for now though, what a relief to finally identify the source of our observed perf variance (numa), and to be able to fix it! |
Update the lwAFTR benchmarks, also making sure that the documentation that can generate the benchmarks is up-to-date with the transient program from #43. No external nic_ui / snsh script invocation should be necessary.
The results will be:
Andy has the graphing scripts; Diego to do the benchmarking once we have a final snabb-lwaftr binary.
The text was updated successfully, but these errors were encountered: