Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update benchmarks #46

Closed
wingo opened this issue Sep 30, 2015 · 6 comments
Closed

Update benchmarks #46

wingo opened this issue Sep 30, 2015 · 6 comments
Assignees
Labels

Comments

@wingo
Copy link

wingo commented Sep 30, 2015

Update the lwAFTR benchmarks, also making sure that the documentation that can generate the benchmarks is up-to-date with the transient program from #43. No external nic_ui / snsh script invocation should be necessary.

The results will be:

  • A self-test of the "transient" benchmarking harness (is it able to generate the desired load)
    • as a CSV file and as a graph of RX/TX MPPS/Gbps over time
  • A transient load over the lwAFTR showing processed MPPS and Gbps as a function of incoming MPPS / Gbps on a 550-byte packet full-duplex workload
    • as a raw CSV file, as a graph of RX/TX Gbps over tiem, and as TX vs RX graphs
    • 10 runs on same graph to show variance

Andy has the graphing scripts; Diego to do the benchmarking once we have a final snabb-lwaftr binary.

@wingo wingo added the lwaftr label Sep 30, 2015
@wingo wingo added this to the Proof-of-Concept milestone Sep 30, 2015
@wingo
Copy link
Author

wingo commented Sep 30, 2015

Makefile to create benchmark csv files:

LWAFTR=./snabb-lwaftr

IPV4_PCAP:=/home/dpino/snabbswitch/tests/apps/lwaftr/benchdata/ipv4-0550.pcap
IPV6_PCAP:=/home/dpino/snabbswitch/tests/apps/lwaftr/benchdata/ipv6-0550.pcap
BINDING_TABLE=/home/dpino/snabbswitch/tests/apps/lwaftr/data/binding.table
LWAFTR_CONF=/home/dpino/snabbswitch/tests/apps/lwaftr/data/icmp_on_fail.conf

LWAFTR_IPV4_PCIADDR=0000:81:00.1
LWAFTR_IPV6_PCIADDR=0000:82:00.1

TRANSIENT_IPV4_PCIADDR=0000:81:00.0
TRANSIENT_IPV6_PCIADDR=0000:82:00.0

TRANSIENT_SELF_TEST_NIC1_PCIADDR=0000:81:00.0
TRANSIENT_SELF_TEST_NIC2_PCIADDR=0000:81:00.1

SHELL=/bin/bash

transient-self-test.csv: $(LWAFTR)
    ./snabb-lwaftr transient -s 0.25e9 -D 2 $(IPV4_PCAP) NIC1 $(TRANSIENT_SELF_TEST_NIC1_PCIADDR) $(IPV4_PCAP) NIC2 $(TRANSIENT_SELF_TEST_NIC2_PCIADDR) > $@

lwaftr-%.csv:: $(LWAFTR)
    ( set -m; set -o pipefail; \
      ./snabb-lwaftr run -D 170 --bt $(BINDING_TABLE) --conf $(LWAFTR_CONF) --v4-pci $(LWAFTR_IPV4_PCIADDR) --v6-pci $(LWAFTR_IPV6_PCIADDR) & \
      ./snabb-lwaftr transient -s 0.25e9 -D 2 $(IPV4_PCAP) IPv4 $(TRANSIENT_IPV4_PCIADDR) $(IPV6_PCAP) IPv6 $(TRANSIENT_IPV6_PCIADDR) | tee $@.tmp && \
      mv $@.tmp $@ && \
      wait )

@wingo wingo self-assigned this Sep 30, 2015
@wingo
Copy link
Author

wingo commented Oct 1, 2015

So this makefile resulted in poor perf numbers; even with taskset to ensure that the transient and the run executables were run on different sockets, strangely I was peaking at 1.5MPPS. Very odd. But then on separate login shells I ran the programs manually, with taskset, and reliably reached 2MPPS. Very weird. So in the end our perf numbers are from this second, more manual way of running things: one one shell:

for i in `seq 1 10`; do sudo taskset -c 0 ~/snabb-lwaftr-v0.2/snabb-lwaftr transient -D 1 -s 0.25e9 ~/snabb-lwaftr-v0.2/snabbswitch/tests/apps/lwaftr/benchdata/ipv4-0550.pcap IPv4 :81:00.0 ~/snabb-lwaftr-v0.2/snabbswitch/tests/apps/lwaftr/benchdata/ipv6-0550.pcap IPv6 :82:00.0 >lwaftr-$i.csv; done

and on another

for i in `seq 1 10`; do sudo taskset -c 6 ./snabb-lwaftr run -D 80 --bt snabbswitch/tests/apps/lwaftr/data/binding.table --conf snabbswitch/tests/apps/lwaftr/data/icmp_on_fail.conf --v4-pci 0000:81:00.1 --v6-pci 0000:82:00.1; done

@wingo
Copy link
Author

wingo commented Oct 1, 2015

I finally realized what is going on, I think. Since on interlaken the .0 ports are cabled to the .1 ports it's not possible to run the load generator on a different socket from the lwaftr and be optimal, because there are two NUMA nodes (one for each socket) and the .0 and .1 ports for a given card will probably be on the same NUMA node.

Getting the NUMA node wrong is worse than the cache interference so we should be running on the same socket.

@wingo
Copy link
Author

wingo commented Oct 1, 2015

@wingo wingo closed this as completed Oct 1, 2015
@lukego
Copy link

lukego commented Oct 1, 2015

snabbco#628 (intel10g txdesc prefetch) may improve load generator performance to compensate for NUMA effects. (Seems to in quick testing now.)

@wingo
Copy link
Author

wingo commented Oct 2, 2015

@lukego interesting! will follow that as we see how we do on smaller packets. for now though, what a relief to finally identify the source of our observed perf variance (numa), and to be able to fix it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants