Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fraggle rock #195

Closed
wants to merge 6 commits into from
Closed

Fraggle rock #195

wants to merge 6 commits into from

Conversation

wingo
Copy link

@wingo wingo commented Jan 6, 2016

A beginning to a solution to #180. The perf seems to be 3.49+3.49 MPPS, which is a little worse than what I was hoping for. There is still an end-to-end.sh bug regarding DF in the tunneled IPv4 packet; the code I made punts, but the current code in the lwaftr sends an ICMPv4 error message. Perhaps @aperez or @kbara knows if the current code is the right thign.

Because this change leads to more complicated graphs, this commit also
adds a helper to configure graphs.  In the future this will let us use
hardware capabilities to detag vlans.
Adapt expectation for ICMP response for don't-fragment packet to include
more bytes to be 576 bytes total, not counting the L2 headers.
In the common case where the NIC can strip the VLAN tag for you, that's
what we do.
@dpino
Copy link
Member

dpino commented Jan 8, 2016

LGTM.

@kbara
Copy link

kbara commented Jan 8, 2016

The new ICMP packets violate the documentation we provide on our RFC compliance:

https://github.com/Igalia/snabbswitch/blob/lwaftr_olive/src/program/snabb_lwaftr/doc/README.rfccompliance.md

ICMPv4 packets are [RFC 1812](https://tools.ietf.org/html/rfc1812) compliant:
that is, they contain up to 576 bytes, rather than having an
[RFC 792](https://tools.ietf.org/html/rfc792) style IPv4 header + 8 octets
payload.

As RFC 1222 says:

In general, no host is required to accept an IP
 datagram larger than 576 bytes (including header and
 data), so a host must not send a larger datagram
 without explicit knowledge or prior arrangement with
 the destination host. 

The differences between icmpv4-fromlwaftr-replyto-tcp-frominet-bound1494-DF.pcap in lwaftr_olive and in this branch, in order, are:

  • Two pcap header bytes; this is (probably) a consequence of the content.
  • Length; this is a genuine problem
  • IP and ICMP checksums (3 bytes differ): this is a side-effect of length

Given that end-to-end.sh passes and this branch has two new pcap files, I'm presuming that those are to avoid the test failures. What did you mean in the first comment about your code punting?

if full_pkt_size > max_size then
full_pkt_size = max_size
if full_pkt_size > max_size + l2_size then
full_pkt_size = max_size + l2_size
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this perhaps the root cause of the too-large-ICMP-packet bug I mentioned in the 'conversation' tab?

@kbara
Copy link

kbara commented Jan 8, 2016

Aside from that, nice work. I really like separating out the fragmentation handling - that should be pretty cheap. I'm less convinced about factoring out the vlan handling, or the overall performance hit. What does performance (for purely-unfragmented and purely-fragmented workloads) look like if fragmentation is factored out, but vlan handling is spread out like now rather than factored into an app?

@kbara
Copy link

kbara commented Jan 8, 2016

Re: out-of-band discussion (thanks Andy), I'm now convinced that this branch's length handling is superior, because the 576 byte limit should be on the IP datagram itself, not the whole packet. In fact, it could reasonably go farther, as https://tools.ietf.org/html/rfc1812 says:

Therefore, the ICMP
   datagram SHOULD contain as much of the original datagram as possible
   without the length of the ICMP datagram exceeding 576 bytes. 

This mentions the length of the ICMP datagram rather than IP datagram - though following it requires ignoring RFC 1222's warning (previous post). In practice, either interpretation is unlikely to cause trouble on the types of hosts that realistically exist on today's internet, as far as I know.

@wingo
Copy link
Author

wingo commented Jan 8, 2016

Regarding punting, my first branch failed tests. I force-pushed to update it before review started. Now it passes at each commit. 68feb75 changes the expectations without switching to the new vlan elements; the last patch switches to separate soft vlan taggers/untaggers, but only if the hardware can't do it for us. In practice I expect that in most cases we are interested in, hardware can untag (it's certainly the case for intel hardware running non-virtualized). I see no speed difference between using the hardware to untag and doing it inside the lwaftr itself (i.e., the code before this patch); though there are some maintenance benefits towards separating the vlan and lwaftr concerns, and it will possibly make performance improvements easier to make. For testing we obviously use the soft vlan taggers/untaggers though.

@kbara
Copy link

kbara commented Jan 8, 2016

Ahh... I wondered if it was an up to date comment, thanks!

I'm a bit reluctant to rely on the hardware, especially if it's not giving us a speed increase. I think that's potentially asking for trouble when someone runs it on a different card. (I'm quite biased towards (my understanding of) the Snabb philosophy of "do it in software if it's not too expensive", rather than relying on offloading as much as possible to hardware).

@wingo
Copy link
Author

wingo commented Jan 8, 2016

Yeah I know what you mean! I think though that "the snabb way" to handle vlans is with separate apps. With the current packet structure though there's a not insignificant perf overhead, see #184 or snabbco#648 for discussion.

In short, VLAN handling in apps > VLAN handling in hardware > VLAN handling in unrelated elements like the lwAFTR. WDYT?

@kbara
Copy link

kbara commented Jan 8, 2016

I strongly agree that VLAN handling in apps > VLAN handling in unrelated elements, though I'm less convinced that the latter is < VLAN handling in hardware. I had Luke's comment in #184 in mind when I said that. I'd really like to see if the offset-packet approach made the first feasible; I wish I had the hack time to find out.

@wingo
Copy link
Author

wingo commented Jan 8, 2016

Given that you think that vlan in apps >> vlan in lwaftr, I take that as an OK. Whether to use the NIC to handle VLANs is then orthogonal -- the current patch takes advantage of the hardware if present but doesn't require its use. Thanks for the feedback! Porting this patch to rambutan.

@wingo
Copy link
Author

wingo commented Jan 8, 2016

Closing in favor of #197. Olive is dead!

@wingo wingo closed this Jan 8, 2016
wingo added a commit that referenced this pull request Jan 8, 2016
@kbara
Copy link

kbara commented Jan 11, 2016

It's an ok, with the caveat that that's ideology, and performance may trump it. :-)

@wingo wingo deleted the fraggle-rock branch October 19, 2016 09:59
eugeneia added a commit that referenced this pull request Jan 13, 2022
e587f8c55 Merge pull request #225 from vavrusa/master
5ea3a881e bpf: add missing constants for linux 4.10 - 4.15
2c691e5a7 Merge pull request #224 from wingo/lseek-syscall-tweak
ae38bdbd7 Make "offset" arg to lseek a signed integer
5cb3b6950 Merge pull request #221 from wingo/util-ls-fix
8e0874609 Promptly close util.ls() dir fd; fix bug with deleted entries
277517436 Merge pull request #220 from sileht/master
8e48fd094 linux/nl.lua: Use ndmsg struct instead of rtmsg for neighbors
3e482bc4e Merge pull request #215 from qsn/gettid
57520cce3 expose gettid
db1a88e94 Merge pull request #214 from jsitnicki/sof-flags-linux
270a6e611 Add missing type for struct scm_timestamping for Linux
e49232047 Add missing SCM_* constants for Linux timestamping API
60fcc6b48 Add constants for SOF_* flags for Linux
26ac34851 Merge pull request #210 from alexandergall/linux-if-ioctls
3e6d3e27c Merge pull request #211 from fperrad/deb
d425a22b2 dummy changelog
b27eca538 update .gitignore
92292aa4b debian files
be257a7e1 debian files generated by lua-create-gitbuildpackage-layout
50a02b94b Add some SIOC{G,S}IF* ioctls for Linux
ee90324 Merge pull request #209 from justincormack/osx_clock
ee17863 Add CLOCK_ constants for OSX
178d244 Merge pull request #208 from justincormack/holes
61450f5 Add SEEK_DATA and SEEK_HOLE constants
9a7b584 Add memfd fnctl sealing support
99beaf5 more test fixes for memfd
cc221e4 fix ctest for new fcntl changes, typowq
56c4c76 fix ctest for new fcntl changes
96073cc fix typo
7a73e8a Add more constants for fcntl, memfd
8d3034c Fix ppc64le syscall numbers for newer calls
62828f6 Merge pull request #206 from johnae/master
c4002b6 The spook project is using ljsyscall.
0b266e8 Merge pull request #205 from lukego/close-fd-safely
ad91aa9 Add more protection against fd double-close
c8baf9e Merge pull request #202 from justincormack/dockerignore
66843c5 Use dockerignore to simplify Dockerfile
7b7211d Merge pull request #200 from justincormack/redo-dockerfile
a779caa Docker Cloud does not start processes at priority 0, remove from test
b85382d Rework Dockerfile and tests
24f7789 Merge pull request #199 from vavrusa/master
2ecf486 linux/constants: added new BPF map and prog types
00c1949 use Alpine 3.4 for docker build
0bcafc6 Merge pull request #196 from kbara/removecunused
a4217a8 Merge pull request #195 from kbara/fixgetcpu
ee67430 Remove unused variables from c.lua
4b2e0b2 Fix getcpu: the cpu argument was incorrectly given the node variable previously
214550a update changelog and rockspec for 0.12 release
21f3fd8 Merge pull request #192 from vavrusa/master
0da437f linux/bpf_prog_load: support custom kernel builds
9981190 fix missing vhangup
f245114 Merge pull request #191 from vavrusa/linux-perf-open
93558c1 linux: added support for tracing/performance counters
0511fb8 Merge pull request #190 from vavrusa/master
1f141ca linux: added new constants (e.g. attach BPF to socket)
36274f3 linux/bpf: added strflag support
4fd3bd6 Merge pull request #189 from vavrusa/master
aaa89cb linux: added support for eBPF
e095295 linux: added new syscall numbers (up to __nr_bpf)
1e079c4 test calls container so just needs file to run
d92625e define a docker compose test
5f14711 Merge pull request #188 from aperezdc/fix-if-nameindex
8915a83 Close socket immediately after error in if_nametoindex()
96c0286 use addons for travis, as learned at fosdem
71241e0 update changelog for unreleased changes
559b499 ignore audit arch constant
5d867d3 new architectures do not have open, will use openat
49d9ff9 test issues with new constants
7cd460e aarch64 audit constants for seccomp
6249e99 more docker examples
dd00af9 rockspec fix
d20033b rockspec for new release
fb17244 update copyright years
9d87597 more Changelog tweaks before release
53856b0 typo
1881526 sometimes winsz is 0, eg if terminal not set up
ab0c08d add error message
a1c207e update Changelog for forthcoming 0.11 release
ce12fb2 addDocker hub to README
602a2b3 fix osx fstatat to use 32 bit stat type, as cannot find how to call to get 64 bit one
8a0a6ad Now have arm machines with working seccomp
d14bd38 Appears that setting maac address on lo often works
6e878a1 remove debug print from test
30e9b5b allow skip on EPERM for adjtimex
bff3e90 Add strace in Docker image for convenient debugging
e67fa31 Use alpine 3.3 for Docker
5148bc3 fix ipv6 tests
185c1a6 more failures with no ipv6
98cc9a2 more fixes for ipv4 only environments for netlink tests
248a935 fix bind errors in environments that do not support ipv6
0eacd64 clean up travis file
5fb71b6 switch to newer Ubuntu in travis
8235724 fix more constant checks not in headers
8dff4ef constants missing on travis
db51b08 update Changelog
7540b04 add new rtnetlink values, so tests work under docker
22604b8 remove test that fails in some environments
d431693 fix waitid test under docker
1595b7d fix swap test under docker
321fdd2 Now an alpine package available
984b533 Add Dockerfile, fix some of the tests that made unreasonable assumptions
9aeff88 recent osx has *at functions
18cd829 better handling for xattr errors
b6bb892 freebsd 11 now has utimensat
7065b0d on freebsd/zfs chflags will fail, skip

git-subtree-dir: lib/ljsyscall
git-subtree-split: e587f8c55aad3955dddab3a4fa6c1968037b5c6e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants