-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define Transmit interface #1
Comments
The proposed "pcap" like layout you have suggested where the EOF is indicated by length=0 is not that PCIe efficient because HW would need to read a packet at a time. It would be more efficient if the blocksize is know up front. |
@mlilja01 Good points! Having the hardware de-block is interesting. I would really like to support operating at line-rate 100G on a single queue. This would mean the hardware needs to extract a packet from the block at ~145MHz which is probably close to the clock speed of the circuit. Is this reasonable? I want to avoid the situation that we saw on the ConnectX-4 (snabbco/snabb#1007 (comment)) where even on a 100G NIC the per-queue performance maxed out at around 15 Mpps (10% of line rate). If that is the situation then the application needs to shard traffic across many queues and then it can become complicated to preserve ordering (reassemble based on timestamps???) and shard the traffic in an application-appropriate way (need an eBPF VM to hash the headers???) So presumably it is very important that the DMA layout does not constrain per-queue parallelism on the device and allows it to extract a packet on more-or-less every cycle. Yes? EDIT: s/GHz/MHz/ |
A single queue running 100G is possible, we do that today on our NICs. Actually the NICs can handle 200G, but we don't have PCIe4 in any x86 servers yet. The issue we mostly see is that SW cannot keep up with a single queue. The drawback of a block of packets is that it is not very protocol stack friendly. Normal networking apps like to have a buffer per packet, which is very handy but very bad PCIe wise. |
Yes. I see this as a "with great power comes great responsibility" situation. The EasyNIC design will concentrate all of the complexity in one place i.e. on the host CPU. This is different than mainstream ASIC NICs that seem eager to divide functionality between hardware and software using more elaborate interfaces (scatter-gather, offloads, multiqueue, etc.) |
Has to be really easy to interface with on a driver, and really easy to implement in silicon, and really efficient with PCIe bandwidth.
The text was updated successfully, but these errors were encountered: