Define Transmit interface #1

lukego · 2018-05-24T09:39:34Z

Has to be really easy to interface with on a driver, and really easy to implement in silicon, and really efficient with PCIe bandwidth.

mlilja01 · 2018-05-25T07:44:49Z

The proposed "pcap" like layout you have suggested where the EOF is indicated by length=0 is not that PCIe efficient because HW would need to read a packet at a time. It would be more efficient if the blocksize is know up front.
One very efficient way is to have multiple packets back to back in contiguous memory and DMA based on start address and then a block size. The HW would need to be-block though.

lukego · 2018-05-25T07:53:29Z

@mlilja01 Good points!

Having the hardware de-block is interesting. I would really like to support operating at line-rate 100G on a single queue. This would mean the hardware needs to extract a packet from the block at ~145MHz which is probably close to the clock speed of the circuit. Is this reasonable?

I want to avoid the situation that we saw on the ConnectX-4 (snabbco/snabb#1007 (comment)) where even on a 100G NIC the per-queue performance maxed out at around 15 Mpps (10% of line rate). If that is the situation then the application needs to shard traffic across many queues and then it can become complicated to preserve ordering (reassemble based on timestamps???) and shard the traffic in an application-appropriate way (need an eBPF VM to hash the headers???)

So presumably it is very important that the DMA layout does not constrain per-queue parallelism on the device and allows it to extract a packet on more-or-less every cycle. Yes?

EDIT: s/GHz/MHz/

mlilja01 · 2018-05-25T08:18:27Z

A single queue running 100G is possible, we do that today on our NICs. Actually the NICs can handle 200G, but we don't have PCIe4 in any x86 servers yet. The issue we mostly see is that SW cannot keep up with a single queue.

The drawback of a block of packets is that it is not very protocol stack friendly. Normal networking apps like to have a buffer per packet, which is very handy but very bad PCIe wise.

lukego · 2018-05-25T08:44:15Z

The drawback of a block of packets is that it is not very protocol stack friendly.

Yes. I see this as a "with great power comes great responsibility" situation. The EasyNIC design will concentrate all of the complexity in one place i.e. on the host CPU. This is different than mainstream ASIC NICs that seem eager to divide functionality between hardware and software using more elaborate interfaces (scatter-gather, offloads, multiqueue, etc.)

lukego mentioned this issue May 24, 2018

Define the Receive interface #2

Open

lukego mentioned this issue May 25, 2018

Consider adding TX_BLOCK_SIZE register #6

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define Transmit interface #1

Define Transmit interface #1

lukego commented May 24, 2018

mlilja01 commented May 25, 2018

lukego commented May 25, 2018 •

edited

Loading

mlilja01 commented May 25, 2018

lukego commented May 25, 2018

Define Transmit interface #1

Define Transmit interface #1

Comments

lukego commented May 24, 2018

mlilja01 commented May 25, 2018

lukego commented May 25, 2018 • edited Loading

mlilja01 commented May 25, 2018

lukego commented May 25, 2018

lukego commented May 25, 2018 •

edited

Loading