Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicated flush in onvm_pkt_process_rx_batch #281

Closed
JackKuo-tw opened this issue Mar 23, 2021 · 5 comments
Closed

Duplicated flush in onvm_pkt_process_rx_batch #281

JackKuo-tw opened this issue Mar 23, 2021 · 5 comments
Labels

Comments

@JackKuo-tw
Copy link
Contributor

JackKuo-tw commented Mar 23, 2021

Bug Report

Current Behavior

The function onvm_pkt_process_rx_batch calls the following 2 function, and both of them call onvm_pkt_flush_nf_queue finally, which could lower the performance.

Expected behavior/code

Call it once only.

Steps to reproduce
Provide command line instructions if possible.

Environment

  • OS: Ubuntu 20.04 LTS
  • onvm version: 20.10 (but the latest develop branch still has this issue)
  • dpdk version: 20.05

Possible Solution

Because onvm_pkt_process_tx_batch also calls onvm_pkt_enqueue_nf and do NOT calls any flush function, I think there are 3 solutions:

  1. Remove onvm_pkt_flush_all_nfs in onvm_pkt_process_rx_batch
  2. Add a control parameter to onvm_pkt_enqueue_nf, which can control whether to call onvm_pkt_flush_nf_queue
  3. Remove onvm_pkt_flush_nf_queue from onvm_pkt_enqueue_nf, and add onvm_pkt_flush_nf_queue to onvm_pkt_process_tx_batch

BTW

I trace this code because I found that if I have multiple simple_forward the first one in the chain must consume much more CPU usage than others, this is so strange.

But I think this issue could not solve my problem, I haven't verified this guess.

@JackKuo-tw
Copy link
Contributor Author

I have tried solution 1 (comment out onvm_pkt_flush_all_nfs)
The simple_forward CPU usage decreases from 62 to 50%!

Env

VNF Chain: fwd -> fwd -> fwd -> firewall
ONVM: enable shared cores
pktgen: 5.7 M pkt/s, UDP, 64 bytes

Performance

==CPU USage==

  • current version
    • fwd1: 62%
    • fwd2: 45%
    • fwd3: 45%
    • firewall: 57%
  • solution 1
    • fwd1: 50%
    • fwd2: 45%
    • fwd3: 45%
    • firewall: 57%

@twood02
Copy link
Member

twood02 commented Mar 23, 2021

Thanks @JackKuo-tw.

I believe the reason we have onvm_pkt_flush_all_nfs in onvm_pkt_process_rx_batch is that otherwise if there is very low traffic you might never correctly flush out the packets. The other flushes only happen if a buffer is full, but what if that condition isn't met?

Can you test to see how your solution behaves under a low traffic rate, for example just using a ping command to generate 1 packet a second?

In any case, it is interesting that you find this consuming a large amount of CPU. We have not done much profiling to find the main sources of overhead.

@JackKuo-tw
Copy link
Contributor Author

JackKuo-tw commented Mar 24, 2021

@twood02 You're right, solution 1 blocks all packets if the buffer count is lower than PACKET_READ_SIZE (default 32).

It's a trade-off, for either low latency or high throughput.

AFAIK, the following modification should reach our goal, which uses the timeout concept:

  1. move onvm_pkt_flush_all_nfs() from onvm_pkt.c to main.c
  2. Add delay flush code
    • If only a few packets in, it reaches flush_threshold faster than lots of packets condition
    • If lots of packets in, buffer reaches PACKET_READ_SIZE soon and flushed in onvm_pkt_enqueue_nf
rx_thread_main(void *arg) {
        ...
!!      int flush_threshold = 0;
!!      int enable_flush_count = 0;
        ...
        for (; worker_keep_running;) {
                /* Read ports */
                for (i = 0; i < ports->num_ports; i++) {
                        rx_count = rte_eth_rx_burst(ports->id[i], rx_mgr->id, pkts, PACKET_READ_SIZE);
                        ports->rx_stats.rx[ports->id[i]] += rx_count;

                        /* Now process the NIC packets read */
                        if (likely(rx_count > 0)) {
                                // If there is no running NF, we drop all the packets of the batch.
                                if (!num_nfs) {
                                        onvm_pkt_drop_batch(pkts, rx_count);
                                } else {
                                        onvm_pkt_process_rx_batch(rx_mgr, pkts, rx_count);
!!                                      enable_flush_count = 1;
                                }
                        }
                }
!!              if (enable_flush_count && flush_threshold++ == 1000) {
!!                  onvm_pkt_flush_all_nfs(rx_mgr, NULL);
!!                  flush_threshold = 0;
!!                  enable_flush_count = 0;
!!              }
        }
        ...
}

flush_threshold can be set as macro, which can be tuned by users.

In my test, CPU usage can be as low as solution 1, my CPU is i7-7920X.

About additional latency, I insert gettimeofday() to the aforementioned code, and change the threshold to 100000000, get the value from 200~950 ms, which means the lapse of time of 100000000 - 1.

So if we adopt 1000 as the threshold, the impact is < 0.01 ms, which is a very small overhead but can solve this problem.

@JackKuo-tw
Copy link
Contributor Author

JackKuo-tw commented Mar 30, 2021

@twood02
Even though the NFs (compare to the manager) take the same strategy to flush buffer (twice), the CPU usages don't raise.
I examine the batch size of each NF every 100K batch processing time for a few seconds and figure it out.
Here is a sample batch size:

FWD1: 32, 24,  8, 24,  8, 24,  8, 24, 32,  8, 32
FWD2: 32, 32 ,32,  8, 32, 32, 32, 32
FWD3: 32, 32, 32, 24, 32, 32, 32, 24
FWD4: 32, 32, 32, 24, 32,  8, 32, 32
FW:   32, 24, 32, 32, 32, 32, 32, 32

We can see the batch sizes are uneven when NF is at the beginning of the chain. And as time goes on, they get much more even.

I think the key reason is the long processing time, which causes the buffer to accumulate more packets. This concept is like the "timespan" in the SIGCOMM paper "Microscope", which means the time between the first packet and the last packet leave the NF.

@JackKuo-tw
Copy link
Contributor Author

I think this is a necessary evil and would not PR this code.

My modification in this issue could be tuned by those who need it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants