-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Scale 10k arp] Traffic drop while arp is being learned #21645
Comments
@prabhataravind Can you help take a look? |
@DavidZagury why are we updating the default copp policer limits? Do you see issues with the default sonic policer config as well? Also, do you notice any kernel errors in syslog related to rps? Do you mind sharing the dump of cat /proc/net/softnet_stat in the good and bad cases? Also, do you see any ksoftirqd kernel threads consuming a lot of CPU during ARP packet processing? |
Will check with the test owner.
After drops have been seen:
I don't manage to see such increase in consumption, but - the ARP learning process is quite fast, don't know if regular tools will catch any increase in consumption that will quickly disappear since all ARPs needed has been learnt. |
Hi @prabhataravind , Regarding to the question you rasied: By default COPP limits to 1000 fps to CPU. We increasing the default limit to check real ARP learning rate targeting to 10K ARPs per second. Without COPP update we will be limited to 1K ARP/sec. The test is about ARP learning rate (performance) |
@DavidZagury I don't see drops in any cores based on the output you shared above (the second column indicates the per-core drop) |
@dprital OK, but in production use cases, we will never encounter this situation unless someone changes the default policing rate for ARP. Could you please repeat the test with the default rate and check if there are drops when learning 1K ARP/sec. |
A PR recently enabled by default Receive Packet Steering (RPS) - #20211
It caused a degradation that can be seen between 202405 and 202411 branches.
When reverting this PR or disabling RPS by removing the linux configuration this PR set (no user friendly way to do so) - the test pass.
Steps to reproduce
configure l2 and l3 interfaces
config vlan add 363
config vlan member add -u 363 Ethernet176
config interface ip add Vlan363 101.1.0.1/24
config interface ip add Ethernet160 100.1.0.1/16
config save -y
modify copp policy to raise rate limits
sed -i '/"default": {/,/}/{s/"cir": "[0-9]"/"cir": "60000"/; s/"cbs": "[0-9]"/"cbs": "60000"/}' /usr/share/sonic/templates/copp_cfg.j2
sed -i '/"queue4_group2": {/,/}/{s/"cir": "[0-9]"/"cir": "60000"/; s/"cbs": "[0-9]"/"cbs": "60000"/}' /usr/share/sonic/templates/copp_cfg.j2
reboot
modify gc_thresholds to higher values
sysctl -w net.ipv4.neigh.default.gc_thresh1=65535
sysctl -w net.ipv4.neigh.default.gc_thresh2=65535
sysctl -w net.ipv4.neigh.default.gc_thresh3=65535
clear arp table. start traffic with 10000 pps and 10000 end hosts
sonic-clear arp
Send bidirectional l3 traffic (simulating a single host pinging 10k hosts on other side)
Observed behavior
A small percentage of traffic (<1%) is lost.
Expected behavior
There should not be any losses.
The text was updated successfully, but these errors were encountered: