-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLI / redis DB Hangs Upon ARP Cache's Hitting the Default Maximum Limit, 1024 #2189
Comments
When the issue hit, the CPU was tied up with the processes, syncd, python3.6 and redis-serv, also the messages, Nov 3 18:20:21.521369 sonic WARNING kernel: [ 3828.702948] net_ratelimit: 2 callbacks suppressed |
This is a kernel setting issue that invoking the gc process always if the neighbor entries are more than gc_thresh3. If the system requires to get more than 1024 neighbor entries, you should set the kernel parameters to a proper value, e,g, 4096: sysctl -w net.ipv4.neigh.default.gc_thresh3=4096 . You may want to tune the gc_thresh1 and gc_thresh2 accordingly to have optimized setting for your case. https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt |
@zhenggen-xu |
ARP table is managed by kernel directly, SONiC is the downstream consumer and has no obvious constrain(except memory etc). To support your ARP table size requirement, you would need tune the kernel settings. |
@zhenggen-xu |
0fc6f47 (HEAD -> 202205, origin/202205) [config][muxcable] Add support for displaying soc_ipv4 and cable_type in config/show muxcable commands (#2189) Signed-off-by: vaibhav-dahiya vdahiya@microsoft.com
05c79ef Fix header for the output table following 'show ipv6 interface' command (#2219) fc5633f increase coverage to 80% (#2214) c0dffba [config][muxcable] fix minor config DB logic issue (#2210) a50eca0 [generic-config-updater] Add NTP validator (#2212) a3d1345 [gendump] Add Support to dump BCM-DNX commands (#1813) bb185d5 [yang] remove mistakenly added parameter for 'get_module_name' (#2193) 2cccf26 [counters] skip showing counters that are not enabled (#2199) ff05bc8 [config][muxcable] Add support for displaying soc_ipv4 and cable_type in config/show muxcable commands (#2189) 3197f39 Add check to not allow deleting PO if its member of vlan. (#2141) 2513da1 [dump] Optimized dump state cli and modified tests to not use common data (#2175) 9e310e5 Fix sonic-installer and 'show version' command crash when database docker not running issue. (#2183) 4ad70b9 [sonic-installer] use host docker startup arguments when running dockerd in chroot (#2179) 3d3c89b fix for non-coherent cmis modules (#2163) 2054680 [subinterface] Fix route add command to accept subinterface as dev (#2180) 5383e92 [subinterface]Avoid removing the subinterface when last configured ip is removed (#2181) f5af780 [GCU] Handling type1 lists (#2171) 4516179 [yang] extend ConfigMgmt constructor to pass YANG options (#2118) 2f53bd4 [dump] implement ACL modules (#2153) 494dd62 show commands for SYSTEM READY (#1851) 4fc09b1 [GCU] Handling non-compliant leaf-list with string values (#2174) 675c7b6 Add sonic-delayed.target to Application Extension .timer file generator (#2176) c587933 [portconfig] Allow to configure interface mtu for physical ports only 9881f3e Broadcast Unknown-multicast and Unknown-unicast Storm-control (#928) 88286cb sonic-utils: initial support for link-training (#2071)
0fc6f47 (HEAD -> 202205, origin/202205) [config][muxcable] Add support for displaying soc_ipv4 and cable_type in config/show muxcable commands (sonic-net#2189) Signed-off-by: vaibhav-dahiya vdahiya@microsoft.com
Description
CLI / redis DB hangs upon ARP cache's hitting the default maximum Limit, 1024 on the latest jenkins built image, n750 (Tue Oct 23 13:30:19 UTC 2018)
Steps to reproduce the issue:
show platform summary
show ver
show
CLI commandsDescribe the results you received:
root@sonic:/home/admin#
show platform summary
Platform: x86_64-accton_as5712_54x-r0
HwSKU: Accton-AS5712-54X
ASIC: broadcom
root@sonic:/home/admin#
show ver
SONiC Software Version: SONiC.HEAD.750-dirty-20181023.092041
Distribution: Debian 9.5
Kernel: 4.9.0-7-amd64
Build commit: 709cd5a
Build date: Tue Oct 23 13:30:19 UTC 2018
Built by: johnar@jenkins-worker-4
root@sonic:/home/admin#
sysctl net.ipv4.neigh.default.gc_thresh1
net.ipv4.neigh.default.gc_thresh1 = 128
root@sonic:/home/admin#
sysctl net.ipv4.neigh.default.gc_thresh2
net.ipv4.neigh.default.gc_thresh2 = 512
root@sonic:/home/admin#
sysctl net.ipv4.neigh.default.gc_thresh3
net.ipv4.neigh.default.gc_thresh3 = 1024
root@sonic:/home/admin#
show interfaces status
Ethernet48 97 N/A 9100 tenGigE48 up up
root@sonic:/home/admin#
arp -v -i Ethernet48 -s 192.168.1.1 00:2E:00:FF:0:0
[SKIPed]
ADD_COUNT=1024
arp -v -i Ethernet48 -s 192.168.5.5 00:2E:00:FF:4:0
arp: SIOCSARP()
SIOCSARP: No buffer space available
[SKIPed]
root@sonic:/home/admin#
arp -a | grep "Ethernet48" -c
1024
root@sonic:/home/admin#
show mac
^C
Traceback (most recent call last):
File "/usr/bin/show", line 9, in
load_entry_point('sonic-utilities==1.2', 'console_scripts', 'show')()
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 561, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 2631, in load_entry_point
return ep.load()
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 2291, in load
return self.resolve()
File "/usr/lib/python2.7/dist-packages/pkg_resources/init.py", line 2297, in resolve
module = import(self.module_name, fromlist=['name'], level=0)
File "/usr/lib/python2.7/dist-packages/show/main.py", line 193, in
iface_alias_converter = InterfaceAliasConverter()
File "/usr/lib/python2.7/dist-packages/show/main.py", line 54, in init
self.port_dict = json.loads(p.stdout.read())
KeyboardInterrupt
Traceback (most recent call last):
File "/usr/local/bin/sonic-cfggen", line 263, in
main()
File "/usr/local/bin/sonic-cfggen", line 217, in main
configdb.connect()
File "/usr/local/lib/python2.7/dist-packages/swsssdk/configdb.py", line 57, in connect
SonicV2Connector.connect(self, self.CONFIG_DB, retry_on)
File "/usr/local/lib/python2.7/dist-packages/swsssdk/interface.py", line 191, in connect
self._onetime_connect(db_name)
File "/usr/local/lib/python2.7/dist-packages/swsssdk/interface.py", line 204, in _onetime_connect
client.config_set('notify-keyspace-events', self.KEYSPACE_EVENTS)
File "/usr/local/lib/python2.7/dist-packages/redis/client.py", line 719, in config_set
return self.execute_command('CONFIG SET', name, value)
File "/usr/local/lib/python2.7/dist-packages/redis/client.py", line 667, in execute_command
connection.send_command(*args)
File "/usr/local/lib/python2.7/dist-packages/redis/connection.py", line 610, in send_command
self.send_packed_command(self.pack_command(*args))
File "/usr/local/lib/python2.7/dist-packages/redis/connection.py", line 585, in send_packed_command
self.connect()
File "/usr/local/lib/python2.7/dist-packages/redis/connection.py", line 484, in connect
sock = self._connect()
File "/usr/local/lib/python2.7/dist-packages/redis/connection.py", line 529, in _connect
sock.connect(socket_address)
File "/usr/lib/python2.7/socket.py", line 228, in meth
return getattr(self._sock,name)(*args)
KeyboardInterrupt
root@sonic:/home/admin#
generate_dump
^C^C^C^C^C^C^C
root@sonic:/home/admin#
echo "no response for 10 minutes
no response for 10 minutes
root@sonic:/home/admin#
redis-cli -n 0
^C^C
root@sonic:/home/admin#
redis-cli -n 1
^C
root@sonic:/home/admin#
redis-cli -n 3
^C
root@sonic:/home/admin#
redis-cli -n 4
^C
root@sonic:/home/admin#
redis-cli -n 5
^C
root@sonic:/home/admin#
redis-cli -n 6
^C
root@sonic:/home/admin#
echo " no response for 30 seconds of waiting"
no response for 30 seconds of waiting
Describe the results you expected:
Although the ARP cache maximum limit is hit, the CLI and redis DB should NOT be hung.
Additional information you deem important (e.g. issue happens only occasionally):
root@sonic:/home/admin#
show ver
SONiC Software Version: SONiC.HEAD.750-dirty-20181023.092041
Distribution: Debian 9.5
Kernel: 4.9.0-7-amd64
Build commit: 709cd5a
Build date: Tue Oct 23 13:30:19 UTC 2018
Built by: johnar@jenkins-worker-4
no generat_dump debug file for us to provide as the system was hung, but we upload our debug console log ZIP file here
100_SONic-V2-AS5712-j_n750_20181023_709cd5a-ARP_1024-Debug-002.zip
The text was updated successfully, but these errors were encountered: