Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tun_pkt]: Wait for AsyncSniffer to init fully #10346

Merged
merged 1 commit into from
Mar 30, 2022

Conversation

theasianpianist
Copy link
Contributor

@theasianpianist theasianpianist commented Mar 24, 2022

Signed-off-by: Lawrence Lee lawlee@microsoft.com

Why I did it

Tunnel packet handler can crash at system startup:

Mar 19 13:11:07.240465 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler Traceback (most recent call last):
Mar 19 13:11:07.240465 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler   File "/usr/local/lib/python3.7/dist-packages/scapy/sendrecv.py", line 1017, in stop
Mar 19 13:11:07.240465 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler     self.stop_cb()
Mar 19 13:11:07.240529 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler AttributeError: 'AsyncSniffer' object has no attribute 'stop_cb'
Mar 19 13:11:07.240529 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler 
Mar 19 13:11:07.240529 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler During handling of the above exception, another exception occurred:
Mar 19 13:11:07.240529 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler 
Mar 19 13:11:07.240560 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler Traceback (most recent call last):
Mar 19 13:11:07.240560 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler   File "/usr/bin/tunnel_packet_handler.py", line 349, in <module>
Mar 19 13:11:07.240560 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler     main()
Mar 19 13:11:07.240587 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler   File "/usr/bin/tunnel_packet_handler.py", line 345, in main
Mar 19 13:11:07.240587 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler     handler.run()
Mar 19 13:11:07.240632 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler   File "/usr/bin/tunnel_packet_handler.py", line 339, in run
Mar 19 13:11:07.240632 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler     self.listen_for_tunnel_pkts()
Mar 19 13:11:07.240652 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler   File "/usr/bin/tunnel_packet_handler.py", line 322, in listen_for_tunnel_pkts
Mar 19 13:11:07.240667 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler     sniffer.stop()
Mar 19 13:11:07.240680 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler   File "/usr/local/lib/python3.7/dist-packages/scapy/sendrecv.py", line 1020, in stop
Mar 19 13:11:07.240680 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler     "Unsupported (offline or unsupported socket)"
Mar 19 13:11:07.240718 str2-8102-03 INFO swss#/supervisord: tunnel_packet_handler scapy.error.Scapy_Exception: Unsupported (offline or unsupported socket)
Mar 19 13:11:07.310724 str2-8102-03 INFO swss#supervisord 2022-03-19 13:11:07,310 INFO exited: tunnel_packet_handler (exit status 1; not expected)

This is due to a race condition between netlink messages being sent by the kernel and the AsyncSniffer object inititalizing fully. It is possible for a netlink message to arrive and trigger a sniffer restart prior to the sniffer initializing its self.stop_cb variable, since the variable creation happens during the sniffer startup rather than during the creation of the sniffer object. If this occurs, the tunnel_packet_handler attempts to stop the sniffer, but this operation fails because self.stop_cb doesn't exist yet.

How I did it

After creating the sniffer object, block until the self.stop_cb attribute has been created.

How to verify it

Run sudo systemctl restart swss and verify the tunnel packet handler does not crash

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
self.sniffer.start()

while not hasattr(self.sniffer, 'stop_cb'):
time.sleep(0.1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, we can have a 1second wait here as 0.1 may be too aggressive

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did some testing, it looks like it takes about 0.2 seconds to initialize, might be ok to keep 0.1 seconds so we can start the service ASAP?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@theasianpianist
Copy link
Contributor Author

/Azp run sonic.buildimage

@azure-pipelines
Copy link

No pipelines are associated with this pull request.

@theasianpianist
Copy link
Contributor Author

/Azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@theasianpianist
Copy link
Contributor Author

/Azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@prsunny prsunny merged commit b31df59 into sonic-net:master Mar 30, 2022
qiluo-msft pushed a commit that referenced this pull request Mar 30, 2022
Fix for Tunnel packet handler can crash at system startup 
Signed-off-by: Lawrence Lee <lawlee@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants