-
Notifications
You must be signed in to change notification settings - Fork 543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[teamsyncd][teammgrd] Graceful exit after receiving SIGTERM #1407
Conversation
Signed-off-by: Sabareesh Kumar Anandan <sanandan@marvell.com>
Found this comment explaining why it wasn't done Since we need to have Signal Handlers in both the teammgrd and teamsyncd and both have resources to be cleaned up which are "interdependent", I found it is right to let the process continue and get killed by SIGKILL later. If we add explicit process exits, since they are exiting on their on pace, on testing I found some of the PortChannel interfaces remaining in the kernel and not getting cleaned. @Sabareesh-Kumar-Anandan Can you please check that after your changes all PortChannel interfaces are being removed? |
Also as suggestion
I'd better to write it as
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please check my comments?
Thanks @pavel-shirshov -- yes that was what my observations during testing earlier. @Sabareesh-Kumar-Anandan You mention you still get the errors mentioned in PR comment -- after config reload ? |
Yes. The errors mentioned in the PR are still seen after config reload. |
@Sabareesh-Kumar-Anandan Can you please change your code |
Hi @Sabareesh-Kumar-Anandan, I still don't get when you tell "all portchannel interfaces in kernel are cleaned up correctly" -- as we are calling TeamMgr::cleanTeamProcesses() and sync.cleanTeamSync() in respective signal handlers. It should be cleaned, unless even while doing a config reload you are creating portchannels ! Share your logs and steps you did....and the image you use We would need test with scale scenario as well. After #1159, we stopped seeing all issues you mentioned with teamsyncd/teammgrd . or any logs in our production environment. Need to make sure we don't introduce any issues again ! |
I did a quick check in one of the devices .. I see the interfaces getting cleaned in kernel and recreated on config reload. Need more info before proceeding !
|
@judyjoseph I am using below commit Branch - 202006 Steps:
Logs:Feb 14 10:21:00.670764 sonic INFO systemd[1]: Stopping TEAMD container... In the above logs, teammgrd doesnt get SIGTERM. After 10sec teammgrd is killed by SIGKILL. so teamgrd cleanup is not happening. Logs with this PR:Feb 14 10:18:34.675285 sonic INFO systemd[1]: Stopping TEAMD container... |
@Sabareesh-Kumar-Anandan , I too find this behavior with the master branch. Looks like there is a change in behavior with the commit sonic-net/sonic-buildimage@7158ccd, where in the process start/exit of teammgrd is sequenced after teamsyncd. So please go ahead with this fix after you take care of Pavel's comments. The branches 201811 and 201911 ( till 2 weeks back ) behaves correctly ( no behavior change as seen in master branch) with the teammgrd also getting the SIGTERM signal and cleaning up the Portchannel kernel resources correctly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* [teamsyncd][teammgrd] Graceful exit after receiving SIGTERM Signed-off-by: Sabareesh Kumar Anandan <sanandan@marvell.com> * Update teammgrd.cpp * Update teamsyncd.cpp Co-authored-by: pavel-shirshov <pavelsh@microsoft.com>
Test make sure cleanup happens of Port-channel Kernel devices. This test case track the fixes done by PR: sonic-net/sonic-swss#1407 sonic-net/sonic-swss#1159 Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
* Added the test case for Port Channel cleanup. Test make sure cleanup happens of Port-channel Kernel devices. This test case track the fixes done by PR: sonic-net/sonic-swss#1407 sonic-net/sonic-swss#1159 Signed-off-by: Abhishek Dosi <abdosi@microsoft.com> * Address Review Comments Signed-off-by: Abhishek Dosi <abdosi@microsoft.com>
…nic-net#1407) This reverts commit b10622e. **What I did** revert changes to call sdkdump and replace with old call to mstdump **How I did it** reverting a previous commit [Mellanox] Add FW dump with new SAI implementation and remove mst dump sonic-net#1338 **How to verify it** run techsupport
Signed-off-by: Sabareesh Kumar Anandan sanandan@marvell.com
What I did
When SIGTERM is received, gracefully exist teamsyncd and teammgrd after cleaning up teamd processes and resources.
Why I did it
Below errors are after config reload
How I verified it
I did multiple config reloads and docker stop teamd.
Verified all portchannel intf are cleaned up in kernel and all teamd processes exists cleanly.
Details if related