Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix syncd autorestart sequence #12460

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

oleksandrx-kolomeiets
Copy link
Contributor

@oleksandrx-kolomeiets oleksandrx-kolomeiets commented Oct 20, 2022

Why I did it

On startup Swss service flushes Redis DB, starts Swss container and Syncd service.
Then the service waits until either Swss or Syncd container stops.
During teardown it stops Swss container (if any) and Syncd service.

In case autorestart is enabled, when Syncd process is killed (for example, with docker exec -it syncd pkill syncd) the following sequence of events happens:

  1. Syncd container and service stop
  2. Swss service stops Swss container
  3. Systemd starts Syncd service, Syncd fails because Redis DB is not flushed
  4. Swss service stops Syncd service, flushes Redis DB, starts Swss container and Syncd service

Event 3 should not happen.

How I did it

Increased the time to sleep before restarting Syncd service to 30 seconds, following example of next services:

  • database
  • teamd
  • macsec
  • swss
  • lldp
  • bgp

How to verify it

Stop Syncd process with docker exec -it syncd pkill syncd.
Observe containers's statuses with watch docker ps -a.
Ensure Syncd container restarts only once.

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205

Description for the changelog

Fix syncd autorestart sequence.

Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

Signed-off-by: Oleksandr Kolomeiets <oleksandrx.kolomeiets@intel.com>
@oleksandrx-kolomeiets
Copy link
Contributor Author

@lguohan, please review the changes.
Thanks!

@oleksandrx-kolomeiets
Copy link
Contributor Author

@lguohan,
This sequence diagram helps visualize the issue.
Syncd gets restarted by Systemd one extra time in the middle of Swss teardown.
diagram

@oleksandrx-kolomeiets
Copy link
Contributor Author

@stepanblyschak, @lguohan,
this PR prevents Syncd from fail during autorestart, please review.
Thanks.

@lguohan
Copy link
Collaborator

lguohan commented Feb 21, 2023

@prsunny , can you review this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants