Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[logrotate] Check orchagent status before sending SIGHUP #13924

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

chiourung
Copy link
Contributor

Sending SIGHUP before orchagent registers the handler for SIGHUP would kill orchagent. Before sending SIGHUP, it must wait until orchagent has been running for 10 seconds.

Why I did it

fixes logrotate kill orchagent issue

Jan 26 04:10:05.340355 as5835-54x-3 INFO swss#supervisord 2023-01-26 04:10:05,339 INFO success: orchagent entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
Jan 26 04:10:05.665174 as5835-54x-3 INFO logrotate: Sending SIGHUP to OA log_file_name: /var/log/swss/swss.rec
Jan 26 04:10:05.760347 as5835-54x-3 INFO swss#supervisord 2023-01-26 04:10:05,753 INFO exited: orchagent (terminated by SIGHUP; not expected)
Jan 26 04:10:05.791343 as5835-54x-3 INFO swss#/supervisor-proc-exit-listener: Process 'orchagent' exited unexpectedly. Terminating supervisor 'swss'
Jan 26 04:10:05.827385 as5835-54x-3 INFO swss#supervisord 2023-01-26 04:10:05,818 WARN received SIGTERM indicating exit request
Jan 26 04:10:05.827971 as5835-54x-3 INFO swss#supervisord 2023-01-26 04:10:05,827 INFO waiting for dependent-startup, supervisor-proc-exit-listener, rsyslogd, portsyncd to die

How I did it

Before sending SIGHUP, it must wait until orchagent has been running for 10 seconds.

How to verify it

  1. increase the log size of /var/log/swss/swss.rec to more than the rotation size.
  2. systemctl restart swss
  3. Use "pgrep -x orchagent" to check if orchagent has been started.
  4. After orchagent has been started, run "/usr/sbin/logrotate /etc/logrotate.conf"

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211

Description for the changelog

Ensure to add label/tag for the feature raised. example - PR#2174 under sonic-utilities repo. where, Generic Config and Update feature has been labelled as GCU.

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

Sending SIGHUP before orchagent registers the handler for SIGHUP would kill orchagent.
Before sending SIGHUP, it must wait until orchagent has been running for 10 seconds.

Signed-off-by: chiourung_huang <chiourung_huang@edge-core.com>
@chiourung chiourung requested a review from lguohan as a code owner February 22, 2023 07:26
@lguohan
Copy link
Collaborator

lguohan commented Jun 26, 2023

@prsunny can you check this. looks like to be some race conditions for orchagent log rotate.

@lguohan lguohan requested a review from prsunny June 26, 2023 18:01
@prsunny prsunny requested a review from theasianpianist June 26, 2023 18:45
@theasianpianist
Copy link
Contributor

How did we decide on 10s as the wait time? Is orchagent always guaranteed to have registered the SIGHUP handler by then? What if there is high CPU usage on a device and the handler registration is delayed?

@copyandrun
Copy link

@prsunny can you check this. I think i have got the same issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants