-
Notifications
You must be signed in to change notification settings - Fork 667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[unplanned-warm-reboot] Add crash-monitor to handle process crash #959
base: master
Are you sure you want to change the base?
Conversation
This pull request introduces 4 alerts when merging bab00b7cb27a68bddbc380f669efc4046cfb9ab3 into f99f7e1 - view on LGTM.com new alerts:
|
scripts/crash-monitor
Outdated
log_file = "/var/log/monitor-crash.log" | ||
log = logging.getLogger('MONITOR_CRASH') | ||
log.setLevel(logging.DEBUG) | ||
# Create file log handler and set level to debug | ||
fh = logging.handlers.RotatingFileHandler(log_file, maxBytes=(MAX_LOG_BYTES / 2), backupCount=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use logger
and a dedicated file which you rotate manually rather than just using the syslog
library?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Joe,
thanks for help reviewing the change.
This logger is for our internal use purpose. Will change it to syslog.
Thanks,
Haiyang
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why use
logger
and a dedicated file which you rotate manually rather than just using thesyslog
library?
Have updated the change by using syslog
lib instead of logger
.
Thanks,
Haiyang
crash-monitor monitors the critical processes inside dockers, and warm restart docker if the process crashes Signed-off-by: Haiyang Zheng <haiyang.z@alibaba-inc.com>
bab00b7
to
f39e5db
Compare
This pull request introduces 4 alerts when merging f39e5db into 9715244 - view on LGTM.com new alerts:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feature seems as though it may conflict with our current container auto-restart feature. Would it be possible to review the existing solution and see if you can integrate as warm-restart option?
Hi Joe, Thanks for the info. Sure, will check the existing auto-restart feature first. Thanks, |
These changes are included in this PR: 07e1f79 [syncd] Add workaround for warm boot new objects (sonic-net#959) 50fd353 Fix the option missing in kernel config issue (sonic-net#956) e77503c [syncd] Comparison logic workaround for empty buffer profile (sonic-net#906) (sonic-net#941)
crash-monitor monitors the critical processes inside dockers,
and warm restart docker if the process crashes
Signed-off-by: Haiyang Zheng haiyang.z@alibaba-inc.com
- What I did
use crash-monitor to monitor critical processes inside docker, and warm-restart docker once the process crashes.
- How I did it
(TODO: there is another PR which will write the PROC_EVENT_TABLE once a process is crashed.)
then try to warm restart the docker if the monitored process is dead if the retry limit has not been reached yet
- How to verify it
killed bgpd process inside bgp docker, and verify it's been warm-restarted without affecting traffic
- Previous command output (if the output of a command-line utility has changed)
- New command output (if the output of a command-line utility has changed)