-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add health check probe for k8s upgrade containers. #15223
Add health check probe for k8s upgrade containers. #15223
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
||
#### exit code contract, k8s only cares zero or not none-zero, but we want to use none-zero code to indicate different error | ||
# 0: readiness | ||
# 1: python script crach exit code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
# if the start service exists, check if it exits normally | ||
# if the start service doesn't exist normally, exit with code 2 | ||
pre_check_service_name="start" | ||
supervisorctl status |awk '{print $1}' |grep -w $pre_check_service_name > /dev/null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only do "supervisorctl status start", We can't do judgement by exit code, because start not existing and some failed state exit codes are the same. If only do "supervisorctl status start", need to judge by the outputs "start: ERROR (no such process)", "start EXITED Jun 21 05:28 PM". I do checking whether start exists in advance, I think code logic is more easy to understand here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an example
root@sonic:/# supervisorctl status start
start EXITED Jul 04 12:38 AM
root@sonic:/# supervisorctl status
dependent-startup EXITED Jul 04 12:38 AM
lldp-syncd RUNNING pid 26, uptime 0:03:54
lldpd RUNNING pid 20, uptime 0:03:57
lldpmgrd RUNNING pid 30, uptime 0:03:52
rsyslogd RUNNING pid 11, uptime 0:04:02
start EXITED Jul 04 12:38 AM
supervisor-proc-exit-listener RUNNING pid 10, uptime 0:04:04
waitfor_lldp_ready EXITED Jul 04 12:38 AM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
# check if the post_check_script exists | ||
# if the post_check_script exists, run it | ||
# if the post_check_script exits with non-zero code, exit with the code | ||
post_check_script="/usr/bin/readiness_probe.py" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
# check if the start service exists | ||
# if the start service exists, check if it exits normally | ||
# if the start service doesn't exist normally, exit with code 2 | ||
pre_check_service_name="start" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The critical processes unexpected event will be handled by the supervisord exit-listener for now, the listener will kill the container, I don't think we need to check them here. Is this correct?
where is public design doc for such health check probe? |
We have a OneNote page, I put the link into this PR related ADO discussion before. ADO number: 22453004. |
#### Why I did it After k8s upgrade a container, k8s can only know the container is running, don't know the service's status inside container. So we need a probe inside container, k8s will call the probe to check whether the container is really ready. ##### Work item tracking - Microsoft ADO **(number only)**: 22453004 #### How I did it Add a health check probe inside config engine container, the probe will check whether the start service exit normally or not if the start service exists and call the python script to do container self-related specific checks if the script is there. The python script should be implemented by feature owner if it's needed. more details: [design doc](https://github.com/sonic-net/SONiC/blob/master/doc/kubernetes/health-check.md) #### How to verify it Check path /usr/bin/readiness_probe.sh inside container. #### Which release branch to backport (provide reason below if selected) - [ ] 201811 - [ ] 201911 - [ ] 202006 - [ ] 202012 - [ ] 202106 - [ ] 202111 - [x] 202205 - [x] 202211 #### Tested branch (Please provide the tested image version) - [x] 20220531.28
#### Why I did it After k8s upgrade a container, k8s can only know the container is running, don't know the service's status inside container. So we need a probe inside container, k8s will call the probe to check whether the container is really ready. ##### Work item tracking - Microsoft ADO **(number only)**: 22453004 #### How I did it Add a health check probe inside config engine container, the probe will check whether the start service exit normally or not if the start service exists and call the python script to do container self-related specific checks if the script is there. The python script should be implemented by feature owner if it's needed. more details: [design doc](https://github.com/sonic-net/SONiC/blob/master/doc/kubernetes/health-check.md) #### How to verify it Check path /usr/bin/readiness_probe.sh inside container. #### Which release branch to backport (provide reason below if selected) - [ ] 201811 - [ ] 201911 - [ ] 202006 - [ ] 202012 - [ ] 202106 - [ ] 202111 - [x] 202205 - [x] 202211 #### Tested branch (Please provide the tested image version) - [x] 20220531.28
Cherry-pick PR to 202205: #15823 |
#### Why I did it After k8s upgrade a container, k8s can only know the container is running, don't know the service's status inside container. So we need a probe inside container, k8s will call the probe to check whether the container is really ready. ##### Work item tracking - Microsoft ADO **(number only)**: 22453004 #### How I did it Add a health check probe inside config engine container, the probe will check whether the start service exit normally or not if the start service exists and call the python script to do container self-related specific checks if the script is there. The python script should be implemented by feature owner if it's needed. more details: [design doc](https://github.com/sonic-net/SONiC/blob/master/doc/kubernetes/health-check.md) #### How to verify it Check path /usr/bin/readiness_probe.sh inside container. #### Which release branch to backport (provide reason below if selected) - [ ] 201811 - [ ] 201911 - [ ] 202006 - [ ] 202012 - [ ] 202106 - [ ] 202111 - [x] 202205 - [x] 202211 #### Tested branch (Please provide the tested image version) - [x] 20220531.28
Cherry-pick PR to 202211: #15824 |
#### Why I did it After k8s upgrade a container, k8s can only know the container is running, don't know the service's status inside container. So we need a probe inside container, k8s will call the probe to check whether the container is really ready. ##### Work item tracking - Microsoft ADO **(number only)**: 22453004 #### How I did it Add a health check probe inside config engine container, the probe will check whether the start service exit normally or not if the start service exists and call the python script to do container self-related specific checks if the script is there. The python script should be implemented by feature owner if it's needed. more details: [design doc](https://github.com/sonic-net/SONiC/blob/master/doc/kubernetes/health-check.md) #### How to verify it Check path /usr/bin/readiness_probe.sh inside container. #### Which release branch to backport (provide reason below if selected) - [ ] 201811 - [ ] 201911 - [ ] 202006 - [ ] 202012 - [ ] 202106 - [ ] 202111 - [x] 202205 - [x] 202211 #### Tested branch (Please provide the tested image version) - [x] 20220531.28
#### Why I did it After k8s upgrade a container, k8s can only know the container is running, don't know the service's status inside container. So we need a probe inside container, k8s will call the probe to check whether the container is really ready. ##### Work item tracking - Microsoft ADO **(number only)**: 22453004 #### How I did it Add a health check probe inside config engine container, the probe will check whether the start service exit normally or not if the start service exists and call the python script to do container self-related specific checks if the script is there. The python script should be implemented by feature owner if it's needed. more details: [design doc](https://github.com/sonic-net/SONiC/blob/master/doc/kubernetes/health-check.md) #### How to verify it Check path /usr/bin/readiness_probe.sh inside container. #### Which release branch to backport (provide reason below if selected) - [ ] 201811 - [ ] 201911 - [ ] 202006 - [ ] 202012 - [ ] 202106 - [ ] 202111 - [x] 202205 - [x] 202211 #### Tested branch (Please provide the tested image version) - [x] 20220531.28
#### Why I did it After k8s upgrade a container, k8s can only know the container is running, don't know the service's status inside container. So we need a probe inside container, k8s will call the probe to check whether the container is really ready. ##### Work item tracking - Microsoft ADO **(number only)**: 22453004 #### How I did it Add a health check probe inside config engine container, the probe will check whether the start service exit normally or not if the start service exists and call the python script to do container self-related specific checks if the script is there. The python script should be implemented by feature owner if it's needed. more details: [design doc](https://github.com/sonic-net/SONiC/blob/master/doc/kubernetes/health-check.md) #### How to verify it Check path /usr/bin/readiness_probe.sh inside container. #### Which release branch to backport (provide reason below if selected) - [ ] 201811 - [ ] 201911 - [ ] 202006 - [ ] 202012 - [ ] 202106 - [ ] 202111 - [x] 202205 - [x] 202211 #### Tested branch (Please provide the tested image version) - [x] 20220531.28
Cherry-pick PR to 202305: #15867 |
#### Why I did it After k8s upgrade a container, k8s can only know the container is running, don't know the service's status inside container. So we need a probe inside container, k8s will call the probe to check whether the container is really ready. ##### Work item tracking - Microsoft ADO **(number only)**: 22453004 #### How I did it Add a health check probe inside config engine container, the probe will check whether the start service exit normally or not if the start service exists and call the python script to do container self-related specific checks if the script is there. The python script should be implemented by feature owner if it's needed. more details: [design doc](https://github.com/sonic-net/SONiC/blob/master/doc/kubernetes/health-check.md) #### How to verify it Check path /usr/bin/readiness_probe.sh inside container. #### Which release branch to backport (provide reason below if selected) - [ ] 201811 - [ ] 201911 - [ ] 202006 - [ ] 202012 - [ ] 202106 - [ ] 202111 - [x] 202205 - [x] 202211 #### Tested branch (Please provide the tested image version) - [x] 20220531.28 Co-authored-by: lixiaoyuner <35456895+lixiaoyuner@users.noreply.github.com>
#### Why I did it After k8s upgrade a container, k8s can only know the container is running, don't know the service's status inside container. So we need a probe inside container, k8s will call the probe to check whether the container is really ready. ##### Work item tracking - Microsoft ADO **(number only)**: 22453004 #### How I did it Add a health check probe inside config engine container, the probe will check whether the start service exit normally or not if the start service exists and call the python script to do container self-related specific checks if the script is there. The python script should be implemented by feature owner if it's needed. more details: [design doc](https://github.com/sonic-net/SONiC/blob/master/doc/kubernetes/health-check.md) #### How to verify it Check path /usr/bin/readiness_probe.sh inside container. #### Which release branch to backport (provide reason below if selected) - [ ] 201811 - [ ] 201911 - [ ] 202006 - [ ] 202012 - [ ] 202106 - [ ] 202111 - [x] 202205 - [x] 202211 #### Tested branch (Please provide the tested image version) - [x] 20220531.28
…sonic-buildimage into internal Fix conflict for rsyslog. Skip partial DNS unit test in internal branch after confirmed with Gang. Related work items: sonic-net#113, sonic-net#131, sonic-net#132, sonic-net#134, sonic-net#321, sonic-net#331, sonic-net#381, sonic-net#382, sonic-net#2525, sonic-net#2676, sonic-net#2698, sonic-net#2737, sonic-net#2789, sonic-net#2839, sonic-net#2845, sonic-net#2850, sonic-net#2882, sonic-net#2885, sonic-net#2887, sonic-net#2890, sonic-net#2895, sonic-net#13338, sonic-net#14105, sonic-net#15142, sonic-net#15223, sonic-net#15456, sonic-net#15487, sonic-net#15520, sonic-net#15726, sonic-net#15727, sonic-net#15758, sonic-net#15764, sonic-net#15765, sonic-net#15772, sonic-net#15779, sonic-net#15782, sonic-net#15785, sonic-net#15797, sonic-net#15798, sonic-net#15810, sonic-net#15811, sonic-net#15821
Why I did it
After k8s upgrade a container, k8s can only know the container is running, don't know the service's status inside container. So we need a probe inside container, k8s will call the probe to check whether the container is really ready.
Work item tracking
22453004
How I did it
Add a health check probe inside config engine container, the probe will check whether the start service exit normally or not if the start service exists and call the python script to do container self-related specific checks if the script is there. The hook script should be implemented by feature owner if it's needed.
more details: design doc
How to verify it
Check path /usr/bin/readiness_probe.sh inside container.
Which release branch to backport (provide reason below if selected)
Tested branch (Please provide the tested image version)
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)