-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Close rsyslog plugin when rsyslog SIGTERM and EOF is sent to stream #18835
Close rsyslog plugin when rsyslog SIGTERM and EOF is sent to stream #18835
Conversation
can you explain why? |
@zbud-msft could you please check why the checker is skipped? would you please re-trigger it? |
@lguohan This change is to address 2 issues
|
Fixes #18771 |
…o stream (sonic-net#18835) Fix sonic-net#18771 Microsoft ADO (number only):27882794 How I did it Add signalOnClose for omprog as well as close rsyslog plugin when receives an EOF. How to verify it Verify rsyslog_plugin is running inside bgp or swss container Run docker exec -it bgp supervisorctl restart rsyslogd Before change: This will not kill current rsyslog_plugin process but instead rsyslogd will now break off its end of writing to cin and send EOF to rsyslog_plugin, however will not send a signal SIGTERM or SIGKILL to rsyslog_plugin. Therefore, rsyslog plugin will run in an infinite loop forever, constantly calling getline raising CPU to 100% inside docker. After change of adding signalOnClose="on" to conf file inside omprog, rsyslogd will now send SIGTERM to rsyslog_plugin process running inside container, and rsyslog_plugin will die. ? ( ): rsyslog_plugin/578637 ... [continued]: read()) = -1 (unknown) (INTERNAL ERROR: strerror_r(512, [buf], 128)=22) UT (will add sonic-mgmt testcase for storming events with logs) RCA: 1. When rsyslogd is terminated, no signal is sent to child process of rsyslog_plugin meaning that rsyslog_plugin will be constantly trying to read from cin with no writer on the other end of the pipe. This leads to rsyslog_plugin process will constantly be reading via getline infinitely. 2. Because rsyslog is terminated and the spawned rsyslog_plugin is still alive, when rsyslog starts backup again, and log is triggered, a new rsyslog_plugin will be spawned for that rsyslog process, which can lead to many "ghost" rsyslog_plugin processes that will be at high CPU usage.
Cherry-pick PR to 202311: #18968 |
…o stream (#18835) Fix #18771 Microsoft ADO (number only):27882794 How I did it Add signalOnClose for omprog as well as close rsyslog plugin when receives an EOF. How to verify it Verify rsyslog_plugin is running inside bgp or swss container Run docker exec -it bgp supervisorctl restart rsyslogd Before change: This will not kill current rsyslog_plugin process but instead rsyslogd will now break off its end of writing to cin and send EOF to rsyslog_plugin, however will not send a signal SIGTERM or SIGKILL to rsyslog_plugin. Therefore, rsyslog plugin will run in an infinite loop forever, constantly calling getline raising CPU to 100% inside docker. After change of adding signalOnClose="on" to conf file inside omprog, rsyslogd will now send SIGTERM to rsyslog_plugin process running inside container, and rsyslog_plugin will die. ? ( ): rsyslog_plugin/578637 ... [continued]: read()) = -1 (unknown) (INTERNAL ERROR: strerror_r(512, [buf], 128)=22) UT (will add sonic-mgmt testcase for storming events with logs) RCA: 1. When rsyslogd is terminated, no signal is sent to child process of rsyslog_plugin meaning that rsyslog_plugin will be constantly trying to read from cin with no writer on the other end of the pipe. This leads to rsyslog_plugin process will constantly be reading via getline infinitely. 2. Because rsyslog is terminated and the spawned rsyslog_plugin is still alive, when rsyslog starts backup again, and log is triggered, a new rsyslog_plugin will be spawned for that rsyslog process, which can lead to many "ghost" rsyslog_plugin processes that will be at high CPU usage.
…o stream (sonic-net#18835) Fix sonic-net#18771 Microsoft ADO (number only):27882794 How I did it Add signalOnClose for omprog as well as close rsyslog plugin when receives an EOF. How to verify it Verify rsyslog_plugin is running inside bgp or swss container Run docker exec -it bgp supervisorctl restart rsyslogd Before change: This will not kill current rsyslog_plugin process but instead rsyslogd will now break off its end of writing to cin and send EOF to rsyslog_plugin, however will not send a signal SIGTERM or SIGKILL to rsyslog_plugin. Therefore, rsyslog plugin will run in an infinite loop forever, constantly calling getline raising CPU to 100% inside docker. After change of adding signalOnClose="on" to conf file inside omprog, rsyslogd will now send SIGTERM to rsyslog_plugin process running inside container, and rsyslog_plugin will die. ? ( ): rsyslog_plugin/578637 ... [continued]: read()) = -1 (unknown) (INTERNAL ERROR: strerror_r(512, [buf], 128)=22) UT (will add sonic-mgmt testcase for storming events with logs) RCA: 1. When rsyslogd is terminated, no signal is sent to child process of rsyslog_plugin meaning that rsyslog_plugin will be constantly trying to read from cin with no writer on the other end of the pipe. This leads to rsyslog_plugin process will constantly be reading via getline infinitely. 2. Because rsyslog is terminated and the spawned rsyslog_plugin is still alive, when rsyslog starts backup again, and log is triggered, a new rsyslog_plugin will be spawned for that rsyslog process, which can lead to many "ghost" rsyslog_plugin processes that will be at high CPU usage.
…o stream (#18835) (#19035) Fix #18771 Microsoft ADO (number only):27882794 How I did it Add signalOnClose for omprog as well as close rsyslog plugin when receives an EOF. How to verify it Verify rsyslog_plugin is running inside bgp or swss container Run docker exec -it bgp supervisorctl restart rsyslogd Before change: This will not kill current rsyslog_plugin process but instead rsyslogd will now break off its end of writing to cin and send EOF to rsyslog_plugin, however will not send a signal SIGTERM or SIGKILL to rsyslog_plugin. Therefore, rsyslog plugin will run in an infinite loop forever, constantly calling getline raising CPU to 100% inside docker. After change of adding signalOnClose="on" to conf file inside omprog, rsyslogd will now send SIGTERM to rsyslog_plugin process running inside container, and rsyslog_plugin will die. ? ( ): rsyslog_plugin/578637 ... [continued]: read()) = -1 (unknown) (INTERNAL ERROR: strerror_r(512, [buf], 128)=22) UT (will add sonic-mgmt testcase for storming events with logs) RCA: 1. When rsyslogd is terminated, no signal is sent to child process of rsyslog_plugin meaning that rsyslog_plugin will be constantly trying to read from cin with no writer on the other end of the pipe. This leads to rsyslog_plugin process will constantly be reading via getline infinitely. 2. Because rsyslog is terminated and the spawned rsyslog_plugin is still alive, when rsyslog starts backup again, and log is triggered, a new rsyslog_plugin will be spawned for that rsyslog process, which can lead to many "ghost" rsyslog_plugin processes that will be at high CPU usage.
Hi @zbud-msft, the change only handles |
Discussed with Bing offline. From the code we are handling EOF. In loop we are checking getline(cin, line). If EOF is sent, getline(cin, line) will evaluate to false. UT also shows that we are replacing cin buffer with empty input stream, which will be treated as EOF when getline() is called by plugin->run. |
Why I did it
Fix #18771
Work item tracking
How I did it
Add signalOnClose for omprog as well as close rsyslog plugin when receives an EOF.
How to verify it
Verify rsyslog_plugin is running inside bgp or swss container
Run
docker exec -it bgp supervisorctl restart rsyslogd
Before change:
This will not kill current rsyslog_plugin process but instead rsyslogd will now break off its end of writing to cin and send EOF to rsyslog_plugin, however will not send a signal SIGTERM or SIGKILL to rsyslog_plugin. Therefore, rsyslog plugin will run in an infinite loop forever, constantly calling getline raising CPU to 100% inside docker.
After change of adding signalOnClose="on" to conf file inside omprog, rsyslogd will now send SIGTERM to rsyslog_plugin process running inside container, and rsyslog_plugin will die.
? ( ): rsyslog_plugin/578637 ... [continued]: read()) = -1 (unknown) (INTERNAL ERROR: strerror_r(512, [buf], 128)=22)
UT (will add sonic-mgmt testcase for storming events with logs)
Which release branch to backport (provide reason below if selected)
Tested branch (Please provide the tested image version)
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)