[eventd]: Close rsyslog plugin when rsyslog SIGTERM and EOF is sent #19035
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Backport (#18835)
Fix #18771, #18335
Microsoft ADO (number only):27882794
How I did it
Add signalOnClose for omprog as well as close rsyslog plugin when receives an EOF.
How to verify it
Verify rsyslog_plugin is running inside bgp or swss container
Run docker exec -it bgp supervisorctl restart rsyslogd
Before change:
This will not kill current rsyslog_plugin process but instead rsyslogd will now break off its end of writing to cin and send EOF to rsyslog_plugin, however will not send a signal SIGTERM or SIGKILL to rsyslog_plugin. Therefore, rsyslog plugin will run in an infinite loop forever, constantly calling getline raising CPU to 100% inside docker.
After change of adding signalOnClose="on" to conf file inside omprog, rsyslogd will now send SIGTERM to rsyslog_plugin process running inside container, and rsyslog_plugin will die.
? ( ): rsyslog_plugin/578637 ... [continued]: read()) = -1 (unknown) (INTERNAL ERROR: strerror_r(512, [buf], 128)=22)
UT (will add sonic-mgmt testcase for storming events with logs)
RCA:
When rsyslogd is terminated, no signal is sent to child process of rsyslog_plugin meaning that rsyslog_plugin will be constantly trying to read from cin with no writer on the other end of the pipe. This leads to rsyslog_plugin process will constantly be reading via getline infinitely.
Because rsyslog is terminated and the spawned rsyslog_plugin is still alive, when rsyslog starts backup again, and log is triggered, a new rsyslog_plugin will be spawned for that rsyslog process, which can lead to many "ghost" rsyslog_plugin processes that will be at high CPU usage.
Why I did it
Work item tracking
How I did it
How to verify it
Which release branch to backport (provide reason below if selected)
Tested branch (Please provide the tested image version)
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)