Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Close rsyslog plugin when rsyslog SIGTERM and EOF is sent to stream #18835

Merged
merged 4 commits into from
May 15, 2024

Conversation

zbud-msft
Copy link
Contributor

@zbud-msft zbud-msft commented Apr 30, 2024

Why I did it

Fix #18771

Work item tracking
  • Microsoft ADO (number only):27882794

How I did it

Add signalOnClose for omprog as well as close rsyslog plugin when receives an EOF.

How to verify it

Verify rsyslog_plugin is running inside bgp or swss container

Run docker exec -it bgp supervisorctl restart rsyslogd

Before change:

This will not kill current rsyslog_plugin process but instead rsyslogd will now break off its end of writing to cin and send EOF to rsyslog_plugin, however will not send a signal SIGTERM or SIGKILL to rsyslog_plugin. Therefore, rsyslog plugin will run in an infinite loop forever, constantly calling getline raising CPU to 100% inside docker.

After change of adding signalOnClose="on" to conf file inside omprog, rsyslogd will now send SIGTERM to rsyslog_plugin process running inside container, and rsyslog_plugin will die.

? ( ): rsyslog_plugin/578637 ... [continued]: read()) = -1 (unknown) (INTERNAL ERROR: strerror_r(512, [buf], 128)=22)

UT (will add sonic-mgmt testcase for storming events with logs)

Which release branch to backport (provide reason below if selected)

  • 201811
  • 201911
  • 202006
  • 202012
  • 202106
  • 202111
  • 202205
  • 202211
  • 202305

Tested branch (Please provide the tested image version)

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

@zbud-msft zbud-msft requested a review from lguohan as a code owner April 30, 2024 16:29
@lguohan
Copy link
Collaborator

lguohan commented May 2, 2024

can you explain why?

@keboliu
Copy link
Collaborator

keboliu commented May 9, 2024

@zbud-msft could you please check why the checker is skipped? would you please re-trigger it?

@zbud-msft zbud-msft requested a review from vivekrnv May 14, 2024 00:59
@zbud-msft
Copy link
Contributor Author

@lguohan This change is to address 2 issues

  1. When rsyslogd is terminated, no signal is sent to child process of rsyslog_plugin meaning that rsyslog_plugin will be constantly trying to read from cin with no writer on the other end of the pipe. This leads to rsyslog_plugin process will constantly be reading via getline infinitely.

  2. Because rsyslog is terminated and the spawned rsyslog_plugin is still alive, when rsyslog starts backup again, and log is triggered, a new rsyslog_plugin will be spawned for that rsyslog process, which can lead to many "ghost" rsyslog_plugin processes that will be at high CPU usage.

@vivekrnv
Copy link
Contributor

Fixes #18771

@lguohan lguohan merged commit 1627c5c into sonic-net:master May 15, 2024
19 checks passed
mssonicbld pushed a commit to mssonicbld/sonic-buildimage that referenced this pull request May 15, 2024
…o stream (sonic-net#18835)

Fix sonic-net#18771

Microsoft ADO (number only):27882794

How I did it

Add signalOnClose for omprog as well as close rsyslog plugin when receives an EOF.

How to verify it

Verify rsyslog_plugin is running inside bgp or swss container

Run docker exec -it bgp supervisorctl restart rsyslogd

Before change:

This will not kill current rsyslog_plugin process but instead rsyslogd will now break off its end of writing to cin and send EOF to rsyslog_plugin, however will not send a signal SIGTERM or SIGKILL to rsyslog_plugin. Therefore, rsyslog plugin will run in an infinite loop forever, constantly calling getline raising CPU to 100% inside docker.

After change of adding signalOnClose="on" to conf file inside omprog, rsyslogd will now send SIGTERM to rsyslog_plugin process running inside container, and rsyslog_plugin will die.

? ( ): rsyslog_plugin/578637 ... [continued]: read()) = -1 (unknown) (INTERNAL ERROR: strerror_r(512, [buf], 128)=22)

UT (will add sonic-mgmt testcase for storming events with logs)

RCA:

1. When rsyslogd is terminated, no signal is sent to child process of rsyslog_plugin meaning that rsyslog_plugin will be constantly trying to read from cin with no writer on the other end of the pipe. This leads to rsyslog_plugin process will constantly be reading via getline infinitely.

2. Because rsyslog is terminated and the spawned rsyslog_plugin is still alive, when rsyslog starts backup again, and log is triggered, a new rsyslog_plugin will be spawned for that rsyslog process, which can lead to many "ghost" rsyslog_plugin processes that will be at high CPU usage.
@mssonicbld
Copy link
Collaborator

Cherry-pick PR to 202311: #18968

mssonicbld pushed a commit that referenced this pull request May 15, 2024
…o stream (#18835)

Fix #18771

Microsoft ADO (number only):27882794

How I did it

Add signalOnClose for omprog as well as close rsyslog plugin when receives an EOF.

How to verify it

Verify rsyslog_plugin is running inside bgp or swss container

Run docker exec -it bgp supervisorctl restart rsyslogd

Before change:

This will not kill current rsyslog_plugin process but instead rsyslogd will now break off its end of writing to cin and send EOF to rsyslog_plugin, however will not send a signal SIGTERM or SIGKILL to rsyslog_plugin. Therefore, rsyslog plugin will run in an infinite loop forever, constantly calling getline raising CPU to 100% inside docker.

After change of adding signalOnClose="on" to conf file inside omprog, rsyslogd will now send SIGTERM to rsyslog_plugin process running inside container, and rsyslog_plugin will die.

? ( ): rsyslog_plugin/578637 ... [continued]: read()) = -1 (unknown) (INTERNAL ERROR: strerror_r(512, [buf], 128)=22)

UT (will add sonic-mgmt testcase for storming events with logs)

RCA:

1. When rsyslogd is terminated, no signal is sent to child process of rsyslog_plugin meaning that rsyslog_plugin will be constantly trying to read from cin with no writer on the other end of the pipe. This leads to rsyslog_plugin process will constantly be reading via getline infinitely.

2. Because rsyslog is terminated and the spawned rsyslog_plugin is still alive, when rsyslog starts backup again, and log is triggered, a new rsyslog_plugin will be spawned for that rsyslog process, which can lead to many "ghost" rsyslog_plugin processes that will be at high CPU usage.
zbud-msft added a commit to zbud-msft/sonic-buildimage that referenced this pull request May 22, 2024
…o stream (sonic-net#18835)

Fix sonic-net#18771

Microsoft ADO (number only):27882794

How I did it

Add signalOnClose for omprog as well as close rsyslog plugin when receives an EOF.

How to verify it

Verify rsyslog_plugin is running inside bgp or swss container

Run docker exec -it bgp supervisorctl restart rsyslogd

Before change:

This will not kill current rsyslog_plugin process but instead rsyslogd will now break off its end of writing to cin and send EOF to rsyslog_plugin, however will not send a signal SIGTERM or SIGKILL to rsyslog_plugin. Therefore, rsyslog plugin will run in an infinite loop forever, constantly calling getline raising CPU to 100% inside docker.

After change of adding signalOnClose="on" to conf file inside omprog, rsyslogd will now send SIGTERM to rsyslog_plugin process running inside container, and rsyslog_plugin will die.

? ( ): rsyslog_plugin/578637 ... [continued]: read()) = -1 (unknown) (INTERNAL ERROR: strerror_r(512, [buf], 128)=22)

UT (will add sonic-mgmt testcase for storming events with logs)

RCA:

1. When rsyslogd is terminated, no signal is sent to child process of rsyslog_plugin meaning that rsyslog_plugin will be constantly trying to read from cin with no writer on the other end of the pipe. This leads to rsyslog_plugin process will constantly be reading via getline infinitely.

2. Because rsyslog is terminated and the spawned rsyslog_plugin is still alive, when rsyslog starts backup again, and log is triggered, a new rsyslog_plugin will be spawned for that rsyslog process, which can lead to many "ghost" rsyslog_plugin processes that will be at high CPU usage.
lguohan pushed a commit that referenced this pull request May 24, 2024
…o stream (#18835) (#19035)

Fix #18771

Microsoft ADO (number only):27882794

How I did it

Add signalOnClose for omprog as well as close rsyslog plugin when receives an EOF.

How to verify it

Verify rsyslog_plugin is running inside bgp or swss container

Run docker exec -it bgp supervisorctl restart rsyslogd

Before change:

This will not kill current rsyslog_plugin process but instead rsyslogd will now break off its end of writing to cin and send EOF to rsyslog_plugin, however will not send a signal SIGTERM or SIGKILL to rsyslog_plugin. Therefore, rsyslog plugin will run in an infinite loop forever, constantly calling getline raising CPU to 100% inside docker.

After change of adding signalOnClose="on" to conf file inside omprog, rsyslogd will now send SIGTERM to rsyslog_plugin process running inside container, and rsyslog_plugin will die.

? ( ): rsyslog_plugin/578637 ... [continued]: read()) = -1 (unknown) (INTERNAL ERROR: strerror_r(512, [buf], 128)=22)

UT (will add sonic-mgmt testcase for storming events with logs)

RCA:

1. When rsyslogd is terminated, no signal is sent to child process of rsyslog_plugin meaning that rsyslog_plugin will be constantly trying to read from cin with no writer on the other end of the pipe. This leads to rsyslog_plugin process will constantly be reading via getline infinitely.

2. Because rsyslog is terminated and the spawned rsyslog_plugin is still alive, when rsyslog starts backup again, and log is triggered, a new rsyslog_plugin will be spawned for that rsyslog process, which can lead to many "ghost" rsyslog_plugin processes that will be at high CPU usage.
@bingwang-ms
Copy link
Contributor

Hi @zbud-msft, the change only handles SIG_TERM, but EOF of pipe is not handled yet. If rsyslog is killed by SIG_KILL or crash unexpectedly, the busy look is still there.

@zbud-msft
Copy link
Contributor Author

Discussed with Bing offline. From the code we are handling EOF. In loop we are checking getline(cin, line). If EOF is sent, getline(cin, line) will evaluate to false. UT also shows that we are replacing cin buffer with empty input stream, which will be treated as EOF when getline() is called by plugin->run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rsyslog_plugin is hogging the CPU when high load is seen on rsyslog
7 participants