-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[rsyslog]: Use RELP instead of UDP for forwarding from container to host #18113
base: master
Are you sure you want to change the base?
Conversation
When the host's rsyslog is restarted (for example, to regenerate the config after some changes, or as part of some automated script), there is a chance that some syslog messages from the containers are lost. Most of the time, this isn't an issue. However, if there are test cases that expect all syslogs to be present (such as the advanced-reboot test case), then this can cause a problem. Additionally, this could affect debuggability of issues where a rsyslog restart happens in the middle. There are two options for reliable message transport in rsyslog: TCP and RELP. With TCP, while the protocol knows whether a syslog message has been delivered or not, the application doesn't know, because there is no feedback from the remote side saying the message was received. This means that there is still a chance that messages could be lost when the connection is broken (if, for example, the host rsyslog gets restarted), because after the connection is established, the sender rsyslog (in the container) doesn't know if the message has been received or not. RELP instead adds a feedback mechanism where the remote side notifies the sender whether the message has actually been received or not. This makes it much less likely to lose a message. There is one known possible case where a message (or messages) could be lost: the network is down, and rsyslog gets restarted. This at least requires both the network and rsyslog to have an issue, rather than just one. There is also a slim possibility where a message could get duplicated; this should be mostly fine (hopefully). RELP does require that both sides are using a recent version of rsyslogd (at least 7.3.16, which looks like it was released more than 10 years ago), but since we use Debian on both the container and the host, it should be fine. Therefore, switch to using RELP when sending syslog messages from the container to the host. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
…urst not being defined $SystemLogRateLimitInterval and $SystemLogRateLimitBurst both come from the imuxsock module. Specify them as module parameters (and also remove the legacy syntax). Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
By default, just using omrelp doesn't hold log messages if the server happens to be unavailable. This needs to be configured manually. Configure an in-memory storage (of a linked list) that by default will store up to 1000 messages (this appears to be a default value that can be bumped up) if the server is unavailable. I'm assuming this will be sufficient for most cases. Assuming each message is 512 bytes (many of our messages will be smaller than this), this will take up an additional 512kB of memory if 1000 messages are queues. If there are no messages queued, then no additional space is taken up. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
If rsyslogd on the host goes down, and rsyslogd on the containers is configured to use librelp to forward messages to the host rsyslogd (instead of UDP), then there will be error messages from the container rsyslogd about not being able to forward messages. Ignore these error messages as they are expected when running tests which may restart rsyslogd. This is in preparation for sonic-net/sonic-buildimage#18113 Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
/azpw run Azure.sonic-buildimage |
/AzurePipelines run Azure.sonic-buildimage |
Azure Pipelines successfully started running 1 pipeline(s). |
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
* Ignore errors about rsyslogd w/ librelp not being able to send syslogs If rsyslogd on the host goes down, and rsyslogd on the containers is configured to use librelp to forward messages to the host rsyslogd (instead of UDP), then there will be error messages from the container rsyslogd about not being able to forward messages. Ignore these error messages as they are expected when running tests which may restart rsyslogd. This is in preparation for sonic-net/sonic-buildimage#18113 Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
/azpw run Azure.sonic-buildimage |
/AzurePipelines run Azure.sonic-buildimage |
Azure Pipelines successfully started running 1 pipeline(s). |
/azpw ms_checker |
/azpw run Azure.sonic-buildimage |
/AzurePipelines run Azure.sonic-buildimage |
Azure Pipelines successfully started running 1 pipeline(s). |
module(load="imuxsock" SysSock.RateLimit.Interval="300" SysSock.RateLimit.Burst="20000") # provides support for local system logging | ||
#module(load="imklog") # provides kernel logging support | ||
#module(load="immark") # provides --MARK-- message capability |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@saiarcot895 can you mention this syntax change in PR description
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -37,7 +33,8 @@ set $.CONTAINER_NAME=getenv("CONTAINER_NAME"); | |||
|
|||
# Set remote syslog server | |||
template (name="ForwardFormatInContainer" type="string" string="<%PRI%>%TIMESTAMP:::date-rfc3339% %HOSTNAME% %$.CONTAINER_NAME%#%syslogtag%%msg:::sp-if-no-1st-sp%%msg%") | |||
*.* action(type="omfwd" target=`echo $SYSLOG_TARGET_IP` port="514" protocol="udp" Template="ForwardFormatInContainer") | |||
module(load="omrelp") | |||
*.* action(type="omrelp" target=`echo $SYSLOG_TARGET_IP` port="2514" action.resumeRetryCount="-1" queue.type="LinkedList" Template="ForwardFormatInContainer") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@saiarcot895 2514 is the port used by relp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no standard port used by RELP. The default port that rsyslog uses is 514, but that can conflict with regular syslog forwarding over TCP. A couple of the examples in the documentation for this feature uses 2514.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@saiarcot895 are we still using bullseye?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, bullseye is still being used for a couple containers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about platform/vs/docker-sonic-vs/etc/rsyslog.conf
don't need this change?
#$ModLoad immark # provides --MARK-- message capability | ||
module(load="imuxsock" {% if rate_limit_interval is not none %}SysSock.RateLimit.Interval="{{ rate_limit_interval }}"{% endif %} {% if rate_limit_burst is not none %}SysSock.RateLimit.Burst="{{ rate_limit_burst }}"{% endif %}) # provides support for local system logging | ||
module(load="imklog") # provides kernel logging support | ||
#module(load="immark") # provides --MARK-- message capability | ||
|
||
# provides UDP syslog reception |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@saiarcot895 This UDP syslog is for remote server?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in the case of a remote syslog server sending over UDP.
Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
Strictly speaking, it doesn't need this change, because the logs aren't actually being forwarded anywhere. It'll forward it to localhost port 514, but there likely won't be anything listening on this port. That container doesn't end up on the device. It would be nice to update the syntax there to have it use the new syntax, but I'll keep that separate for now. |
* Ignore errors about rsyslogd w/ librelp not being able to send syslogs If rsyslogd on the host goes down, and rsyslogd on the containers is configured to use librelp to forward messages to the host rsyslogd (instead of UDP), then there will be error messages from the container rsyslogd about not being able to forward messages. Ignore these error messages as they are expected when running tests which may restart rsyslogd. This is in preparation for sonic-net/sonic-buildimage#18113 Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
/azpw run Azure.sonic-buildimage |
/AzurePipelines run Azure.sonic-buildimage |
Azure Pipelines successfully started running 1 pipeline(s). |
* Ignore errors about rsyslogd w/ librelp not being able to send syslogs If rsyslogd on the host goes down, and rsyslogd on the containers is configured to use librelp to forward messages to the host rsyslogd (instead of UDP), then there will be error messages from the container rsyslogd about not being able to forward messages. Ignore these error messages as they are expected when running tests which may restart rsyslogd. This is in preparation for sonic-net/sonic-buildimage#18113 Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
In case rsyslog can't forward messages to the host's rsyslog server, messages will be queued so that they can be sent out later. For this queue, set a limit of 20000 messages so that rsyslog doesn't take too much memory. Assuming each message is 512 bytes, the approximate maximum additional memory usage is 10MB. Signed-off-by: Saikrishna Arcot <sarcot@microsoft.com>
This is kind of new feature. Is it possible to config for new behavior or old behavior? |
Why I did it
When the host's rsyslog is restarted (for example, to regenerate the config after some changes, or as part of some automated script), there is a chance that some syslog messages from the containers are lost. Most of the time, this isn't an issue. However, if there are test cases that expect all syslogs to be present (such as the advanced-reboot test case), then this can cause a problem. Additionally, this could affect debuggability of issues where a rsyslog restart happens in the middle.
There are two options for reliable message transport in rsyslog: TCP and RELP. With TCP, while the protocol knows whether a syslog message has been delivered or not, the application doesn't know, because there is no feedback from the remote side saying the message was received. This means that there is still a chance that messages could be lost when the connection is broken (if, for example, the host rsyslog gets restarted), because after the connection is established, the sender rsyslog (in the
container) doesn't know if the message has been received or not.
RELP builds on top of TCP, and adds a feedback mechanism where the remote side notifies the sender whether the message has actually been received or not. This makes it much less likely to lose a message. There is one known possible case where a message (or messages) could be lost: the network is down, and rsyslog gets restarted. This at least requires both the network and rsyslog to have an issue, rather than just one. There is also a slim possibility where a message could get duplicated; this should be mostly fine (hopefully).
RELP does require that both sides are using a recent version of rsyslogd (at least 7.3.16, which looks like it was released more than 10 years ago), but since we use Debian on both the container and the host, it should be fine.
Therefore, switch to using RELP when sending syslog messages from the container to the host.
Fixes #17792.
Work item tracking
How I did it
Modify the rsyslog.conf file on the host and the container to use RELP instead of UDP.
In addition, update the syntax used for the config files to the (newer) RainierScript format, which, among other things, makes it easier to set settings for specific outputs.
How to verify it
Stop rsyslogd on the host, make sure that the containers generate some syslogs, restart rsyslogd on the host, and verify no logs were lost.
Which release branch to backport (provide reason below if selected)
Tested branch (Please provide the tested image version)
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)