Rev | Date | Author | Change Description |
---|---|---|---|
0.1 | Junchao Chen | Initial version |
This document is the design document for syslog message rate limit configuration per container.
N/A
Logging in SONiC is organized with rsyslogd. Each container has its own rsyslogd instance plus a daemon running on host side. The rsyslogd instance which is running on the host is used to collect the messages from within containers and store them at certain path (E.g. /var/log/syslog). Rsyslog config file are generated from templates:
- Container scope:
- Host scope: https://github.com/Azure/sonic-buildimage/blob/master/files/image_config/rsyslog/rsyslog.conf.j2
Currently, each container has hardcoded message rate limiting to avoid receiving flooded log messages:
$SystemLogRateLimitInterval 300
$SystemLogRateLimitBurst 20000
There is no rate limiting configured on host side for now.
The SystemLogRateLimitInterval determines the amount of time that is being measured for rate limiting. The SystemLogRateLimitBurst defines the amount of messages, that have to occur in the time limit of SystemLogRateLimitInterval, to trigger rate limiting. For example, SystemLogRateLimitInterval=300, SystemLogRateLimitBurst=20000, it means that if one daemon generate more than 20000 messages in 300 seconds, rsyslogd will start to drop messages after that(FIFO).
This feature allows user to configure SystemLogRateLimitInterval and SystemLogRateLimitBurst for host, containers.
- Support syslog message rate limit configuration for host and containers
- New CLI for rate limit configuration
- Have rate limit configuration persistent
- Default rate limit shall be applied if no configuration provided
- Both multi ASIC and single ASIC platform shall use rsyslog-container.conf.j2 as template
- APP extension shall be able to declare its capability of this feature
- APP extension is responsible for reading/listening rate limit configuration from CONFIG DB and set rsyslog configuration accordingly.
- CLI is only responsible for putting rsyslog configuration to CONFIG DB
- hostcfgd shall be extended to handle host side rsyslog configuration by listening CONFIG DB change
- A new daemon containercfgd shall be added to each container to handle container side rsyslog configuration
- rsyslog.conf.j2 and rsyslog-container.conf.j2 shall be extended to accept template variable for SystemLogRateLimitInterval and SystemLogRateLimitBurst.
- rsyslog.conf for single ASIC platform will be removed and replaced by rsyslog-container.conf.j2
- App extension shall handle rsyslog configuration itself
There is an existing service rsyslog-config.service. On switch start up, this service renders rsyslog.conf.j2 and restarts rsyslog service. No extra changes for this feature.
In SONiC, container start up script is generated by rendering docker_image_ctl.j2. In each container start up script, there are 3 phases: preStartAction
, start container
, postStartAction
. In phase preStartAction
, there is an existing function updateSyslogConf
. updateSyslogConf
shall be extended to render rsyslog-container.conf.j2 for both single ASIC and multi ASIC platforms. Flow for updateSyslogConf
:
Note: docker support copying file to a stopped container
Note: according to test, syslog rate limit configuration on host side would not affect container side.
The syslog rate limit configuration shall be stored/listened per namespace.
Changes shall be made into sonic-utilities, sonic-buildimage. CLI changes of sonic-utilities will be covered by chapter "Configuration and management".
Note: Code present in this design document is only for demonstrating the design idea, it is not production code.
New tables shall be added to CONFIG DB to store the rate limit configuration. init_cfg.json.j2 shall be extended to define the default value for each built-in containers.
...
"SYSLOG_CONFIG": {
"GLOBAL": {
"rate_limit_interval" : "0",
"rate_limit_burst" : "0"
}
},
"SYSLOG_CONFIG_FEATURE": {
{\%- for feature, _, _, _ in features \%}
"{{feature}}": {
"rate_limit_interval" : "300",
"rate_limit_burst": "20000"
}{\%if not loop.last \%},{\% endif -\%}
{\% endfor \%}
}
NOTE: An extra backslash is added in front of % in the above code snippet. Remove the backslash while using the actual code in SONiC.
...
{\% if SYSLOG_CONFIG is defined \%}
{\% if 'GLOBAL' in SYSLOG_CONFIG \%}
{\% if 'rate_limit_interval' in SYSLOG_CONFIG['GLOBAL']\%}
{\% set rate_limit_interval = SYSLOG_CONFIG['GLOBAL']['rate_limit_interval'] \%}
{\% endif \%}
{\% if 'rate_limit_burst' in SYSLOG_CONFIG['GLOBAL']\%}
{\% set rate_limit_burst = SYSLOG_CONFIG['GLOBAL']['rate_limit_burst'] \%}
{\% endif \%}
{\% endif \%}
{\% endif \%}
{\% if rate_limit_interval is defined \%}
$SystemLogRateLimitInterval {{ rate_limit_interval }}
{\% endif \%}
{\% if rate_limit_burst is defined \%}
$SystemLogRateLimitBurst {{ rate_limit_burst }}
{\% endif \%}
NOTE: An extra backslash is added in front of % in the above code snippet. Remove the backslash while using the actual code in SONiC.
{\% if SYSLOG_CONFIG_FEATURE is defined \%}
{\% if container_name in SYSLOG_CONFIG_FEATURE \%}
{\% if 'rate_limit_interval' in SYSLOG_CONFIG_FEATURE[container_name]\%}
{\% set rate_limit_interval = SYSLOG_CONFIG_FEATURE[container_name]['rate_limit_interval'] \%}
{\% endif \%}
{\% if 'rate_limit_burst' in SYSLOG_CONFIG_FEATURE[container_name]\%}
{\% set rate_limit_burst = SYSLOG_CONFIG_FEATURE[container_name]['rate_limit_burst'] \%}
{\% endif \%}
{\% endif \%}
{\% endif\%}
{\% if rate_limit_interval is defined \%}
$SystemLogRateLimitInterval {{ rate_limit_interval }}
{\% endif \%}
{\% if rate_limit_burst is defined \%}
$SystemLogRateLimitBurst {{ rate_limit_burst }}
{\% endif \%}
NOTE: An extra backslash is added in front of % in the above code snippet. Remove the backslash while using the actual code in SONiC.
Function updateSyslogConf
in the template currently only works for multi ASIC platform, it shall be extended to work for both multi ASIC and single ASIC platform. The only difference between multi ASIC and single ASIC platform is the target IP of rsyslog.
- Mulit ASIC: target IP of rsyslog is get from docker0 IP which is already implemented in docker_image_ctl.j2
- Single ASIC: target IP of rsyslog is always 127.0.0.1.
Meanwhile, the file https://github.com/Azure/sonic-buildimage/blob/master/dockers/docker-base/etc/rsyslog.conf as well as any code processing it shall be removed.
APP extension shall be able to expose its constant syslog capability by adding new fields to manifest:
root@sonic:/home/admin# spm show package manifest what-just-happened
{
...
"service": {
...
"syslog": {
"support-rate-limit": "true"
}
...
}
...
}
- support-rate-limit: indicates if this APP extension supports configuring rate limit. This field affected CLI behavior: if true, CLI shall put rate limit configuration to CONFIG DB; otherwise, CLI shall reject the configuration.
The capability shall also be saved to CONFIG DB to allow easy access by management plane:
root@sonic:/home/admin# cat /etc/sonic/config_db.json
{
"FEATURE": {
...
"what-just-happened": {
...
"support_syslog_rate_limit": "True",
...
}
...
}
Also, default value shall be provided to https://github.com/Azure/sonic-utilities/blob/master/sonic_package_manager/service_creator/feature.py:
DEFAULT_SYSLOG_FEATURE_CONFIG = {
'rate_limit_interval': '300',
'rate_limit_burst': '20000'
}
N/A
Config rate limit:
config syslog rate-limit-host --interval <interval> --burst <burst>
config syslog rate-limit-container <service_name> --interval <interval> --burst <burst> -n <namespace>
Example:
config syslog rate-limit-host --interval 300 --burst 20000
config syslog rate-limit-host --interval 300
config syslog rate-limit-host --burst 20000
# Config bgp for all namespaces. For multi ASIC platforms, bgp service in all namespaces will be affected.
# For single ASIC platform, bgp service in global namespace will be affected.
config syslog rate-limit-container bgp --interval 300 --burst 20000
# Config bgp for global namespace only.
config syslog rate-limit-container bgp --interval 300 --burst 20000 -n default
# Config bgp for asic0 namespace only.
config syslog rate-limit-container bgp --interval 300 --burst 20000 -n asic0
On multi ASIC platform, there could be 3 different cases. All 3 cases shall be handled by syslog rate limit CLIs.
- A service exists in global namespace only. For multi ASIC platform,
-n default
shall be used; for single ASIC platform,-n default
or no-n
option shall handle this case. - A service exists in per ASIC namespace only. Only support on multi ASIC platform,
-n <namespace>
shall be used. - A service exists in global and per ASIC namespace only. Only support on multi ASIC platform, no
-n
option shall handle this case.
Note: set interval or burst to 0 will disable rate limit.
Show rate limit:
show syslog rate-limit-host
show syslog rate-limit-container [<service_name>] -n <namespace>
Example:
show syslog rate-limit-host
INTERVAL BURST
---------- --------
500 50000
# Single ASIC
show syslog rate-limit-container
SERVICE INTERVAL BURST
-------- ---------- --------
bgp 500 N/A
snmp 300 20000
swss 2000 12000
# Single ASIC
show syslog rate-limit-container bgp
SERVICE INTERVAL BURST
-------- ---------- --------
bgp 500 5000
# Multi ASIC
show syslog rate-limit-container
SERVICE INTERVAL BURST
-------- ---------- --------
bgp 500 N/A
snmp 300 20000
swss 2000 12000
Namespace asic0:
SERVICE INTERVAL BURST
-------- ---------- --------
bgp 500 N/A
snmp 300 20000
swss 2000 12000
# Multi ASIC
show syslog rate-limit-container bgp
SERVICE INTERVAL BURST
-------- ---------- --------
bgp 500 5000
Namespace asic0:
SERVICE INTERVAL BURST
-------- ---------- --------
bgp 500 5000
# Multi ASIC
show syslog rate-limit-container bgp -n asic1
Namespace asic1:
SERVICE INTERVAL BURST
-------- ---------- --------
bgp 500 5000
Enable/disable rate limit feature:
config syslog rate-limit-feature enable [<service_name>] -n <namespace>
config syslog rate-limit-feature disable [<service_name>] -n <namespace>
# Enable/disable syslog rate limit for all services in all namespaces
config syslog rate-limit-feature enable
config syslog rate-limit-feature disable
# Enable/disable syslog rate limit for all services in global namespace
config syslog rate-limit-feature enable -n default
config syslog rate-limit-feature disable -n default
# Enable/disable syslog rate limit for all services in asic0 namespace
config syslog rate-limit-feature enable -n asic0
config syslog rate-limit-feature disable -n asic0
# Enable/disable syslog rate limit for database in all namespaces
config syslog rate-limit-feature enable database
config syslog rate-limit-feature disable database
# Enable/disable syslog rate limit for database in default namespace
config syslog rate-limit-feature enable database -n default
config syslog rate-limit-feature disable database -n default
# Enable/disable syslog rate limit for database in asci0 namespace
config syslog rate-limit-feature enable database -n asci0
config syslog rate-limit-feature disable database -n asci0
...
/* table for host side syslog configuration */
container SYSLOG_CONFIG {
description "SYSLOG_CONFIG part of config_db.json";
container GLOBAL {
leaf rate_limit_interval {
description "Message rate limit interval";
type uint32 {
range 0..2147483647;
}
}
leaf rate_limit_burst {
description "Message rate limit burst";
type uint32 {
range 0..2147483647;
}
}
}
/* end of list SYSLOG_CONFIG_LIST */
}
/* end of container SYSLOG_CONFIG */
/* table for container side syslog configuration */
container SYSLOG_CONFIG_FEATURE {
description "SYSLOG_CONFIG_FEATURE part of config_db.json";
list SYSLOG_CONFIG_FEATURE_LIST {
key "service";
leaf service {
description "Service name";
type leafref {
path "/feature:sonic-feature/feature:FEATURE/feature:FEATURE_LIST/feature:name";
}
}
leaf rate_limit_interval {
description "Message rate limit interval";
type uint32 {
range 0..2147483647;
}
}
leaf rate_limit_burst {
description "Message rate limit burst";
type uint32 {
range 0..2147483647;
}
}
}
/* end of list SYSLOG_CONFIG_FEATURE_LIST */
}
/* end of container SYSLOG_CONFIG_FEATURE */
...
container FEATURE {
description "feature table in config_db.json";
list FEATURE_LIST {
...
leaf support_syslog_rate_limit {
description "This configuration indicates if the feature support configuring syslog rate limit";
type stypes:boolean_type;
default "false";
}
...
}
}
containercfgd shall be delayed when warmboot/fastboot is in progress.
Note: according to test, containercfgd does not introduce extra delay for warmboot/fastboot within a proper start delay.
- Cannot support container not registered to FEATURE table
- Persist syslog configuration for database container will not be loaded on container startup, but containercfgd will sync up the configuration later.
- Configuring rate limit would cause rsyslogd dropping some log messages because it will restart rsyslogd.
- Verify command "config syslog rate-limit-host"
- Verify command "config syslog rate-limit-container"
- Verify command "show syslog rate-limit-host"
- Verify command "show syslog rate-limit-container"
Two new test case shall be added:
- Loop each container
- Change the syslog rate limit of current container
- Use a generated script to print log from current container which is fast enough to hit the limit
- Check syslog that some logs are dropped
- Change the syslog rate limit of host
- Use a generated script to print log from host which is fast enough to hit the limit
- Check syslog that some logs are dropped