-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[telemetry] Rotate streaming telemetry secrets. #9600
base: master
Are you sure you want to change the base?
Changes from all commits
a921b8e
7da4005
ede36b9
7c3c7d3
90d9d03
0bfcec9
9db5b04
b1844b7
e28f64d
536fc4c
17c4d60
59dfe9b
ce9f713
68bd0b6
6ad0ef5
6ce4e4d
6512a6f
c6e2fdd
3dbb79d
0881704
41377a1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,214 @@ | ||
#!/usr/bin/env python3 | ||
|
||
""" | ||
certificate_rotation_checker | ||
|
||
This script will be leveraged to periodically check whether the certificate file | ||
of streaming telemetry was rotated or not. The configuration of streaming telemetry | ||
server process will be reloaded if the certificate file was rotated. | ||
""" | ||
|
||
import os | ||
import signal | ||
import subprocess | ||
import sys | ||
import syslog | ||
import time | ||
|
||
import inotify.adapters | ||
|
||
from swsscommon import swsscommon | ||
|
||
MAX_RETRY_TIMES = 10 | ||
CERTIFICATE_CHECKING_INTERVAL_SECS = 3600 | ||
|
||
CREDENTIALS_DIR_PATH = "/etc/sonic/credentials/" | ||
|
||
|
||
def get_command_result(command): | ||
"""Executes the command and returns the exiting code and resulting output. | ||
|
||
Args: | ||
command: A string contains the command to be executed. | ||
|
||
Returns: | ||
An integer indicates the exiting code. | ||
A string which contains the output of command. | ||
""" | ||
command_stdout = "" | ||
command_stderr = "" | ||
|
||
try: | ||
proc_instance = subprocess.Popen(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, | ||
shell=True, universal_newlines=True) | ||
command_stdout, command_stderr = proc_instance.communicate() | ||
except (OSError, ValueError) as err: | ||
syslog.syslog(syslog.LOG_ERR, "Failed to execute the command '{}'. Error: '{}'" | ||
.format(command, err)) | ||
return 2, command_stderr | ||
|
||
return proc_instance.returncode, command_stdout.strip() | ||
|
||
|
||
def get_telemetry_server_info(): | ||
"""Gets telemetry server process information. | ||
|
||
Args: | ||
None. | ||
|
||
Returns: | ||
If telemetry server process is running, returns True and process id; | ||
Otherwise returns False and -1. | ||
""" | ||
processes_status_cmd = "supervisorctl status" | ||
retry_times = 0 | ||
|
||
while retry_times <= MAX_RETRY_TIMES: | ||
retry_times += 1 | ||
exit_code, command_stdout = get_command_result(processes_status_cmd) | ||
if exit_code != 3: | ||
syslog.syslog(syslog.LOG_INFO, | ||
"Failed to get the telemetry server process information and retry after 60 seconds ...") | ||
time.sleep(60) | ||
else: | ||
for line in command_stdout.splitlines(): | ||
if "telemetry" in line and "RUNNING" in line: | ||
return True, line.split()[3].strip(",") | ||
|
||
return False, -1 | ||
|
||
|
||
def reload_telemetry_server_configuration(): | ||
"""Reloads the telemetry server configuration by sending signal 'SIGHUP' | ||
to telemetry server process and checks it is actually running after doing the reload. | ||
|
||
Args: | ||
None | ||
|
||
Returns: | ||
Returns True if the configuration was reloaded successfully; Otherwise, return False. | ||
""" | ||
telemetry_server_pid = -1 | ||
is_running = False | ||
|
||
is_running, telemetry_server_pid = get_telemetry_server_info() | ||
if not is_running: | ||
syslog.syslog(syslog.LOG_ERR, | ||
"Telemetry server process is not running before reloading configuration!") | ||
return False | ||
|
||
syslog.syslog(syslog.LOG_INFO, | ||
"Telemetry server process is running with PID: {}".format(telemetry_server_pid)) | ||
syslog.syslog(syslog.LOG_INFO, "Sending 'SIGHUP' signal to telemetry server process ...") | ||
|
||
os.kill(int(telemetry_server_pid), signal.SIGHUP) | ||
|
||
syslog.syslog(syslog.LOG_INFO, "'SIGHUP' signal was sent out.") | ||
|
||
# Wait for 120 seconds to check whether telemetry server process comes back | ||
time.sleep(120) | ||
|
||
is_running, telemetry_server_pid = get_telemetry_server_info() | ||
if not is_running: | ||
syslog.syslog(syslog.LOG_ERR, | ||
"Telemetry server process is not running after reloading configuration!") | ||
return False | ||
|
||
syslog.syslog(syslog.LOG_INFO, "Telemetry server process is running after reloading configuration!") | ||
return True | ||
|
||
|
||
def check_certificate_rotated(certificate_file_name): | ||
"""Leverages the 'inotify' module to monitor the file system events under the | ||
directory which stores the SONiC credentials and reloads telemetry server | ||
configuration if its certificate was rotated. | ||
|
||
|
||
Args: | ||
certificate_file_name: A string indicates the telemetry certificate file name. | ||
|
||
Returns: | ||
None. | ||
""" | ||
certificate_file_rotated = False | ||
|
||
inotify_instance = inotify.adapters.Inotify() | ||
inotify_instance.add_watch(CREDENTIALS_DIR_PATH) | ||
for event in inotify_instance.event_gen(yield_nones=False): | ||
header, event_type, monitoring_path, file_name = event | ||
if (file_name == certificate_file_name | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The FSM logic is complex and may be messed up by some input sequence. Could you use one file as the main indicator, and always rotate if that file changed. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated and use the rotation of certificate file as the main indicator. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please make sure describe the main file in document? This is very critical design assumption and the cert rotator should treat it as a contract. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated the design document. |
||
and ("IN_CREATE" in event_type or "IN_MOVED_TO" in event_type)): | ||
certificate_file_rotated = True | ||
|
||
if certificate_file_rotated: | ||
certificate_file_rotated = False | ||
syslog.syslog(syslog.LOG_INFO, | ||
"Certificate was rotated and reloading telemetry server configuration ...") | ||
|
||
if not reload_telemetry_server_configuration(): | ||
syslog.syslog(syslog.LOG_ERR, | ||
"Failed to reload the telemetry server configuration!") | ||
|
||
syslog.syslog(syslog.LOG_INFO, "Telemetry server configuration was reloaded successfully!") | ||
|
||
|
||
def certificate_rotated_checker(): | ||
"""Checks rotation of certificate file and then reloads streaming telemetry server configuration. | ||
|
||
Leverages 'inotify' module to check whether the certificate file of streaming telemetry was | ||
rotated or not. The configuration of telemetry server process will be reloaded if it was rotated. | ||
|
||
Args: | ||
None | ||
|
||
Returns: | ||
None | ||
""" | ||
certificate_file_path = "" | ||
private_key_file_path = "" | ||
certificate_file_name = "" | ||
|
||
config_db = swsscommon.DBConnector("CONFIG_DB", 0) | ||
telemetry_table = swsscommon.Table(config_db, "TELEMETRY") | ||
telemetry_table_keys = telemetry_table.getKeys() | ||
if "certs" in telemetry_table_keys: | ||
certs_info = dict(telemetry_table.get("certs")[1]) | ||
if "server_crt" in certs_info and "server_key" in certs_info: | ||
certificate_file_path = certs_info["server_crt"] | ||
private_key_file_path = certs_info["server_key"] | ||
syslog.syslog(syslog.LOG_INFO, "Path of certificate file is '{}'".format(certificate_file_path)) | ||
syslog.syslog(syslog.LOG_INFO, "Path of private key file is '{}'".format(private_key_file_path)) | ||
else: | ||
syslog.syslog(syslog.LOG_ERR, | ||
"Failed to retrieve the path of certificate or private key file from 'TELEMETRY' table!") | ||
sys.exit(1) | ||
else: | ||
syslog.syslog(syslog.LOG_ERR, | ||
"Failed to retrieve the certificate information from 'TELEMETRY' table!") | ||
sys.exit(2) | ||
|
||
while True: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated and checks the existence of both two files. |
||
if not os.path.exists(certificate_file_path) or not os.path.exists(private_key_file_path): | ||
syslog.syslog(syslog.LOG_ERR, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
"Certificate or private key file did not exist on device and checks again after '{}' seconds ..." | ||
.format(CERTIFICATE_CHECKING_INTERVAL_SECS)) | ||
time.sleep(CERTIFICATE_CHECKING_INTERVAL_SECS) | ||
else: | ||
break | ||
|
||
certificate_file_name = certificate_file_path.strip().split("/")[-1] | ||
if not certificate_file_name: | ||
syslog.syslog(syslog.LOG_ERR, | ||
"Failed to retrieve the file name of certificate!") | ||
sys.exit(3) | ||
|
||
check_certificate_rotated(certificate_file_name) | ||
|
||
|
||
def main(): | ||
certificate_rotated_checker() | ||
|
||
|
||
if __name__ == "__main__": | ||
main() | ||
sys.exit(0) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,3 @@ | ||
program:telemetry | ||
program:dialout | ||
program:certificate_rollover_checker |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In extreme case, the file is deleted by a malicious user, will the inotify_instance still working? I think its link to inode, and deleting file will destroy the inode.
If this is true, a crash is better than a dead loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
inotify
isinode
based and it will monitor the credentials directory/etc/sonic/credentials/
to see whether the telemetry certificate file was rotated or not. If certificate file was deleted by accidentally, theinotify_instance
will not be impacted.I updated the PR to log an error message if the certificate was deleted. What I am thinking is if the certificate was restored later, then it can be treated as a kind of
rotation
operation and the telemetry server will be restarted by this script.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If certificate file was deleted by accidentally, what is the expected behavior?
I am considering in this case, we can kill telemetry daemon.