-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reload certificate periodically #280
Conversation
apiserver/src/api/mod.rs
Outdated
"Certificate has been renewed, restarting server to reload new certificate" | ||
); | ||
server.stop(true).await; | ||
std::process::exit(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like to see a process exit buried so deeply in a program. Ideally you would return a bool (or enum) that would say whether or not an exit is needed. Bubble that all the way up to main if possible. Ideally only main should have process exits IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After testing, I think we don't need process exit
since server.stop(true).await;
will restart apiserver. I'll just remove this std::process::exit(0)
. thanks
rebase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! 🕺🏼
models/src/node/mod.rs
Outdated
pub mod mod_error { | ||
use snafu::Snafu; | ||
|
||
#[derive(Debug, Snafu)] | ||
#[snafu(visibility(pub))] | ||
pub enum Error { | ||
#[snafu(display( | ||
"IO error occurred while attempting to use APIServerClient: '{}'", | ||
source | ||
))] | ||
IOError { source: Box<dyn std::error::Error> }, | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this error be better placed in the node/error.rs
file with the rest of this modules errors? (also see comment above where we seem to already have a Result type for this module
models/src/node/mod.rs
Outdated
use std::io::Read; | ||
|
||
/// The module-wide result type. | ||
type Result<T> = std::result::Result<T, mod_error::Error>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this already defined here?
pub type Result<T> = std::result::Result<T, Error>; |
apiserver/src/api/mod.rs
Outdated
@@ -101,6 +104,9 @@ pub async fn run_server<T: 'static + BottlerocketShadowClient>( | |||
k8s_client: kube::Client, | |||
prometheus_exporter: Option<opentelemetry_prometheus::PrometheusExporter>, | |||
) -> Result<()> { | |||
let public_key_path = format!("{}/{}", TLS_KEY_MOUNT_PATH, PUBLIC_KEY_NAME); | |||
let certificate_cache = read_certificate(&public_key_path).unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's replace this unwrap
with a structured error in the API module (reference: bottlerocket-os/bottlerocket@5a58d4b)
apiserver/src/api/mod.rs
Outdated
// Certificate is refreshed periodically (default 60 days). Once certificate is renewed, apiserver | ||
// needs to stop the server for reloading new certificate. | ||
// We cache the certificate initially when brupop starts server, and compare to update-to-date certificate periodically. | ||
// If they aren't match, we recognize it as new certificate is coming out, so the server needs to be restarted. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few nits here:
// Certificate is refreshed periodically (default 60 days). Once certificate is renewed, apiserver | |
// needs to stop the server for reloading new certificate. | |
// We cache the certificate initially when brupop starts server, and compare to update-to-date certificate periodically. | |
// If they aren't match, we recognize it as new certificate is coming out, so the server needs to be restarted. | |
// The certificate is refreshed periodically (default 60 days). Once the certificate is renewed, the apiserver | |
// needs to stop in order to reload the new certificate. | |
// We cache the certificate initially when brupop starts the server, and compare it to the update-to-date certificate periodically. | |
// If they don't match, we recognize it as a new certificate, so the server needs to be restarted. |
push above address @jpmcb comments |
when cert-manager rotates the certificate brupop-apiserver-certificate (which by default will be after 60 days), brupop apiserver needs to reload certificate to bounce all the agent and APIServer pods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🥇
// We cache the certificate initially when brupop starts the server, and compare it to the update-to-date certificate periodically. | ||
// If they don't match, we recognize it as a new certificate, so the server needs to be restarted. | ||
async fn reload_certificate( | ||
server: Server, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to comment on an old PR, but I was catching up on some of the stuff that's changed in brupop!
I wonder if it makes more sense to make this a mutable reference? It seems unusual to clone it. Although, I suppose if it works, Actix must have done some work to make sure that the interior state of the Server is thread-safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually find out that using https://github.com/notify-rs/notify to watch the fs change and reload the certificate is what other people like to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cbgbt - This has actually changed recently: 561e6c0#diff-5cad36d03cb8802a9c5a0a20ae2e841d683e1cd564146c441e2940fb1ecf652eR255
A few APIs changed in Actix so cloning the server and using it directly wasn't a great option. Instead, now, we can use a actix_web::dev::ServerHandle
as a "server handler" to call into it and stop it like:
server_handler.stop(true).await;
when cert-manager rotates the certificate brupop-apiserver-certificate (which by default will be after 60 days), brupop apiserver needs to reload certificate to bounce all the agent and APIServer pods.
Issue number:
Closes: #233
Description of changes:
Add a mechanism to monitor if certificate has been renewed. If yes, stop servers and apiserver will be restarted to reload new certificate. Agent doesn't need do anything since the logic in brupop that agent will always use current certificate on volume to talk with apiserver.
Testing done:
Terms of contribution:
By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.