Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

9.3.0: Add missing check for SenderStatusCertMiss and more robust cert fetch #3031

Merged

Conversation

rouming
Copy link
Contributor

@rouming rouming commented Feb 9, 2023

Backport of the #3020

cc: @eriknordmark
cc: @dautovri

FYI: branch probably has to be tagged as 9.3.1, Ruslan, you know better.


Need a careful review around this to look for any other holes where we can miss updating the controllers certificates in EVE, e.g., if there is a network outage. If helpful we can set up a call for that.

The refactoring in
985b142 missed the check for SenderStatusCertMiss.

Given that we didn't detect that in testing, the second commit runs a the fetch periodically (default every 24 hours) as extra safety.

Third commit: Turns out that the first fetch of the certs in zedagent can fail if the network isn't up yet (and the fetch on boot in client.go can be skipped if the device has already been onboarded), thus it makes sense retrying more frequently if that happens. (Note that all of this retry is a second layer of protection should the SenderStatusCertMiss checks be broken again as in the above commit.)
We also add some more logging to be able to observe at least the failures, without logging every periodic fetch.

The forth commit makes sure we clear the cache inside the zedcloud.authen.go code when we update the certs.

The fifth commit is unrelated, but when debugging and testing the above it looked odd that a device was marked as booting when it was running fine using the checkpointed config.

The refactoring in
lf-edge@985b142
missed the check for SenderStatusCertMiss.

Signed-off-by: eriknordmark <erik@zededa.com>
Run a periodic fetch of the controller certs (default 24 hours) for
additional safety should we experience multiple failures. Setable using
timer.cert.interval. Add/increase logging for the controller cert
fetches.

Signed-off-by: eriknordmark <erik@zededa.com>
It can fail if there is a network outage when zedagent starts this task.
We run with a short timer when there is a failure, and revert to the
longer timer once we have fetched the initial, and after a trigger or
periodic, succeeded in the fetch.

Signed-off-by: eriknordmark <erik@zededa.com>
Signed-off-by: eriknordmark <erik@zededa.com>
Signed-off-by: eriknordmark <erik@zededa.com>
Signed-off-by: eriknordmark <erik@zededa.com>
Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
@rouming rouming force-pushed the 9.3.0-missing-check-for-SenderStatusCertMiss branch from 60ebce8 to 0f659cc Compare February 10, 2023 11:31
parseControllerCerts already checks if there is any change in newly
fetched controller certificates. We can therefore reuse that code
instead of reloading saved signing certificate.
Plus I'm not sure if we can simply skip parseControllerCerts
when we detect that signing certificate has not changed - what if some
other certificate of a different type has changed? Any cert update
should be published to pubControllerCert.
It is better to do some redundant save of the signing cert (when some
other cert have changed, not the one for signing), then to forget
to publish a cert update.

Signed-off-by: Milan Lenco <milan@zededa.com>
Signed-off-by: Roman Penyaev <r.peniaev@gmail.com>
Copy link
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@eriknordmark eriknordmark merged commit d9278b4 into lf-edge:9.3 Feb 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants