Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve certificate renewal #324

Merged
merged 8 commits into from
May 4, 2022
Merged

Conversation

mz-pdm
Copy link
Member

@mz-pdm mz-pdm commented May 2, 2022

This should fix some of the problems that users not paying proper attention to certificate expiration warnings and alerts may experience:

  • Host certificate lifetimes are much longer now.
  • Host certificates are renewed much earlier on host upgrade now.
  • Enrolling host certificates in host upgrade actually works.
  • It is possible to enroll certificates now when a host is non-responsive.
  • ovn-controller is restarted after enrolling certificates.

Read the corresponding commit messages for more details.

Not renewing some of the vmconsole certificates will be handled in a separate PR later.

Related bugs:

Copy link
Member

@michalskrivanek michalskrivanek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, @almusil @erav please approve

mz-pdm added 7 commits May 4, 2022 09:54
Non-web certificates are not subjects to the lifetime limitations
required by web browsers.  VDS certificates must be renewed when the
host is in maintenance, which makes frequent upgrades difficult.
Let’s extend the lifetime of non-web certificates to 5 years (a value
selected by Michal) to make the certificate updates much less
frequent.

Bug-Url: https://bugzilla.redhat.com/2079835
…host-upgrade

Otherwise the whole playbook will stop immediately and host upgrade
fails.  We want to proceed instead and update the certificates.

Bug-Url: https://bugzilla.redhat.com/2079890
Currently, we warn about host certificate expiration 120 days in
advance and alert and perform renewal on host upgrades 30 days in
advance.  These proved to be too late, many users don’t pay
appropriate attention to warnings and alerts and when there is no host
upgrade during the relatively short period, the users may end up with
expired certificates and non-responsive hosts.

Now, when we have extended the lifetime of host certificates to 3650
days, we can warn about host certificate expiration earlier and force
renewal on host upgrades already during the warning period.  This
increases the chances there will be a host upgrade (with certificate
renewal) during the period.  Value of CertExpirationWarnPeriodInDays
is increased to 365.

This change also extends the warning period for all other certificates
except for web certificates.

Bug-Url: https://bugzilla.redhat.com/2079890
When a host certificate expires the host becomes non-responsive.
There is no longer a way to connect to the host via the Vdsm API.
But it would be still possible to renew the certificate if the
operation wasn’t blocked in the web UI.  Let’s permit enrolling
certificates from the menu also for hosts in non-responsive state, to
make life easier for users who ignore warning about upcoming
certificate expiration until it actually happens.

Of course, a host can become non-responsive also for other reasons
than its certificate expiration.  If the users decide to renew
certificates in such states, it’s their own decision and risk.
A host in such a state is not in a good shape anyway and restarting
the services running on it, as needed by the certificate renewal, is
not that much an issue.

Bug-Url: https://bugzilla.redhat.com/2079901
OVN services are not restarted after OVN certificate update.
Unlike other services, which are restarted after enrolling
certificates, these ones are not restarted together with libvirtd
restart and need their own restart action.

Bug-Url: https://bugzilla.redhat.com/2079901
After updating Vdsm certificates, the corresponding services are not
restarted.  Or, actually, they are, but only as a part of
ovirt-host-deploy-vnc-certificates role if those certificates happened
to be updated as well (they usually do), which is not entirely
sufficient.  And ovirt-imageio is not restarted at all.

Let’s make sure the services are always restarted after Vdsm
certificates update.  It’s sufficient to restart libvirtd (it restarts
vdsmd etc. too) and ovirt-imageio (vdsmd wants it started but doesn’t
ensure its restart).

We check for both ‘enabled’ status and ‘running’ states.  Some
services may be stopped before updates, so checking for ‘running’
wouldn’t be sufficient.  And ovirt-imageio may be running but not
enabled, which is probably a bug but it doesn’t harm to check for
both.

This change also means that some services may be restarted more than
once during a certificate update.  Fixing this would need introducing
inter-role service handling, which would be too complicated at the
moment.  Multiple restarts should be usually harmless because this is
normally done in the maintenance mode.

Bug-Url: https://bugzilla.redhat.com/2079901
@mz-pdm
Copy link
Member Author

mz-pdm commented May 4, 2022

Besides the things discussed above, I added another commit ensuring that services get restarted after Vdsm certificates update, including ovirt-imageio, which wasn't restarted in such a case at all, even not by side-effect (like the other services).

Copy link
Member

@mwperina mwperina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, the small comment can be handled in followup patch

Expose VdsCertificateValidityInDays in engine-config to allow to set
custom validity period of hypervisor certificates.

Bug-Url: https://bugzilla.redhat.com/2079890
@mwperina mwperina merged commit dad0d3a into oVirt:master May 4, 2022
@mz-pdm mz-pdm deleted the certificates-all branch May 24, 2022 11:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants