Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[discuss] Support for rolling upgrades #41795

Closed
tylersmalley opened this issue Jul 23, 2019 · 6 comments
Closed

[discuss] Support for rolling upgrades #41795

tylersmalley opened this issue Jul 23, 2019 · 6 comments
Labels
discuss Team:Operations Team label for Operations Team

Comments

@tylersmalley
Copy link
Contributor

Much like with Elasticsearch, users should be able to provide a rolling upgrade for Kibana. This will be limited due to not having a true cluster-state, but we should be able to greatly improve the experience.

When Kibana is upgrading, we perform any pending migrations as described here. This creates a new index, and upon completion of the migrations points the .kibana alias to the new index. At this point, it's important that the previous versions of Kibana not make any mutations to this index.

Here is the change I am proposing:

On startup, the Kibana server will grab the underlying index of the alias and read/write to it as opposed to the alias directly. The first part of the migration process is to put this index into a read-only state, preventing writes. In the UI, we can pull and notify for two possible scenarios, if the current index does not match that of the underlying alias, or if the index is read-only. This allows for a newer version of Kibana to be stood up, while the existing instance was still functional in a read-only state. We can extend the health check to account for these additional checks to assist with automation on a load balancer.

Down the road, when we have a cluster state, it should be possible to re-route requests to the most recent version of Kibana.

@tylersmalley tylersmalley added discuss Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Team:Operations Team label for Operations Team labels Jul 23, 2019
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-platform

@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-operations

@tylersmalley tylersmalley removed the Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc label Jul 23, 2019
@epixa
Copy link
Contributor

epixa commented Jul 23, 2019

Security permissions are another critical area that would need to be addressed to support this. Today, when Kibana starts up it pushes all the necessary privileges into Elasticsearch in order to support that exact version of Kibana, which can cause two problems in a rolling upgrade scenario:

  1. An older instance might behave unexpectedly if its underlying permission model is wiped out.
  2. An older instance restarting after a newer instance is brought up will override the newer instance's permission model.

cc @elastic/kibana-security

@kobelb
Copy link
Contributor

kobelb commented Jul 24, 2019

The way it works presently is the last version of Kibana to start-up wins, and essentially locks out all other versions from being able to authorize users. Are we only concerned with supporting "rolling upgrades" or should we concerned with supporting potential roll-backs as well?

@LeeDr
Copy link

LeeDr commented Dec 3, 2019

When the new version of Kibana starts up, it should go through the migration process (writing directly to the new index and not changing the alias yet).

If something fails, abort, log detailed message. The existing version keeps running.

If the migration succeeds, it compares the timestamp of the most recent change to that index (this might be something new we need?) to the timestamp when the migration started.

  • If there were no changes to the old index since the migration started, we know we can swap the alias to point to the new migrated index and shut down the old Kibana version (and change the proxy redirect on Cloud)

  • if there were changes to the old index since the migration started, the new version should log a message about how long the migration took, and try again. After some number of attempts, if new writes keep happening on the old index, at some point we (a manual administrator action maybe?) need to switch it to read-only.

@rudolf
Copy link
Contributor

rudolf commented Dec 14, 2020

From #52202:

Note: Rolling upgrades introduce significant complexity for plugins and risk of bugs. We assume that as long as the downtime window is predictable, downtime as such is not a problem for our users. Since this allows us to have a dramatically simpler system we won't aim to implement rolling upgrades unless this assumption is proven wrong.

@rudolf rudolf closed this as completed Dec 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Team:Operations Team label for Operations Team
Projects
None yet
Development

No branches or pull requests

6 participants