Skip to content
This repository has been archived by the owner on Apr 4, 2023. It is now read-only.

Stalled ElasticSearch Upgrade #368

Open
cehoffman opened this issue Jul 2, 2018 · 0 comments
Open

Stalled ElasticSearch Upgrade #368

cehoffman opened this issue Jul 2, 2018 · 0 comments
Labels

Comments

@cehoffman
Copy link

cehoffman commented Jul 2, 2018

Is this a BUG REPORT or FEATURE REQUEST?:

/kind bug

What happened:

Upgrade from 6.2.4 to 6.3.0 elasticsearch stalled with last two data and ingest pods unupgraded. The 3 masters upgraded then pods 4, 3, and 2 of the data and ingest upgraded. 1 and 0 did not upgrade and the UpdateVersion loop in navigator controller stopped.

What you expected to happen:

All pods upgraded.

How to reproduce it (as minimally and precisely as possible):

Create a 5 member data and ingest pool and a 3 member master pool at 6.2.4 with 0.1.0 navigator.

Anything else we need to know?:

It appears there was a mixup in pilot updating the version of elasticsearch. See https://gist.github.com/34927d24d0056967aba99c2f5a29ba7e

The d-0 an d-1 pilots indicates they are running the 6.3.0 elasticsearch but they never changed images. Seconds prior to this gist capture (while the upgrade was in the stalled state) pilots d-0 an d-3 indicated they had version 6.2.4. d-0 would be correct, but d-3 was using 6.3.0 image.

It appears there is misalignment in updating or detecting the elasticsearch version of the pilot record.

The events in the describe summary for the cluster are:

Events:
  Type     Reason            Age                From                  Message
  ----     ------            ----               ----                  -------
  Normal   UpdateVersion     40m (x2 over 1h)   navigator-controller  Updating replica es-logging-master-1 to version 6.3.0
  Warning  ErrUpdateVersion  37m (x8 over 37m)  navigator-controller  Pilot "es-logging-master-1" has not finished updating to version "6.3.0"
  Normal   UpdateVersion     31m (x3 over 1h)   navigator-controller  Updating replica es-logging-master-2 to version 6.3.0
  Normal   UpdateVersion     29m                navigator-controller  Updated node pool "master" to version "6.3.0"
  Normal   UpdateVersion     24m                navigator-controller  Updating replica es-logging-d-2 to version 6.3.0

There are a number of failures on the master upgrade because using ES_JAVA_OPTS doesn't work as an override in the container image with 0.1.0 pilot. I had to change to setting the min/max heap in the jvm.options file.

Environment:

  • Kubernetes version (use kubectl version): 1.9.6
  • Cloud provider or hardware configuration**: Azure
  • Install tools: Helm
  • Others:
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants