QA: Switch xpack rolling upgrades to three nodes #31112

nik9000 · 2018-06-05T18:47:48Z

This is much more realistic and can find more issues. This causes the
"mixed cluster" tests to be run twice so I had to fix the tests to work
in that case. In most cases I did as little as possible to get them
working but in a few cases I went a little beyond that to make them
easier for me to debug while getting them to work. My test changes:

Remove the "basic indexing" tests and replace them with a copy of the
tests used in the OSS. We have no way of sharing code between these two
projects so for now I copy.
Skip the a few tests in the "one third" upgraded scenario:

creating a scroll to be reused when the cluster is fully upgraded
creating some ml data to be used when the cluster is fully ugpraded

Drop many "assert yellow and that the cluster has two nodes"
assertions. These assertions duplicate those made by the wait condition
and they fail now that we have three nodes.
Switch many "assert green and that the cluster has two nodes" to 3
nodes. These assertions are unique from the wait condition and, while
I imagine they aren't required in all cases, now is not the time to
find that out. Thus, I made them work.
Rework the index audit trail test so it is more obvious that it is
the same test expecting different numbers based on the shape of the
cluster. The conditions for which number are expected are fairly
complex because the index audit trail is shut down until the template
for it is upgraded and the template is upgraded when a master node is
elected that has the new version of the software.
Add some more information to debug the index audit trail test because
it helped me figure out what was going on.

Closes #25336

This is much more realistic and can find more issues. This causes the "mixed cluster" tests to be run twice so I had to fix the tests to work in that case. In most cases I did as little as possible to get them working but in a few cases I went a little beyond that to make them easier for me to debug while getting them to work. My test changes: 1. Remove the "basic indexing" tests and replace them with a copy of the tests used in the OSS. We have no way of sharing code between these two projects so for now I copy. 2. Skip the a few tests in the "one third" upgraded scenario: * creating a scroll to be reused when the cluster is fully upgraded * creating some ml data to be used when the cluster is fully ugpraded 3. Drop many "assert yellow and that the cluster has two nodes" assertions. These assertions duplicate those made by the wait condition and they fail now that we have three nodes. 4. Switch many "assert green and that the cluster has two nodes" to 3 nodes. These assertions are unique from the wait condition and, while I imagine they aren't required in all cases, now is not the time to find that out. Thus, I made them work. 5. Rework the index audit trail test so it is more obvious that it is the same test expecting different numbers based on the shape of the cluster. The conditions for which number are expected are fairly complex because the index audit trail is shut down until the template for it is upgraded and the template is upgraded when a master node is elected that has the new version of the software. 6. Add some more information to debug the index audit trail test because it helped me figure out what was going on. I also dropped the `waitCondition` from the `rolling-upgrade-basic` tests because it wasn't needed. Closes elastic#25336

elasticmachine · 2018-06-05T18:47:49Z

Pinging @elastic/es-core-infra

ywelsch

LGTM. The index audit trail functionality not being available during a rolling restart does not sound great. Looking at the code, it looks like it's silently dropping events on the floor. Should we raise a separate issue for this?

nik9000 · 2018-06-06T13:41:48Z

The index audit trail functionality not being available during a rolling restart does not sound great. Looking at the code, it looks like it's silently dropping events on the floor. Should we raise a separate issue for this?

Yeah. #31139.

nik9000 · 2018-06-06T16:01:08Z

There are half a dozen conflicts on the backport cherry pick so I'm marking this backport pending because I expect it'll take some time.

This is much more realistic and can find more issues. This causes the "mixed cluster" tests to be run twice so I had to fix the tests to work in that case. In most cases I did as little as possible to get them working but in a few cases I went a little beyond that to make them easier for me to debug while getting them to work. My test changes: 1. Remove the "basic indexing" tests and replace them with a copy of the tests used in the OSS. We have no way of sharing code between these two projects so for now I copy. 2. Skip the a few tests in the "one third" upgraded scenario: * creating a scroll to be reused when the cluster is fully upgraded * creating some ml data to be used when the cluster is fully ugpraded 3. Drop many "assert yellow and that the cluster has two nodes" assertions. These assertions duplicate those made by the wait condition and they fail now that we have three nodes. 4. Switch many "assert green and that the cluster has two nodes" to 3 nodes. These assertions are unique from the wait condition and, while I imagine they aren't required in all cases, now is not the time to find that out. Thus, I made them work. 5. Rework the index audit trail test so it is more obvious that it is the same test expecting different numbers based on the shape of the cluster. The conditions for which number are expected are fairly complex because the index audit trail is shut down until the template for it is upgraded and the template is upgraded when a master node is elected that has the new version of the software. 6. Add some more information to debug the index audit trail test because it helped me figure out what was going on. I also dropped the `waitCondition` from the `rolling-upgrade-basic` tests because it wasn't needed. Closes #25336

nik9000 · 2018-06-07T13:55:07Z

OK! So I was able to backport to 6.x without much trouble. After I fixed the conflicts everything ran fine so I pushed. 6.3 is not running fine. And this is starting to fail on the master branch in CI but I'm not sure why yet.

nik9000 · 2018-06-07T14:40:46Z

One ray of sunshine, sort of: it looks like the tests were failing in 6.x just like in 6.3 but I didn't catch it locally. That removes one mystery.

I'm skipping those tests in 6.x for now while I figure out what is up with them.

So the state of this backport now is:

~~Merged to 6.x but with some tests disabled~~ fixed
Not merged to 6.3 at all

nik9000 · 2018-06-07T21:03:29Z

The backport to 6.3 was clean! But new and interesting tests fail!

This is much more realistic and can find more issues. This causes the "mixed cluster" tests to be run twice so I had to fix the tests to work in that case. In most cases I did as little as possible to get them working but in a few cases I went a little beyond that to make them easier for me to debug while getting them to work. My test changes: 1. Remove the "basic indexing" tests and replace them with a copy of the tests used in the OSS. We have no way of sharing code between these two projects so for now I copy. 2. Skip the a few tests in the "one third" upgraded scenario: * creating a scroll to be reused when the cluster is fully upgraded * creating some ml data to be used when the cluster is fully ugpraded 3. Drop many "assert yellow and that the cluster has two nodes" assertions. These assertions duplicate those made by the wait condition and they fail now that we have three nodes. 4. Switch many "assert green and that the cluster has two nodes" to 3 nodes. These assertions are unique from the wait condition and, while I imagine they aren't required in all cases, now is not the time to find that out. Thus, I made them work. 5. Rework the index audit trail test so it is more obvious that it is the same test expecting different numbers based on the shape of the cluster. The conditions for which number are expected are fairly complex because the index audit trail is shut down until the template for it is upgraded and the template is upgraded when a master node is elected that has the new version of the software. 6. Add some more information to debug the index audit trail test because it helped me figure out what was going on. I also dropped the `waitCondition` from the `rolling-upgrade-basic` tests because it wasn't needed. Closes #25336

nik9000 · 2018-06-08T16:02:30Z

Backported.

nik9000 added >test Issues or PRs that are addressing/adding tests blocker review :Core/Infra/Core Core issues without another label v7.0.0 v6.3.0 v6.4.0 labels Jun 5, 2018

nik9000 mentioned this pull request Jun 5, 2018

Enable rolling upgrades from default distribution prior to 6.3.0 to default distribution post 6.3.0 #30731

Closed

9 tasks

Explain copy and paste

5f4c90e

ywelsch approved these changes Jun 6, 2018

View reviewed changes

Merge branch 'master' into xpack_rolling_3

55d5784

nik9000 merged commit 7c59e76 into elastic:master Jun 6, 2018

nik9000 added the backport pending label Jun 6, 2018

nik9000 mentioned this pull request Jun 6, 2018

[CI] IndexAuditUpgradeIT.testDocsAuditedInMixedCluster #30562

Closed

nik9000 removed the backport pending label Jun 8, 2018

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QA: Switch xpack rolling upgrades to three nodes #31112

QA: Switch xpack rolling upgrades to three nodes #31112

nik9000 commented Jun 5, 2018

elasticmachine commented Jun 5, 2018

ywelsch left a comment

nik9000 commented Jun 6, 2018

nik9000 commented Jun 6, 2018

nik9000 commented Jun 7, 2018

nik9000 commented Jun 7, 2018 •

edited

Loading

nik9000 commented Jun 7, 2018

nik9000 commented Jun 8, 2018

QA: Switch xpack rolling upgrades to three nodes #31112

QA: Switch xpack rolling upgrades to three nodes #31112

Conversation

nik9000 commented Jun 5, 2018

elasticmachine commented Jun 5, 2018

ywelsch left a comment

Choose a reason for hiding this comment

nik9000 commented Jun 6, 2018

nik9000 commented Jun 6, 2018

nik9000 commented Jun 7, 2018

nik9000 commented Jun 7, 2018 • edited Loading

nik9000 commented Jun 7, 2018

nik9000 commented Jun 8, 2018

nik9000 commented Jun 7, 2018 •

edited

Loading