Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QA: Switch xpack rolling upgrades to three nodes #31112

Merged
merged 3 commits into from
Jun 6, 2018

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented Jun 5, 2018

This is much more realistic and can find more issues. This causes the
"mixed cluster" tests to be run twice so I had to fix the tests to work
in that case. In most cases I did as little as possible to get them
working but in a few cases I went a little beyond that to make them
easier for me to debug while getting them to work. My test changes:

  1. Remove the "basic indexing" tests and replace them with a copy of the
    tests used in the OSS. We have no way of sharing code between these two
    projects so for now I copy.
  2. Skip the a few tests in the "one third" upgraded scenario:
  • creating a scroll to be reused when the cluster is fully upgraded
  • creating some ml data to be used when the cluster is fully ugpraded
  1. Drop many "assert yellow and that the cluster has two nodes"
    assertions. These assertions duplicate those made by the wait condition
    and they fail now that we have three nodes.
  2. Switch many "assert green and that the cluster has two nodes" to 3
    nodes. These assertions are unique from the wait condition and, while
    I imagine they aren't required in all cases, now is not the time to
    find that out. Thus, I made them work.
  3. Rework the index audit trail test so it is more obvious that it is
    the same test expecting different numbers based on the shape of the
    cluster. The conditions for which number are expected are fairly
    complex because the index audit trail is shut down until the template
    for it is upgraded and the template is upgraded when a master node is
    elected that has the new version of the software.
  4. Add some more information to debug the index audit trail test because
    it helped me figure out what was going on.

Closes #25336

This is much more realistic and can find more issues. This causes the
"mixed cluster" tests to be run twice so I had to fix the tests to work
in that case. In most cases I did as little as possible to get them
working but in a few cases I went a little beyond that to make them
easier for me to debug while getting them to work. My test changes:

1. Remove the "basic indexing" tests and replace them with a copy of the
tests used in the OSS. We have no way of sharing code between these two
projects so for now I copy.
2. Skip the a few tests in the "one third" upgraded scenario:
  * creating a scroll to be reused when the cluster is fully upgraded
  * creating some ml data to be used when the cluster is fully ugpraded
3. Drop many "assert yellow and that the cluster has two nodes"
assertions. These assertions duplicate those made by the wait condition
and they fail now that we have three nodes.
4. Switch many "assert green and that the cluster has two nodes" to 3
nodes. These assertions are unique from the wait condition and, while
I imagine they aren't required in all cases, now is not the time to
find that out. Thus, I made them work.
5. Rework the index audit trail test so it is more obvious that it is
the same test expecting different numbers based on the shape of the
cluster. The conditions for which number are expected are fairly
complex because the index audit trail is shut down until the template
for it is upgraded and the template is upgraded when a master node is
elected that has the new version of the software.
6. Add some more information to debug the index audit trail test because
it helped me figure out what was going on.

I also dropped the `waitCondition` from the `rolling-upgrade-basic`
tests because it wasn't needed.

Closes elastic#25336
@nik9000 nik9000 added >test Issues or PRs that are addressing/adding tests blocker review :Core/Infra/Core Core issues without another label v7.0.0 v6.3.0 v6.4.0 labels Jun 5, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The index audit trail functionality not being available during a rolling restart does not sound great. Looking at the code, it looks like it's silently dropping events on the floor. Should we raise a separate issue for this?

@nik9000
Copy link
Member Author

nik9000 commented Jun 6, 2018

The index audit trail functionality not being available during a rolling restart does not sound great. Looking at the code, it looks like it's silently dropping events on the floor. Should we raise a separate issue for this?

Yeah. #31139.

@nik9000 nik9000 merged commit 7c59e76 into elastic:master Jun 6, 2018
@nik9000
Copy link
Member Author

nik9000 commented Jun 6, 2018

There are half a dozen conflicts on the backport cherry pick so I'm marking this backport pending because I expect it'll take some time.

nik9000 added a commit that referenced this pull request Jun 6, 2018
This is much more realistic and can find more issues. This causes the
"mixed cluster" tests to be run twice so I had to fix the tests to work
in that case. In most cases I did as little as possible to get them
working but in a few cases I went a little beyond that to make them
easier for me to debug while getting them to work. My test changes:

1. Remove the "basic indexing" tests and replace them with a copy of the
tests used in the OSS. We have no way of sharing code between these two
projects so for now I copy.
2. Skip the a few tests in the "one third" upgraded scenario:
  * creating a scroll to be reused when the cluster is fully upgraded
  * creating some ml data to be used when the cluster is fully ugpraded
3. Drop many "assert yellow and that the cluster has two nodes"
assertions. These assertions duplicate those made by the wait condition
and they fail now that we have three nodes.
4. Switch many "assert green and that the cluster has two nodes" to 3
nodes. These assertions are unique from the wait condition and, while
I imagine they aren't required in all cases, now is not the time to
find that out. Thus, I made them work.
5. Rework the index audit trail test so it is more obvious that it is
the same test expecting different numbers based on the shape of the
cluster. The conditions for which number are expected are fairly
complex because the index audit trail is shut down until the template
for it is upgraded and the template is upgraded when a master node is
elected that has the new version of the software.
6. Add some more information to debug the index audit trail test because
it helped me figure out what was going on.

I also dropped the `waitCondition` from the `rolling-upgrade-basic`
tests because it wasn't needed.

Closes #25336
@nik9000
Copy link
Member Author

nik9000 commented Jun 7, 2018

OK! So I was able to backport to 6.x without much trouble. After I fixed the conflicts everything ran fine so I pushed. 6.3 is not running fine. And this is starting to fail on the master branch in CI but I'm not sure why yet.

@nik9000
Copy link
Member Author

nik9000 commented Jun 7, 2018

One ray of sunshine, sort of: it looks like the tests were failing in 6.x just like in 6.3 but I didn't catch it locally. That removes one mystery.

I'm skipping those tests in 6.x for now while I figure out what is up with them.

So the state of this backport now is:

  • Merged to 6.x but with some tests disabled fixed
  • Not merged to 6.3 at all

@nik9000
Copy link
Member Author

nik9000 commented Jun 7, 2018

The backport to 6.3 was clean! But new and interesting tests fail!

nik9000 added a commit that referenced this pull request Jun 8, 2018
This is much more realistic and can find more issues. This causes the
"mixed cluster" tests to be run twice so I had to fix the tests to work
in that case. In most cases I did as little as possible to get them
working but in a few cases I went a little beyond that to make them
easier for me to debug while getting them to work. My test changes:

1. Remove the "basic indexing" tests and replace them with a copy of the
tests used in the OSS. We have no way of sharing code between these two
projects so for now I copy.
2. Skip the a few tests in the "one third" upgraded scenario:
  * creating a scroll to be reused when the cluster is fully upgraded
  * creating some ml data to be used when the cluster is fully ugpraded
3. Drop many "assert yellow and that the cluster has two nodes"
assertions. These assertions duplicate those made by the wait condition
and they fail now that we have three nodes.
4. Switch many "assert green and that the cluster has two nodes" to 3
nodes. These assertions are unique from the wait condition and, while
I imagine they aren't required in all cases, now is not the time to
find that out. Thus, I made them work.
5. Rework the index audit trail test so it is more obvious that it is
the same test expecting different numbers based on the shape of the
cluster. The conditions for which number are expected are fairly
complex because the index audit trail is shut down until the template
for it is upgraded and the template is upgraded when a master node is
elected that has the new version of the software.
6. Add some more information to debug the index audit trail test because
it helped me figure out what was going on.

I also dropped the `waitCondition` from the `rolling-upgrade-basic`
tests because it wasn't needed.

Closes #25336
@nik9000
Copy link
Member Author

nik9000 commented Jun 8, 2018

Backported.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker :Core/Infra/Core Core issues without another label >test Issues or PRs that are addressing/adding tests v6.3.0 v6.4.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rolling upgrade tests with 3 nodes
4 participants