-
Notifications
You must be signed in to change notification settings - Fork 24.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
QA: Switch xpack rolling upgrades to three nodes #31112
Conversation
This is much more realistic and can find more issues. This causes the "mixed cluster" tests to be run twice so I had to fix the tests to work in that case. In most cases I did as little as possible to get them working but in a few cases I went a little beyond that to make them easier for me to debug while getting them to work. My test changes: 1. Remove the "basic indexing" tests and replace them with a copy of the tests used in the OSS. We have no way of sharing code between these two projects so for now I copy. 2. Skip the a few tests in the "one third" upgraded scenario: * creating a scroll to be reused when the cluster is fully upgraded * creating some ml data to be used when the cluster is fully ugpraded 3. Drop many "assert yellow and that the cluster has two nodes" assertions. These assertions duplicate those made by the wait condition and they fail now that we have three nodes. 4. Switch many "assert green and that the cluster has two nodes" to 3 nodes. These assertions are unique from the wait condition and, while I imagine they aren't required in all cases, now is not the time to find that out. Thus, I made them work. 5. Rework the index audit trail test so it is more obvious that it is the same test expecting different numbers based on the shape of the cluster. The conditions for which number are expected are fairly complex because the index audit trail is shut down until the template for it is upgraded and the template is upgraded when a master node is elected that has the new version of the software. 6. Add some more information to debug the index audit trail test because it helped me figure out what was going on. I also dropped the `waitCondition` from the `rolling-upgrade-basic` tests because it wasn't needed. Closes elastic#25336
Pinging @elastic/es-core-infra |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. The index audit trail functionality not being available during a rolling restart does not sound great. Looking at the code, it looks like it's silently dropping events on the floor. Should we raise a separate issue for this?
Yeah. #31139. |
There are half a dozen conflicts on the backport cherry pick so I'm marking this |
This is much more realistic and can find more issues. This causes the "mixed cluster" tests to be run twice so I had to fix the tests to work in that case. In most cases I did as little as possible to get them working but in a few cases I went a little beyond that to make them easier for me to debug while getting them to work. My test changes: 1. Remove the "basic indexing" tests and replace them with a copy of the tests used in the OSS. We have no way of sharing code between these two projects so for now I copy. 2. Skip the a few tests in the "one third" upgraded scenario: * creating a scroll to be reused when the cluster is fully upgraded * creating some ml data to be used when the cluster is fully ugpraded 3. Drop many "assert yellow and that the cluster has two nodes" assertions. These assertions duplicate those made by the wait condition and they fail now that we have three nodes. 4. Switch many "assert green and that the cluster has two nodes" to 3 nodes. These assertions are unique from the wait condition and, while I imagine they aren't required in all cases, now is not the time to find that out. Thus, I made them work. 5. Rework the index audit trail test so it is more obvious that it is the same test expecting different numbers based on the shape of the cluster. The conditions for which number are expected are fairly complex because the index audit trail is shut down until the template for it is upgraded and the template is upgraded when a master node is elected that has the new version of the software. 6. Add some more information to debug the index audit trail test because it helped me figure out what was going on. I also dropped the `waitCondition` from the `rolling-upgrade-basic` tests because it wasn't needed. Closes #25336
OK! So I was able to backport to 6.x without much trouble. After I fixed the conflicts everything ran fine so I pushed. 6.3 is not running fine. And this is starting to fail on the master branch in CI but I'm not sure why yet. |
One ray of sunshine, sort of: it looks like the tests were failing in 6.x just like in 6.3 but I didn't catch it locally. That removes one mystery. I'm skipping those tests in 6.x for now while I figure out what is up with them. So the state of this backport now is:
|
The backport to 6.3 was clean! But new and interesting tests fail! |
This is much more realistic and can find more issues. This causes the "mixed cluster" tests to be run twice so I had to fix the tests to work in that case. In most cases I did as little as possible to get them working but in a few cases I went a little beyond that to make them easier for me to debug while getting them to work. My test changes: 1. Remove the "basic indexing" tests and replace them with a copy of the tests used in the OSS. We have no way of sharing code between these two projects so for now I copy. 2. Skip the a few tests in the "one third" upgraded scenario: * creating a scroll to be reused when the cluster is fully upgraded * creating some ml data to be used when the cluster is fully ugpraded 3. Drop many "assert yellow and that the cluster has two nodes" assertions. These assertions duplicate those made by the wait condition and they fail now that we have three nodes. 4. Switch many "assert green and that the cluster has two nodes" to 3 nodes. These assertions are unique from the wait condition and, while I imagine they aren't required in all cases, now is not the time to find that out. Thus, I made them work. 5. Rework the index audit trail test so it is more obvious that it is the same test expecting different numbers based on the shape of the cluster. The conditions for which number are expected are fairly complex because the index audit trail is shut down until the template for it is upgraded and the template is upgraded when a master node is elected that has the new version of the software. 6. Add some more information to debug the index audit trail test because it helped me figure out what was going on. I also dropped the `waitCondition` from the `rolling-upgrade-basic` tests because it wasn't needed. Closes #25336
Backported. |
This is much more realistic and can find more issues. This causes the
"mixed cluster" tests to be run twice so I had to fix the tests to work
in that case. In most cases I did as little as possible to get them
working but in a few cases I went a little beyond that to make them
easier for me to debug while getting them to work. My test changes:
tests used in the OSS. We have no way of sharing code between these two
projects so for now I copy.
assertions. These assertions duplicate those made by the wait condition
and they fail now that we have three nodes.
nodes. These assertions are unique from the wait condition and, while
I imagine they aren't required in all cases, now is not the time to
find that out. Thus, I made them work.
the same test expecting different numbers based on the shape of the
cluster. The conditions for which number are expected are fairly
complex because the index audit trail is shut down until the template
for it is upgraded and the template is upgraded when a master node is
elected that has the new version of the software.
it helped me figure out what was going on.
Closes #25336