-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding upgrade test for unhealthy upgraded Agent #3274
Adding upgrade test for unhealthy upgraded Agent #3274
Conversation
This pull request does not have a backport label. Could you fix it @ycombinator? 🙏
NOTE: |
🌐 Coverage report
|
4c9a949
to
f386baf
Compare
Pinging @elastic/elastic-agent (Team:Elastic-Agent) |
a72ba62
to
d4350a3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a couple of comments but overall it seems fine as it is a variation on other upgrade tests we already have and run...
// input when the Agent version matches the upgraded Agent version. This way | ||
// the pre-upgrade version of the Agent runs healthy, but the post-upgrade | ||
// version doesn't. | ||
invalidInputPolicy := fastWatcherCfg + fmt.Sprintf(` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we guarantee that the fastWatcherCfg is supported? is it even needed here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't guarantee it but the settings in fastWatcherCfg
should just be ignored by any versions of Agent that don't support it. Also, checkUpgradeWatcherRan
will just skip checking if the upgrade watcher finished running for Agents that don't support the fastWatcherCfg
settings:
elastic-agent/testing/integration/upgrade_test.go
Lines 558 to 565 in 22c276d
func checkUpgradeWatcherRan(t *testing.T, agentFixture *atesting.Fixture, fromVersion *version.ParsedSemVer) { | |
t.Helper() | |
if fromVersion.Less(*version_8_9_0_SNAPSHOT) { | |
t.Logf("Version %q is too old for a quick update marker check, skipping...", fromVersion) | |
return | |
} | |
So I think we are okay here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the checkUpgradeWatcherRan skips here, doesn't that make the test run pointless?
If I understood correctly, the point of the test is to verify that the watcher rollback the upgrade because of the unhealthy state of the new agent (please correct me if this is not the correct assumption). In such a case wouldn't the upgrade watcher terminate before the full 10 mins default grace period because of the amount of errors ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently on the main
branch, the fromVersion
is 8.11.0
. Which means checkUpgradeWatcherRan
won't be skipped. If we backport this test to 8.10
and even 8.9
, checkUpgradeWatcherRan
won't be skipped.
If we were to backport this test to versions before 8.9
, then checkUpgradeWatcherRan
would be skipped and the rest of the test will fail because the assertion for checking if Agent has rolled back to the pre-upgrade version will timeout. But we don't plan to backport this test to any versions before 8.9
so I think we're good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On an unrelated note, I just noticed that checkUpgradeWatcherRan
skips if the fromVersion
is < 8.9.0
. That made sense when Agent was spawning the Upgrade Watcher from the old (pre-upgrade) Agent's binary. However, starting from 8.10.0
onwards, Agent will spawn the Upgrade Watcher from the new (post-upgrade) Agent's binary. So I think we need an additional check here to skip if the fromVersion
is >= 8.10.0 && toVersion
< 8.9.0. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the check is correct as it refers to the earliest version that understand settings for a shorter watch period... The fact that the watcher is spawned from the old or the new shouldn't matter as the outcome in the case of an unhealthy upgraded agent should stay the same...
It actually provides an interesting additional test: that even if the upgraded agent process itself is unhealthy, the agent process stays alive and take the correct action ;)
c33192b
to
789f69b
Compare
789f69b
to
fcb09a4
Compare
fcb09a4
to
9aeade4
Compare
This test is failing on Ubuntu arm64 but not on Ubuntu amd64. The failure is bizzare too, in that it happens in the very first step of the test, where we try to install the Agent fixture and check that it's healthy, and the failure is evidently due to
If I run just just this test locally on an ARM64 Ubuntu machine, it passes. I'm going to re-run the Buildkite build and see if the test fails again in the same way or in a different way, to perhaps give us some ideas into what might be going on. |
@pazone any reason why in this sonar notification #3274 (comment) we don't get any coverage metric? |
This time |
buildkite test this |
Re-running caused This PR here is, therefore, blocked on #3371 for the moment. |
dbfc886
to
3df1098
Compare
SonarQube Quality Gate 0 Bugs No Coverage information |
* [WIP] Adding upgrade test for unhealthy upgraded Agent * Fixing up logging * WIP: fixing some issues with fake package creation * Use fast watcher config * [WIP] Different approach to making upgraded Agent unhealthy * WIP: Adding TODO * Flesh out rest of the test * Adding logging * Make invalid input try to run in upgraded Agent version * Removing OS restriction * Tuning status check frequency * Use artifacts API * Clarify comment (cherry picked from commit 5531ad3)
* [WIP] Adding upgrade test for unhealthy upgraded Agent * Fixing up logging * WIP: fixing some issues with fake package creation * Use fast watcher config * [WIP] Different approach to making upgraded Agent unhealthy * WIP: Adding TODO * Flesh out rest of the test * Adding logging * Make invalid input try to run in upgraded Agent version * Removing OS restriction * Tuning status check frequency * Use artifacts API * Clarify comment (cherry picked from commit 5531ad3)
* [WIP] Adding upgrade test for unhealthy upgraded Agent * Fixing up logging * WIP: fixing some issues with fake package creation * Use fast watcher config * [WIP] Different approach to making upgraded Agent unhealthy * WIP: Adding TODO * Flesh out rest of the test * Adding logging * Make invalid input try to run in upgraded Agent version * Removing OS restriction * Tuning status check frequency * Use artifacts API * Clarify comment (cherry picked from commit 5531ad3)
* [WIP] Adding upgrade test for unhealthy upgraded Agent * Fixing up logging * WIP: fixing some issues with fake package creation * Use fast watcher config * [WIP] Different approach to making upgraded Agent unhealthy * WIP: Adding TODO * Flesh out rest of the test * Adding logging * Make invalid input try to run in upgraded Agent version * Removing OS restriction * Tuning status check frequency * Use artifacts API * Clarify comment (cherry picked from commit 5531ad3) Co-authored-by: Shaunak Kashyap <ycombinator@gmail.com>
What does this PR do?
This PR adds an integration test that ensures that an unhealthy Agent is rolled back.
Why is it important?
To ensure upgrade rollbacks work as expected.
Checklist
I have made corresponding changes to the documentationI have made corresponding change to the default configuration filesI have added tests that prove my fix is effective or that my feature worksI have added an entry in./changelog/fragments
using the changelog toolHow to test this PR locally
Check out this PR's branch and then:
Related issues