Change the selection of downgrade version in TestUpgradeBrokenPackageVersion #3458

pchila · 2023-09-21T16:03:04Z

What does this PR do?

Fixes selection of the elastic agent version to downgrade to in the test so that we don't try to upgrade the agent on its own version

Why is it important?

The test outcome today depends on the order of versions returned by Artifact API making it flaky when there are some changes.

Checklist

My code follows the style guidelines of this project
~~[ ] I have commented my code, particularly in hard-to-understand areas~~
~~[ ] I have made corresponding changes to the documentation~~
~~[ ] I have made corresponding change to the default configuration files~~
~~[ ] I have added tests that prove my fix is effective or that my feature works~~
~~[ ] I have added an entry in ./changelog/fragments using the changelog tool~~
~~[ ] I have added an integration test or an E2E test~~

Author's Checklist

[ ]

How to test this PR locally

Related issues

Closes Flaky test: TestUpgradeBrokenPackageVersion #3454

Use cases

Screenshots

Logs

Questions to ask yourself

How are we going to support this in production?
How are we going to measure its adoption?
How are we going to debug this?
What are the metrics I should take care of?
...

mergify · 2023-09-21T16:03:59Z

This pull request does not have a backport label. Could you fix it @pchila? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

backport-v./d./d./d is the label to automatically backport to the 8./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

elasticmachine · 2023-09-21T16:31:59Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Start Time: 2023-09-26T13:03:44.985+0000
Duration: 26 min 41 sec

Test stats 🧪

Test	Results
Failed	0
Passed	6313
Skipped	59
Total	6372

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

/test : Re-trigger the build.
/package : Generate the packages.
run integration tests : Run the Elastic Agent Integration tests.
run end-to-end tests : Generate the packages and run the E2E Tests.
run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

elasticmachine · 2023-09-21T16:32:08Z

🌐 Coverage report

Name	Metrics % (`covered/total`)	Diff
Packages	98.78% (`81/82`)	👍
Files	66.102% (`195/295`)	👍
Classes	65.693% (`360/548`)	👍
Methods	52.744% (`1134/2150`)	👍
Lines	38.196% (`12876/33710`)	👎 -0.009
Conditionals	100.0% (`0/0`)	💚

elasticmachine · 2023-09-22T07:28:56Z

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

AndersonQ · 2023-09-22T12:12:51Z

testing/integration/upgrade_test.go

+	// transform and reverse the version list and find the most recent version that is different from the broken version returned
+
+	// look for 1 version with the current major that is lower than the current one


I'm guessing it's 1 thing? AM I right?

Suggested change

// transform and reverse the version list and find the most recent version that is different from the broken version returned

// look for 1 version with the current major that is lower than the current one

// transform and reverse the version list and find the most recent version that is different from the broken version returned

// look for 1 version with the current major that is lower than the current one

blakerouse

This test overall has a problem. Being that this test is downgrading the installed build to a snapshot build, means that when 'elastic-agent uninstall' is ran at the end of the test will be the snapshots uninstall.

The snapshot uninstall might not have the required code to kill the watcher, which then means the watcher is still running and will break a following test.

pchila · 2023-09-22T14:57:14Z

@blakerouse since we start from the latest snapshot for test, we don't have that many choices about versions to move to.
The purpose of the test is to ensure that an agent install that has a broken/missing package version file can still be upgraded as requested when the feature was implemented
If we cannot downgrade here, what do you suggest? Remove the test entirely? Skip it for a few release cycles?
If we don't merge this PR we still have the test just in the flaky version which is probably a worse outcome

michalpristas · 2023-09-25T12:19:38Z

@blakerouse would running it in isolation address your concerns? for the time being. once we have previous major with proper watcher capabilities we would go back to normal runs

blakerouse · 2023-09-25T13:51:59Z

@blakerouse since we start from the latest snapshot for test, we don't have that many choices about versions to move to. The purpose of the test is to ensure that an agent install that has a broken/missing package version file can still be upgraded as requested when the feature was implemented If we cannot downgrade here, what do you suggest? Remove the test entirely? Skip it for a few release cycles? If we don't merge this PR we still have the test just in the flaky version which is probably a worse outcome

@pchila We have an overall problem with all the upgrade tests in the integration suite. They are either not testing the actual agent under test (#3461) or the ending Elastic Agent is an older version of Elastic Agent that might not have required fixes for the testing suite to pass (in this case missing watcher kill).

@blakerouse would running it in isolation address your concerns? for the time being. once we have previous major with proper watcher capabilities we would go back to normal runs

@michalpristas I wish that would work, but uninstall is a core part of the testing framework. Running this test in isolation will still result in it failing on Windows, because it will not be able to kill the watcher.

Only possible solution is to wait until the watcher is completely done running, before performing the uninstall. I think that is the only valid solution. This is the only case that works, that is still assuming that we don't hit an issue with RemovePath on the older version of the Elastic Agent.

cmacknz · 2023-09-25T14:34:43Z

@pchila We have an overall problem with all the upgrade tests in the integration suite. They are either not testing the actual agent under test (#3461) or the ending Elastic Agent is an older version of Elastic Agent that might not have required fixes for the testing suite to pass (in this case missing watcher kill).

Fundamentally to solve this we need the ability to upgrade to the version of the agent that was built in the PR. We need #2780.

We need to move tests between previously released versions and official snapshots onto a schedule because the code in PRs doesn't affect them at all.

Only possible solution is to wait until the watcher is completely done running, before performing the uninstall. I think that is the only valid solution. This is the only case that works, that is still assuming that we don't hit an issue with RemovePath on the older version of the Elastic Agent.

Agreed I think this is the only way to deal with this right now, besides skipping the test or figuring out how to unit test this instead of integration testing it.

pchila · 2023-09-26T06:35:28Z

Fundamentally to solve this we need the ability to upgrade to the version of the agent that was built in the PR. We need #2780.

For standalone we mostly do, except for versions of agent where we cannot specify a file:// source-uri and a skip-verify flag. For Fleet managed tests I would expect the same but I didn't actually verify that we do use the packaged version of agent something can be improved there.

For the scope of this PR, do we want to include all of those changes? Or do we create a separate issues/PRs for those ?

pchila · 2023-09-26T07:43:20Z

@blakerouse @cmacknz Adding the wait means partially revert commit 7e86d24 where the cleanup functions have been removed.
Would that be acceptable for the moment in this PR ?

…ersion

…ackageVersion

elastic-sonarqube · 2023-09-26T13:14:49Z

SonarQube Quality Gate

0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells

No Coverage information
0.0% Duplication

pchila · 2023-09-26T14:03:41Z

@blakerouse could you give another review pass now that we have reintroduced the wait for watcher on the upgrade tests ?

pchila · 2023-09-26T14:22:59Z

@cmacknz

We need to move tests between previously released versions and official snapshots onto a schedule because the code in PRs doesn't affect them at all.

#3475

blakerouse · 2023-09-26T17:50:58Z

@pchila See my PR #3477

I believe I have upgrade testing working correctly, even this one.

pchila · 2023-09-28T12:48:49Z

This is replaced by #3477

pchila added Team:Elastic-Agent Label for the Agent team flaky-test Unstable or unreliable test cases. skip-changelog Testing labels Sep 21, 2023

mergify bot assigned pchila Sep 21, 2023

mergify bot added the backport-skip label Sep 21, 2023

pchila marked this pull request as ready for review September 22, 2023 07:28

pchila requested a review from a team as a code owner September 22, 2023 07:28

pchila requested review from AndersonQ and michalpristas September 22, 2023 07:28

pchila added backport-v8.10.0 Automated backport with mergify and removed backport-skip labels Sep 22, 2023

AndersonQ approved these changes Sep 22, 2023

View reviewed changes

blakerouse requested changes Sep 22, 2023

View reviewed changes

pchila added 3 commits September 26, 2023 10:34

Change the selection of dongrade version in TestUpgradeBrokenPackageV…

9f3559d

…ersion

fixup! Change the selection of dongrade version in TestUpgradeBrokenP…

b2994d1

…ackageVersion

restore watcher wait for upgrade integration tests

e3654cb

pchila force-pushed the fix-TestUpgradeBrokenPackage branch from fe6c51f to e3654cb Compare September 26, 2023 13:03

pchila mentioned this pull request Sep 26, 2023

[Meta] Flaky integration tests #3357

Closed

pchila requested a review from blakerouse September 26, 2023 14:02

pchila mentioned this pull request Sep 27, 2023

Refactor and fix all upgrade integration tests #3477

Merged

4 tasks

pchila closed this Sep 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change the selection of downgrade version in TestUpgradeBrokenPackageVersion #3458

Change the selection of downgrade version in TestUpgradeBrokenPackageVersion #3458

pchila commented Sep 21, 2023 •

edited

Loading

mergify bot commented Sep 21, 2023

elasticmachine commented Sep 21, 2023 •

edited

Loading

Build stats

Test stats 🧪

elasticmachine commented Sep 21, 2023 •

edited

Loading

elasticmachine commented Sep 22, 2023

AndersonQ Sep 22, 2023

blakerouse left a comment

pchila commented Sep 22, 2023

michalpristas commented Sep 25, 2023 •

edited

Loading

blakerouse commented Sep 25, 2023

cmacknz commented Sep 25, 2023

pchila commented Sep 26, 2023

pchila commented Sep 26, 2023

elastic-sonarqube bot commented Sep 26, 2023

pchila commented Sep 26, 2023

pchila commented Sep 26, 2023

blakerouse commented Sep 26, 2023

pchila commented Sep 28, 2023

		// transform and reverse the version list and find the most recent version that is different from the broken version returned

		// look for 1 version with the current major that is lower than the current one

Change the selection of downgrade version in TestUpgradeBrokenPackageVersion #3458

Change the selection of downgrade version in TestUpgradeBrokenPackageVersion #3458

Conversation

pchila commented Sep 21, 2023 • edited Loading

What does this PR do?

Why is it important?

Checklist

Author's Checklist

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

Questions to ask yourself

mergify bot commented Sep 21, 2023

elasticmachine commented Sep 21, 2023 • edited Loading

💚 Build Succeeded

Build stats

Test stats 🧪

💚 Flaky test report

🤖 GitHub comments

elasticmachine commented Sep 21, 2023 • edited Loading

🌐 Coverage report

elasticmachine commented Sep 22, 2023

AndersonQ Sep 22, 2023

Choose a reason for hiding this comment

blakerouse left a comment

Choose a reason for hiding this comment

pchila commented Sep 22, 2023

michalpristas commented Sep 25, 2023 • edited Loading

blakerouse commented Sep 25, 2023

cmacknz commented Sep 25, 2023

pchila commented Sep 26, 2023

pchila commented Sep 26, 2023

elastic-sonarqube bot commented Sep 26, 2023

pchila commented Sep 26, 2023

pchila commented Sep 26, 2023

blakerouse commented Sep 26, 2023

pchila commented Sep 28, 2023

pchila commented Sep 21, 2023 •

edited

Loading

elasticmachine commented Sep 21, 2023 •

edited

Loading

elasticmachine commented Sep 21, 2023 •

edited

Loading

michalpristas commented Sep 25, 2023 •

edited

Loading