Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change the selection of downgrade version in TestUpgradeBrokenPackageVersion #3458

Closed
wants to merge 3 commits into from

Conversation

pchila
Copy link
Member

@pchila pchila commented Sep 21, 2023

What does this PR do?

Fixes selection of the elastic agent version to downgrade to in the test so that we don't try to upgrade the agent on its own version

Why is it important?

The test outcome today depends on the order of versions returned by Artifact API making it flaky when there are some changes.

Checklist

  • My code follows the style guidelines of this project
  • [ ] I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • [ ] I have added tests that prove my fix is effective or that my feature works
  • [ ] I have added an entry in ./changelog/fragments using the changelog tool
  • [ ] I have added an integration test or an E2E test

Author's Checklist

  • [ ]

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

Questions to ask yourself

  • How are we going to support this in production?
  • How are we going to measure its adoption?
  • How are we going to debug this?
  • What are the metrics I should take care of?
  • ...

@pchila pchila added Team:Elastic-Agent Label for the Agent team flaky-test Unstable or unreliable test cases. skip-changelog Testing labels Sep 21, 2023
@mergify mergify bot assigned pchila Sep 21, 2023
@mergify
Copy link
Contributor

mergify bot commented Sep 21, 2023

This pull request does not have a backport label. Could you fix it @pchila? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v./d./d./d is the label to automatically backport to the 8./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@elasticmachine
Copy link
Contributor

elasticmachine commented Sep 21, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-09-26T13:03:44.985+0000

  • Duration: 26 min 41 sec

Test stats 🧪

Test Results
Failed 0
Passed 6313
Skipped 59
Total 6372

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages.

  • run integration tests : Run the Elastic Agent Integration tests.

  • run end-to-end tests : Generate the packages and run the E2E Tests.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@elasticmachine
Copy link
Contributor

elasticmachine commented Sep 21, 2023

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 98.78% (81/82) 👍
Files 66.102% (195/295) 👍
Classes 65.693% (360/548) 👍
Methods 52.744% (1134/2150) 👍
Lines 38.196% (12876/33710) 👎 -0.009
Conditionals 100.0% (0/0) 💚

@pchila pchila marked this pull request as ready for review September 22, 2023 07:28
@pchila pchila requested a review from a team as a code owner September 22, 2023 07:28
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@pchila pchila added backport-v8.10.0 Automated backport with mergify and removed backport-skip labels Sep 22, 2023
Comment on lines +944 to +925
// transform and reverse the version list and find the most recent version that is different from the broken version returned

// look for 1 version with the current major that is lower than the current one
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing it's 1 thing? AM I right?

Suggested change
// transform and reverse the version list and find the most recent version that is different from the broken version returned
// look for 1 version with the current major that is lower than the current one
// transform and reverse the version list and find the most recent version that is different from the broken version returned
// look for 1 version with the current major that is lower than the current one

Copy link
Contributor

@blakerouse blakerouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test overall has a problem. Being that this test is downgrading the installed build to a snapshot build, means that when 'elastic-agent uninstall' is ran at the end of the test will be the snapshots uninstall.

The snapshot uninstall might not have the required code to kill the watcher, which then means the watcher is still running and will break a following test.

@pchila
Copy link
Member Author

pchila commented Sep 22, 2023

@blakerouse since we start from the latest snapshot for test, we don't have that many choices about versions to move to.
The purpose of the test is to ensure that an agent install that has a broken/missing package version file can still be upgraded as requested when the feature was implemented
If we cannot downgrade here, what do you suggest? Remove the test entirely? Skip it for a few release cycles?
If we don't merge this PR we still have the test just in the flaky version which is probably a worse outcome

@michalpristas
Copy link
Contributor

michalpristas commented Sep 25, 2023

@blakerouse would running it in isolation address your concerns? for the time being. once we have previous major with proper watcher capabilities we would go back to normal runs

@blakerouse
Copy link
Contributor

@blakerouse since we start from the latest snapshot for test, we don't have that many choices about versions to move to. The purpose of the test is to ensure that an agent install that has a broken/missing package version file can still be upgraded as requested when the feature was implemented If we cannot downgrade here, what do you suggest? Remove the test entirely? Skip it for a few release cycles? If we don't merge this PR we still have the test just in the flaky version which is probably a worse outcome

@pchila We have an overall problem with all the upgrade tests in the integration suite. They are either not testing the actual agent under test (#3461) or the ending Elastic Agent is an older version of Elastic Agent that might not have required fixes for the testing suite to pass (in this case missing watcher kill).

@blakerouse would running it in isolation address your concerns? for the time being. once we have previous major with proper watcher capabilities we would go back to normal runs

@michalpristas I wish that would work, but uninstall is a core part of the testing framework. Running this test in isolation will still result in it failing on Windows, because it will not be able to kill the watcher.

Only possible solution is to wait until the watcher is completely done running, before performing the uninstall. I think that is the only valid solution. This is the only case that works, that is still assuming that we don't hit an issue with RemovePath on the older version of the Elastic Agent.

@cmacknz
Copy link
Member

cmacknz commented Sep 25, 2023

@pchila We have an overall problem with all the upgrade tests in the integration suite. They are either not testing the actual agent under test (#3461) or the ending Elastic Agent is an older version of Elastic Agent that might not have required fixes for the testing suite to pass (in this case missing watcher kill).

Fundamentally to solve this we need the ability to upgrade to the version of the agent that was built in the PR. We need #2780.

We need to move tests between previously released versions and official snapshots onto a schedule because the code in PRs doesn't affect them at all.

Only possible solution is to wait until the watcher is completely done running, before performing the uninstall. I think that is the only valid solution. This is the only case that works, that is still assuming that we don't hit an issue with RemovePath on the older version of the Elastic Agent.

Agreed I think this is the only way to deal with this right now, besides skipping the test or figuring out how to unit test this instead of integration testing it.

@pchila
Copy link
Member Author

pchila commented Sep 26, 2023

Fundamentally to solve this we need the ability to upgrade to the version of the agent that was built in the PR. We need #2780.

For standalone we mostly do, except for versions of agent where we cannot specify a file:// source-uri and a skip-verify flag. For Fleet managed tests I would expect the same but I didn't actually verify that we do use the packaged version of agent something can be improved there.

For the scope of this PR, do we want to include all of those changes? Or do we create a separate issues/PRs for those ?

@pchila
Copy link
Member Author

pchila commented Sep 26, 2023

@blakerouse @cmacknz Adding the wait means partially revert commit 7e86d24 where the cleanup functions have been removed.
Would that be acceptable for the moment in this PR ?

@pchila pchila force-pushed the fix-TestUpgradeBrokenPackage branch from fe6c51f to e3654cb Compare September 26, 2023 13:03
@elastic-sonarqube
Copy link

SonarQube Quality Gate

Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@pchila pchila requested a review from blakerouse September 26, 2023 14:02
@pchila
Copy link
Member Author

pchila commented Sep 26, 2023

@blakerouse could you give another review pass now that we have reintroduced the wait for watcher on the upgrade tests ?

@pchila
Copy link
Member Author

pchila commented Sep 26, 2023

@cmacknz

We need to move tests between previously released versions and official snapshots onto a schedule because the code in PRs doesn't affect them at all.

#3475

@blakerouse
Copy link
Contributor

@pchila See my PR #3477

I believe I have upgrade testing working correctly, even this one.

@pchila
Copy link
Member Author

pchila commented Sep 28, 2023

This is replaced by #3477

@pchila pchila closed this Sep 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-v8.10.0 Automated backport with mergify flaky-test Unstable or unreliable test cases. skip-changelog Team:Elastic-Agent Label for the Agent team Testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flaky test: TestUpgradeBrokenPackageVersion
6 participants