Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor and fix all upgrade integration tests #3477

Merged
merged 17 commits into from
Oct 2, 2023

Conversation

blakerouse
Copy link
Contributor

What does this PR do?

This refactors and fixes all upgrade integration tests. This provides a unified path for the tests to follow for testing upgrade. Includes code to make the upgrades faster, and allows downgrades to work. Includes fixes for upgrade tests that where skipped because they where flaky and enables the retry upgrade test on Windows.

Why is it important?

Ensures that all upgrade tests are working properly so we can catch real errors instead of fighting the CI.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • [ ] I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

How to test this PR locally

mage integration:test

Related issues

@blakerouse blakerouse added Team:Elastic-Agent Label for the Agent team backport-v8.10.0 Automated backport with mergify labels Sep 26, 2023
@blakerouse blakerouse requested a review from a team as a code owner September 26, 2023 17:41
@blakerouse blakerouse self-assigned this Sep 26, 2023
@blakerouse
Copy link
Contributor Author

This has a replace present until this PR is merged and new elastic-agent-libs is released - elastic/elastic-agent-libs#152

@elasticmachine
Copy link
Contributor

elasticmachine commented Sep 26, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-10-02T13:46:00.500+0000

  • Duration: 29 min 56 sec

Test stats 🧪

Test Results
Failed 0
Passed 6425
Skipped 59
Total 6484

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages.

  • run integration tests : Run the Elastic Agent Integration tests.

  • run end-to-end tests : Generate the packages and run the E2E Tests.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@elasticmachine
Copy link
Contributor

elasticmachine commented Sep 26, 2023

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 98.78% (81/82)
Files 65.886% (197/299)
Classes 65.343% (362/554)
Methods 52.697% (1143/2169)
Lines 38.328% (13018/33965)
Conditionals 100.0% (0/0) 💚

@cmacknz
Copy link
Member

cmacknz commented Sep 26, 2023

testing/** needs to be added to the SonarQube exclusion list to silence the coverage failure.

https://github.com/elastic/elastic-agent/blob/main/sonar-project.properties

Copy link
Member

@cmacknz cmacknz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far, want to see the tests passing before approving.

Tests are much easier to follow like this (at least to me).

pkg/testing/fetcher_artifact.go Show resolved Hide resolved
pkg/testing/fixture.go Outdated Show resolved Hide resolved
testing/upgradetest/upgrader.go Outdated Show resolved Hide resolved
testing/upgradetest/watcher.go Show resolved Hide resolved
testing/upgradetest/watcher.go Show resolved Hide resolved
Copy link
Member

@pchila pchila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is at least 1 test that has changed meaning in the refactor: TestStandaloneDowngradeToPreviousSnapshotBuild is supposed to test the upgrade in form of elastic-agent upgrade x.y.z-SNAPSHOT+<buildid> and from what I see it doesn't do that anymore but it just performs a downgrade 2 minors back

There are a few other comments on other parts of the code but another element that pops is that we use version.GetPreviousMinor() more even if we know that it has some limitation as it tries to guess what the previous minor would be.
Maybe you want to take a look at #3458 (which I imagine is now replaced by this) for a possible alternative of selecting previous versions

testing/integration/upgrade_broken_package_test.go Outdated Show resolved Hide resolved
testing/integration/upgrade_fleet_test.go Outdated Show resolved Hide resolved
testing/integration/upgrade_fleet_test.go Outdated Show resolved Hide resolved
testing/integration/upgrade_downgrade_test.go Outdated Show resolved Hide resolved
testing/upgradetest/upgrader.go Outdated Show resolved Hide resolved
@blakerouse
Copy link
Contributor Author

The modreplace has been removed but the lint is still failing, not clear as to why.

…s are different and the commits are the same.
@blakerouse blakerouse force-pushed the fix-all-upgrade-tests branch from 800e598 to a264c09 Compare September 28, 2023 20:32
@blakerouse
Copy link
Contributor Author

@ycombinator Don't worry about it. I have fixed the conflict and updated the test to use the new patterns.

@blakerouse
Copy link
Contributor Author

@pchila I believe this is ready for a final review. I have fixed all conflicts and issues you pointed out.

@mergify
Copy link
Contributor

mergify bot commented Sep 29, 2023

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b fix-all-upgrade-tests upstream/fix-all-upgrade-tests
git merge upstream/main
git push upstream fix-all-upgrade-tests

@jlind23 jlind23 requested a review from pchila September 29, 2023 04:47
Copy link
Member

@pchila pchila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TestStandaloneDowngradeToSpecificSnapshotBuild looks better but we need to select the penultimate build to have a result that is noticeably different from a normal upgrade to a snapshot version

pkg/testing/fetcher_artifact.go Outdated Show resolved Hide resolved
testing/integration/upgrade_downgrade_test.go Outdated Show resolved Hide resolved
testing/integration/upgrade_downgrade_test.go Outdated Show resolved Hide resolved
@blakerouse
Copy link
Contributor Author

@pchila I have fixed that test and fixed the conflicts.

@ycombinator I have fixed the two tests in conflict if you want to give them a look over. Both where new tests that you added.

Copy link
Member

@pchila pchila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@blakerouse blakerouse enabled auto-merge (squash) September 29, 2023 13:46
// before trying to perform another upgrade
err = upgradetest.WaitHealthyAndVersion(ctx, startFixture, endVersionInfo.Binary, 2*time.Minute, 10*time.Second, t)
if err != nil {
// agent never got healthy, but we need to ensure the watcher is stopped before continuing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't immediately obvious to me why the watcher was being killed here. I think it's so it doesn't interfere with other tests (as opposed to having anything to do with continuing on in this test, since this test should now fail due to the require.NoError(t, err) on line 79), right? Could you clarify that in the code comment here please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added more to the comment to make this clear. The reason you gave is the correct.

Comment on lines +88 to +89
// killTimeout is greater than timeout as the watcher should have been
// stopped on its own, and we don't want this test to hide that fact
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment on lines +129 to +130
// Start at the build version as we want to test the retry
// logic that is in the build.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment is not relevant to this test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is relevant.

Comment on lines +176 to +177
// agent never got healthy, but we need to ensure the watcher is stopped before continuing
// this kills the watcher instantly and waits for it to be gone before continuing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as https://github.com/elastic/elastic-agent/pull/3477/files#r1341596538 about the "continuing" being unclear.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Comment on lines +186 to +187
// killTimeout is greater than timeout as the watcher should have been
// stopped on its own, and we don't want this test to hide that fact
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@ycombinator
Copy link
Contributor

@ycombinator I have fixed the two tests in conflict if you want to give them a look over. Both where new tests that you added.

@blakerouse I reviewed the two tests. The logic in both looks good and even improved towards the end of the tests! I left some feedback about clarifying comments, that's all.

@elastic-sonarqube
Copy link

SonarQube Quality Gate

Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@pchila
Copy link
Member

pchila commented Oct 2, 2023

@blakerouse If we have a snapshot with a buildId we need to prepare the uri in the format of x.y.z-, for example
https://snapshots.elastic.co/8.11.0-d55157ca/downloads/beats/elastic-agent/elastic-agent-8.11.0-SNAPSHOT-linux-x86_64.tar.gz (we can see the build details from https://artifacts-api.elastic.co/v1/versions/8.11.0-SNAPSHOT/builds/8.11.0-d55157ca)

From the error in https://buildkite.com/elastic/elastic-agent/builds/3611 it seems the artifact fetcher is not resolving the uri correctly

@blakerouse
Copy link
Contributor Author

@pchila You are correct it was not, my most recent commit fixed it. See here - 967969f

@blakerouse blakerouse merged commit 6201e19 into elastic:main Oct 2, 2023
11 checks passed
mergify bot pushed a commit that referenced this pull request Oct 2, 2023
* Fix all upgrade tests.

* Fix imports and headers.

* Update notice.

* Exclude testing/** from sonar.

* Fix comments from code review.

* Add extra error information in the artifact fetcher.

* Fixes from code review.

* Add upgrade uninstall kill watcher test.

* Remove go replace. Regenerate notice. Fix lint.

* Import WithSourceURI logic. Fix fleet test to not skip if the versions are different and the commits are the same.

* More test fixes.

* Fix imports.

* Re-add TestStandaloneUpgradeFailsWhenUpgradeIsInProgress. Fix code review.

(cherry picked from commit 6201e19)

# Conflicts:
#	sonar-project.properties
#	testing/integration/upgrade_test.go
blakerouse added a commit that referenced this pull request Oct 5, 2023
…#3495)

* Refactor and fix all upgrade integration tests (#3477)

* Fix all upgrade tests.

* Fix imports and headers.

* Update notice.

* Exclude testing/** from sonar.

* Fix comments from code review.

* Add extra error information in the artifact fetcher.

* Fixes from code review.

* Add upgrade uninstall kill watcher test.

* Remove go replace. Regenerate notice. Fix lint.

* Import WithSourceURI logic. Fix fleet test to not skip if the versions are different and the commits are the same.

* More test fixes.

* Fix imports.

* Re-add TestStandaloneUpgradeFailsWhenUpgradeIsInProgress. Fix code review.

(cherry picked from commit 6201e19)

# Conflicts:
#	sonar-project.properties
#	testing/integration/upgrade_test.go

* Fix merge.

---------

Co-authored-by: Blake Rouse <blake.rouse@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-v8.10.0 Automated backport with mergify skip-changelog Team:Elastic-Agent Label for the Agent team
Projects
None yet
5 participants