Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration tests are not upgrading to the actual build being tested #3461

Closed
blakerouse opened this issue Sep 22, 2023 · 4 comments · Fixed by #3477
Closed

Integration tests are not upgrading to the actual build being tested #3461

blakerouse opened this issue Sep 22, 2023 · 4 comments · Fixed by #3477
Assignees
Labels
bug Something isn't working Team:Elastic-Agent Label for the Agent team

Comments

@blakerouse
Copy link
Contributor

I have been digging into why tests are still failing to install on Windows. While I did lots of manual testing of PR #3384 confirming that it was indeed stopping and killing the watcher, I was shocked to find similar issues still occurring on Windows with my windows integration runner.

After spending too much time thinking that it could be the code to find the watcher, and too much time pausing the tests and inspecting running processes I identified the issue.

The issue is that some tests are not even upgrading to the build of the Elastic Agent that is under test. It is instead upgrading to the SNAPSHOT build from the artifacts. This can be seen clearly here:

TestStandaloneUpgrade/Upgrade_7.17.13_to_8.11.0-SNAPSHOT

>>> (windows-amd64-2022-teststandaloneupgrade) Test output (sudo) (stdout): === FAIL: testing/integration TestStandaloneUpgrade/Upgrade_7.17.13_to_8.11.0-SNAPSHOT (68.48s)
>>> (windows-amd64-2022-teststandaloneupgrade) Test output (sudo) (stdout): fetcher_artifact.go:176: Downloading artifact from https://staging.elastic.co/7.17.13-d6f555d2/downloads/beats/elastic-agent/elastic-agent-7.17.13-windows-x86_64.zip
>>> (windows-amd64-2022-teststandaloneupgrade) Test output (sudo) (stdout): fetcher_artifact.go:255: Downloading artifact progress 9.97%
>>> (windows-amd64-2022-teststandaloneupgrade) Test output (sudo) (stdout): fetcher_artifact.go:255: Downloading artifact progress 17.77%
>>> (windows-amd64-2022-teststandaloneupgrade) Test output (sudo) (stdout): fetcher_artifact.go:255: Downloading artifact progress 24.18%
>>> (windows-amd64-2022-teststandaloneupgrade) Test output (sudo) (stdout): fetcher_artifact.go:255: Downloading artifact progress 30.77%
>>> (windows-amd64-2022-teststandaloneupgrade) Test output (sudo) (stdout): fetcher_artifact.go:255: Downloading artifact progress 38.99%
>>> (windows-amd64-2022-teststandaloneupgrade) Test output (sudo) (stdout): fetcher_artifact.go:255: Downloading artifact progress 51.99%
>>> (windows-amd64-2022-teststandaloneupgrade) Test output (sudo) (stdout): fetcher_artifact.go:255: Downloading artifact progress 64.59%
>>> (windows-amd64-2022-teststandaloneupgrade) Test output (sudo) (stdout): fetcher_artifact.go:255: Downloading artifact progress 75.38%
>>> (windows-amd64-2022-teststandaloneupgrade) Test output (sudo) (stdout): fetcher_artifact.go:255: Downloading artifact progress 83.18%
>>> (windows-amd64-2022-teststandaloneupgrade) Test output (sudo) (stdout): fetcher_artifact.go:255: Downloading artifact progress 93.29%
>>> (windows-amd64-2022-teststandaloneupgrade) Test output (sudo) (stdout): fetcher_artifact.go:255: Downloading artifact progress 100.00%
>>> (windows-amd64-2022-teststandaloneupgrade) Test output (sudo) (stdout): fetcher_artifact.go:222: Completed downloading artifact from https://staging.elastic.co/7.17.13-d6f555d2/downloads/beats/elastic-agent/elastic-agent-7.17.13-windows-x86_64.zip
>>> (windows-amd64-2022-teststandaloneupgrade) Test output (sudo) (stdout): fetcher_artifact.go:176: Downloading artifact from https://staging.elastic.co/7.17.13-d6f555d2/downloads/beats/elastic-agent/elastic-agent-7.17.13-windows-x86_64.zip.sha512
>>> (windows-amd64-2022-teststandaloneupgrade) Test output (sudo) (stdout): fetcher_artifact.go:222: Completed downloading artifact from https://staging.elastic.co/7.17.13-d6f555d2/downloads/beats/elastic-agent/elastic-agent-7.17.13-windows-x86_64.zip.sha512
>>> (windows-amd64-2022-teststandaloneupgrade) Test output (sudo) (stdout): fixture.go:221: Extracting artifact elastic-agent-7.17.13-windows-x86_64.zip to C:\Users\windows\AppData\Local\Temp\TestStandaloneUpgradeUpgrade_7.17.13_to_8.11.0-SNAPSHOT3632557646\001
>>> (windows-amd64-2022-teststandaloneupgrade) Test output (sudo) (stdout): fixture.go:234: Completed extraction of artifact elastic-agent-7.17.13-windows-x86_64.zip to C:\Users\windows\AppData\Local\Temp\TestStandaloneUpgradeUpgrade_7.17.13_to_8.11.0-SNAPSHOT3632557646\001
>>> (windows-amd64-2022-teststandaloneupgrade) Test output (sudo) (stdout): fixture.go:540: Components were not modified from the fetched artifact
>>> (windows-amd64-2022-teststandaloneupgrade) Test output (sudo) (stdout): fixture_install.go:90: [test TestStandaloneUpgrade/Upgrade_7.17.13_to_8.11.0-SNAPSHOT] Inside fixture install function
>>> (windows-amd64-2022-teststandaloneupgrade) Test output (sudo) (stdout): fixture.go:365: >> running agent with: [C:\Users\windows\AppData\Local\Temp\TestStandaloneUpgradeUpgrade_7.17.13_to_8.11.0-SNAPSHOT3632557646\001\elastic-agent-7.17.13-windows-x86_64\elastic-agent.exe install --force]
>>> (windows-amd64-2022-teststandaloneupgrade) Test output (sudo) (stdout): upgrade_test.go:474: Agent installation output: "Elastic Agent has been successfully installed.\n"
>>> (windows-amd64-2022-teststandaloneupgrade) Test output (sudo) (stdout): fixture.go:365: >> running agent with: [C:\Program Files\Elastic\Agent\elastic-agent.exe status --output json]
>>> (windows-amd64-2022-teststandaloneupgrade) Test output (sudo) (stdout): upgrade_test.go:590: current agent state: {Status:V1_HEALTHY Message: Applications:[]}
>>> (windows-amd64-2022-teststandaloneupgrade) Test output (sudo) (stdout): fixture.go:365: >> running agent with: [C:\Program Files\Elastic\Agent\elastic-agent.exe version --yaml]
>>> (windows-amd64-2022-teststandaloneupgrade) Test output (sudo) (stdout): upgrade_test.go:603: current agent version: {Binary:7.17.13 (build: de195afbaa6ed4f7823e91e5544927b84275103b at 2023-08-31 22:03:50 +0000 UTC) Daemon:7.17.13 (build: de195afbaa6ed4f7823e91e5544927b84275103b at 2023-08-31 22:03:50 +0000 UTC)}
>>> (windows-amd64-2022-teststandaloneupgrade) Test output (sudo) (stdout): upgrade_test.go:487: Upgrading from version "7.17.13" to version "8.11.0-SNAPSHOT"
>>> (windows-amd64-2022-teststandaloneupgrade) Test output (sudo) (stdout): fixture.go:365: >> running agent with: [C:\Program Files\Elastic\Agent\elastic-agent.exe upgrade 8.11.0-SNAPSHOT --skip-verify]
>>> (windows-amd64-2022-teststandaloneupgrade) Test output (sudo) (stdout): fixture.go:365: >> running agent with: [C:\Program Files\Elastic\Agent\elastic-agent.exe status --output yaml]
>>> (windows-amd64-2022-teststandaloneupgrade) Test output (sudo) (stdout): upgrade_test.go:530: error getting the agent state: exit status 1
>>> (windows-amd64-2022-teststandaloneupgrade) Test output (sudo) (stdout): fixture.go:365: >> running agent with: [C:\Program Files\Elastic\Agent\elastic-agent.exe status --output yaml]
>>> (windows-amd64-2022-teststandaloneupgrade) Test output (sudo) (stdout): upgrade_test.go:530: error getting the agent state: exit status 1
>>> (windows-amd64-2022-teststandaloneupgrade) Test output (sudo) (stdout): fixture.go:365: >> running agent with: [C:\Program Files\Elastic\Agent\elastic-agent.exe status --output yaml]
>>> (windows-amd64-2022-teststandaloneupgrade) Test output (sudo) (stdout): upgrade_test.go:530: current agent state: {Info:{ID:c2870809-3718-4ccc-b334-49b07f424cd3 Version:8.11.0 Commit:b1d2e6b04062b8572718e583590afe678579bf9d BuildTime:2023-09-21 15:18:03 +0000 UTC Snapshot:true} State:HEALTHY Message:Running Components:[] FleetState:STOPPED FleetMessage:Not enrolled into Fleet}
>>> (windows-amd64-2022-teststandaloneupgrade) Test output (sudo) (stdout): upgrade_test.go:533: Version "7.17.13" is too old for a quick update marker check, skipping...
>>> (windows-amd64-2022-teststandaloneupgrade) Test output (sudo) (stdout): fixture_install.go:114: [test TestStandaloneUpgrade/Upgrade_7.17.13_to_8.11.0-SNAPSHOT] Inside fixture cleanup function
>>> (windows-amd64-2022-teststandaloneupgrade) Test output (sudo) (stdout): fixture.go:365: >> running agent with: [C:\Program Files\Elastic\Agent\elastic-agent.exe uninstall --force]
>>> (windows-amd64-2022-teststandaloneupgrade) Test output (sudo) (stdout): fixture_install.go:174: fixture.Install Cleanup: uninstall failed: process running: PID: 5572 [C:\Program Files\Elastic\Agent\data\elastic-agent-de195a\elastic-agent watch --path.config C:\Program Files\Elastic\Agent --path.home C:\Program Files\Elastic\Agent]
>>> (windows-amd64-2022-teststandaloneupgrade) Test output (sudo) (stdout): fixture_install.go:179:
>>> (windows-amd64-2022-teststandaloneupgrade) Test output (sudo) (stdout): Error Trace:        C:/Users/windows/agent/pkg/testing/fixture_install.go:179
>>> (windows-amd64-2022-teststandaloneupgrade) Test output (sudo) (stdout): C:/Program Files/Go/src/testing/testing.go:1150
>>> (windows-amd64-2022-teststandaloneupgrade) Test output (sudo) (stdout): C:/Program Files/Go/src/testing/testing.go:1328
>>> (windows-amd64-2022-teststandaloneupgrade) Test output (sudo) (stdout): C:/Program Files/Go/src/testing/testing.go:1570
>>> (windows-amd64-2022-teststandaloneupgrade) Test output (sudo) (stdout): Error:              Received unexpected error:
>>> (windows-amd64-2022-teststandaloneupgrade) Test output (sudo) (stdout): error running uninstall command: exit status 1
>>> (windows-amd64-2022-teststandaloneupgrade) Test output (sudo) (stdout): Test:               TestStandaloneUpgrade/Upgrade_7.17.13_to_8.11.0-SNAPSHOT
>>> (windows-amd64-2022-teststandaloneupgrade) Test output (sudo) (stdout): Messages:           uninstalling agent failed. Output: "Error: failed to remove installation directory (C:\\Program Files\\Elastic\\Agent): timed out while removing \"C:\\\\Program Files\\\\Elastic\\\\Agent\". Last error: remove C:\\Program Files\\Elastic\\Agent\\watcher.lock: The process cannot access the file because it is being used by another process.\nFor help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.11/fleet-troubleshooting.html\n"
>>> (windows-amd64-2022-teststandaloneupgrade) Test output (sudo) (stdout): === FAIL: testing/integration TestStandaloneUpgrade (325.47s)
>>> (windows-amd64-2022-teststandaloneupgrade) Test output (sudo) (stdout): upgrade_test.go:189: Skipping version "8.11.0-SNAPSHOT" since it's newer or equal to version after upgrade "8.11.0-SNAPSHOT"
>>> (windows-amd64-2022-teststandaloneupgrade) Test output (sudo) (stdout): DONE 4 tests, 2 failures in 392.055s
>>> (windows-amd64-2022-teststandaloneupgrade) Test output (sudo) (stderr): Error: go test returned a non-zero value: exit status 1
>>> (windows-amd64-2022-teststandaloneupgrade) sudo tests failed: Process exited with status 1

The interesting part is this line (I removed the prefix), which is after the upgrade has completed.

upgrade_test.go:530: current agent state: {Info:{ID:c2870809-3718-4ccc-b334-49b07f424cd3 Version:8.11.0 Commit:b1d2e6b04062b8572718e583590afe678579bf9d BuildTime:2023-09-21 15:18:03 +0000 UTC Snapshot:true} State:HEALTHY Message:Running Components:[] FleetState:STOPPED FleetMessage:Not enrolled into Fleet}

The Commit and BuildTime does not match by build that was placed on the host under test (aka. the instance):

windows@OGC-WINDOWS-AMD C:\Users\windows\elastic-agent-8.11.0-SNAPSHOT-windows-x86_64>.\elastic-agent.exe version --binary-only --yaml       
binary:
  version: 8.11.0
  commit: 14d017359f23293e5a44e6bfa962668198defe9a
  build_time: 2023-09-22T00:46:57Z
  snapshot: true

This means that the test is not even really testing the build of Elastic Agent we are expecting and that is the reason that the watcher is not being killed. I would not see the benefit of #3384 until a new snapshot is built and published.

@blakerouse blakerouse added the bug Something isn't working label Sep 22, 2023
@blakerouse
Copy link
Contributor Author

blakerouse commented Sep 22, 2023

This affects more than the test I referenced in the description of the issue. Seems that lots of tests are affected by this.

TestFleetManagedUpgrade is not sending the --source_uri through the upgrade request from Fleet. Without doing that the integration test is just testing the same builds, until a new snapshot is built and released. It is not testing at all the build that is placed on the host.

https://github.com/elastic/elastic-agent/blob/main/testing/integration/upgrade_test.go#L159

Calls tools.UpgradeAgent which then makes the API call to Kibana with request body of https://github.com/elastic/elastic-agent/blob/main/pkg/testing/tools/agents.go#L118. That request body does not set the source_uri so the installed Elastic Agent will not install the version that is local on the host, instead fetch the snapshot version from the artifacts snapshot API.

@pierrehilbert pierrehilbert added the Team:Elastic-Agent Label for the Agent team label Sep 22, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@blakerouse
Copy link
Contributor Author

@pierrehilbert I am going to take this issue as it is blocking my ability to continue with Windows integration runner.

This is a critical problem that we need to focus on for the testing framework. If we are not testing the actual build then the tests are useless.

With multiple tests not upgrading to the actual built version of Elastic Agent it makes it impossible to test new code (as we are not really testing the code we expect), which prevents me from continuing to work on Windows.

@pierrehilbert
Copy link
Contributor

Fine for me! Thx!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants