Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add integration test upgrading latest release to a build from the current PR #5457

Merged
merged 17 commits into from
Sep 12, 2024

Conversation

AndersonQ
Copy link
Member

What does this PR do?

Adds an integration test which upgrades the latest Elastic Agent release to the version built form the current PR being tested.

Why is it important?

It'll allow us to catch any bug preventing the agent from upgrading before merging the PR.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • [ ] I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

Disruptive User Impact

  • N/A

How to test this PR locally

AGENT_KEEP_INSTALLED=true INSTANCE_PROVISIONER="multipass" SNAPSHOT=true TEST_PLATFORMS="linux/amd64" TEST_PACKAGES="tar.gz" mage integration:single TestFleetPRBuildSelfSigned

Related issues

Questions to ask yourself

  • How are we going to support this in production?
  • How are we going to measure its adoption?
  • How are we going to debug this?
  • What are the metrics I should take care of?
  • ...

@AndersonQ AndersonQ self-assigned this Sep 6, 2024
@AndersonQ AndersonQ requested a review from a team as a code owner September 6, 2024 14:58
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

@ycombinator ycombinator requested review from pchila and removed request for swiatekm September 6, 2024 16:53
Copy link
Member

@pchila pchila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have misunderstood the intent of the PR so please correct me if I am wrong but the purpose here is to have a fleet-managed agent (installed using the latest release) upgrade to the version built from source on the branch, correct?

Why do we need to impersonate the artifact server, messing with /etc/hosts? Wouldn't a simple httptest.Server serve the package we have available locally? We should be able to set that on the policy using Fleet API...

Am I missing something?

@AndersonQ
Copy link
Member Author

Am I missing something?

Yes :)

The issue is the agent will only upgrade to package it can verify the signature. In order to do so it uses 3 keys:

  • one served by fleet-server, which on cloud we cannot changed
  • one that is built-in in the agent
  • one it downloads from the artifacts API, which the address is hard coded

therefore, I needed to serve the key from a HTTPS server that matches the hard-coded address in the agent. I could use a proxy or something like that. However, impersonating API keeps it all in the test itself and does not rely on other tools.

@AndersonQ AndersonQ requested a review from pchila September 10, 2024 06:59
@AndersonQ AndersonQ added the backport-8.15 Automated backport to the 8.15 branch with mergify label Sep 10, 2024
@pchila
Copy link
Member

pchila commented Sep 10, 2024

  • one served by fleet-server, which on cloud we cannot changed

  • one that is built-in in the agent

  • one it downloads from the artifacts API, which the address is hard coded

Ok, so for a limitation on ESS fleet-server side (not being able to send a skip-verify or a setting a custom PGP key for validation) we are impersonating the API server itself by limiting this test to Linux and playing with /etc/hosts, did I get the gist ?

I am not sure this is the best way to get a fleet-managed agent to upgrade to a locally signed package (we already support the test for standalone agents since we can pass the additional options). Maybe it's time to have integration tests run with something other than ESS deployments, using a fleet-server we can configure?

@cmacknz do we still want to persevere in using ESS deployments in integration tests even if it forces us to implement these sort of workarounds ?

@AndersonQ thanks for the explanation, I see now what constraints we have on integration tests involving fleet-managed agents, I will have another look

@AndersonQ
Copy link
Member Author

I am not sure this is the best way to get a fleet-managed agent to upgrade to a locally signed package (we already support the test for standalone agents since we can pass the additional options). Maybe it's time to have integration tests run with something other than ESS deployments, using a fleet-server we can configure?

I agree 100% with you, but it's the current limitations we have.

@cmacknz do we still want to persevere in using ESS deployments in integration tests even if it forces us to implement these sort of workarounds ?

being able to deploy a fleet-server would also help us with the mTLS tests

pkg/testing/fixture.go Show resolved Hide resolved
pkg/testing/fixture.go Outdated Show resolved Hide resolved
Comment on lines 701 to 709
stamp := time.Now().Format(time.RFC3339)
// on Windows a filename cannot contain a ':' as this collides with disk
// labels (aka. C:\)
stamp = strings.ReplaceAll(stamp, ":", "-")

// Subtest names are separated by "/" characters which are not valid
// filenames on Linux.
sanitizedTestName := strings.ReplaceAll(f.t.Name(), "/", "-")
prefix := fmt.Sprintf("%s-%s", sanitizedTestName, stamp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this could be grouped in a generic utility function SanitizePath or whatever other name...

testing/integration/upgrade_fleet_test.go Outdated Show resolved Hide resolved
@@ -247,6 +393,7 @@ func testUpgradeFleetManagedElasticAgent(
nonInteractiveFlag = true
}
installOpts := atesting.InstallOpts{
Insecure: true,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be passed as parameter so that we use insecure: true only on the new test ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this line wasn't needed after all ? I cannot find it in the latest version of the PR

testing/integration/upgrade_fleet_test.go Outdated Show resolved Hide resolved

// prepareFileServer prepares and returns a started HTTPS file server serving
// files from rootDir and using cert as its TLS certificate.
func prepareFileServer(t *testing.T, rootDir string, cert tls.Certificate) *httptest.Server {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's better if we say what we actually do in this function

Suggested change
func prepareFileServer(t *testing.T, rootDir string, cert tls.Certificate) *httptest.Server {
func startHttpsFileServer(t *testing.T, rootDir string, cert tls.Certificate) *httptest.Server {

return server
}

// prepareTLSCerts generates a CA and a child certificate for the given host and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am pretty sure there's already a function in test code that does this: can we avoid the duplication ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, I searched the repo and I could not find it. There is thought https://github.com/elastic/elastic-agent-libs/blob/v0.10.1/testing/certutil/certutil.go#L196 which almost works, but it doesn't because it does not allow to configure the host

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the other function is only missing the hostname I would rather extend the existing function, generating certificates is tedious and error prone, I would rather avoid the duplication...

Not blocking, anyways

@AndersonQ AndersonQ requested a review from pchila September 10, 2024 12:13
Copy link

Copy link
Contributor

mergify bot commented Sep 11, 2024

backport-v8.x has been added to help with the transition to the new branch 8.x.

@mergify mergify bot added the backport-8.x Automated backport to the 8.x branch with mergify label Sep 11, 2024
@v1v v1v removed the backport-v8.x label Sep 11, 2024
@AndersonQ AndersonQ merged commit 1a449cd into elastic:main Sep 12, 2024
15 checks passed
@AndersonQ AndersonQ deleted the 4560-integration-test-pr-build branch September 12, 2024 08:49
mergify bot pushed a commit that referenced this pull request Sep 12, 2024
mergify bot pushed a commit that referenced this pull request Sep 12, 2024
AndersonQ added a commit that referenced this pull request Sep 13, 2024
…rent PR (#5457) (#5530)

(cherry picked from commit 1a449cd)

Co-authored-by: Anderson Queiroz <anderson.queiroz@elastic.co>
AndersonQ added a commit that referenced this pull request Sep 23, 2024
…rent PR (#5457) (#5531)

(cherry picked from commit 1a449cd)

Co-authored-by: Anderson Queiroz <anderson.queiroz@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.x Automated backport to the 8.x branch with mergify backport-8.15 Automated backport to the 8.15 branch with mergify skip-changelog Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team Testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add an upgrade integration test upgrading to the PR build
4 participants