Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/report version from file #2908

Merged
merged 24 commits into from
Jul 13, 2023

Conversation

pchila
Copy link
Member

@pchila pchila commented Jun 20, 2023

What does this PR do?

Package a new file containing the Elastic Agent "package version" passed using AGENT_PACKAGE_VERSION env var and make sure that that's the version reported by the agent. If no version is specified during packaging, the resulting file will be created with the current BEAT_VERSION and the agent will keep reporting version as it did before this change.

The file is read once for the daemon/service process and for every CLI command execution and the content is stored in a package private variable. If there are issues reading the file the InitVersionInformation() function will fallback on BEATS_VERSION and return an error. This initialization error is treated as non-blocking and logged by the running agent or printed to stderr by the CLI

Why is it important?

For our agent early release process we want the ability of repackaging the agent while updating only the components that need to be patched quickly: in other words, we want to take a previously packaged agent, swap in updated versions of some component (maybe beats, maybe endpoint, maybe something else...) without modifying anything else.
This PR gives the possibility to specify a package version while performing mage package by passing the version using AGENT_PACKAGE_VERSION environment variable.
For example a "normal" package command to create a new 8.9.0-SNAPSHOT elastic agent package like:
BEATS_VERSION=8.9.0 SNAPSHOT=true mage package
would become something like this if we want to give it a different package version
AGENT_PACKAGE_VERSION=8.9.0-super_special_sauce_remix BEATS_VERSION=8.9.0 SNAPSHOT=true mage package
and using the second package form from the same agent source code we can have the agent report itself as 8.9.0-super_special_sauce_remix

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

Author's Checklist

  • [ ]

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

Questions to ask yourself

  • How are we going to support this in production?
  • How are we going to measure its adoption?
  • How are we going to debug this?
  • What are the metrics I should take care of?
  • ...

@pchila pchila added enhancement New feature or request Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team Team:Elastic-Agent Label for the Agent team labels Jun 20, 2023
@pchila pchila self-assigned this Jun 20, 2023
@mergify
Copy link
Contributor

mergify bot commented Jun 20, 2023

This pull request does not have a backport label. Could you fix it @pchila? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v./d./d./d is the label to automatically backport to the 8./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@elasticmachine
Copy link
Contributor

elasticmachine commented Jun 20, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-07-12T19:12:03.375+0000

  • Duration: 50 min 55 sec

Test stats 🧪

Test Results
Failed 0
Passed 5977
Skipped 27
Total 6004

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages.

  • run integration tests : Run the Elastic Agent Integration tests.

  • run end-to-end tests : Generate the packages and run the E2E Tests.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@elasticmachine
Copy link
Contributor

elasticmachine commented Jun 20, 2023

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 98.718% (77/78) 👍 0.017
Files 67.669% (180/266) 👎 -0.011
Classes 66.531% (328/493) 👎 -0.135
Methods 53.894% (1038/1926) 👎 -0.185
Lines 40.167% (11891/29604) 👎 -0.072
Conditionals 100.0% (0/0) 💚

@pchila pchila force-pushed the feature/report-version-from-file branch 2 times, most recently from db9ca5d to 3503bf4 Compare June 20, 2023 16:29
@pchila pchila marked this pull request as ready for review June 20, 2023 16:46
@pchila pchila requested a review from a team as a code owner June 20, 2023 16:46
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

Copy link
Member

@cmacknz cmacknz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments about error handling. I would also like to see the .package.version file included in the diagnostics bundle.

return pkgVersion
}
// TODO how to signal an error ?
// Fallback to previous behavior
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should fall back to the agent binary version, that will be hard to detect and will possibly cause confusion. Ideally we can use some sentinel value or just set the version to the error.

Maybe we set the version to major.minor.patch-$errorMessage or something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the agent can't determine its version, maybe we should set the version to major.minor.patch-unknown, and set the agent as unhealthy with the version error as the message?

major.minor.patch can be pulled from the agent binary version built into the image, today it should always match the package version.

Copy link
Member Author

@pchila pchila Jun 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting the version to something different from the BEATS_VERSION when packaging without passing the AGENT_PACKAGE_VERSION env var would be a change in the current behavior (the current packaging we do via Jenkins would spit out an agent with a different result for example).
Maybe what we could do is: we set the package version equal to the BEATS_VERSION in packaging if no specific version is passed (current output of version command does not change if no AGENT_PACKAGE_VERSION is passed) and fallback to the error version only if the file is missing/unreadable.
With that approach we can detect errors in packaging/installation by returning something like major.minor.patch[-SNAPSHOT]+unknown_package_version but passing no specific version string during packaging works as it does today.
As for the health status I am not sure we want to flag the agent as unhealthy for a version problem (it seems excessive as the agent keeps working maybe just misreporting its version to fleet maybe) 🤔

Copy link
Member

@cmacknz cmacknz Jun 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe what we could do is: we set the package version equal to the BEATS_VERSION in packaging if no specific version is passed (current output of version command does not change if no AGENT_PACKAGE_VERSION is passed) and fallback to the error version only if the file is missing/unreadable.

This makes sense. If you don't specify AGENT_PACKAGE_VERSION, default to BEAT_VERSION. In both cases we write out the package version file.

As for the health status I am not sure we want to flag the agent as unhealthy for a version problem (it seems excessive as the agent keeps working maybe just misreporting its version to fleet maybe)

Seeing the major.minor.patch[-SNAPSHOT]+unknown_package_version in Fleet is probably going to be obvious enough, the agent is working so I can see why we wouldn't want to set it as unhealthy. I just want this problem to be obvious.

We should explicitly test that upgrading from an agent with major.minor.patch[-SNAPSHOT]+unknown_package_version as the version works to simulate the scenario where there is a bug in parsing the version file and we need to upgrade to fix it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in 372131a

testing/integration/package_version_test.go Show resolved Hide resolved
@pchila pchila force-pushed the feature/report-version-from-file branch from f4a75ec to 372131a Compare June 23, 2023 10:55
@pchila
Copy link
Member Author

pchila commented Jun 23, 2023

/test

Copy link
Member

@cmacknz cmacknz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running AGENT_PACKAGE_VERSION=8.10.0 SNAPSHOT=true PLATFORMS=darwin/arm64 PACKAGES=targz mage package with the agent version/version.go file set to 8.9.0 the build fails trying to download external dependencies like osquerybeat-8.9.0+unknown_package_version-SNAPSHOT-darwin-aarch64.tar.gz.

It produced a package named elastic-agent-8.9.0-SNAPSHOT-darwin-aarch64.tar.gz with an agent binary having an empty components directory.

In this case I expected BEAT_VERSION to be set to the value in version/version.go which will be the agent binary version. I was expecting it to download the 8.9.0 artifacts for the dependent binaries. This is how the build works by default today, the only thing I expected to change was the overall package version which I expected to be 8.10.0.

Adding BEAT_VERSION fixes the component version download but the package version is still set to 8.9.0 in the file name. BEAT_VERSION=8.9.0 AGENT_PACKAGE_VERSION=8.10.0 SNAPSHOT=true PLATFORMS=darwin/arm64 PACKAGES=targz mage package

internal/pkg/agent/install/install_windows.go Show resolved Hide resolved
@pchila
Copy link
Member Author

pchila commented Jun 23, 2023

@cmacknz Thanks for the first realistic example of how this feature could be used.
The issue seems related to the "feature" of collecting beats from the beats repository that must be side by side of elastic-agent repository (specifying EXTERNAL=true we cannot reproduce the issue).
Somewhere in our magefile we must be calling Version() because before this change it was an alias to the beats version value... I will check where it is used an change it to the appropriate function call...

@pchila
Copy link
Member Author

pchila commented Jun 23, 2023

@cmacknz Fixed the default beats version lookup in 1c892f7

@pchila pchila requested a review from alexsapran June 27, 2023 07:14
@cmacknz
Copy link
Member

cmacknz commented Jun 28, 2023

This still doesn't quite do what I expect it to do.

  1. I changed the local contents of version/version.go to 8.9.0 from 8.10.0 so that the agent binary version and the version of the downloaded dependencies (like Beats) would be 8.9.0.
diff --git a/version/version.go b/version/version.go
index e8717381b..fcc886c4b 100644
--- a/version/version.go
+++ b/version/version.go
@@ -4,4 +4,4 @@

 package version

-const defaultBeatVersion = "8.10.0"
+const defaultBeatVersion = "8.9.0"

Next I ran mage package with AGENT_PACKAGE_VERSION=8.6.0 so that it was a version that the build would otherwise never generate on main but I could still enroll it in Fleet since we can enroll old versions. This produced a file named elastic-agent-8.9.0-SNAPSHOT-darwin-aarch64.tar.gz when I was expecting elastic-agent-8.6.0-SNAPSHOT-darwin-aarch64.tar.gz where the version number should match the value of AGENT_PACKAGE_VERSION instead of the agent binary version.

~/g/s/gi/elastic/elastic-agent feature/report-version-from-file !1 ?7 ··· 10:15:06 AM
❯ AGENT_PACKAGE_VERSION=8.6.0 SNAPSHOT=true PLATFORMS=darwin/arm64 PACKAGES=targz mage -v package
...
Adding -rwxr-xr-x elastic-agent-8.9.0-SNAPSHOT-darwin-aarch64/elastic-agent
Creating output file at build/distributions/elastic-agent-8.9.0-SNAPSHOT-darwin-aarch64.tar.gz
Running dependency: TestPackages
>> Testing package contents
exec: go "test" "-v" "/Users/cmackenzie/go/src/github.com/elastic/elastic-agent/dev-tools/packaging/package_test.go" "-root-owner" "-files" "/Users/cmackenzie/go/src/github.com/elastic/elastic-agent/build/distributions/*"
package ran for 2m5.817733s

If I extract the built .tar.gz I don't see a .package.version file either which I was expecting.

cd build/distributions/
tar xvfz elastic-agent-8.9.0-SNAPSHOT-darwin-aarch64.tar.gz
cd elastic-agent-8.9.0-SNAPSHOT-darwin-aarch64/
~/g/s/gi/e/elastic-agent/b/d/elastic-agent-8.9.0-SNAPSHOT-darwin-aarch64 feature/report-version-from-file !1 ?7
❯ ls -la
total 2024
drwxr-xr-x  11 cmackenzie  staff     352 Jun 28 10:27 .
drwxr-xr-x   6 cmackenzie  staff     192 Jun 28 10:27 ..
-rw-r--r--   1 cmackenzie  staff      41 Jun 28 10:20 .build_hash.txt
-rw-r--r--   1 cmackenzie  staff      41 Jun 28 10:20 .elastic-agent.active.commit
-rw-r--r--   1 cmackenzie  staff   13675 Jun 26 15:30 LICENSE.txt
-rw-r--r--   1 cmackenzie  staff  980302 Jun 27 10:00 NOTICE.txt
-rw-r--r--   1 cmackenzie  staff     309 Jun 28 10:20 README.md
drwxr-xr-x   3 cmackenzie  staff      96 Jun 28 10:27 data
lrwxr-xr-x   1 cmackenzie  staff      72 Jun 28 10:20 elastic-agent -> data/elastic-agent-1c892f/elastic-agent.app/Contents/MacOS/elastic-agent
-rw-r--r--   1 cmackenzie  staff   10797 Jun 28 10:19 elastic-agent.reference.yml
-rw-------   1 cmackenzie  staff   10951 Jun 28 10:19 elastic-agent.yml

❯ ls -la data/elastic-agent-1c892f/
total 8
drwxr-xr-x   5 cmackenzie  staff  160 Jun 28 10:27 .
drwxr-xr-x   3 cmackenzie  staff   96 Jun 28 10:27 ..
drwxr-xr-x  31 cmackenzie  staff  992 Jun 28 10:19 components
-rwxr-xr-x   1 cmackenzie  staff  227 Jun 28 10:20 elastic-agent
drwxr-xr-x   3 cmackenzie  staff   96 Jun 28 10:27 elastic-agent.app

However if I install the agent it reports its version as 8.6.0, but I have no way to know if it is telling me the package version or the agent binary version. We need the version command to start listing both.

❯ sudo ./elastic-agent install -f
Elastic Agent has been successfully installed.
~/g/s/gi/e/elastic-agent/b/d/elastic-agent-8.9.0-SNAPSHOT-darwin-aarch64 feature/report-version-from-file !1 ?7
❯ sudo elastic-agent version
Binary: 8.6.0-SNAPSHOT (build: 1c892f717b558c7c74caed5090b61d01f3508bd8 at 2023-06-28 08:19:16 +0000 UTC)
Daemon: 8.6.0-SNAPSHOT (build: 1c892f717b558c7c74caed5090b61d01f3508bd8 at 2023-06-28 08:19:16 +0000 UTC)

@pchila
Copy link
Member Author

pchila commented Jun 28, 2023

@cmacknz the .package.version file is located side by side with the binary (we need to do this since during upgrade we have multiple agent version and each should report its own version independently (in case of mac you will find the package version file in data/elastic-agent-{{ commit_short }}/elastic-agent.app/Contents/MacOS/.package.version). During upgrade we only extract the contents of data and its subdirectories so if we place the new package version file in the root of the archive it won't be extracted.
As for the package naming I didn't know that we wanted to change this as I thought was part of #2579
We can also do it in this issue (it will be needed anyways), so if I understand the requirement correctly we want all the packages to report the package version instead of the beats version in both the archive name and the top directory inside the archive ?
Would you mind adding this requirement and the expected naming in the issue?

As for the version output: I thought that the format of the version command should not change and we should only change the version string here. If we want to separate the beats version from the package version we should probably think about having an output that looks like a Bill of Materials where all the versions of the components (agent, beats, endpoint etc.) are reported separately. What do you think ?

@cmacknz
Copy link
Member

cmacknz commented Jun 28, 2023

We can also do it in this issue (it will be needed anyways), so if I understand the requirement correctly we want all the packages to report the package version instead of the beats version in both the archive name and the top directory inside the archive ?
Would you mind adding this requirement and the expected naming in the issue?

Where elastic-agent-8.9.0-SNAPSHOT-darwin-aarch64.tar.gz is the agent package, I expect the versioning to be elastic-agent-$AGENT_PACKAGE_VERSION-SNAPSHOT-darwin-aarch64.tar.gz. We now have a separate package for just the agent binary we can create that should have the agent binary version in the name.

As for the version output: I thought that the format of the version command should not change and we should only change the version string here. If we want to separate the beats version from the package version we should probably think about having an output that looks like a Bill of Materials where all the versions of the components (agent, beats, endpoint etc.) are reported separately. What do you think ?

This makes sense, the elastic-agent version default output should be unchanged and display the package version. We can add an option or a new command to show the version of every component in the agent package since they can all be versioned independently. This can be done in separate PR with its own issue.

@pchila
Copy link
Member Author

pchila commented Jun 29, 2023

@cmacknz added a small change in 764cbba and 50c5cea that will set the version used for package names and top level directories to agent_package_version.

The only exception to this rule are the elastic-agent-core artifacts that still maintain the override to the BEATS_VERSION value regardless of whatever AGENT_PACKAGE_VERSION is passed

Copy link
Contributor

@michalpristas michalpristas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i tested few scenarios all looks good to me.
i don't like that when file is missing we say unknown package but if that's what you agreed on i'm fine with that.

when you merge main into this branch you won't get conflicts but some build failures. make sure you resolve that before merging

internal/pkg/agent/install/install_windows.go Show resolved Hide resolved
@pchila pchila force-pushed the feature/report-version-from-file branch from 50c5cea to 11e3036 Compare July 4, 2023 09:41
@pchila pchila force-pushed the feature/report-version-from-file branch from 0ac1816 to 5c54316 Compare July 11, 2023 09:31
@pchila
Copy link
Member Author

pchila commented Jul 12, 2023

buildkite test this

if ver.Prerelease() == "" {
ver = semver.NewParsedSemVer(ver.Major(), ver.Minor(), ver.Patch(), ver.Prerelease()+"SNAPSHOT", ver.BuildMetadata())
} else {
ver = semver.NewParsedSemVer(ver.Major(), ver.Minor(), ver.Patch(), ver.Prerelease()+"-SNAPSHOT", ver.BuildMetadata())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I though the - before SNAPSHOT was always necessary? What am I missing here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not simple string concatenation: we are setting the prerelease string of a semver version.
The String() function of the semver object will format the version as
<major>.<minor>.<patch>[-<prerelease>][+<metadata>]
If we had no prerelease set in the original version, we just need to set the prerelease to 'SNAPSHOT' and the version will have the expected format
If we already had a prerelease string set (like er.1 for example) then we set the prerelease string to the original prerelease string suffixed with '-SNAPSHOT' (in the er.1 example we get er.1-SNAPSHOT as prerelease string)

Copy link
Member

@cmacknz cmacknz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested, looks good. Left a few minor comments.

Installing without a package version logs an obvious error now at install time:

Error initializing version information: reading package version from file "/Users/cmackenzie/Downloads/elastic-agent-8.6.0-SNAPSHOT-darwin-aarch64/data/elastic-agent-cd6d63/elastic-agent.app/Contents/MacOS/.package.version": open /Users/cmackenzie/Downloads/elastic-agent-8.6.0-SNAPSHOT-darwin-aarch64/data/elastic-agent-cd6d63/elastic-agent.app/Contents/MacOS/.package.version: no such file or directory
Error initializing version information: reading package version from file "/Library/Elastic/Agent/data/elastic-agent-cd6d63/elastic-agent.app/Contents/MacOS/.package.version": open /Library/Elastic/Agent/data/elastic-agent-cd6d63/elastic-agent.app/Contents/MacOS/.package.version: no such file or directory
{"log.level":"info","@timestamp":"2023-07-12T11:19:30.481-0400","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":478},"message":"Starting enrollment to URL: https://348ceebe92654238b196c6280658b2f5.fleet.eastus2.staging.azure.foundit.no:443/","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-07-12T11:19:31.511-0400","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":276},"message":"Successfully triggered restart on running Elastic Agent.","ecs.version":"1.6.0"}
Successfully enrolled the Elastic Agent.
Elastic Agent has been successfully installed.

@pchila
Copy link
Member Author

pchila commented Jul 12, 2023

buildkite test this

Co-authored-by: Craig MacKenzie <craig.mackenzie@elastic.co>
@jlind23
Copy link
Contributor

jlind23 commented Jul 13, 2023

Buildkite test this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-skip enhancement New feature or request Team:Elastic-Agent Label for the Agent team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Report the version from the elastic agent package instead of agent binary
5 participants