Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Host Update API integration #35

Merged
merged 2 commits into from
Jul 23, 2020
Merged

Host Update API integration #35

merged 2 commits into from
Jul 23, 2020

Conversation

etungsten
Copy link
Contributor

@etungsten etungsten commented Jul 10, 2020

Issue number:
N/A

Description of changes:
Adds support for updating via the Bottlerocket Update API.
The new platform is used if the node is labeled bottlerocket.aws/updater-interface-version=2.0.0

Testing done:
Agent successfully scheduled a pod onto a node and finished an update:

    time="2020-07-13T22:11:27Z" level=debug msg=starting component=agent
    time="2020-07-13T22:11:27Z" level=debug msg="posted intent" component=agent intent=",, update:" node=ip-192-168-4-80.us-west-2.compute.internal
    time="2020-07-13T22:11:27Z" level=debug msg=starting component=agent worker=informer
    time="2020-07-13T22:11:27Z" level=debug msg="resource add event" component=agent worker=informer
    time="2020-07-13T22:11:27Z" level=debug msg="skipping emitted intent as event" component=agent intent=",, update:" node=ip-192-168-4-80.us-west-2.compute.internal
    time="2020-07-13T22:11:27Z" level=debug msg="resource update event" component=agent worker=informer
    time="2020-07-13T22:11:27Z" level=debug msg="active intent received" component=agent intent="stabilize,unknown,unknown update:unknown" node=ip-192-168-4-80.us-west-2.compute.internal
    time="2020-07-13T22:11:27Z" level=debug msg="handling intent" component=agent intent="stabilize,unknown,unknown update:unknown" worker=handler
    time="2020-07-13T22:11:27Z" level=debug msg="posted intent" component=agent intent="stabilize,stabilize,busy update:unknown" node=ip-192-168-4-80.us-west-2.compute.internal
    time="2020-07-13T22:11:27Z" level=debug msg=sitrep component=agent intent="stabilize,unknown,unknown update:unknown" worker=handler
    time="2020-07-13T22:11:27Z" level=debug msg="making 'GET' request to '/os' API endpoint" component=api
    time="2020-07-13T22:11:27Z" level=info msg="current running OS version: 0.3.4" component=platform
    time="2020-07-13T22:11:27Z" level=debug msg="fetching list of available updates" component=platform
    time="2020-07-13T22:11:27Z" level=debug msg="making 'POST' request to '/actions/refresh-updates' API endpoint" component=api
    time="2020-07-13T22:11:27Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
    time="2020-07-13T22:11:27Z" level=info msg="API server busy, retrying in 10 seconds ..." component=api
    time="2020-07-13T22:11:37Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
    time="2020-07-13T22:11:37Z" level=info msg="an update is available" component=agent intent="stabilize,unknown,unknown update:unknown" update-available=true worker=handler
    time="2020-07-13T22:11:37Z" level=debug msg="realized intent" component=agent intent="stabilize,unknown,unknown update:unknown" worker=handler
    time="2020-07-13T22:11:37Z" level=debug msg="posted intent" component=agent intent="stabilize,stabilize,ready update:true" node=ip-192-168-4-80.us-west-2.compute.internal
    time="2020-07-13T22:11:37Z" level=debug msg="resource update event" component=agent worker=informer
    time="2020-07-13T22:11:37Z" level=debug msg="skipping emitted intent as event" component=agent intent="stabilize,stabilize,busy update:unknown" node=ip-192-168-4-80.us-west-2.compute.internal
    time="2020-07-13T22:11:37Z" level=debug msg="resource update event" component=agent worker=informer
    time="2020-07-13T22:11:37Z" level=debug msg="skipping emitted intent as event" component=agent intent="stabilize,stabilize,ready update:true" node=ip-192-168-4-80.us-west-2.compute.internal
    time="2020-07-13T22:11:37Z" level=debug msg="resource update event" component=agent worker=informer
    time="2020-07-13T22:11:37Z" level=debug msg="active intent received" component=agent intent="prepare-update,stabilize,ready update:true" node=ip-192-168-4-80.us-west-2.compute.internal
    time="2020-07-13T22:11:37Z" level=debug msg="handling intent" component=agent intent="prepare-update,stabilize,ready update:true" worker=handler
    time="2020-07-13T22:11:37Z" level=debug msg="posted intent" component=agent intent="prepare-update,prepare-update,busy update:true" node=ip-192-168-4-80.us-west-2.compute.internal
    time="2020-07-13T22:11:37Z" level=debug msg="fetching list of available updates" component=platform
    time="2020-07-13T22:11:37Z" level=debug msg="making 'POST' request to '/actions/refresh-updates' API endpoint" component=api
    time="2020-07-13T22:11:37Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
    time="2020-07-13T22:11:37Z" level=info msg="API server busy, retrying in 10 seconds ..." component=api
    time="2020-07-13T22:11:47Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
    time="2020-07-13T22:11:47Z" level=debug msg="preparing update" component=agent intent="prepare-update,stabilize,ready update:true" worker=handler
    time="2020-07-13T22:11:47Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
    time="2020-07-13T22:11:47Z" level=debug msg="making 'POST' request to '/actions/prepare-update' API endpoint" component=api
    time="2020-07-13T22:11:47Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
    time="2020-07-13T22:11:47Z" level=info msg="API server busy, retrying in 10 seconds ..." component=api
    time="2020-07-13T22:11:57Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
    time="2020-07-13T22:11:57Z" level=debug msg="realized intent" component=agent intent="prepare-update,stabilize,ready update:true" worker=handler
    time="2020-07-13T22:11:57Z" level=debug msg="posted intent" component=agent intent="prepare-update,prepare-update,ready update:true" node=ip-192-168-4-80.us-west-2.compute.internal
    time="2020-07-13T22:11:57Z" level=debug msg="resource update event" component=agent worker=informer
    time="2020-07-13T22:11:57Z" level=debug msg="skipping emitted intent as event" component=agent intent="prepare-update,prepare-update,busy update:true" node=ip-192-168-4-80.us-west-2.compute.internal
    time="2020-07-13T22:11:57Z" level=debug msg="resource update event" component=agent worker=informer
    time="2020-07-13T22:11:57Z" level=debug msg="skipping emitted intent as event" component=agent intent="prepare-update,prepare-update,busy update:true" node=ip-192-168-4-80.us-west-2.compute.internal
    time="2020-07-13T22:11:57Z" level=debug msg="resource update event" component=agent worker=informer
    time="2020-07-13T22:11:57Z" level=debug msg="skipping emitted intent as event" component=agent intent="prepare-update,prepare-update,ready update:true" node=ip-192-168-4-80.us-west-2.compute.internal
    time="2020-07-13T22:11:57Z" level=debug msg="resource update event" component=agent worker=informer
    time="2020-07-13T22:11:57Z" level=debug msg="active intent received" component=agent intent="perform-update,prepare-update,ready update:true" node=ip-192-168-4-80.us-west-2.compute.internal
    time="2020-07-13T22:11:57Z" level=debug msg="handling intent" component=agent intent="perform-update,prepare-update,ready update:true" worker=handler
    time="2020-07-13T22:11:57Z" level=debug msg="posted intent" component=agent intent="perform-update,perform-update,busy update:true" node=ip-192-168-4-80.us-west-2.compute.internal
    time="2020-07-13T22:11:57Z" level=debug msg=updating component=agent intent="perform-update,prepare-update,ready update:true" worker=handler
    time="2020-07-13T22:11:57Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
    time="2020-07-13T22:11:57Z" level=debug msg="making 'POST' request to '/actions/activate-update' API endpoint" component=api
    time="2020-07-13T22:11:57Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
    time="2020-07-13T22:11:57Z" level=info msg="API server busy, retrying in 10 seconds ..." component=api
    time="2020-07-13T22:12:07Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
    time="2020-07-13T22:12:07Z" level=debug msg="realized intent" component=agent intent="perform-update,prepare-update,ready update:true" worker=handler
    time="2020-07-13T22:12:07Z" level=debug msg="posted intent" component=agent intent="perform-update,perform-update,ready update:true" node=ip-192-168-4-80.us-west-2.compute.internal
    time="2020-07-13T22:12:07Z" level=debug msg="resource update event" component=agent worker=informer
    time="2020-07-13T22:12:07Z" level=debug msg="skipping emitted intent as event" component=agent intent="perform-update,perform-update,busy update:true" node=ip-192-168-4-80.us-west-2.compute.internal
    time="2020-07-13T22:12:07Z" level=debug msg="resource update event" component=agent worker=informer
    time="2020-07-13T22:12:07Z" level=debug msg="skipping emitted intent as event" component=agent intent="perform-update,perform-update,ready update:true" node=ip-192-168-4-80.us-west-2.compute.internal
    time="2020-07-13T22:12:07Z" level=debug msg="resource update event" component=agent worker=informer
    time="2020-07-13T22:12:07Z" level=debug msg="skipping emitted intent as event" component=agent intent="perform-update,perform-update,ready update:true" node=ip-192-168-4-80.us-west-2.compute.internal
    time="2020-07-13T22:12:07Z" level=debug msg="resource update event" component=agent worker=informer
    time="2020-07-13T22:12:07Z" level=debug msg="skipping emitted intent as event" component=agent intent="perform-update,perform-update,ready update:true" node=ip-192-168-4-80.us-west-2.compute.internal
    time="2020-07-13T22:12:07Z" level=debug msg="resource update event" component=agent worker=informer
    time="2020-07-13T22:12:07Z" level=debug msg="active intent received" component=agent intent="reboot-update,perform-update,ready update:true" node=ip-192-168-4-80.us-west-2.compute.internal
    time="2020-07-13T22:12:07Z" level=debug msg="handling intent" component=agent intent="reboot-update,perform-update,ready update:true" worker=handler
    time="2020-07-13T22:12:07Z" level=debug msg="posted intent" component=agent intent="reboot-update,reboot-update,busy update:true" node=ip-192-168-4-80.us-west-2.compute.internal
    time="2020-07-13T22:12:07Z" level=debug msg=rebooting component=agent intent="reboot-update,perform-update,ready update:true" worker=handler
    time="2020-07-13T22:12:07Z" level=info msg="Rebooting Node to complete update" component=agent intent="reboot-update,perform-update,ready update:true" worker=handler
    time="2020-07-13T22:12:07Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
    time="2020-07-13T22:12:07Z" level=debug msg="making 'POST' request to '/actions/reboot' API endpoint" component=api

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

@etungsten etungsten changed the title Host Update api integration Host Update API integration Jul 10, 2020
pkg/agent/agent.go Outdated Show resolved Hide resolved
pkg/agent/agent.go Outdated Show resolved Hide resolved
// Get the updater interface version from the node label
var platformVersion = node.Labels[marker.UpdaterInterfaceVersionKey]
switch platformVersion {
case "2.0.0":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to use exactly 2.0.0, or a semver-compatible range more akin to 2.x.y?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why not, but I also don't really see why we should. The platform version here is to differentiate hosts that can support the update API or not. There's really nothing much in between.

I think ideally the label would be more explicit. Instead of a version, it would just be the supported update platform, "updog" or "update-api".

pkg/agent/agent.go Outdated Show resolved Hide resolved
default:
log.Info("unknown platform version specified, defaulting to using updog")
fallthrough
case "1.0.0":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as above, do we want exactly 1.0.0?

pkg/platform/update-api/platform.go Outdated Show resolved Hide resolved
pkg/platform/update-api/update_api.go Outdated Show resolved Hide resolved
pkg/platform/update-api/update_api.go Outdated Show resolved Hide resolved
pkg/platform/update-api/update_api.go Outdated Show resolved Hide resolved
pkg/platform/update-api/update_api.go Outdated Show resolved Hide resolved
pkg/platform/update-api/update_api.go Outdated Show resolved Hide resolved
pkg/agent/agent.go Outdated Show resolved Hide resolved
pkg/agent/agent.go Outdated Show resolved Hide resolved
pkg/agent/agent.go Outdated Show resolved Hide resolved
pkg/platform/update-api/update_api.go Outdated Show resolved Hide resolved
pkg/platform/update-api/update_api_test.go Outdated Show resolved Hide resolved
pkg/platform/update-api/update_api.go Outdated Show resolved Hide resolved
pkg/platform/update-api/update_api.go Outdated Show resolved Hide resolved
pkg/platform/update-api/update_api_test.go Outdated Show resolved Hide resolved
@etungsten etungsten force-pushed the update-api-integration branch 2 times, most recently from a8893ea to 44d3078 Compare July 13, 2020 16:08
@etungsten
Copy link
Contributor Author

Push above addresses @samuelkarp and @jahkeup 's comments:

  • Tries to abstract out the direct interactions with the HTTP client from platform.go
  • Create separate deployments for updog specific platform or api specific platform.

@etungsten
Copy link
Contributor Author

etungsten commented Jul 13, 2020

Testing wise, I got a node to update via the operator using the update API. However, I'm running into an issue where the agent repeatedly tries to prepare the update even after each attempt succeeds. The agent spins like this for a while until it finally realizes the update is ready then finally perform-update and reboots.

See below:

Retrieves the available update fine, ready-update is set to true

time="2020-07-11T23:16:38Z" level=debug msg=sitrep component=agent intent="stabilize,stabilize,busy update:unknown" worker=handler
time="2020-07-11T23:16:38Z" level=debug msg="making 'GET' request to '/os' API endpoint" component=api
time="2020-07-11T23:16:38Z" level=info msg="current running OS version: 0.3.4" component=platform
time="2020-07-11T23:16:38Z" level=debug msg="fetching list of available updates" component=platform
time="2020-07-11T23:16:38Z" level=debug msg="making 'POST' request to '/actions/refresh-updates' API endpoint" component=api
time="2020-07-11T23:16:38Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
time="2020-07-11T23:16:38Z" level=info msg="API server busy, retrying in 5 seconds ..." component=api
time="2020-07-11T23:16:43Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
time="2020-07-11T23:16:43Z" level=info msg="an update is available" component=agent intent="stabilize,stabilize,busy update:unknown" update-available=true worker=handler
time="2020-07-11T23:16:43Z" level=debug msg="realized intent" component=agent intent="stabilize,stabilize,busy update:unknown" worker=handler
time="2020-07-11T23:16:43Z" level=debug msg="posted intent" component=agent intent="stabilize,stabilize,ready update:true" node=ip-192-168-1-1.us-west-2.compute.internal
time="2020-07-11T23:16:43Z" level=debug msg="resource update event" component=agent worker=informer
time="2020-07-11T23:16:43Z" level=debug msg="inactive intent received" component=agent intent="stabilize,stabilize,ready update:true" node=ip-192-168-1-1.us-west-2.compute.internal
time="2020-07-11T23:16:43Z" level=debug msg="resource update event" component=agent worker=informer
time="2020-07-11T23:16:43Z" level=debug msg="skipping duplicate received event" component=agent intent="stabilize,stabilize,busy update:unknown" node=ip-192-168-1-1.us-west-2.compute.internal
time="2020-07-11T23:16:43Z" level=debug msg="resource update event" component=agent worker=informer
time="2020-07-11T23:16:43Z" level=debug msg="inactive intent received" component=agent intent="stabilize,stabilize,ready update:true" node=ip-192-168-1-1.us-west-2.compute.internal
time="2020-07-11T23:16:49Z" level=debug msg="resource update event" component=agent worker=informer
time="2020-07-11T23:16:49Z" level=debug msg="inactive intent received" component=agent intent="stabilize,stabilize,ready update:true" node=ip-192-168-1-1.us-west-2.compute.internal
time="2020-07-11T23:17:49Z" level=debug msg="resource update event" component=agent worker=informer
time="2020-07-11T23:17:49Z" level=debug msg="inactive intent received" component=agent intent="stabilize,stabilize,ready update:true" node=ip-192-168-1-1.us-west-2.compute.internal
time="2020-07-11T23:17:49Z" level=debug msg="resource update event" component=agent worker=informer

Then the agent tries to prepare-update which eventually comes back fine and the intent is realized.

DEBU[2020-07-11T22:04:12Z] active intent received                        component=agent intent="prepare-update,stabilize,ready update:true" node=ip-192-168-30-63.us-west-2.compute.internal
DEBU[2020-07-11T22:04:12Z] handling intent                               component=agent intent="prepare-update,stabilize,ready update:true" worker=handler
DEBU[2020-07-11T22:04:12Z] posted intent                                 component=agent intent="prepare-update,prepare-update,busy update:true" node=ip-192-168-30-63.us-west-2.compute.internal
DEBU[2020-07-11T22:04:12Z] fetching list of available updates            component=platform
DEBU[2020-07-11T22:04:12Z] making 'POST' request to '/actions/refresh-updates' API endpoint  component=api
DEBU[2020-07-11T22:04:12Z] making 'GET' request to '/updates/status' API endpoint  component=api
INFO[2020-07-11T22:04:12Z] API server busy, retrying in 10 seconds ...   component=api
DEBU[2020-07-11T22:04:22Z] making 'GET' request to '/updates/status' API endpoint  component=api
DEBU[2020-07-11T22:04:22Z] preparing update                              component=agent intent="prepare-update,stabilize,ready update:true" worker=handler
DEBU[2020-07-11T22:04:22Z] making 'GET' request to '/updates/status' API endpoint  component=api
DEBU[2020-07-11T22:04:22Z] making 'POST' request to '/actions/prepare-update' API endpoint  component=api
DEBU[2020-07-11T22:04:22Z] making 'GET' request to '/updates/status' API endpoint  component=api
INFO[2020-07-11T22:04:22Z] API server busy, retrying in 10 seconds ...   component=api
DEBU[2020-07-11T22:04:32Z] making 'GET' request to '/updates/status' API endpoint  component=api
DEBU[2020-07-11T22:04:32Z] realized intent                               component=agent intent="prepare-update,stabilize,ready update:true" worker=handler

But then it goes back to posting the same intent again? It tries to re-prepare-update. This happens until it's "unable to realize intent"

DEBU[2020-07-11T22:04:32Z] posted intent                                 component=agent intent="prepare-update,prepare-update,ready update:true" node=ip-192-168-30-63.us-west-2.compute.internal
DEBU[2020-07-11T22:04:32Z] resource update event                         component=agent worker=informer
DEBU[2020-07-11T22:04:32Z] active intent received                        component=agent intent="prepare-update,prepare-update,busy update:true" node=ip-192-168-30-63.us-west-2.compute.internal
DEBU[2020-07-11T22:04:32Z] handling intent                               component=agent intent="prepare-update,prepare-update,busy update:true" worker=handler
DEBU[2020-07-11T22:04:32Z] posted intent                                 component=agent intent="prepare-update,prepare-update,busy update:true" node=ip-192-168-30-63.us-west-2.compute.internal
DEBU[2020-07-11T22:04:32Z] fetching list of available updates            component=platform
DEBU[2020-07-11T22:04:32Z] making 'POST' request to '/actions/refresh-updates' API endpoint  component=api
DEBU[2020-07-11T22:04:32Z] making 'GET' request to '/updates/status' API endpoint  component=api
INFO[2020-07-11T22:04:32Z] API server busy, retrying in 10 seconds ...   component=api
DEBU[2020-07-11T22:04:42Z] making 'GET' request to '/updates/status' API endpoint  component=api
DEBU[2020-07-11T22:04:42Z] preparing update                              component=agent intent="prepare-update,prepare-update,busy update:true" worker=handler
DEBU[2020-07-11T22:04:42Z] making 'GET' request to '/updates/status' API endpoint  component=api
DEBU[2020-07-11T22:04:42Z] making 'POST' request to '/actions/prepare-update' API endpoint  component=api
DEBU[2020-07-11T22:04:42Z] making 'GET' request to '/updates/status' API endpoint  component=api
INFO[2020-07-11T22:04:42Z] API server busy, retrying in 10 seconds ...   component=api
DEBU[2020-07-11T22:04:52Z] making 'GET' request to '/updates/status' API endpoint  component=api
DEBU[2020-07-11T22:04:52Z] realized intent                               component=agent intent="prepare-update,prepare-update,busy update:true" worker=handler
DEBU[2020-07-11T22:04:52Z] posted intent                                 component=agent intent="prepare-update,prepare-update,ready update:true" node=ip-192-168-30-63.us-west-2.compute.internal
DEBU[2020-07-11T22:04:52Z] resource update event                         component=agent worker=informer
DEBU[2020-07-11T22:04:52Z] inactive intent received                      component=agent intent="prepare-update,prepare-update,ready update:true" node=ip-192-168-30-63.us-west-2.compute.internal
DEBU[2020-07-11T22:04:52Z] resource update event                         component=agent worker=informer
DEBU[2020-07-11T22:04:52Z] active intent received                        component=agent intent="prepare-update,prepare-update,busy update:true" node=ip-192-168-30-63.us-west-2.compute.internal
DEBU[2020-07-11T22:04:52Z] handling intent                               component=agent intent="prepare-update,prepare-update,busy update:true" worker=handler
ERRO[2020-07-11T22:04:52Z] unable to realize intent                      component=agent error="unable to update node: Operation cannot be fulfilled on nodes \"ip-192-168-30-63.us-west-2.compute.internal\": the object has been modified; please apply your changes to the latest version and try again" intent="prepare-update,prepare-update,busy update:true" node=ip-192-168-30-63.us-west-2.compute.internal
DEBU[2020-07-11T22:04:52Z] resource update event                         component=agent worker=informer
DEBU[2020-07-11T22:04:52Z] skipping emitted intent as event              component=agent intent="prepare-update,prepare-update,busy update:true" node=ip-192-168-30-63.us-west-2.compute.internal
DEBU[2020-07-11T22:04:52Z] resource update event                         component=agent worker=informer
DEBU[2020-07-11T22:04:52Z] inactive intent received                      component=agent intent="prepare-update,prepare-update,ready update:true" node=ip-192-168-30-63.us-west-2.compute.internal

Then finally the agent decides to perform-update and reboots.

DEBU[2020-07-11T22:04:52Z] resource update event                         component=agent worker=informer
DEBU[2020-07-11T22:04:52Z] active intent received                        component=agent intent="perform-update,prepare-update,ready update:true" node=ip-192-168-30-63.us-west-2.compute.internal
DEBU[2020-07-11T22:04:52Z] handling intent                               component=agent intent="perform-update,prepare-update,ready update:true" worker=handler
DEBU[2020-07-11T22:04:52Z] posted intent                                 component=agent intent="perform-update,perform-update,busy update:true" node=ip-192-168-30-63.us-west-2.compute.internal
DEBU[2020-07-11T22:04:52Z] updating                                      component=agent intent="perform-update,prepare-update,ready update:true" worker=handler
DEBU[2020-07-11T22:04:52Z] making 'GET' request to '/updates/status' API endpoint  component=api
DEBU[2020-07-11T22:04:52Z] making 'POST' request to '/actions/activate-update' API endpoint  component=api
DEBU[2020-07-11T22:04:52Z] making 'GET' request to '/updates/status' API endpoint  component=api
INFO[2020-07-11T22:04:52Z] API server busy, retrying in 10 seconds ...   component=api
DEBU[2020-07-11T22:05:02Z] making 'GET' request to '/updates/status' API endpoint  component=api
DEBU[2020-07-11T22:05:02Z] realized intent                               component=agent intent="perform-update,prepare-update,ready update:true" worker=handler
DEBU[2020-07-11T22:05:02Z] posted intent                                 component=agent intent="perform-update,perform-update,ready update:true" node=ip-192-168-30-63.us-west-2.compute.internal
DEBU[2020-07-11T22:05:02Z] resource update event                         component=agent worker=informer
DEBU[2020-07-11T22:05:02Z] inactive intent received                      component=agent intent="perform-update,perform-update,busy update:true" node=ip-192-168-30-63.us-west-2.compute.internal
DEBU[2020-07-11T22:05:02Z] resource update event                         component=agent worker=informer
DEBU[2020-07-11T22:05:02Z] inactive intent received                      component=agent intent="perform-update,perform-update,ready update:true" node=ip-192-168-30-63.us-west-2.compute.internal
DEBU[2020-07-11T22:05:02Z] resource update event                         component=agent worker=informer
DEBU[2020-07-11T22:05:02Z] inactive intent received                      component=agent intent="perform-update,perform-update,ready update:true" node=ip-192-168-30-63.us-west-2.compute.internal
DEBU[2020-07-11T22:05:02Z] resource update event                         component=agent worker=informer
DEBU[2020-07-11T22:05:02Z] inactive intent received                      component=agent intent="perform-update,perform-update,ready update:true" node=ip-192-168-30-63.us-west-2.compute.internal
DEBU[2020-07-11T22:05:02Z] resource update event                         component=agent worker=informer
DEBU[2020-07-11T22:05:02Z] active intent received                        component=agent intent="reboot-update,perform-update,ready update:true" node=ip-192-168-30-63.us-west-2.compute.internal
DEBU[2020-07-11T22:05:02Z] handling intent                               component=agent intent="reboot-update,perform-update,ready update:true" worker=handler
DEBU[2020-07-11T22:05:02Z] posted intent                                 component=agent intent="reboot-update,reboot-update,busy update:true" node=ip-192-168-30-63.us-west-2.compute.internal
DEBU[2020-07-11T22:05:02Z] rebooting                                     component=agent intent="reboot-update,perform-update,ready update:true" worker=handler
INFO[2020-07-11T22:05:02Z] Rebooting Node to complete update             component=agent intent="reboot-update,perform-update,ready update:true" worker=handler
DEBU[2020-07-11T22:05:02Z] making 'GET' request to '/updates/status' API endpoint  component=api
DEBU[2020-07-11T22:05:02Z] making 'POST' request to '/actions/reboot' API endpoint  component=api
INFO[2020-07-11T22:05:02Z] waiting on workers to finish                  component=agent
DEBU[2020-07-11T22:05:02Z] finished     

Not entirely sure why this happens. I suspect it has something to with me sleeping to wait for the /updates/status to be ready?

Logs from another attempt:

time="2020-07-11T23:16:33Z" level=debug msg=starting component=agent
time="2020-07-11T23:16:33Z" level=debug msg="posted intent" component=agent intent=",, update:" node=ip-192-168-1-1.us-west-2.compute.internal
time="2020-07-11T23:16:33Z" level=debug msg=starting component=agent worker=informer
time="2020-07-11T23:16:33Z" level=debug msg="resource add event" component=agent worker=informer
time="2020-07-11T23:16:33Z" level=debug msg="inactive intent received" component=agent intent=",, update:" node=ip-192-168-1-1.us-west-2.compute.internal
time="2020-07-11T23:16:33Z" level=debug msg="resource update event" component=agent worker=informer
time="2020-07-11T23:16:33Z" level=debug msg="active intent received" component=agent intent="stabilize,unknown,unknown update:unknown" node=ip-192-168-1-1.us-west-2.compute.internal
time="2020-07-11T23:16:33Z" level=debug msg="handling intent" component=agent intent="stabilize,unknown,unknown update:unknown" worker=handler
time="2020-07-11T23:16:33Z" level=debug msg="posted intent" component=agent intent="stabilize,stabilize,busy update:unknown" node=ip-192-168-1-1.us-west-2.compute.internal
time="2020-07-11T23:16:33Z" level=debug msg=sitrep component=agent intent="stabilize,unknown,unknown update:unknown" worker=handler
time="2020-07-11T23:16:33Z" level=debug msg="making 'GET' request to '/os' API endpoint" component=api
time="2020-07-11T23:16:33Z" level=info msg="current running OS version: 0.3.4" component=platform
time="2020-07-11T23:16:33Z" level=debug msg="fetching list of available updates" component=platform
time="2020-07-11T23:16:33Z" level=debug msg="making 'POST' request to '/actions/refresh-updates' API endpoint" component=api
time="2020-07-11T23:16:33Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
time="2020-07-11T23:16:33Z" level=info msg="API server busy, retrying in 5 seconds ..." component=api
time="2020-07-11T23:16:38Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
time="2020-07-11T23:16:38Z" level=info msg="an update is available" component=agent intent="stabilize,unknown,unknown update:unknown" update-available=true worker=handler
time="2020-07-11T23:16:38Z" level=debug msg="realized intent" component=agent intent="stabilize,unknown,unknown update:unknown" worker=handler
time="2020-07-11T23:16:38Z" level=debug msg="posted intent" component=agent intent="stabilize,stabilize,ready update:true" node=ip-192-168-1-1.us-west-2.compute.internal
time="2020-07-11T23:16:38Z" level=debug msg="resource update event" component=agent worker=informer
time="2020-07-11T23:16:38Z" level=debug msg="active intent received" component=agent intent="stabilize,stabilize,busy update:unknown" node=ip-192-168-1-1.us-west-2.compute.internal
time="2020-07-11T23:16:38Z" level=debug msg="handling intent" component=agent intent="stabilize,stabilize,busy update:unknown" worker=handler
time="2020-07-11T23:16:38Z" level=debug msg="posted intent" component=agent intent="stabilize,stabilize,busy update:unknown" node=ip-192-168-1-1.us-west-2.compute.internal
time="2020-07-11T23:16:38Z" level=debug msg=sitrep component=agent intent="stabilize,stabilize,busy update:unknown" worker=handler
time="2020-07-11T23:16:38Z" level=debug msg="making 'GET' request to '/os' API endpoint" component=api
time="2020-07-11T23:16:38Z" level=info msg="current running OS version: 0.3.4" component=platform
time="2020-07-11T23:16:38Z" level=debug msg="fetching list of available updates" component=platform
time="2020-07-11T23:16:38Z" level=debug msg="making 'POST' request to '/actions/refresh-updates' API endpoint" component=api
time="2020-07-11T23:16:38Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
time="2020-07-11T23:16:38Z" level=info msg="API server busy, retrying in 5 seconds ..." component=api
time="2020-07-11T23:16:43Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
time="2020-07-11T23:16:43Z" level=info msg="an update is available" component=agent intent="stabilize,stabilize,busy update:unknown" update-available=true worker=handler
time="2020-07-11T23:16:43Z" level=debug msg="realized intent" component=agent intent="stabilize,stabilize,busy update:unknown" worker=handler
time="2020-07-11T23:16:43Z" level=debug msg="posted intent" component=agent intent="stabilize,stabilize,ready update:true" node=ip-192-168-1-1.us-west-2.compute.internal
time="2020-07-11T23:16:43Z" level=debug msg="resource update event" component=agent worker=informer
time="2020-07-11T23:16:43Z" level=debug msg="inactive intent received" component=agent intent="stabilize,stabilize,ready update:true" node=ip-192-168-1-1.us-west-2.compute.internal
time="2020-07-11T23:16:43Z" level=debug msg="resource update event" component=agent worker=informer
time="2020-07-11T23:16:43Z" level=debug msg="skipping duplicate received event" component=agent intent="stabilize,stabilize,busy update:unknown" node=ip-192-168-1-1.us-west-2.compute.internal
time="2020-07-11T23:16:43Z" level=debug msg="resource update event" component=agent worker=informer
time="2020-07-11T23:16:43Z" level=debug msg="inactive intent received" component=agent intent="stabilize,stabilize,ready update:true" node=ip-192-168-1-1.us-west-2.compute.internal
time="2020-07-11T23:16:49Z" level=debug msg="resource update event" component=agent worker=informer
time="2020-07-11T23:16:49Z" level=debug msg="inactive intent received" component=agent intent="stabilize,stabilize,ready update:true" node=ip-192-168-1-1.us-west-2.compute.internal
time="2020-07-11T23:17:49Z" level=debug msg="resource update event" component=agent worker=informer
time="2020-07-11T23:17:49Z" level=debug msg="inactive intent received" component=agent intent="stabilize,stabilize,ready update:true" node=ip-192-168-1-1.us-west-2.compute.internal
time="2020-07-11T23:17:49Z" level=debug msg="resource update event" component=agent worker=informer
time="2020-07-11T23:17:49Z" level=debug msg="active intent received" component=agent intent="prepare-update,stabilize,ready update:true" node=ip-192-168-1-1.us-west-2.compute.internal
time="2020-07-11T23:17:49Z" level=debug msg="handling intent" component=agent intent="prepare-update,stabilize,ready update:true" worker=handler
time="2020-07-11T23:17:49Z" level=debug msg="posted intent" component=agent intent="prepare-update,prepare-update,busy update:true" node=ip-192-168-1-1.us-west-2.compute.internal
time="2020-07-11T23:17:49Z" level=debug msg="fetching list of available updates" component=platform
time="2020-07-11T23:17:49Z" level=debug msg="making 'POST' request to '/actions/refresh-updates' API endpoint" component=api
time="2020-07-11T23:17:49Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
time="2020-07-11T23:17:49Z" level=info msg="API server busy, retrying in 5 seconds ..." component=api
time="2020-07-11T23:17:54Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
time="2020-07-11T23:17:54Z" level=debug msg="preparing update" component=agent intent="prepare-update,stabilize,ready update:true" worker=handler
time="2020-07-11T23:17:54Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
time="2020-07-11T23:17:54Z" level=debug msg="making 'POST' request to '/actions/prepare-update' API endpoint" component=api
time="2020-07-11T23:17:54Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
time="2020-07-11T23:17:54Z" level=info msg="API server busy, retrying in 5 seconds ..." component=api
time="2020-07-11T23:17:59Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
time="2020-07-11T23:17:59Z" level=info msg="API server busy, retrying in 5 seconds ..." component=api
time="2020-07-11T23:18:04Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
time="2020-07-11T23:18:04Z" level=info msg="API server busy, retrying in 5 seconds ..." component=api
time="2020-07-11T23:18:09Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
time="2020-07-11T23:18:09Z" level=debug msg="realized intent" component=agent intent="prepare-update,stabilize,ready update:true" worker=handler
time="2020-07-11T23:18:09Z" level=debug msg="posted intent" component=agent intent="prepare-update,prepare-update,ready update:true" node=ip-192-168-1-1.us-west-2.compute.internal
time="2020-07-11T23:18:09Z" level=debug msg="resource update event" component=agent worker=informer
time="2020-07-11T23:18:09Z" level=debug msg="active intent received" component=agent intent="prepare-update,prepare-update,busy update:true" node=ip-192-168-1-1.us-west-2.compute.internal
time="2020-07-11T23:18:09Z" level=debug msg="handling intent" component=agent intent="prepare-update,prepare-update,busy update:true" worker=handler
time="2020-07-11T23:18:09Z" level=debug msg="posted intent" component=agent intent="prepare-update,prepare-update,busy update:true" node=ip-192-168-1-1.us-west-2.compute.internal
time="2020-07-11T23:18:09Z" level=debug msg="fetching list of available updates" component=platform
time="2020-07-11T23:18:09Z" level=debug msg="making 'POST' request to '/actions/refresh-updates' API endpoint" component=api
time="2020-07-11T23:18:09Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
time="2020-07-11T23:18:09Z" level=info msg="API server busy, retrying in 5 seconds ..." component=api
time="2020-07-11T23:18:14Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
time="2020-07-11T23:18:14Z" level=debug msg="preparing update" component=agent intent="prepare-update,prepare-update,busy update:true" worker=handler
time="2020-07-11T23:18:14Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
time="2020-07-11T23:18:14Z" level=debug msg="making 'POST' request to '/actions/prepare-update' API endpoint" component=api
time="2020-07-11T23:18:14Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
time="2020-07-11T23:18:14Z" level=info msg="API server busy, retrying in 5 seconds ..." component=api
time="2020-07-11T23:18:19Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
time="2020-07-11T23:18:19Z" level=info msg="API server busy, retrying in 5 seconds ..." component=api
time="2020-07-11T23:18:24Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
time="2020-07-11T23:18:24Z" level=debug msg="realized intent" component=agent intent="prepare-update,prepare-update,busy update:true" worker=handler
time="2020-07-11T23:18:24Z" level=debug msg="posted intent" component=agent intent="prepare-update,prepare-update,ready update:true" node=ip-192-168-1-1.us-west-2.compute.internal
time="2020-07-11T23:18:24Z" level=debug msg="resource update event" component=agent worker=informer
time="2020-07-11T23:18:24Z" level=debug msg="inactive intent received" component=agent intent="prepare-update,prepare-update,ready update:true" node=ip-192-168-1-1.us-west-2.compute.internal
time="2020-07-11T23:18:24Z" level=debug msg="resource update event" component=agent worker=informer
time="2020-07-11T23:18:24Z" level=debug msg="active intent received" component=agent intent="prepare-update,prepare-update,busy update:true" node=ip-192-168-1-1.us-west-2.compute.internal
time="2020-07-11T23:18:24Z" level=debug msg="handling intent" component=agent intent="prepare-update,prepare-update,busy update:true" worker=handler
time="2020-07-11T23:18:24Z" level=debug msg="posted intent" component=agent intent="prepare-update,prepare-update,busy update:true" node=ip-192-168-1-1.us-west-2.compute.internal
time="2020-07-11T23:18:24Z" level=debug msg="fetching list of available updates" component=platform
time="2020-07-11T23:18:24Z" level=debug msg="making 'POST' request to '/actions/refresh-updates' API endpoint" component=api
time="2020-07-11T23:18:24Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
time="2020-07-11T23:18:24Z" level=info msg="API server busy, retrying in 5 seconds ..." component=api
time="2020-07-11T23:18:29Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
time="2020-07-11T23:18:29Z" level=debug msg="preparing update" component=agent intent="prepare-update,prepare-update,busy update:true" worker=handler
time="2020-07-11T23:18:29Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
time="2020-07-11T23:18:29Z" level=debug msg="making 'POST' request to '/actions/prepare-update' API endpoint" component=api
time="2020-07-11T23:18:29Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
time="2020-07-11T23:18:29Z" level=info msg="API server busy, retrying in 5 seconds ..." component=api
time="2020-07-11T23:18:34Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
time="2020-07-11T23:18:34Z" level=info msg="API server busy, retrying in 5 seconds ..." component=api
time="2020-07-11T23:18:39Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
time="2020-07-11T23:18:39Z" level=debug msg="realized intent" component=agent intent="prepare-update,prepare-update,busy update:true" worker=handler
...

prepare-update happens several more times here... until it finally breaks out of the loop

time="2020-07-11T23:22:14Z" level=debug msg="realized intent" component=agent intent="prepare-update,prepare-update,busy update:true" worker=handler
time="2020-07-11T23:22:14Z" level=debug msg="posted intent" component=agent intent="prepare-update,prepare-update,ready update:true" node=ip-192-168-1-1.us-west-2.compute.internal
time="2020-07-11T23:22:14Z" level=debug msg="resource update event" component=agent worker=informer
time="2020-07-11T23:22:14Z" level=debug msg="active intent received" component=agent intent="perform-update,prepare-update,ready update:true" node=ip-192-168-1-1.us-west-2.compute.internal
time="2020-07-11T23:22:14Z" level=debug msg="handling intent" component=agent intent="perform-update,prepare-update,ready update:true" worker=handler
time="2020-07-11T23:22:14Z" level=debug msg="posted intent" component=agent intent="perform-update,perform-update,busy update:true" node=ip-192-168-1-1.us-west-2.compute.internal
time="2020-07-11T23:22:14Z" level=debug msg=updating component=agent intent="perform-update,prepare-update,ready update:true" worker=handler
time="2020-07-11T23:22:14Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
time="2020-07-11T23:22:14Z" level=debug msg="making 'POST' request to '/actions/activate-update' API endpoint" component=api
time="2020-07-11T23:22:14Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
time="2020-07-11T23:22:14Z" level=info msg="API server busy, retrying in 5 seconds ..." component=api
time="2020-07-11T23:22:19Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
time="2020-07-11T23:22:19Z" level=debug msg="realized intent" component=agent intent="perform-update,prepare-update,ready update:true" worker=handler
time="2020-07-11T23:22:19Z" level=debug msg="posted intent" component=agent intent="perform-update,perform-update,ready update:true" node=ip-192-168-1-1.us-west-2.compute.internal
time="2020-07-11T23:22:19Z" level=debug msg="resource update event" component=agent worker=informer
time="2020-07-11T23:22:19Z" level=debug msg="inactive intent received" component=agent intent="prepare-update,prepare-update,ready update:true" node=ip-192-168-1-1.us-west-2.compute.internal
time="2020-07-11T23:22:19Z" level=debug msg="resource update event" component=agent worker=informer
time="2020-07-11T23:22:19Z" level=debug msg="active intent received" component=agent intent="prepare-update,prepare-update,busy update:true" node=ip-192-168-1-1.us-west-2.compute.internal
time="2020-07-11T23:22:19Z" level=debug msg="handling intent" component=agent intent="prepare-update,prepare-update,busy update:true" worker=handler
time="2020-07-11T23:22:19Z" level=debug msg="posted intent" component=agent intent="prepare-update,prepare-update,busy update:true" node=ip-192-168-1-1.us-west-2.compute.internal
time="2020-07-11T23:22:19Z" level=debug msg="fetching list of available updates" component=platform
time="2020-07-11T23:22:19Z" level=debug msg="making 'POST' request to '/actions/refresh-updates' API endpoint" component=api
time="2020-07-11T23:22:19Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
time="2020-07-11T23:22:19Z" level=info msg="API server busy, retrying in 5 seconds ..." component=api
time="2020-07-11T23:22:24Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
time="2020-07-11T23:22:24Z" level=debug msg="preparing update" component=agent intent="prepare-update,prepare-update,busy update:true" worker=handler
time="2020-07-11T23:22:24Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
time="2020-07-11T23:22:24Z" level=error msg="could not realize intent" component=agent error="unexpected update state: Ready, expecting state to be 'Available' or 'Staged'. update action performed out of band?" intent="prepare-update,prepare-update,busy update:true" worker=handler
time="2020-07-11T23:22:24Z" level=debug msg="posted intent" component=agent intent="prepare-update,prepare-update,error update:true" node=ip-192-168-1-1.us-west-2.compute.internal
time="2020-07-11T23:22:24Z" level=error msg="unable to realize intent" component=agent error="unexpected update state: Ready, expecting state to be 'Available' or 'Staged'. update action performed out of band?" intent="prepare-update,prepare-update,busy update:true" node=ip-192-168-1-1.us-west-2.compute.internal
time="2020-07-11T23:22:24Z" level=debug msg="resource update event" component=agent worker=informer
time="2020-07-11T23:22:24Z" level=debug msg="inactive intent received" component=agent intent="prepare-update,prepare-update,ready update:true" node=ip-192-168-1-1.us-west-2.compute.internal
time="2020-07-11T23:22:24Z" level=debug msg="resource update event" component=agent worker=informer
time="2020-07-11T23:22:24Z" level=debug msg="skipping duplicate received event" component=agent intent="prepare-update,prepare-update,busy update:true" node=ip-192-168-1-1.us-west-2.compute.internal
time="2020-07-11T23:22:24Z" level=debug msg="resource update event" component=agent worker=informer
time="2020-07-11T23:22:24Z" level=debug msg="inactive intent received" component=agent intent="prepare-update,prepare-update,ready update:true" node=ip-192-168-1-1.us-west-2.compute.internal
time="2020-07-11T23:22:24Z" level=debug msg="resource update event" component=agent worker=informer
time="2020-07-11T23:22:24Z" level=debug msg="skipping duplicate received event" component=agent intent="prepare-update,prepare-update,busy update:true" node=ip-192-168-1-1.us-west-2.compute.internal
time="2020-07-11T23:22:24Z" level=debug msg="resource update event" component=agent worker=informer
time="2020-07-11T23:22:24Z" level=debug msg="skipping duplicate received event" component=agent intent="prepare-update,prepare-update,busy update:true" node=ip-192-168-1-1.us-west-2.compute.internal
time="2020-07-11T23:22:24Z" level=debug msg="resource update event" component=agent worker=informer
time="2020-07-11T23:22:24Z" level=debug msg="inactive intent received" component=agent intent="prepare-update,prepare-update,ready update:true" node=ip-192-168-1-1.us-west-2.compute.internal
time="2020-07-11T23:22:24Z" level=debug msg="resource update event" component=agent worker=informer
time="2020-07-11T23:22:24Z" level=debug msg="skipping duplicate received event" component=agent intent="prepare-update,prepare-update,busy update:true" node=ip-192-168-1-1.us-west-2.compute.internal
time="2020-07-11T23:22:24Z" level=debug msg="resource update event" component=agent worker=informer
time="2020-07-11T23:22:24Z" level=debug msg="inactive intent received" component=agent intent="prepare-update,prepare-update,ready update:true" node=ip-192-168-1-1.us-west-2.compute.internal
time="2020-07-11T23:22:24Z" level=debug msg="resource update event" component=agent worker=informer
time="2020-07-11T23:22:24Z" level=debug msg="inactive intent received" component=agent intent="perform-update,perform-update,busy update:true" node=ip-192-168-1-1.us-west-2.compute.internal
time="2020-07-11T23:22:24Z" level=debug msg="resource update event" component=agent worker=informer
time="2020-07-11T23:22:24Z" level=debug msg="inactive intent received" component=agent intent="perform-update,perform-update,ready update:true" node=ip-192-168-1-1.us-west-2.compute.internal
time="2020-07-11T23:22:24Z" level=debug msg="resource update event" component=agent worker=informer
time="2020-07-11T23:22:24Z" level=debug msg="skipping duplicate received event" component=agent intent="prepare-update,prepare-update,busy update:true" node=ip-192-168-1-1.us-west-2.compute.internal
time="2020-07-11T23:22:24Z" level=debug msg="resource update event" component=agent worker=informer
time="2020-07-11T23:22:24Z" level=debug msg="skipping duplicate received event" component=agent intent="prepare-update,prepare-update,busy update:true" node=ip-192-168-1-1.us-west-2.compute.internal
time="2020-07-11T23:22:24Z" level=debug msg="resource update event" component=agent worker=informer
time="2020-07-11T23:22:24Z" level=debug msg="skipping duplicate received event" component=agent intent="prepare-update,prepare-update,busy update:true" node=ip-192-168-1-1.us-west-2.compute.internal
time="2020-07-11T23:22:24Z" level=debug msg="resource update event" component=agent worker=informer
time="2020-07-11T23:22:24Z" level=debug msg="active intent received" component=agent intent="reboot-update,perform-update,ready update:true" node=ip-192-168-1-1.us-west-2.compute.internal
time="2020-07-11T23:22:24Z" level=debug msg="handling intent" component=agent intent="reboot-update,perform-update,ready update:true" worker=handler
time="2020-07-11T23:22:24Z" level=debug msg="posted intent" component=agent intent="reboot-update,reboot-update,busy update:true" node=ip-192-168-1-1.us-west-2.compute.internal
time="2020-07-11T23:22:24Z" level=debug msg=rebooting component=agent intent="reboot-update,perform-update,ready update:true" worker=handler
time="2020-07-11T23:22:24Z" level=info msg="Rebooting Node to complete update" component=agent intent="reboot-update,perform-update,ready update:true" worker=handler
time="2020-07-11T23:22:24Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
time="2020-07-11T23:22:24Z" level=debug msg="making 'POST' request to '/actions/reboot' API endpoint" component=api
time="2020-07-11T23:22:24Z" level=info msg="waiting on workers to finish" component=agent
time="2020-07-11T23:22:24Z" level=debug msg=finished component=agent worker=update-checker

@jahkeup
Copy link
Member

jahkeup commented Jul 13, 2020

Not entirely sure why this happens. I suspect it has something to with me
sleeping to wait for the /updates/status to be ready?

I don't think it is the sleep causing issue. The main event loop shouldn't be
updating the Node's status until the platform returns its result (respecting the
time spent sleeping!). However, during the time between, the Node shouldn't be
responsive nor updating its posted intent's status - my guess is that there's a
conflict between the controller and agent's posting of the resource object on
the Kubernetes control plane.

But then it goes back to posting the same intent again? It tries to
re-prepare-update. This happens until it's "unable to realize intent"

We've seen similiar behavior with similiar messages. This makes me think it
might be due to the conflict handling in this case:

ERRO[2020-07-11T22:04:52Z] unable to realize intent
component=agent
intent="prepare-update,prepare-update,busy update:true"
node=ip-192-168-30-63.us-west-2.compute.internal
error="unable to update node: Operation cannot be fulfilled on nodes "ip-192-168-30-63.us-west-2.compute.internal":
the object has been modified; please apply your changes to the latest version and try again"

When this happens, the Kubernetes control plane is literally rejecting the
update that the agent is trying to make to itself. Typically, this is because
there was an unobserved update to the resource that the agent isn't including in
its request to POST its update (this happens when the controller posts an intent
or by other object edits).

I think agent.postIntent may be one source of concern and issue. The current
logic

records intents that received an error in response.

Reading over the rest of the lastCache and tracker code paths in the agent,
I think the method should read:

func (a *Agent) postIntent(in *intent.Intent) error {
	err := a.poster.Post(in)
	if err == nil {
		// track the successfully updated intent.
		a.tracker.recordPost(in)
	}
	return err
}

It's worth trying again with that method updated to see if the looping subsides.

Also, if you are able to get the controller's logs at the same time that you saw
the above messages then we might see what it thought was going on.

I'll try out updating a cluster myself, though probably not until later today or
tomorrow. If you have time to and are able to quickly reproduce the issue, then
please do and share the logs from both the agent and the controller.

Fixes postIntent logic so it properly records the intent if no error
happened
@etungsten
Copy link
Contributor Author

Fixed the issue described in #35 (comment) thanks to @jahkeup 's suggestion. Now the node properly updates without spinning in place:

time="2020-07-13T22:38:03Z" level=debug msg=starting component=agent
time="2020-07-13T22:38:03Z" level=debug msg="posted intent" component=agent intent=",, update:" node=ip-192-168-25-121.us-west-2.compute.internal
time="2020-07-13T22:38:03Z" level=debug msg=starting component=agent worker=informer
time="2020-07-13T22:38:03Z" level=debug msg="resource add event" component=agent worker=informer
time="2020-07-13T22:38:03Z" level=debug msg="skipping emitted intent as event" component=agent intent=",, update:" node=ip-192-168-25-121.us-west-2.compute.internal
time="2020-07-13T22:38:19Z" level=debug msg="resource update event" component=agent worker=informer
time="2020-07-13T22:38:19Z" level=debug msg="skipping emitted intent as event" component=agent intent=",, update:" node=ip-192-168-25-121.us-west-2.compute.internal
time="2020-07-13T22:38:19Z" level=debug msg="resource update event" component=agent worker=informer
time="2020-07-13T22:38:19Z" level=debug msg="active intent received" component=agent intent="stabilize,unknown,unknown update:unknown" node=ip-192-168-25-121.us-west-2.compute.internal
time="2020-07-13T22:38:19Z" level=debug msg="handling intent" component=agent intent="stabilize,unknown,unknown update:unknown" worker=handler
time="2020-07-13T22:38:19Z" level=debug msg="posted intent" component=agent intent="stabilize,stabilize,busy update:unknown" node=ip-192-168-25-121.us-west-2.compute.internal
time="2020-07-13T22:38:19Z" level=debug msg=sitrep component=agent intent="stabilize,unknown,unknown update:unknown" worker=handler
time="2020-07-13T22:38:19Z" level=debug msg="making 'GET' request to '/os' API endpoint" component=api
time="2020-07-13T22:38:19Z" level=info msg="current running OS version: 0.3.4" component=platform
time="2020-07-13T22:38:19Z" level=debug msg="fetching list of available updates" component=platform
time="2020-07-13T22:38:19Z" level=debug msg="making 'POST' request to '/actions/refresh-updates' API endpoint" component=api
time="2020-07-13T22:38:19Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
time="2020-07-13T22:38:19Z" level=info msg="API server busy, retrying in 10 seconds ..." component=api
time="2020-07-13T22:38:29Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
time="2020-07-13T22:38:29Z" level=info msg="an update is available" component=agent intent="stabilize,unknown,unknown update:unknown" update-available=true worker=handler
time="2020-07-13T22:38:29Z" level=debug msg="realized intent" component=agent intent="stabilize,unknown,unknown update:unknown" worker=handler
time="2020-07-13T22:38:29Z" level=debug msg="posted intent" component=agent intent="stabilize,stabilize,ready update:true" node=ip-192-168-25-121.us-west-2.compute.internal
time="2020-07-13T22:38:29Z" level=debug msg="resource update event" component=agent worker=informer
time="2020-07-13T22:38:29Z" level=debug msg="skipping emitted intent as event" component=agent intent="stabilize,stabilize,busy update:unknown" node=ip-192-168-25-121.us-west-2.compute.internal
time="2020-07-13T22:38:29Z" level=debug msg="resource update event" component=agent worker=informer
time="2020-07-13T22:38:29Z" level=debug msg="skipping emitted intent as event" component=agent intent="stabilize,stabilize,ready update:true" node=ip-192-168-25-121.us-west-2.compute.internal
time="2020-07-13T22:38:29Z" level=debug msg="resource update event" component=agent worker=informer
time="2020-07-13T22:38:29Z" level=debug msg="active intent received" component=agent intent="prepare-update,stabilize,ready update:true" node=ip-192-168-25-121.us-west-2.compute.internal
time="2020-07-13T22:38:29Z" level=debug msg="handling intent" component=agent intent="prepare-update,stabilize,ready update:true" worker=handler
time="2020-07-13T22:38:29Z" level=debug msg="posted intent" component=agent intent="prepare-update,prepare-update,busy update:true" node=ip-192-168-25-121.us-west-2.compute.internal
time="2020-07-13T22:38:29Z" level=debug msg="fetching list of available updates" component=platform
time="2020-07-13T22:38:29Z" level=debug msg="making 'POST' request to '/actions/refresh-updates' API endpoint" component=api
time="2020-07-13T22:38:29Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
time="2020-07-13T22:38:29Z" level=info msg="API server busy, retrying in 10 seconds ..." component=api
time="2020-07-13T22:38:39Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
time="2020-07-13T22:38:39Z" level=debug msg="preparing update" component=agent intent="prepare-update,stabilize,ready update:true" worker=handler
time="2020-07-13T22:38:39Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
time="2020-07-13T22:38:39Z" level=debug msg="making 'POST' request to '/actions/prepare-update' API endpoint" component=api
time="2020-07-13T22:38:39Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
time="2020-07-13T22:38:39Z" level=info msg="API server busy, retrying in 10 seconds ..." component=api
time="2020-07-13T22:38:49Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
time="2020-07-13T22:38:49Z" level=debug msg="realized intent" component=agent intent="prepare-update,stabilize,ready update:true" worker=handler
time="2020-07-13T22:38:49Z" level=debug msg="posted intent" component=agent intent="prepare-update,prepare-update,ready update:true" node=ip-192-168-25-121.us-west-2.compute.internal
time="2020-07-13T22:38:49Z" level=debug msg="resource update event" component=agent worker=informer
time="2020-07-13T22:38:49Z" level=debug msg="skipping emitted intent as event" component=agent intent="prepare-update,prepare-update,busy update:true" node=ip-192-168-25-121.us-west-2.compute.internal
time="2020-07-13T22:38:49Z" level=debug msg="resource update event" component=agent worker=informer
time="2020-07-13T22:38:49Z" level=debug msg="skipping emitted intent as event" component=agent intent="prepare-update,prepare-update,ready update:true" node=ip-192-168-25-121.us-west-2.compute.internal
time="2020-07-13T22:38:49Z" level=debug msg="resource update event" component=agent worker=informer
time="2020-07-13T22:38:49Z" level=debug msg="active intent received" component=agent intent="perform-update,prepare-update,ready update:true" node=ip-192-168-25-121.us-west-2.compute.internal
time="2020-07-13T22:38:49Z" level=debug msg="handling intent" component=agent intent="perform-update,prepare-update,ready update:true" worker=handler
time="2020-07-13T22:38:49Z" level=debug msg="posted intent" component=agent intent="perform-update,perform-update,busy update:true" node=ip-192-168-25-121.us-west-2.compute.internal
time="2020-07-13T22:38:49Z" level=debug msg=updating component=agent intent="perform-update,prepare-update,ready update:true" worker=handler
time="2020-07-13T22:38:49Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
time="2020-07-13T22:38:49Z" level=debug msg="making 'POST' request to '/actions/activate-update' API endpoint" component=api
time="2020-07-13T22:38:49Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
time="2020-07-13T22:38:49Z" level=info msg="API server busy, retrying in 10 seconds ..." component=api
time="2020-07-13T22:38:59Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
time="2020-07-13T22:38:59Z" level=debug msg="realized intent" component=agent intent="perform-update,prepare-update,ready update:true" worker=handler
time="2020-07-13T22:38:59Z" level=debug msg="posted intent" component=agent intent="perform-update,perform-update,ready update:true" node=ip-192-168-25-121.us-west-2.compute.internal
time="2020-07-13T22:38:59Z" level=debug msg="resource update event" component=agent worker=informer
time="2020-07-13T22:38:59Z" level=debug msg="skipping emitted intent as event" component=agent intent="perform-update,perform-update,busy update:true" node=ip-192-168-25-121.us-west-2.compute.internal
time="2020-07-13T22:38:59Z" level=debug msg="resource update event" component=agent worker=informer
time="2020-07-13T22:38:59Z" level=debug msg="skipping emitted intent as event" component=agent intent="perform-update,perform-update,ready update:true" node=ip-192-168-25-121.us-west-2.compute.internal
time="2020-07-13T22:38:59Z" level=debug msg="resource update event" component=agent worker=informer
time="2020-07-13T22:38:59Z" level=debug msg="skipping emitted intent as event" component=agent intent="perform-update,perform-update,ready update:true" node=ip-192-168-25-121.us-west-2.compute.internal
time="2020-07-13T22:38:59Z" level=debug msg="resource update event" component=agent worker=informer
time="2020-07-13T22:38:59Z" level=debug msg="skipping emitted intent as event" component=agent intent="perform-update,perform-update,ready update:true" node=ip-192-168-25-121.us-west-2.compute.internal
time="2020-07-13T22:38:59Z" level=debug msg="resource update event" component=agent worker=informer
time="2020-07-13T22:38:59Z" level=debug msg="active intent received" component=agent intent="reboot-update,perform-update,ready update:true" node=ip-192-168-25-121.us-west-2.compute.internal
time="2020-07-13T22:38:59Z" level=debug msg="handling intent" component=agent intent="reboot-update,perform-update,ready update:true" worker=handler
time="2020-07-13T22:38:59Z" level=debug msg="posted intent" component=agent intent="reboot-update,reboot-update,busy update:true" node=ip-192-168-25-121.us-west-2.compute.internal
time="2020-07-13T22:38:59Z" level=debug msg=rebooting component=agent intent="reboot-update,perform-update,ready update:true" worker=handler
time="2020-07-13T22:38:59Z" level=info msg="Rebooting Node to complete update" component=agent intent="reboot-update,perform-update,ready update:true" worker=handler
time="2020-07-13T22:38:59Z" level=debug msg="making 'GET' request to '/updates/status' API endpoint" component=api
time="2020-07-13T22:38:59Z" level=debug msg="making 'POST' request to '/actions/reboot' API endpoint" component=api

@etungsten etungsten marked this pull request as ready for review July 13, 2020 22:53
pkg/agent/agent.go Outdated Show resolved Hide resolved
pkg/platform/api/api.go Outdated Show resolved Hide resolved
pkg/platform/api/api.go Outdated Show resolved Hide resolved
pkg/platform/api/api.go Outdated Show resolved Hide resolved
pkg/platform/api/api.go Outdated Show resolved Hide resolved
pkg/platform/api/api.go Outdated Show resolved Hide resolved
pkg/platform/api/api.go Outdated Show resolved Hide resolved
pkg/platform/api/api.go Outdated Show resolved Hide resolved
pkg/platform/api/platform.go Outdated Show resolved Hide resolved
pkg/platform/api/platform.go Outdated Show resolved Hide resolved
@etungsten
Copy link
Contributor Author

Addresses @samuelkarp 's comments.

@etungsten
Copy link
Contributor Author

Tested the latest changes and they still work.

deploy/update-operator-api.yaml Outdated Show resolved Hide resolved
go.sum Show resolved Hide resolved
pkg/platform/api/api.go Outdated Show resolved Hide resolved
pkg/platform/api/api.go Show resolved Hide resolved
pkg/platform/api/api.go Outdated Show resolved Hide resolved
pkg/platform/api/api.go Outdated Show resolved Hide resolved
pkg/platform/api/api.go Outdated Show resolved Hide resolved
pkg/platform/api/api.go Outdated Show resolved Hide resolved
pkg/platform/api/platform.go Outdated Show resolved Hide resolved
pkg/platform/api/platform.go Outdated Show resolved Hide resolved
@etungsten
Copy link
Contributor Author

etungsten commented Jul 21, 2020

Push above addresses @jahkeup 's comments.

Merged the two config files for deploying the operator. The diff with the original update-operator.yaml can be seen here: https://github.com/bottlerocket-os/bottlerocket-update-operator/pull/35/files#diff-b5e2ccfe33253fc880150faeb1949241

@etungsten etungsten requested a review from jahkeup July 21, 2020 00:33
pkg/agent/agent.go Outdated Show resolved Hide resolved
pkg/agent/agent.go Outdated Show resolved Hide resolved
pkg/agent/agent.go Outdated Show resolved Hide resolved
pkg/platform/api/api.go Outdated Show resolved Hide resolved
pkg/platform/api/api.go Outdated Show resolved Hide resolved
@etungsten
Copy link
Contributor Author

etungsten commented Jul 21, 2020

Addresses @samuelkarp 's comments.

Added a comment above the 10 second timeout when creating the http client (forgot to do this in the previous set of changes, oops!)

Push below fixes a grammar error.

Copy link
Contributor

@webern webern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🍄

pkg/agent/agent.go Show resolved Hide resolved

func TestUnmarshallUpdateStatus(t *testing.T) {
update_string := "Starting update to 0.4.0\n"
cases := []struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, nice work creating these test cases.

@etungsten
Copy link
Contributor Author

Push above addresses #35 (comment)

pkg/platform/api/api.go Outdated Show resolved Hide resolved
pkg/platform/api/platform.go Outdated Show resolved Hide resolved
pkg/agent/agent.go Show resolved Hide resolved
pkg/agent/agent.go Outdated Show resolved Hide resolved
pkg/agent/agent.go Outdated Show resolved Hide resolved
@etungsten
Copy link
Contributor Author

Addresses @jahkeup's comments.

@etungsten etungsten requested a review from jahkeup July 22, 2020 00:33
Copy link
Contributor

@samuelkarp samuelkarp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with everything here except for formatting. I can re-approve when you run gofmt again.

pkg/agent/agent.go Outdated Show resolved Hide resolved
Adds support for updating via the Bottlerocket Update API.
The new platform is used if the node is labeled
bottlerocket.aws/updater-interface-version=2.0.0
@etungsten
Copy link
Contributor Author

Push above fixes formatting.

Copy link
Member

@jahkeup jahkeup left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done! 👏 👍

I'm excited to see this roll out with the update API 😃

@etungsten etungsten merged commit 2a5c4a4 into develop Jul 23, 2020
@etungsten etungsten deleted the update-api-integration branch July 23, 2020 17:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants