Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updog: Building a better dog #186

Merged
merged 9 commits into from
Sep 20, 2019
Merged

Updog: Building a better dog #186

merged 9 commits into from
Sep 20, 2019

Conversation

sam-aws
Copy link
Contributor

@sam-aws sam-aws commented Aug 22, 2019

Issue #, if available:
Related to #103, #185, #91

Description of changes:
WIP changes to enable Updog to understand and handle update waves, versions, migrations etc, as a reflection of #103. Larger points that need to be decided on aside from the metadata itself being finalised:

  • How Updog should behave if not in a wave or if in the final wave, or if all waves are in the past
  • How and under what conditions Updog should add jitter to its update time
  • Random seed range
  • Whether Updog should try to source migrations from an update root image
  • Where Updog finds its "flavor" (release: Set version and default flavor #204)

Bonus points from #184:

  • Split updating of image and boot flags
  • Report update status in a machine-readable form

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@sam-aws sam-aws changed the title Updog Updog: Building a better dog Aug 22, 2019
@@ -0,0 +1,12 @@
[Unit]
Description=When's Updog?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 🐶

@sam-aws
Copy link
Contributor Author

sam-aws commented Sep 3, 2019

Something that still needs to be firmed up is the behaviour two wave scenarios:

  • If a client misses their wave: Do nothing and warn? Wait until all waves have passed?
  • If a client is in the final wave: The current form of metadata doesn't specify an "end" of the final wave (unless the final "wave" is from 2048). Should the final wave be open ended or should we specify that there is always a closing wave at 2048? (I lean a bit towards the former).

Jitter is something that depends a bit on how Updog is called. If called from a service file then Updog can just spin until it's ready but if called by some other helper it should probably report the jitter amount back and exit to be called again later. Perhaps some --max-wait option that tells Updog to exit if its jitter is above some amount could cover both those cases?

@iliana
Copy link
Contributor

iliana commented Sep 3, 2019

If a client misses their wave: Do nothing and warn? Wait until all waves have passed?

If a client is in the final wave: The current form of metadata doesn't specify an "end" of the final wave (unless the final "wave" is from 2048). Should the final wave be open ended or should we specify that there is always a closing wave at 2048? (I lean a bit towards the former).

There should be a closing entry, to say "at this timestamp all hosts should be on at least this version".

The question updog should be asking is not "is it my turn, wait crap did I miss it", but instead "am I allowed to update yet". A host determines its update time by find its seed along the curve provided (e.g. given two points along the wave, 10 at 3:00 and 30 at 5:00, a host with seed 15's update time is 3:30). Once updog runs any time after the update time, it prepares to update. It doesn't matter whether it's 3:35 or 9:30 or two months from then.

@sam-aws
Copy link
Contributor Author

sam-aws commented Sep 3, 2019

That makes more sense - I was thinking of the waves being more strict rather than specifying starting times - I'll rejig the code a bit but that should make it simpler.

@sam-aws
Copy link
Contributor Author

sam-aws commented Sep 10, 2019

Taking this out of draft mode: there are a few TODO items lying around still but there shouldn't be anything that would prevent updating out of a problem (famous last words).

@sam-aws
Copy link
Contributor Author

sam-aws commented Sep 10, 2019

Speaking of we do need the flavor to be set (eg. via #204 which will need to be rebased on #216)

.next();
match (prev, next) {
(Some((_, start)), Some((_, end))) => {
return Some((end.timestamp() - start.timestamp()) as u64);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now (and with a fixup to come) Updog just chooses a point in time between the current and next wave, or zero jitter if all waves are in the past. We probably still want some amount of jitter in this case - any opinions on having some default value that Updog uses to jitter, eg. 30 minutes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For jitter we're probably looking at returning this information to the caller rather than handling it in Updog - pushing this to a later PR.

workspaces/updater/updog/src/de.rs Show resolved Hide resolved
use std::str::FromStr;

/// Converts the bound key to an integer before insertion and catches duplicates
pub(crate) fn deserialize_bound<'de, D>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like these three functions are all relatively similar. Do you think there's some code deduplication that can be done?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect there is - this fell off my radar because.. serde deserialize. I'll see if I can simplify this a bit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'll offer to help too, just ask ^_^)

workspaces/updater/updog/src/error.rs Outdated Show resolved Hide resolved
workspaces/updater/updog/src/error.rs Outdated Show resolved Hide resolved
workspaces/updater/updog/src/main.rs Outdated Show resolved Hide resolved
These changes move Updog towards handling updates and update metadata as
described in the Thar documentation. In particular this starts to handle
update waves, update timing, and proper versioning based on the current
understanding of the metadata format.

Updates will be staged into "waves" which will be specified in the
update manifest. A wave is described by a "bound" number and a UTC time.
A series of bound-time points defines a simple update curve. The Updog
client will generate a random seed value which from then on is used to
locate itself in one of these waves, and a random amount of jitter is
added to the exact update time. When every wave has passed all clients
should be at the new image version.

Applicable updates filtered by version, flavor, and state. Version is
assumed to be of the form "vx.y.z", flavor is some string sourced
from the system, and states should reflect those described by TUF.

Signed-off-by: Samuel Mendoza-Jonas <samjonas@amazon.com>
Signed-off-by: Samuel Mendoza-Jonas <samjonas@amazon.com>
The update manifest holds two new pieces of information; a map of Thar
image versions to datastore versions, and a structure describing the
migration helpers required for moving between two datastore versions.

Before updating to a new image Updog will prepare by retrieving and
storing all migrations required by the datastore version change implied
by the image version change. In the common case these will be present in
the update image and will be copied from there. For scenarios such as a
downgrade or additional migrations add after-the-fact the migration
targets are downloaded from TUF.

Signed-off-by: Samuel Mendoza-Jonas <samjonas@amazon.com>
Add a simple timer that calls Updog every 30 minutes to check for an
update. This would primarily be applicable if Updog were running in
'Automatic' mode as opposed to be managed by some orchestrator.

Signed-off-by: Samuel Mendoza-Jonas <samjonas@amazon.com>
Add a few commands to give more fine-grained control to a user or some
update orchestrator.

UpdateImage: Perform an update if available but do not make the update
the default boot option.
UpdateFlags: Update the flags on the inactive partition to be the
default boot option.
--now: Ignore waves and update immediately
--json: Output update information in JSON
--image: Specify a specific image version to update to

Signed-off-by: Samuel Mendoza-Jonas <samjonas@amazon.com>
Thar fails to build with the following error:
16 35.74 error[E0583]: file not found for module `target`
16 35.74   --> /home/builder/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/platforms-0.2.0/src/lib.rs:29:9
16 35.74    |
16 35.74 29 | pub mod target;
16 35.74    |         ^^^^^^
16 35.74    |
16 35.74    = help: name the file either target.rs or target/mod.rs inside the directory "/home/builder/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/platforms-0.2.0/src"
16 35.74
16 35.74 error: aborting due to previous error
16 35.74
16 35.74 For more information about this error, try `rustc --explain E0583`.
16 35.74        Fresh synstructure v0.10.2
16 35.74    Compiling snafu-derive v0.5.0
16 35.74 error: Could not compile `platforms`.

For now set TARGET_ARCH manually until the above error can be resolved.

Signed-off-by: Samuel Mendoza-Jonas <samjonas@amazon.com>
Signed-off-by: Samuel Mendoza-Jonas <samjonas@amazon.com>
Signed-off-by: Samuel Mendoza-Jonas <samjonas@amazon.com>
The seed value should be initialised by one of the other dogs and
written by thar-be-settings, so don't do so in Updog.

Signed-off-by: Samuel Mendoza-Jonas <samjonas@amazon.com>
@sam-aws
Copy link
Contributor Author

sam-aws commented Sep 19, 2019

Rebased on develop (after accidentally rebasing on bork)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants