Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update-agent: rework steady/polling state #233

Merged
merged 2 commits into from
Mar 2, 2020

Conversation

lucab
Copy link
Contributor

@lucab lucab commented Feb 27, 2020

This splits the "reported steady" and "checked, but no updates
available" states, making it easier to track and report upgrade
progress.

Closes: #229

@lucab
Copy link
Contributor Author

lucab commented Feb 28, 2020

/cc @jlebon for review (#229 has the whole rationale)

@@ -135,7 +145,7 @@ impl UpdateAgent {
}

/// Initialize the update agent.
fn initialize(&mut self) -> ResponseActFuture<Self, Result<(), ()>> {
fn tick_initialize(&mut self) -> ResponseActFuture<Self, Result<(), ()>> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the naming changes? Is it to represent that these functions are directly called by the tick handler?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes (but mostly: a drive-by change to make code easier to skim/grep).

Comment on lines +34 to +37
/// Node steady, agent allowed to check for updates.
ReportedSteady,
/// No further updates available yet.
NoNewUpdate,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bikeshed: WDYT about naming the NoNewUpdate one with the steady keyword since it's likely going to be the stable/steady state in which the agent will mostly sit in? And maybe instead of ReportedSteady, something like ReadyToCheck? Makes it a little more obvious too that it hasn't checked for updates yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I'm a dummy so I have already coupled the meaning of "steady" with

static V1_STEADY_STATE: &str = "v1/steady-state";

So, to avoid confusion I'd rather stick with the definition of "steady" as "we released the lock, marking the current release green".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, is coreos/airlock#1 still open for change? Would it make sense to change the specs here so that we only actually release the lock once we've also been able to check for an update? It intuitively makes sense to me to do that, but I might be missing something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, is coreos/airlock#1 still open for change?

Not really anymore.

Would it make sense to change the specs here so that we only actually release the lock once we've also been able to check for an update?

I don't think so. We don't want to couple the "did the node reboot" question with the "is Cincinnati-service ok" question inside Zincati. Plus, users can enforce stronger pre-conditions via https://github.com/coreos/zincati/blob/master/dist/systemd/system/zincati.service#L9-L10

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh OK right. I'd like to chat about this some more, but I don't think it blocks this PR as is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forwarded to #239.

Comment on lines 58 to 59
#[allow(dead_code)]
fn discriminant(&self) -> u8 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually need this function? The only place that was using it is now gone.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, dropped. I was planning to use it for checking the "from" state on transition, but I realized I cannot do it that way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self-note: one approach to explore here is to statically enforce allowed transitions via generics, like https://hoverbear.org/blog/rust-state-machine-pattern/#generically-sophistication.

@jlebon
Copy link
Member

jlebon commented Feb 28, 2020

Rationale overall makes sense to me otherwise!

This moves metrics logic for the agent, unifying it into a single
place.
This splits the "reported steady" and "checked, but no updates
available" states, making it easier to track and report upgrade
progress.
@lucab
Copy link
Contributor Author

lucab commented Feb 28, 2020

Rebased.

Copy link
Member

@jlebon jlebon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@lucab lucab merged commit b1c0936 into coreos:master Mar 2, 2020
@lucab lucab deleted the ups/split-steady-state branch June 8, 2020 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

update-agent: rework steady/polling state
2 participants