Skip to content
This repository has been archived by the owner on Jan 30, 2020. It is now read-only.

Scope unit state by machine #638

Merged
merged 1 commit into from
Jul 20, 2014
Merged

Scope unit state by machine #638

merged 1 commit into from
Jul 20, 2014

Conversation

jonboulle
Copy link
Contributor

fleet does a fine job at reporting unit state in the simple cases, but adding any complexity to the unit lifecycle causes mis-publishing of unit state into etcd.

For example, starting and destroying a single unit will result in all states being published properly. Now imagine seting an ExecStop option that takes a long time to finish (i.e. /usr/bin/sleep 10s). Start it, unload it and immediately start it again. If that unit is scheduled to a different machine the second time it is started, an active state will be published, but 10s later an inactive state will overwrite it.

We need to stop treating units like only one agent can possibly report state for it at a given time. It is incredibly important to know if a unit is still running in some capacity on a node, when it shouldn't be.

Taking systemd's lead, it allows you to visualize processes across a bunch of systemd-nspawn containers like so:

$ systemctl --recursive list-units fleet.service
UNIT                 LOAD   ACTIVE SUB     DESCRIPTION
fleet.service        loaded active running fleet
smoke0:fleet.service loaded active running fleet.service
smoke1:fleet.service loaded active running fleet.service
smoke2:fleet.service loaded active running fleet.service
smoke3:fleet.service loaded active running fleet.service

@bcwaldon
Copy link
Contributor Author

bcwaldon commented Jul 9, 2014

Related: #628

@jonboulle
Copy link
Contributor

Are you suggesting that unit state is (effectively) stored with a key that's unitname+machineid instead of just unitname?

@bcwaldon
Copy link
Contributor Author

bcwaldon commented Jul 9, 2014

@jonboulle you got it

@jonboulle
Copy link
Contributor

Capturing our discussion, my proposal is to just use the existing MACHINE field for this duplication, rather than the prefix, to ensure that a unit foo:bar.service is unambiguous from unit bar.service running on machine foo

@jonboulle
Copy link
Contributor

@bcwaldon sanity check on this approach?
(first effort at getting Registry test coverage above measly single digits!)

if isKeyNotFound(err) {
err = nil
if err != nil && isKeyNotFound(err) {
return err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we now returning an error on KeyNotFound?

@jonboulle
Copy link
Contributor

Any feedback?

//TODO: Handle the error generated by unmarshal
unmarshal(resp.Node.Value, &usm)
if err := unmarshal(resp.Node.Value, &usm); err != nil {
log.Errorf("Error unmarshalling UnitState: %v", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UnitState(%s)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

@bcwaldon
Copy link
Contributor Author

LGTM

@bcwaldon
Copy link
Contributor Author

shipit

jonboulle added a commit that referenced this pull request Jul 20, 2014
Scope unit state by machine
@jonboulle jonboulle merged commit 0bd72d8 into coreos:master Jul 20, 2014
@jonboulle jonboulle deleted the 638 branch July 20, 2014 17:23
@bcwaldon
Copy link
Contributor Author

This used to be an issue but turned into a PR and now I can't reopen. @jonboulle maybe you should stop using the fancy 'convert issue to PR' thing

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants