Skip to content
This repository has been archived by the owner on Jan 21, 2020. It is now read-only.

Manager: a stateful group plugin with leader detection #283

Merged
merged 11 commits into from
Nov 10, 2016

Conversation

chungers
Copy link
Contributor

@chungers chungers commented Nov 9, 2016

This PR combines the earlier PRs for state storage and leader detection to create
a manager (entity + cmd binary). The manager offers a Group interface and can record
state in a swarm or local file.

There's a new executable infrakit-manager that can be started as a point of contact for users. This binary will handle leader detection. See cmd/manager/README.md for details.

Signed-off-by: David Chung david.chung@docker.com

David Chung added 2 commits November 8, 2016 15:08
Signed-off-by: David Chung <david.chung@docker.com>
Signed-off-by: David Chung <david.chung@docker.com>
Signed-off-by: David Chung <david.chung@docker.com>
@codecov-io
Copy link

codecov-io commented Nov 9, 2016

Current coverage is 70.24% (diff: 33.33%)

Merging #283 into master will decrease coverage by 0.10%

@@             master       #283   diff @@
==========================================
  Files            25         25          
  Lines          1366       1368     +2   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
  Hits            961        961          
- Misses          326        327     +1   
- Partials         79         80     +1   

Powered by Codecov. Last update 89a65c0...3939697

@GordonTheTurtle
Copy link

Please sign your commits following these rules:
https://github.com/docker/docker/blob/master/CONTRIBUTING.md#sign-your-work
The easiest way to do this is to amend the last commit:

$ git clone -b "stateful-group" git@github.com:chungers/infrakit.git somewhere
$ cd somewhere
$ git rebase -i HEAD~842354164800
editor opens
change each 'pick' to 'edit'
save the file and quit
$ git commit --amend -s --no-edit
$ git rebase --continue # and repeat the amend for each commit
$ git push -f

Amending updates the existing PR. You DO NOT need to open a new one.

@@ -0,0 +1,67 @@
package docker
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems redundant with infrakit/example/flavor/swarm/docker.go. Any reason you're not using that code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is meant to be reused (hence in a util package). I am planning to change the example swarm flavor to use this instead of including a client in its own example package.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. Can you do it in this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated with versioned package for Docker client and changed the example to use the same client as a library.

## Running

```shell
$ infrakit-manager -h
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider starting with $ make binaries

Should infrakit-manager be prefixed with build/?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


const (
// LeaderFileEnvVar is the environment variable that may be used to customize the plugin leader detection
LeaderFileEnvVar = "INFRAKIT_LEADER_FILE"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move these env variables to command line args? They will be easier to discover if presented in help text.

log.Infoln("Connect to docker", host, "err=", err)
if err != nil {
log.Error(err)
os.Exit(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a recent patch, i made some tweaks so returning errors from RunE CLI functions should work as expected. So from now on it should be fine to replace

log.Error(err)
os.Exit(1)

with

return err

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed


pollInterval := 5 * time.Second
filename := defaultLeaderFile()
storeDir := defaultStoreDir()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: how would you feel about adopting this pattern for var defaults:

var storeDir string
...
cmd.Flags.StringVar(&storeDir, "store-dir", defaultStoreDir(), ...)

Personally, i find that easier to follow as the variable is not both the input and the output. This has the added benefit of making it easier to share variable definitions with common defaults.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally don't like uninitialized variables and having to look for flag bindings to know what default values are -- if the value were even bound as a flag.

In any case I will make the change assuming it's just some convention.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hear you, and I don't think the design of the cobra API makes it easy to structure well.

Another problem resulting from the initializing approach here is you have two places to check for the default - the variable and flag definition to ensure that it was wired into the default parameter. I've had to correct a few places already where the latter was missing, leading to confusion.

func (m *manager) Start() (<-chan struct{}, error) {
m.lock.Lock()
if m.running != nil {
m.lock.Unlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit uneasy with the ad hoc locking starting to show up.

Possibly only worth dealing with later, but it would be nice to establish a straightforward convention for lock management. It would be nice if we could strongly favor associating lock acquisition with function calls, like the pattern used elsewhere in this file:

func ... {
    m.lock.Lock()
    defer m.lock.Unlock()
}

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to use an init function that returns true only when the first caller caused initialization of the channel variable. Another concurrent caller of Start won't proceed.

// unwatch without first watching. If the manager crashes and comes back up again,
// there will be no possibility to unwatch existing groups.
m.lock.Lock()
config := m.currentConfig
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some functions have access to a function-scoped GlobalSpec as well as m.currentConfig, which creates potential for bugs due to not using the right spec.

Given that this is the only read of m.currentConfig, and the fact that you could ask the plugin for the list of groups being watched, it seems like you can get away without having currentConfig.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that querying is better than an in-memory copy since I noted elsewhere in a comment that using the in-memory copy to notify the downstream plugins won't work if the manager exited and comes back up. So I am changing to querying for now the proxied group plugin. Note that this isn't the general case where a global config has multiple groups with different group plugins implementations running (instead of just the single one proxied here). To construct the global spec requires querying all the group plugins (imagine one that's group-default and one that's backed by some autoscaling group) to construct this global collection of groups and their specs.

Anyway, added a TODO / comment and changing to querying the backend plugin for now.

}

if err != nil {
log.Warningln("Error watching group:", spec.ID, "Err=", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest making this fatal, and i'd argue the same for the 'Already watching' case above to be conservative against corner cases for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing to warning. I really don't like to panic or have a fatal which will cause the process to exit. In this case, if it's already watching there's no adverse effect. Watch should be idempotent, I'd imagine. I think logging or a warn should be enough.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the problem - watch is not idempotent (it does not initiate a rolling update), so in the current behavior this signals a potential problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking more on this, while this condition does indicate a state mismatch that the user must reconcile, making this fatal could result in a state that's difficult to reconcile. This is because the next attempt to restart this daemon would try to watch groups, and likely fail again indefinitely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My personal view on this... if the Group's Watch operation isn't idempotent, then it's the proxied group plugin that should panic, not this layer. This manager's concerns are only

  • persist user's spec for multiple groups
  • determine if it's the leader
  • activate plugins (including install -- TBD later)
  • update / watch on the groups

if usr, err := user.Current(); err == nil {
home = usr.HomeDir
}
return home
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: restructure for brevity and greater readability:

if usr, err := user.Current(); err == nil {
        return usr.HomeDir
}
return os.Getenv("HOME")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


backend := &backend{
id: "group",
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the shared mutable state here makes the code harder to read than it should be. Can you instead convert the PersistentPostRunE to a function you call and pass a backend?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

David Chung added 5 commits November 9, 2016 13:52
…example to use the same client code

Signed-off-by: David Chung <david.chung@docker.com>
Signed-off-by: David Chung <david.chung@docker.com>
Signed-off-by: David Chung <david.chung@docker.com>
Signed-off-by: David Chung <david.chung@docker.com>
cli.SetLogLevel(logLevel)
},
PersistentPostRunE: func(c *cobra.Command, args []string) error {
return runMain(backend)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incremental readability improvement on my end - eliminate the PersistentPostRunE and have the subcommands call runMain().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

group_rpc.PluginServer(manager),
func() error {
log.Infoln("Stopping manager")
manager.Stop()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider moving this to after <-stopped below for better readability.

Copy link
Contributor

@wfarner wfarner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once outstanding comments are addressed. Let's make sure to not slip on tests - there should be a quick follow-up.

Signed-off-by: David Chung <david.chung@docker.com>
@chungers chungers merged commit e6c2deb into docker-archive:master Nov 10, 2016
chungers pushed a commit to chungers/infrakit that referenced this pull request Sep 30, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants