Skip to content
This repository has been archived by the owner on Jan 21, 2020. It is now read-only.

Manager: a stateful group plugin with leader detection #283

Merged
merged 11 commits into from
Nov 10, 2016
1 change: 1 addition & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@ ifneq (,$(findstring .m,$(VERSION)))
endif

$(call build_binary,infrakit,github.com/docker/infrakit/cmd/cli)
$(call build_binary,infrakit-manager,github.com/docker/infrakit/cmd/manager)
$(call build_binary,infrakit-group-default,github.com/docker/infrakit/cmd/group)
$(call build_binary,infrakit-flavor-combo,github.com/docker/infrakit/example/flavor/combo)
$(call build_binary,infrakit-flavor-swarm,github.com/docker/infrakit/example/flavor/swarm)
Expand Down
130 changes: 130 additions & 0 deletions cmd/manager/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
InfraKit Manager
================

The Manager is a binary that offers a Group interface while providing the following:

+ Leadership detection - for coordinating multiple sets (replicas) of InfraKit plugins
+ State storage - persists user configuration in some backend

Both file-based and Docker Swarm (Swarm Mode) based leadership detection and state storage are
available.

## Group Interface

Currently the manager exposes the same Group plugin interface as the `infrakit-group-default`.
This means `infrakit group ...` command will work as usual. The manager expects a group plugin
to be running prior to starting up and it functions as proxy for that group plugin:

+ When user does a `infrakit group watch` or `infrakit group update`, the manager will
persist the input configuration in the data store it was configured at startup time.
+ If the data store is configured with a backend that is shared or replicated across multiple
instances of InfraKit ensemble (all the collaborating plugins), high availability can be
achieved via leader detection and global availabilit of state (the stored config).
+ Multiple replicas of the manager can do leader detection so that only one is active. As
soon as leadership changes, the responsibility of maintaining infrastructure state is transfered
to the new manager that became active.

## Leadership

The manager can use either `os` or `swarm` for leadership detection:

### OS mode (via the `os` subcommand)

1. Assumes multiple instances of managers can access a shared file (e.g. over NFS or FUSE on S3).
2. Each manager starts up with a name (the `--name` flag).
3. The manager instance with the name that matches the content of the shared file is the leader.

### Swarm mode (via the `swarm` subcommand)

1. Assumes there's a manager instance per Docker Swarm manager instance
2. Leadership depends on the status of the Swarm manager node. If the Swarm manager node is the
leader, then the InfraKit manager instance running on that node is the leader.
3. When leadership changes in the Swarm, InfraKit leadership follows.

When an instance assumes leadership:

+ State is retrieved from shared storage (see below) and for each group in the config, a group
`watch` is invoked so that the new leader can begin watching the groups
+ Since this is the frontend for the stateless group, it records any input the user provides when the
user performs and update. The new config is then written in the shared store and `update` is forwarded
to the actual group plugin to do the real work.

When an instance loses leadership:

+ The manager uses previous configuration and 'deactivates' the local group plugin by calling `unwatch`
on the downstream group plugin
+ It rejects user's attempt to `update` since it's not the leader.


## State Storage

The manager can use either `os` or `swarm` for state storage:

### OS mode (via the `os` subcommand)

1. State is stored in a local file that is well-known and defined at startup of the manager.
2. This file is a global config that can include multiple groups.

### Swarm mode (via the `swarm` subcommand)

1. State is stored in the Swarm via annotations
2. A single global state is stored in a single annotation. The data is compressed and encoded.


## Fronted (Proxy) for Group

The manager requires a group plugin to be running so that it can forward calls to it to actually
perform the work of watching and updating:

+ When you intend to use the manager, you should start your default group plugin with a name like
`group-stateless`
+ Then when starting the manager, set the `--proxy-for-group` flag to the name of the group plugin
(e.g. `group-stateless`). By default, the manager starts up with the name of `group`. This matches
the default name that the CLI (`infrakit group ...`) uses.


## Running

```shell
$ make binaries
$ build/infrakit-manager -h
Manager

Usage:
infrakit-manager [command]

Available Commands:
os os
swarm swarm mode for leader detection and storage
version print build version information

Flags:
--log int Logging level. 0 is least verbose. Max is 5 (default 4)
--name string Name of the manager (default "group")
--proxy-for-group string Name of the group plugin to proxy for. (default "group-stateless")

Use "infrakit-manager [command] --help" for more information about a command.
```

### Running in OS Mode

Useful for local testing:

```shell
$ infrakit-manager os --log 5
```

### Running in Swarm Mode

First enable Swarm mode:

```shell
docker swarm init
```

On each Swarm manager node:

```shell
$ infrakit-manager swarm --log 5
```
will connect to Docker using defaulted Docker socket.
84 changes: 84 additions & 0 deletions cmd/manager/main.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
package main

import (
"os"
"path/filepath"

log "github.com/Sirupsen/logrus"
"github.com/docker/infrakit/cli"
"github.com/docker/infrakit/discovery"
"github.com/docker/infrakit/leader"
"github.com/docker/infrakit/manager"
"github.com/docker/infrakit/rpc"
group_rpc "github.com/docker/infrakit/rpc/group"
"github.com/docker/infrakit/store"
"github.com/spf13/cobra"
)

type backend struct {
id string
plugins discovery.Plugins
leader leader.Detector
snapshot store.Snapshot
pluginName string //This is the name of the stateless group plugin that the manager will proxy for.
}

func main() {

logLevel := cli.DefaultLogLevel
backend := &backend{}

cmd := &cobra.Command{
Use: filepath.Base(os.Args[0]),
Short: "Manager",
PersistentPreRun: func(c *cobra.Command, args []string) {
cli.SetLogLevel(logLevel)
},
PersistentPostRunE: func(c *cobra.Command, args []string) error {
return runMain(backend)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incremental readability improvement on my end - eliminate the PersistentPostRunE and have the subcommands call runMain().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

},
}
cmd.PersistentFlags().IntVar(&logLevel, "log", logLevel, "Logging level. 0 is least verbose. Max is 5")
cmd.PersistentFlags().StringVar(&backend.id, "name", "group", "Name of the manager")
cmd.PersistentFlags().StringVar(&backend.pluginName, "proxy-for-group", "group-stateless", "Name of the group plugin to proxy for.")

cmd.AddCommand(cli.VersionCommand(), osEnvironment(backend), swarmEnvironment(backend))

err := cmd.Execute()
if err != nil {
log.Error(err)
os.Exit(1)
}
}

func runMain(backend *backend) error {

log.Infoln("Starting up manager:", backend)

manager, err := manager.NewManager(backend.plugins,
backend.leader, backend.snapshot, backend.pluginName)
if err != nil {
return err
}

_, err = manager.Start()
if err != nil {
return err
}

_, stopped, err := rpc.StartPluginAtPath(
filepath.Join(discovery.Dir(), backend.id),
group_rpc.PluginServer(manager),
func() error {
log.Infoln("Stopping manager")
manager.Stop()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider moving this to after <-stopped below for better readability.

return nil
},
)
if err != nil {
return err
}

<-stopped // block until done
return err
}
79 changes: 79 additions & 0 deletions cmd/manager/os.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
package main

import (
"os"
"os/user"
"path/filepath"
"time"

"github.com/docker/infrakit/discovery"
file_leader "github.com/docker/infrakit/leader/file"
file_store "github.com/docker/infrakit/store/file"
"github.com/spf13/cobra"
)

const (
// LeaderFileEnvVar is the environment variable that may be used to customize the plugin leader detection
LeaderFileEnvVar = "INFRAKIT_LEADER_FILE"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move these env variables to command line args? They will be easier to discover if presented in help text.


// StoreDirEnvVar is the directory where the configs are stored
StoreDirEnvVar = "INFRAKIT_STORE_DIR"
)

func getHome() string {
if usr, err := user.Current(); err == nil {
return usr.HomeDir
}
return os.Getenv("HOME")
}

func defaultLeaderFile() string {
if leaderFile := os.Getenv(LeaderFileEnvVar); leaderFile != "" {
return leaderFile
}
return filepath.Join(getHome(), ".infrakit/leader")
}

func defaultStoreDir() string {
if storeDir := os.Getenv(StoreDirEnvVar); storeDir != "" {
return storeDir
}
return filepath.Join(getHome(), ".infrakit/configs")
}

func osEnvironment(backend *backend) *cobra.Command {

var pollInterval time.Duration
var filename, storeDir string

cmd := &cobra.Command{
Use: "os",
Short: "os",
RunE: func(c *cobra.Command, args []string) error {

plugins, err := discovery.NewPluginDiscovery()
if err != nil {
return err
}

leader, err := file_leader.NewDetector(pollInterval, filename, backend.id)
if err != nil {
return err
}

snapshot, err := file_store.NewSnapshot(storeDir, "global.config")
if err != nil {
return err
}

backend.plugins = plugins
backend.leader = leader
backend.snapshot = snapshot
return nil
},
}
cmd.Flags().StringVar(&filename, "leader-file", defaultLeaderFile(), "File used for leader election/detection")
cmd.Flags().StringVar(&storeDir, "store-dir", defaultStoreDir(), "Dir to store the config")
cmd.Flags().DurationVar(&pollInterval, "poll-interval", 5*time.Second, "Leader polling interval")
return cmd
}
59 changes: 59 additions & 0 deletions cmd/manager/swarm.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
package main

import (
"time"

log "github.com/Sirupsen/logrus"
"github.com/docker/go-connections/tlsconfig"
"github.com/docker/infrakit/discovery"
swarm_leader "github.com/docker/infrakit/leader/swarm"
swarm_store "github.com/docker/infrakit/store/swarm"
"github.com/docker/infrakit/util/docker/1.24"
"github.com/spf13/cobra"
)

func swarmEnvironment(backend *backend) *cobra.Command {

tlsOptions := tlsconfig.Options{}
host := "unix:///var/run/docker.sock"

var pollInterval time.Duration

cmd := &cobra.Command{
Use: "swarm",
Short: "swarm mode for leader detection and storage",
RunE: func(c *cobra.Command, args []string) error {

dockerClient, err := docker.NewDockerClient(host, &tlsOptions)
log.Infoln("Connect to docker", host, "err=", err)
if err != nil {
return err
}

leader := swarm_leader.NewDetector(pollInterval, dockerClient)
snapshot, err := swarm_store.NewSnapshot(dockerClient)
if err != nil {
return err
}

plugins, err := discovery.NewPluginDiscovery()
if err != nil {
return err
}

backend.plugins = plugins
backend.leader = leader
backend.snapshot = snapshot
return nil
},
}

cmd.Flags().DurationVar(&pollInterval, "poll-interval", 5*time.Second, "Leader polling interval")
cmd.Flags().StringVar(&host, "host", host, "Docker host")
cmd.Flags().StringVar(&tlsOptions.CAFile, "tlscacert", "", "TLS CA cert file path")
cmd.Flags().StringVar(&tlsOptions.CertFile, "tlscert", "", "TLS cert file path")
cmd.Flags().StringVar(&tlsOptions.KeyFile, "tlskey", "", "TLS key file path")
cmd.Flags().BoolVar(&tlsOptions.InsecureSkipVerify, "tlsverify", true, "True to skip TLS")

return cmd
}
3 changes: 2 additions & 1 deletion example/flavor/swarm/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ import (
"github.com/docker/infrakit/cli"
"github.com/docker/infrakit/plugin/flavor/swarm"
flavor_plugin "github.com/docker/infrakit/rpc/flavor"
"github.com/docker/infrakit/util/docker/1.24"
"github.com/spf13/cobra"
)

Expand All @@ -26,7 +27,7 @@ func main() {

cli.SetLogLevel(logLevel)

dockerClient, err := NewDockerClient(host, &tlsOptions)
dockerClient, err := docker.NewDockerClient(host, &tlsOptions)
log.Infoln("Connect to docker", host, "err=", err)
if err != nil {
log.Error(err)
Expand Down
Loading