This repository has been archived by the owner on Jan 21, 2020. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 262
Manager: a stateful group plugin with leader detection #283
Merged
Merged
Changes from 10 commits
Commits
Show all changes
11 commits
Select commit
Hold shift + click to select a range
f294faf
Manager: a stateful group plugin with leader detection
955322b
Add README
9d070eb
Fix lint; remove poc functionality
2391fd4
Merge branch 'master' into stateful-group
4e327f3
Merge branch 'master' into stateful-group
865b3c1
Use versioned package name for Docker client; changed client code in …
e9f4a30
merge master
2668c23
Merge remote-tracking branch 'origin/stateful-group' into stateful-group
5be8873
Incorporating feedback
3939697
Fix wrong logic
b6b0b81
Addressing more feedback
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,130 @@ | ||
InfraKit Manager | ||
================ | ||
|
||
The Manager is a binary that offers a Group interface while providing the following: | ||
|
||
+ Leadership detection - for coordinating multiple sets (replicas) of InfraKit plugins | ||
+ State storage - persists user configuration in some backend | ||
|
||
Both file-based and Docker Swarm (Swarm Mode) based leadership detection and state storage are | ||
available. | ||
|
||
## Group Interface | ||
|
||
Currently the manager exposes the same Group plugin interface as the `infrakit-group-default`. | ||
This means `infrakit group ...` command will work as usual. The manager expects a group plugin | ||
to be running prior to starting up and it functions as proxy for that group plugin: | ||
|
||
+ When user does a `infrakit group watch` or `infrakit group update`, the manager will | ||
persist the input configuration in the data store it was configured at startup time. | ||
+ If the data store is configured with a backend that is shared or replicated across multiple | ||
instances of InfraKit ensemble (all the collaborating plugins), high availability can be | ||
achieved via leader detection and global availabilit of state (the stored config). | ||
+ Multiple replicas of the manager can do leader detection so that only one is active. As | ||
soon as leadership changes, the responsibility of maintaining infrastructure state is transfered | ||
to the new manager that became active. | ||
|
||
## Leadership | ||
|
||
The manager can use either `os` or `swarm` for leadership detection: | ||
|
||
### OS mode (via the `os` subcommand) | ||
|
||
1. Assumes multiple instances of managers can access a shared file (e.g. over NFS or FUSE on S3). | ||
2. Each manager starts up with a name (the `--name` flag). | ||
3. The manager instance with the name that matches the content of the shared file is the leader. | ||
|
||
### Swarm mode (via the `swarm` subcommand) | ||
|
||
1. Assumes there's a manager instance per Docker Swarm manager instance | ||
2. Leadership depends on the status of the Swarm manager node. If the Swarm manager node is the | ||
leader, then the InfraKit manager instance running on that node is the leader. | ||
3. When leadership changes in the Swarm, InfraKit leadership follows. | ||
|
||
When an instance assumes leadership: | ||
|
||
+ State is retrieved from shared storage (see below) and for each group in the config, a group | ||
`watch` is invoked so that the new leader can begin watching the groups | ||
+ Since this is the frontend for the stateless group, it records any input the user provides when the | ||
user performs and update. The new config is then written in the shared store and `update` is forwarded | ||
to the actual group plugin to do the real work. | ||
|
||
When an instance loses leadership: | ||
|
||
+ The manager uses previous configuration and 'deactivates' the local group plugin by calling `unwatch` | ||
on the downstream group plugin | ||
+ It rejects user's attempt to `update` since it's not the leader. | ||
|
||
|
||
## State Storage | ||
|
||
The manager can use either `os` or `swarm` for state storage: | ||
|
||
### OS mode (via the `os` subcommand) | ||
|
||
1. State is stored in a local file that is well-known and defined at startup of the manager. | ||
2. This file is a global config that can include multiple groups. | ||
|
||
### Swarm mode (via the `swarm` subcommand) | ||
|
||
1. State is stored in the Swarm via annotations | ||
2. A single global state is stored in a single annotation. The data is compressed and encoded. | ||
|
||
|
||
## Fronted (Proxy) for Group | ||
|
||
The manager requires a group plugin to be running so that it can forward calls to it to actually | ||
perform the work of watching and updating: | ||
|
||
+ When you intend to use the manager, you should start your default group plugin with a name like | ||
`group-stateless` | ||
+ Then when starting the manager, set the `--proxy-for-group` flag to the name of the group plugin | ||
(e.g. `group-stateless`). By default, the manager starts up with the name of `group`. This matches | ||
the default name that the CLI (`infrakit group ...`) uses. | ||
|
||
|
||
## Running | ||
|
||
```shell | ||
$ make binaries | ||
$ build/infrakit-manager -h | ||
Manager | ||
|
||
Usage: | ||
infrakit-manager [command] | ||
|
||
Available Commands: | ||
os os | ||
swarm swarm mode for leader detection and storage | ||
version print build version information | ||
|
||
Flags: | ||
--log int Logging level. 0 is least verbose. Max is 5 (default 4) | ||
--name string Name of the manager (default "group") | ||
--proxy-for-group string Name of the group plugin to proxy for. (default "group-stateless") | ||
|
||
Use "infrakit-manager [command] --help" for more information about a command. | ||
``` | ||
|
||
### Running in OS Mode | ||
|
||
Useful for local testing: | ||
|
||
```shell | ||
$ infrakit-manager os --log 5 | ||
``` | ||
|
||
### Running in Swarm Mode | ||
|
||
First enable Swarm mode: | ||
|
||
```shell | ||
docker swarm init | ||
``` | ||
|
||
On each Swarm manager node: | ||
|
||
```shell | ||
$ infrakit-manager swarm --log 5 | ||
``` | ||
will connect to Docker using defaulted Docker socket. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
package main | ||
|
||
import ( | ||
"os" | ||
"path/filepath" | ||
|
||
log "github.com/Sirupsen/logrus" | ||
"github.com/docker/infrakit/cli" | ||
"github.com/docker/infrakit/discovery" | ||
"github.com/docker/infrakit/leader" | ||
"github.com/docker/infrakit/manager" | ||
"github.com/docker/infrakit/rpc" | ||
group_rpc "github.com/docker/infrakit/rpc/group" | ||
"github.com/docker/infrakit/store" | ||
"github.com/spf13/cobra" | ||
) | ||
|
||
type backend struct { | ||
id string | ||
plugins discovery.Plugins | ||
leader leader.Detector | ||
snapshot store.Snapshot | ||
pluginName string //This is the name of the stateless group plugin that the manager will proxy for. | ||
} | ||
|
||
func main() { | ||
|
||
logLevel := cli.DefaultLogLevel | ||
backend := &backend{} | ||
|
||
cmd := &cobra.Command{ | ||
Use: filepath.Base(os.Args[0]), | ||
Short: "Manager", | ||
PersistentPreRun: func(c *cobra.Command, args []string) { | ||
cli.SetLogLevel(logLevel) | ||
}, | ||
PersistentPostRunE: func(c *cobra.Command, args []string) error { | ||
return runMain(backend) | ||
}, | ||
} | ||
cmd.PersistentFlags().IntVar(&logLevel, "log", logLevel, "Logging level. 0 is least verbose. Max is 5") | ||
cmd.PersistentFlags().StringVar(&backend.id, "name", "group", "Name of the manager") | ||
cmd.PersistentFlags().StringVar(&backend.pluginName, "proxy-for-group", "group-stateless", "Name of the group plugin to proxy for.") | ||
|
||
cmd.AddCommand(cli.VersionCommand(), osEnvironment(backend), swarmEnvironment(backend)) | ||
|
||
err := cmd.Execute() | ||
if err != nil { | ||
log.Error(err) | ||
os.Exit(1) | ||
} | ||
} | ||
|
||
func runMain(backend *backend) error { | ||
|
||
log.Infoln("Starting up manager:", backend) | ||
|
||
manager, err := manager.NewManager(backend.plugins, | ||
backend.leader, backend.snapshot, backend.pluginName) | ||
if err != nil { | ||
return err | ||
} | ||
|
||
_, err = manager.Start() | ||
if err != nil { | ||
return err | ||
} | ||
|
||
_, stopped, err := rpc.StartPluginAtPath( | ||
filepath.Join(discovery.Dir(), backend.id), | ||
group_rpc.PluginServer(manager), | ||
func() error { | ||
log.Infoln("Stopping manager") | ||
manager.Stop() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Consider moving this to after |
||
return nil | ||
}, | ||
) | ||
if err != nil { | ||
return err | ||
} | ||
|
||
<-stopped // block until done | ||
return err | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,79 @@ | ||
package main | ||
|
||
import ( | ||
"os" | ||
"os/user" | ||
"path/filepath" | ||
"time" | ||
|
||
"github.com/docker/infrakit/discovery" | ||
file_leader "github.com/docker/infrakit/leader/file" | ||
file_store "github.com/docker/infrakit/store/file" | ||
"github.com/spf13/cobra" | ||
) | ||
|
||
const ( | ||
// LeaderFileEnvVar is the environment variable that may be used to customize the plugin leader detection | ||
LeaderFileEnvVar = "INFRAKIT_LEADER_FILE" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you move these env variables to command line args? They will be easier to discover if presented in help text. |
||
|
||
// StoreDirEnvVar is the directory where the configs are stored | ||
StoreDirEnvVar = "INFRAKIT_STORE_DIR" | ||
) | ||
|
||
func getHome() string { | ||
if usr, err := user.Current(); err == nil { | ||
return usr.HomeDir | ||
} | ||
return os.Getenv("HOME") | ||
} | ||
|
||
func defaultLeaderFile() string { | ||
if leaderFile := os.Getenv(LeaderFileEnvVar); leaderFile != "" { | ||
return leaderFile | ||
} | ||
return filepath.Join(getHome(), ".infrakit/leader") | ||
} | ||
|
||
func defaultStoreDir() string { | ||
if storeDir := os.Getenv(StoreDirEnvVar); storeDir != "" { | ||
return storeDir | ||
} | ||
return filepath.Join(getHome(), ".infrakit/configs") | ||
} | ||
|
||
func osEnvironment(backend *backend) *cobra.Command { | ||
|
||
var pollInterval time.Duration | ||
var filename, storeDir string | ||
|
||
cmd := &cobra.Command{ | ||
Use: "os", | ||
Short: "os", | ||
RunE: func(c *cobra.Command, args []string) error { | ||
|
||
plugins, err := discovery.NewPluginDiscovery() | ||
if err != nil { | ||
return err | ||
} | ||
|
||
leader, err := file_leader.NewDetector(pollInterval, filename, backend.id) | ||
if err != nil { | ||
return err | ||
} | ||
|
||
snapshot, err := file_store.NewSnapshot(storeDir, "global.config") | ||
if err != nil { | ||
return err | ||
} | ||
|
||
backend.plugins = plugins | ||
backend.leader = leader | ||
backend.snapshot = snapshot | ||
return nil | ||
}, | ||
} | ||
cmd.Flags().StringVar(&filename, "leader-file", defaultLeaderFile(), "File used for leader election/detection") | ||
cmd.Flags().StringVar(&storeDir, "store-dir", defaultStoreDir(), "Dir to store the config") | ||
cmd.Flags().DurationVar(&pollInterval, "poll-interval", 5*time.Second, "Leader polling interval") | ||
return cmd | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
package main | ||
|
||
import ( | ||
"time" | ||
|
||
log "github.com/Sirupsen/logrus" | ||
"github.com/docker/go-connections/tlsconfig" | ||
"github.com/docker/infrakit/discovery" | ||
swarm_leader "github.com/docker/infrakit/leader/swarm" | ||
swarm_store "github.com/docker/infrakit/store/swarm" | ||
"github.com/docker/infrakit/util/docker/1.24" | ||
"github.com/spf13/cobra" | ||
) | ||
|
||
func swarmEnvironment(backend *backend) *cobra.Command { | ||
|
||
tlsOptions := tlsconfig.Options{} | ||
host := "unix:///var/run/docker.sock" | ||
|
||
var pollInterval time.Duration | ||
|
||
cmd := &cobra.Command{ | ||
Use: "swarm", | ||
Short: "swarm mode for leader detection and storage", | ||
RunE: func(c *cobra.Command, args []string) error { | ||
|
||
dockerClient, err := docker.NewDockerClient(host, &tlsOptions) | ||
log.Infoln("Connect to docker", host, "err=", err) | ||
if err != nil { | ||
return err | ||
} | ||
|
||
leader := swarm_leader.NewDetector(pollInterval, dockerClient) | ||
snapshot, err := swarm_store.NewSnapshot(dockerClient) | ||
if err != nil { | ||
return err | ||
} | ||
|
||
plugins, err := discovery.NewPluginDiscovery() | ||
if err != nil { | ||
return err | ||
} | ||
|
||
backend.plugins = plugins | ||
backend.leader = leader | ||
backend.snapshot = snapshot | ||
return nil | ||
}, | ||
} | ||
|
||
cmd.Flags().DurationVar(&pollInterval, "poll-interval", 5*time.Second, "Leader polling interval") | ||
cmd.Flags().StringVar(&host, "host", host, "Docker host") | ||
cmd.Flags().StringVar(&tlsOptions.CAFile, "tlscacert", "", "TLS CA cert file path") | ||
cmd.Flags().StringVar(&tlsOptions.CertFile, "tlscert", "", "TLS cert file path") | ||
cmd.Flags().StringVar(&tlsOptions.KeyFile, "tlskey", "", "TLS key file path") | ||
cmd.Flags().BoolVar(&tlsOptions.InsecureSkipVerify, "tlsverify", true, "True to skip TLS") | ||
|
||
return cmd | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incremental readability improvement on my end - eliminate the
PersistentPostRunE
and have the subcommands callrunMain()
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.