Skip to content
This repository has been archived by the owner on Jan 30, 2020. It is now read-only.

Implement follow_only flag #1263

Merged
merged 1 commit into from
Sep 29, 2015
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion Documentation/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,10 @@ Every system in the fleet cluster runs a single `fleetd` daemon. Each daemon enc
- The engine uses a _lease model_ to enforce that only one engine is running at a time. Every time a reconciliation is due, an engine will attempt to take a lease on etcd. If the lease succeeds, the reconciliation proceeds; otherwise, that engine will remain idle until the next reconciliation period begins.
- The engine uses a simplistic "least-loaded" scheduling algorithm: when considering where to schedule a given unit, preference is given to agents running the smallest number of units.

The reconciliation loop of the engine can be disabled with the `--disable-engine` flag. This means that
this `fleetd` daemon will *never* become a cluster leader. If all running daemons have this setting,
your cluster is dead; i.e. no jobs will be scheduled. Use with care.

### Agent

- The agent is responsible for actually executing Units on systems. It communicates with the local systemd instance over D-Bus.
Expand All @@ -19,7 +23,7 @@ Every system in the fleet cluster runs a single `fleetd` daemon. Each daemon enc

## etcd

etcd is the sole datastore in a fleet cluster. All persistent and ephemeral data is stored in etcd: unit files, cluster presence, unit state, etc.
etcd is the sole datastore in a fleet cluster. All persistent and ephemeral data is stored in etcd: unit files, cluster presence, unit state, etc.

etcd is also used for all internal communication between fleet engines and agents.

Expand Down
8 changes: 4 additions & 4 deletions Documentation/fleet-scaling.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ minimizing the load it puts on etcd. This is true for reads, writes, and
watches.

## Known issues

Currently when fleet schedules a job *all* `fleetd`s are woken up (via a watch)
and then do a recursive GET on the Unit file in etcd to figure out if it should
schedule a job. This is a very expensive operation.
Expand Down Expand Up @@ -33,9 +34,8 @@ wins:
this is an expensive operation. The fewer nodes that are engaged in this
election, the better. Possible downside is that if there isn't a leader at
all, the cluster is inoperable. However the (usually) 5 machines running
etcd are also a single point of failure. Proposal:
https://github.com/coreos/fleet/pull/1263
etcd are also a single point of failure. See the `--disable-engine` flag.

* Making some defaults exported and allow them to be overridden. For instance
fleet's tokenLimit controls how many Units are listed per "page". Proposal:
https://github.com/coreos/fleet/pull/1265
fleet's tokenLimit controls how many Units are listed per "page". See the
`--token-limit` flag.
1 change: 1 addition & 0 deletions config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ type Config struct {
RawMetadata string
AgentTTL string
TokenLimit int
DisableEngine bool
VerifyUnits bool
AuthorizedKeysFile string
}
Expand Down
2 changes: 2 additions & 0 deletions fleetd/fleetd.go
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@ func main() {
cfgset.String("metadata", "", "List of key-value metadata to assign to the fleet machine")
cfgset.String("agent_ttl", agent.DefaultTTL, "TTL in seconds of fleet machine state in etcd")
cfgset.Int("token_limit", 100, "Maximum number of entries per page returned from API requests")
cfgset.Bool("disable_engine", false, "Disable the engine entirely, use with care")
cfgset.Bool("verify_units", false, "DEPRECATED - This option is ignored")
cfgset.String("authorized_keys_file", "", "DEPRECATED - This option is ignored")

Expand Down Expand Up @@ -188,6 +189,7 @@ func getConfig(flagset *flag.FlagSet, userCfgFile string) (*config.Config, error
PublicIP: (*flagset.Lookup("public_ip")).Value.(flag.Getter).Get().(string),
RawMetadata: (*flagset.Lookup("metadata")).Value.(flag.Getter).Get().(string),
AgentTTL: (*flagset.Lookup("agent_ttl")).Value.(flag.Getter).Get().(string),
DisableEngine: (*flagset.Lookup("disable_engine")).Value.(flag.Getter).Get().(bool),
VerifyUnits: (*flagset.Lookup("verify_units")).Value.(flag.Getter).Get().(bool),
TokenLimit: (*flagset.Lookup("token_limit")).Value.(flag.Getter).Get().(int),
AuthorizedKeysFile: (*flagset.Lookup("authorized_keys_file")).Value.(flag.Getter).Get().(string),
Expand Down
26 changes: 16 additions & 10 deletions server/server.go
Original file line number Diff line number Diff line change
Expand Up @@ -45,15 +45,16 @@ const (
)

type Server struct {
agent *agent.Agent
aReconciler *agent.AgentReconciler
usPub *agent.UnitStatePublisher
usGen *unit.UnitStateGenerator
engine *engine.Engine
mach *machine.CoreOSMachine
hrt heart.Heart
mon *heart.Monitor
api *api.Server
agent *agent.Agent
aReconciler *agent.AgentReconciler
usPub *agent.UnitStatePublisher
usGen *unit.UnitStateGenerator
engine *engine.Engine
mach *machine.CoreOSMachine
hrt heart.Heart
mon *heart.Monitor
api *api.Server
disableEngine bool

engineReconcileInterval time.Duration

Expand Down Expand Up @@ -131,6 +132,7 @@ func New(cfg config.Config) (*Server, error) {
api: apiServer,
stop: nil,
engineReconcileInterval: eIval,
disableEngine: cfg.DisableEngine,
}

return &srv, nil
Expand Down Expand Up @@ -174,7 +176,11 @@ func (s *Server) Run() {
go s.mach.PeriodicRefresh(machineStateRefreshInterval, s.stop)
go s.agent.Heartbeat(s.stop)
go s.aReconciler.Run(s.agent, s.stop)
go s.engine.Run(s.engineReconcileInterval, s.stop)
if s.disableEngine {
log.Info("Not starting engine; disable-engine is set")
} else {
go s.engine.Run(s.engineReconcileInterval, s.stop)
}

beatchan := make(chan *unit.UnitStateHeartbeat)
go s.usGen.Run(beatchan, s.stop)
Expand Down