Skip to content
This repository has been archived by the owner on Jan 30, 2020. It is now read-only.

Implement follow_only flag #1263

Merged
merged 1 commit into from
Sep 29, 2015
Merged

Implement follow_only flag #1263

merged 1 commit into from
Sep 29, 2015

Conversation

miekg
Copy link
Contributor

@miekg miekg commented Jun 24, 2015

If set this fleetd will never assume mastership in a fleet cluster.
This will cut down on the interactions with Etcd.

Note: this PR is rather hackish, it would be nice to know if something like will be considered for merging (if cleaned up, etc .etc.). IOW: comments welcome!

Fix #1166

@bcwaldon
Copy link
Contributor

bcwaldon commented Jul 9, 2015

@miekg In what scenario has this helped you so far? I worry that a cluster will degrade to only those nodes that are --follow-only and the cluster will mysteriously stop making any progress.

@miekg
Copy link
Contributor Author

miekg commented Jul 9, 2015

[ Quoting notifications@github.com in "Re: [fleet] Implement follow_only f..." ]

@miekg In what scenario has this helped you so far?

We basically have two type of nodes in our clusters: worker and admin nodes. The
worker nodes come and go (in large numbers) and these should just be used to
schedule jobs. We never want them to become leaders. The admin nodes control
the entire cluster and we just want one of those to become the fleet leader.

I worry that a cluster will
degrade to only those nodes that are --follow-only and the cluster will
mysteriously stop making any progress.

So, this is why I'm also adding metrics to fleet. Without admin nodes our
cluster is useless anyway (because no etcd - among other things), so fleet will
then be also broken, regardless of where the fleet leader actually is.

/Miek

Miek Gieben

@bcwaldon
Copy link
Contributor

bcwaldon commented Jul 9, 2015

@miekg ok, that makes sense. Could you start a "scaling" document that helps people understand how to use these features to scale out fleet?

@@ -58,7 +58,7 @@ func New(reg *registry.EtcdRegistry, lManager lease.Manager, rStream pkg.EventSt
}
}

func (e *Engine) Run(ival time.Duration, stop chan bool) {
func (e *Engine) Run(ival time.Duration, follower bool, stop chan bool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the Engine simply not run if follow-only == true?

@miekg
Copy link
Contributor Author

miekg commented Jul 9, 2015

[ Quoting notifications@github.com in "Re: [fleet] Implement follow_only f..." ]

@miekg ok, that makes sense. Could you start a "scaling" document that helps people understand how to use these features to scale out fleet?

Ok, will do.

@miekg
Copy link
Contributor Author

miekg commented Aug 10, 2015

Ok, reworked PR to not start the scheduler at all and renamed the flag to no_scheduler. PTAL.

@miekg
Copy link
Contributor Author

miekg commented Aug 24, 2015

Friendly ping @bcwaldon

@mwitkow
Copy link
Contributor

mwitkow commented Sep 5, 2015

@mischief, can you take a look at this? This change is crucial in making fleet scale to ~1000s of machines.

@miekg
Copy link
Contributor Author

miekg commented Sep 22, 2015

Dear fleet devs :) @jonboulle @mischief ... can we merge this as well?

@jonboulle
Copy link
Contributor

What made you choose "scheduler" rather than "engine"?

@miekg
Copy link
Contributor Author

miekg commented Sep 23, 2015

wrt the naming in the PR? Wasn't I conscious choice, felt more natural to use 'scheduler', but 'engine' might be better describing what it does code wise. Or maybe 'no_master' . Happy to make rename it to whatever you like best.

@jonboulle
Copy link
Contributor

Oh, I see, @bcwaldon originally suggested it @ #1263 (comment) . Personally I would find --disable-engine much more intuitive and consistent with the documentation/code (e.g.)

@miekg
Copy link
Contributor Author

miekg commented Sep 23, 2015

I have no problem making that change.

@jonboulle
Copy link
Contributor

I think I'd like to see
a) change to --disable-engine
b) an explanation in the architecture document
c) some big bold warnings in all of the relevant documentation (to try mitigate this concern)

Other than that, the implementation seems fine, fairly straightforward. Thanks!

If set this fleetd will never assume mastership in a fleet cluster.
This will cut down on the interactions with etcd.

Useful when when you have two type of nodes in our clusters: worker and
admin nodes. The worker nodes come and go (in large numbers) and these
should just be used to schedule jobs. We never want them to become
leaders. The admin nodes control the entire cluster and we just want one
of those to become the fleet leader.

Add documentation and detail the consequences this can have.
@miekg
Copy link
Contributor Author

miekg commented Sep 25, 2015

Ok. Change to --disable-engine. Added docs, squashed the commits.
PTAL.

@jonboulle
Copy link
Contributor

Thanks! LGTM.

jonboulle added a commit that referenced this pull request Sep 29, 2015
@jonboulle jonboulle merged commit 3178aed into coreos:master Sep 29, 2015
@miekg
Copy link
Contributor Author

miekg commented Sep 29, 2015

Cool! Thank you.
On 29 Sep 2015 01:33, "Jonathan Boulle" notifications@github.com wrote:

Thanks! LGTM.


Reply to this email directly or view it on GitHub
#1263 (comment).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants