Use gRPC to communicate the engine and agents #1426

hectorj2f · 2016-02-05T15:30:48Z

This PR aims to provide a new communication mechanism to improve the performance, data transmission and unit state sharing between the fleet engine and agents in a fleet cluster.

Motivation: In our infrastructure, we have experienced some issues with fleet in terms of scalability, performance and fault-tolerance. Therefore we'd like to present our ideas to help improve those areas.

We use gRPC/HTTP2 as framework to expose all the required operations (schedule unit, destroy unit, save state, ...) that will allow to coordinate the engine (the fleet node elected as leader) with the agents. In this implementation, we provide a new registry that stores all the information in-memory. Nevertheless, it's also possible to use the etcd registry.

Generally, this implementation provides two solutions as mentioned above. You can use etcd if that fits better your architecture or requirements. OR you can use the in-memory registry to reduce the dependencies with etcd (but not to avoid using etcd). In that direction, we found out in our infrastructure that a high workload over etcd induces into a poor or wrong behavior of fleet. Besides, we believe that the use of etcd to provide inter-process communication for the agents, it could end up into potential bottlenecks, as well as it has an impact into the fault tolerance of fleet.

Additional information and plots about the motivation of this PR can be found below: #1426 (comment)

This PR has been labeled as WIP, we are still working on improvements, fault tolerance, bug fixing, etc :)

NOTE: If you want to try this PR, you need to rebuild the Go dependencies, and preferably to use Go v1.5.1. This PR was so big that we were forced to exclude our new Go dependencies.

kayrus · 2016-02-05T16:27:56Z

@hectorj2f could you provide more info on use cases of this PR? You are trying to implement new protocol for fleet daemons communication and avoid etcd because it is slow? Is that correct?

hectorj2f · 2016-02-05T18:08:47Z

could you provide more info on use cases of this PR?

@kayrus Yes, we'll add more information to this PR description such as motivation, use cases, etc...

You are trying to implement new protocol for fleet daemons communication and avoid etcd because it is slow? Is that correct?

Generally, this implementation provides two solutions as mentioned above. You can use etcd if that fits better your architecture or requirements. OR you can use the in-memory registry to reduce the dependencies with etcd (but not to avoid using etcd). In that direction, we found out in our infrastructure that a high workload over etcd induces into a poor or wrong behavior of fleet.

mischief · 2016-02-05T18:28:30Z

@hectorj2f perhaps it would be worth your time to investigate using fleet with the etcd v3 api, which uses gRPC.

hectorj2f · 2016-02-08T09:45:17Z

@mischief yes, that won't be a bad idea although we believe that the use of etcd to provide inter-process communication for the agents, it could end up into potential bottlenecks, as well as it has an impact into the fault tolerance of fleet. Perhaps, @htr could add more light into this to clarify our intention with this PR.

On the other hand and related to your suggestion, Do you have benchmarking results to analyze the benefits of etcd v2 VS v3 ? We also use etcd in other projects :).

kayrus · 2016-02-08T10:40:42Z

@hectorj2f there are only these benchmarks: https://github.com/coreos/etcd/tree/master/Documentation/benchmarks

htr · 2016-02-09T09:30:19Z

This PR addresses issues related with our fleet usage pattern. Usually our fleet cluster isn't composed by many nodes (<50), but it is expected to handle a non trivial (at this time) amount of units (>3000).

Currently, fleet's only registry implementation uses etcd to store transient and persistent data. More units means more unitStates being updated per second, which after some time causes some instability (an update can be very quick (<50ms) but also quite slow (~ 1s)).

This PR implements a multiplexing layer on top of the former EtcdRegistry. This means:

from the agent perspective:
- each time an engine change is detected, a grpc connection to the new engine is established
- any Registry operation is forwarded (via grpc) to the engine
from the engine perspective:
- every time a new engine gets "elected", it starts a grpcserver
- when an engine looses it's leadership, it stops the grpcserver, terminating existing connections
- all the transient data is stored inmemory
- the persistent data (units, desired state and schedule) is stored in etcd

Another issue that we encountered was the lack of cleanup (or GC) in fleet's etcd directory. This PR doesn't address that issue, but having a single writer makes the development of a solution much easier.

This implementation can coexist with the traditional etcd-centric implementation.

We have a micro benchmark (note the emphasis on micro benchmark) that we use to compare tweaks, configuration parameters, etc:

start NN units spaced by 100ms and wait until NN-10 units are actually started

The time between the start api call (or change desired state) and the actual start (each unit curls an endpoint with an unique identifier) is measured and stored. This delay is expected to increase with the number of units running in the system.

fleet 0.11.3 with some caching:

this branch:

htr · 2016-02-09T09:37:13Z

perhaps it would be worth your time to investigate using fleet with the etcd v3 api, which uses gRPC.

@mischief we might, but I still think separating the storage of transient and persistent data is important.

xiang90 · 2016-03-16T20:44:11Z

engine/rpcengine.go

+	"github.com/coreos/fleet/registry"
+)
+
+func (e *Engine) rpcLeadership(leaseTTL time.Duration, machID string) lease.Lease {


have you looked at https://github.com/coreos/etcd/blob/master/clientv3/concurrency/election.go?

@xiang90 Yes, this fleet version doesn't use the etcd clientv3 yet. But I always keep one eye on your changes.

hectorj2f · 2016-04-04T14:39:19Z

@kayrus @tixxdz @jonboulle I will update the PR to the latest version. But, do you want me to include grpc dependencies into the Godeps folder ? This could make harder to review this PR due to the amount of the files changed.

kayrus · 2016-04-04T15:02:44Z

@hectorj2f Just create additional PR for godeps.

hectorj2f · 2016-04-04T15:07:19Z

@kayrus Alright, thanks 👍

jonboulle · 2016-04-04T15:08:21Z

hmm, additional commit should be fine..?

On Mon, Apr 4, 2016 at 5:02 PM, kayrus notifications@github.com wrote:

@hectorj2f https://github.com/hectorj2f Just create additional PR for
godeps.

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#1426 (comment)

kayrus · 2016-04-04T15:24:40Z

@jonboulle it will be hard to review whole PR

jonboulle · 2016-04-04T15:54:04Z

Eh, having it in a separate commit seems sufficient to me,but whatever

On Mon, Apr 4, 2016 at 5:24 PM kayrus notifications@github.com wrote:

@jonboulle https://github.com/jonboulle it will be hard to review whole
PR

—
You are receiving this because you were mentioned.

Reply to this email directly or view it on GitHub
#1426 (comment)

hectorj2f · 2016-04-04T18:09:50Z

@jonboulle If we do so, there would be approx. 230 files changed and github doesn't work well with those kind of PR :/.

dongsupark · 2016-04-12T10:57:30Z

Recently I have been looking into this PR.
First of all, I totally agree with the original problem analysis and its approach with using GRPC. This seems to be the best way to avoid bottleneck on the side of etcd. Given the fact that etcd v3 is being implemented based on GRPC, I think this is the way to go.

On the other hand, it was not easy for me to follow each commit in this PR, from older to newer ones. Is there any chance to reorganize the PR, to reduce number of commits? (Though I can already see that it would not be trivial...)

And today I rebased this PR on top of master, fixing a minor thing to compile. (see the branch endocode/dongsu/grpc_engine_communication) And I ran functional tests.
Good news is that it runs without errors, with enable_grpc disabled.
Bad news is that 16 of 20 tests failed, with enable_grpc enabled. I suppose it could be the first issue for us to investigate.

hectorj2f · 2016-04-12T11:26:47Z

Bad news is that 16 of 20 tests failed, with enable_grpc enabled. I suppose it could be the first issue for us to investigate.

@dongsupark I will upgrade this branch against the latest version and fix the tests, asap. I'd inform you if I find any blocking error. But the purpose is to have all the tests passing or adapt them if need it.

Is there any chance to reorganize the PR, to reduce number of commits? (Though I can already see that it would not be trivial...)

I feel your pain. I can squash our commits but we can do that once tests are happy, and code is LGTMed.

antrik · 2016-04-12T12:02:06Z

@hectorj2f I think the point is that having a messy history actually makes the review harder... While github favours reviewing all commits as one big diff, that's not the traditional way to deal with pull requests; and it can be quite daunting for complex changes. Thus it's better to clean up history before submitting upstream, to help the review.

Note that "clean up" doesn't typically mean to squash everything into a single commit -- that wouldn't really help. Such complex changes can usually be made as a logical series of individual commits, each representing some meaningful change that can be explained on its own, and doesn't result in anything breaking...

hectorj2f · 2016-04-12T13:21:44Z

@antrik Alright!. It makes sense, I'll group the commits and add helpful messages for them.

dongsupark · 2016-04-12T14:33:31Z

@hectorj2f

16 of 20 tests failed, with enable_grpc enabled

Ah, sorry. This was due to my mistake. For some reason I set ExecStart=/opt/fleet/fleetd --enable_grpc=1 in util/config.go, which was not supported. I should have configured enable_grpc=1 in /opt/fleet/fleet.conf. Doing that, functional test is working correctly. Sorry for the noise. (Fundamentally I don't like nc.Fleetctl failing silently without printing out stderr, but that's another issue..)

As for the logical structure of commits, I think it was not that bad in the beginning. The first 10 commits were relatively good to understand, well-structured. It's just that at some point it started to grow like that. :-)

hectorj2f · 2016-08-01T17:37:38Z

@dongsupark I rebased against master but I am getting some errors in the tests of fleetctl: https://gist.github.com/hectorj2f/c53edf850f67f6279559749a30727862

Does it ring any bell ? Where could I look at ?

hectorj2f · 2016-08-01T17:38:59Z

@dongsupark Should I add the LICENSE to the header of the new files ? or you'd do it once it is merged.

dongsupark · 2016-08-01T18:33:10Z

@hectorj2f That looks like an error from unit tests. I haven't seen such an error.
Sorry, but I'm on vacation this week. I'll be able to look into that probably next week.

hectorj2f · 2016-08-01T23:23:01Z

Alright @dongsupark ! I'd see what I can do.

dongsupark · 2016-08-11T09:09:10Z

FTR, the error in the unit test was already solved earlier in this week.
Merge conflict between #1426 and #1657 was also solved yesterday.

LGTM.
I'll create another PR just for running functional tests on semaphoreci.
If that passes, I'd like to merge #1426, #1657, and #1656 tomorrow.

hectorj2f · 2016-08-11T10:43:01Z

Great @dongsupark

jonboulle added needs review area/performance labels Feb 5, 2016

hectorj2f mentioned this pull request Feb 29, 2016

Allow submission of custom fleet systemd units and timers giantswarm/nomi#14

Closed

2 tasks

jonboulle added this to the v1.0.0 milestone Mar 15, 2016

jonboulle mentioned this pull request Mar 15, 2016

Let's fix presence #746

Closed

jonboulle added reviewed/needs rework and removed needs review labels Mar 15, 2016

xiang90 reviewed Mar 16, 2016
View reviewed changes

hectorj2f changed the title ~~[WIP] Use gRPC to communicate the engine and agents~~ Use gRPC to communicate the engine and agents Apr 4, 2016

hectorj2f mentioned this pull request Apr 6, 2016

Add fleet-grpc related godeps packages #1426 #1534

Closed

dongsupark added the priority/P1 label May 9, 2016

hectorj2f and others added 10 commits August 1, 2016 17:55

rpcregistry: uses registryClient directly

153f28a

Migrated engineChanged to a callback instead of a chan

9cf5cc9

Sets default no-block to false

d75683b

Spell checking and made all the linters happy

928add2

vendorized things

819d324

Disable grpc by default

49fef99

New leadership election algorithm

6a745ba

Add Status health check for the rpc server

5dad559

Add support to handle disable|enable grpc engine configuration

83ada93

Share unit states for non grpc fleet agents

014da29

hectorj2f added 4 commits August 8, 2016 12:24

Set a big TTL for the lease when using grpc

49a0b3e

bugfixes in registry/rpc and fixed imports

1211d5f

Added vendors with glide

dce0de7

Rebased against master

b39e193

hectorj2f force-pushed the grpc_engine_communication branch from 74e147a to b39e193 Compare August 8, 2016 10:34

dongsupark mentioned this pull request Aug 9, 2016

functional: run tests with enable_grpc turned both on and off #1656

Merged

hectorj2f mentioned this pull request Aug 10, 2016

Add fleet grpc vendoring #1657

Merged

Added missed packages to glide.lock

e2f95df

dongsupark mentioned this pull request Aug 11, 2016

[WIP] grpc + vendoring + functional test #1658

Closed

dongsupark added reviewed/lgtm and removed reviewed/needs rework labels Aug 12, 2016

dongsupark merged commit d96d80b into coreos:master Aug 12, 2016

This was referenced Aug 12, 2016

*: fix travis errors from gRPC patches #1659

Merged

test: add registry/rpc to the list of unit tests #1660

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use gRPC to communicate the engine and agents #1426

Use gRPC to communicate the engine and agents #1426

hectorj2f commented Feb 5, 2016

kayrus commented Feb 5, 2016

hectorj2f commented Feb 5, 2016

mischief commented Feb 5, 2016

hectorj2f commented Feb 8, 2016

kayrus commented Feb 8, 2016

htr commented Feb 9, 2016

htr commented Feb 9, 2016

xiang90 Mar 16, 2016

hectorj2f Mar 17, 2016

hectorj2f commented Apr 4, 2016

kayrus commented Apr 4, 2016

hectorj2f commented Apr 4, 2016

jonboulle commented Apr 4, 2016

kayrus commented Apr 4, 2016

jonboulle commented Apr 4, 2016

hectorj2f commented Apr 4, 2016

dongsupark commented Apr 12, 2016

hectorj2f commented Apr 12, 2016

antrik commented Apr 12, 2016

hectorj2f commented Apr 12, 2016

dongsupark commented Apr 12, 2016

hectorj2f commented Aug 1, 2016

hectorj2f commented Aug 1, 2016

dongsupark commented Aug 1, 2016

hectorj2f commented Aug 1, 2016

dongsupark commented Aug 11, 2016

hectorj2f commented Aug 11, 2016

Use gRPC to communicate the engine and agents #1426

Use gRPC to communicate the engine and agents #1426

Conversation

hectorj2f commented Feb 5, 2016

kayrus commented Feb 5, 2016

hectorj2f commented Feb 5, 2016

mischief commented Feb 5, 2016

hectorj2f commented Feb 8, 2016

kayrus commented Feb 8, 2016

htr commented Feb 9, 2016

htr commented Feb 9, 2016

xiang90 Mar 16, 2016

Choose a reason for hiding this comment

hectorj2f Mar 17, 2016

Choose a reason for hiding this comment

hectorj2f commented Apr 4, 2016

kayrus commented Apr 4, 2016

hectorj2f commented Apr 4, 2016

jonboulle commented Apr 4, 2016

kayrus commented Apr 4, 2016

jonboulle commented Apr 4, 2016

hectorj2f commented Apr 4, 2016

dongsupark commented Apr 12, 2016

hectorj2f commented Apr 12, 2016

antrik commented Apr 12, 2016

hectorj2f commented Apr 12, 2016

dongsupark commented Apr 12, 2016

hectorj2f commented Aug 1, 2016

hectorj2f commented Aug 1, 2016

dongsupark commented Aug 1, 2016

hectorj2f commented Aug 1, 2016

dongsupark commented Aug 11, 2016

hectorj2f commented Aug 11, 2016