Add Kubernetes topo implementation #5703

carsonoid · 2020-01-13T18:01:52Z

I wanted to get some initial feedback on my implementation of the Kubernetes topology service. This resolves #5701.

The code currently works to bring up a basic cluster on a linux host using:

TOPO=k8s ./101_initial_cluster.sh

Known Issues that warrant discussion:

The current deployment uses k3s running locally to provide a minimal Kubernetes API. This means that the topology is not available to anyone but Linux users.
The binaries are much larger now thanks to the size of client-go.
Scale concerns:
- All components currently maintain a cache of all the configmaps because they all start the topo server. This may not scale and could slow down vtctl or other operations that are done against the topology with processes that are not long-lived if there is too much topology data.
  - This may actually be fine. I'm not sure how much data is actually in the topology service of a large Vitess installation.
- Client-go watching does not support objects > 1MB in size. How big can individual documents get in the topology?

carsonoid · 2020-01-13T18:02:27Z

@derekperkins feedback and discussion appreciated.

morgo

A couple of small comments.

bootstrap.sh

examples/local/k3s-up.sh

examples/local/env.sh

derekperkins · 2020-01-14T20:16:14Z

I think this is awesome and I think all the getting started docs should be updated after this merges to remove the etcd-operator dependency.

I'm very unfamiliar with the topo, so I'm not the best person to comment on the actual implementation, but I've got a couple of questions:

How much larger are the binaries? Is it worth the effort to use a smaller library like https://github.com/ericchiang/k8s?
Should this and the other topo implementations be put behind build tags? We would have to include the existing clients by default I think to not break BC, but it'd be nice to be able to exclude them from a custom build.

carsonoid

To answer @derekperkins question its ~20MB to all binaries that include client-go. Here are the size increases in detail.

COMPONENT	BEFORE	AFTER	INCREASE
bin/vtcombo	42M	63M	67%
bin/vtctl	41M	61M	67%
bin/vtctld	47M	67M	70%
bin/vtgate	30M	50M	60%
bin/vttablet	41M	62M	66%
bin/vtworker	33M	53M	62%

PaulFurtado · 2020-01-21T23:04:35Z

go/vt/topo/kubernetestopo/file.go

+	node.value = string(contents)
+
+	// Create "file" object
+	return &corev1.ConfigMap{


Rather than using ConfigMaps, I think you should probably define a CustomResourceDefinition for these. Doing it via a custom resource definition means that:

You can easily define separate ACLs for ConfigMaps and the Vitess Topology objects

You could create a schema to define your data if it is well-defined

You could make kubectl display useful default columns for vitess topology objects

Cluster administrators can shard each object type into separate etcd clusters behind the scenes (ex: massive kube clusters may actually put events objects in a different etcd cluster due to the update volume)

You don't pollute the output of kubectl get configmaps with thousands of topology-related objects

For context, there's a separate Vitess operator that uses CRDs to manage the whole cluster, so not using them is a choice, not a lack of understanding. @carsonoid actually spoke about operators at Kubecon in December.

The purpose of this is to have a dead simple topo available for free, especially for quick onboarding and smaller Vitess installations. If you care to shard etcd or do anything else, I think you would then just use the direct etcd topo implementation and manage it yourself, which is what the current helm implementation uses via etcd-operator.

vtctlclient is the Vitess way to connect to the topo, so managing resources via kubectl seems unnecessary here. Maybe I'm incorrect, but I believe that using CRDs would require us to deploy a separate controller to manage them, which is what this PR is specifically trying to avoid.

All that being said, maybe there is a good enough reason to use CRDs or change how we're using ConfigMaps, I just wanted to step back to make sure we're looking at the overall reasons for adding a new topo implementation.

Rather than using ConfigMaps, I think you should probably define a CustomResourceDefinition for these. ...

I 100% agree that there are a lot of benefits that come with using a CRD for the resource instead of a ConfigMap. In fact, I have gone back and forth on using them here. As @derekperkins stated: the goal is to allow for a dead-simple topology server for small Kubernetes-based Vitess clusters. But we do need to balance that with scale and stability concerns.

There is added overhead when using a CRD for the objects:

We would have to use https://github.com/kubernetes/code-generator to create the API and client code. Making sure to fit that into the current build system.

We would need to figure out how to the CRD definition when deploying Vitess to a new cluster. This may be added work for the end-user or for the code maintainers, depending on who ends up owning this process. I.E. do we have a vitess component make and maintain the CRD def automatically? Or do we require that it be done as part of the first-time work of running Vitess in a cluster?

TLDR: The CM/CRD decision is still up in the air

For context, there's a separate Vitess operator that uses CRDs to manage the whole cluster, so not using them is a choice, not a lack of understanding. @carsonoid actually spoke about operators at Kubecon in December.

Yeah, for background on my end, I lead the team that owns the Kubernetes clusters over at HubSpot so I definitely keep up with vitess development a bit.

The purpose of this is to have a dead simple topo available for free, especially for quick onboarding and smaller Vitess installations. If you care to shard etcd or do anything else, I think you would then just use the direct etcd topo implementation and manage it yourself, which is what the current helm implementation uses via etcd-operator.

There's no real reason this implementation cannot scale, however, and even for larger clusters there is a lot of benefit of doing this via the kubernetes API versus etcd. The etcd-operator alone isn't suitable to give larger orgs production-grade etcd clusters with appropriate auditing, backups, and authentication. The SQL team must then either own their own topology server, or depend on an etcd, zookeeper, or consul team. Removing that requirement and using the kubernetes API directly eliminates that burden.

vtctlclient is the Vitess way to connect to the topo, so managing resources via kubectl seems unnecessary here.

It's not totally unprecedented for debugging to take us into the underlying topology store to debug or fix issues by hand, however. But yeah, the fields to display as kubectl columns would be pretty limited (like maybe just the keyspace and tablet id)

Maybe I'm incorrect, but I believe that using CRDs would require us to deploy a separate controller to manage them, which is what this PR is specifically trying to avoid.

Creating a new CRD definition in the cluster is just a kubectl apply away, and then it's just another object you can put data in, so no additional controller code is necessary.

We would need to figure out how to the CRD definition when deploying Vitess to a new cluster. This may be added work for the end-user or for the code maintainers, depending on who ends up owning this process. I.E. do we have a vitess component make and maintain the CRD def automatically? Or do we require that it be done as part of the first-time work of running Vitess in a cluster?

For the case of creating the CRD definitions in the API, it seems reasonable enough to make someone kubectl apply it, but also seems pretty reasonable to automatically create it if it doesn't exist and tell the user to do it manually if you hit a permission error. They already have to apply other things, so it shouldn't be too much of a burden on the user's end (and far far less work than setting etcd or zk) so long as the error messaging is clear if when it hasn't been created yet.

Anyway, I think we probably agree that CRDs are the correct technical solution, but I do understand that ConfigMaps are slightly easier to use as they don't require the code generation bit or applying the definition into the cluster; but if you can nail those two pieces down, they're really the only cons.

There's no real reason this implementation cannot scale, however, and even for larger clusters there is a lot of benefit of doing this via the kubernetes API versus etcd. The etcd-operator alone isn't suitable to give larger orgs production-grade etcd clusters with appropriate auditing, backups, and authentication. The SQL team must then either own their own topology server, or depend on an etcd, zookeeper, or consul team. Removing that requirement and using the kubernetes API directly eliminates that burden.

That's great feedback, I hadn't expected someone at HubSpot scale to be interested in putting the topo into k8s, we can definitely revisit the expectations for this topo implementation. Questions I have around that:

Curious how that scales when using GKE or another hosted system that controls the masters

What about global vs separate zone topos? Unless you did something to shard those under the hood, you'd have a SPOF. (to be fair, that's how helm does it with everything in a single etcd-operator cluster)

Where do we draw the line between this topo implementation and the actual operator?

Another important question - what happens when you're upgrading a cluster? How does that impact topo availability?

Curious how that scales when using GKE or another hosted system that controls the masters

What is your update frequency on these objects? Our kubernetes clusters push infinitely more QPS through etcd than vitess does to its topology backend. I doubt we'd even perceive a load increase. Consider that every pod or node status change or event incurs a write to the kubernetes API. We don't even run particularly large etcd servers and often push 100-400MiB/sec through them from kubernetes without breaking a sweat. A given pod is likely more load on the kubernetes API than your topology data for that pod. Best of all, in this case someone else owns your production grade etcd cluster and the ops aren't your problem.

What about global vs separate zone topos? Unless you did something to shard those under the hood, you'd have a SPOF. (to be fair, that's how helm does it with everything in a single etcd-operator cluster)

As it currently stands, we do not make use of global vs local as our current intention for multi-region is that every region is a fully independent shard of our entire stack. However, we actually fo operate a "global" kubernetes API which is distributed across regions - it's etcd + a kubernetes apiserver with no kubernetes nodes that we use to stash custom resources in that we then replicate into region-local kubernetes clusters. So in our case, we actually would be capable of using a global kubernetes api for topology if we wanted to, but as long as the global topology can use a different backend configuration than local topology, I think you'd cover all use cases.

Where do we draw the line between this topo implementation and the actual operator?

The topology backend should definitely be independent from the operator. You wouldn't want to force people onto the operator in order to use it. In general, I perceive the kubernetes API as a CRUD service on typed objects that all other components then use to stash their states in, so it's as good of a topology store as any other. When you create a pod (or anything) against the kubernetes API, nothing occurs synchronously - the scheduler, controller, and nodes all just watch the API to see if there are any declarative objects they are supposed to take action on.

Another important question - what happens when you're upgrading a cluster? How does that impact topo availability?

Kubernetes cluster upgrades are zero API downtime. The controller-manager and scheduler may have a blip in activity while all masters aren't on the same version, but the API is up the whole time.

as long as the global topology can use a different backend configuration than local topology, I think you'd cover all use cases.

If someone wanted to use different topology types, I'm pretty sure that would require an upstream change since there's only the one topo flag I'm aware of that controls the global and local plugin: -topo_implementation. In your case where you'd still be using the k8s topo for both global and local, that shouldn't require any extra work.

This all sounds great. I think I'm onto team CRD. Whether someone is getting started with helm or the operator when it's more available, we should be able to register the CRDs silently so that the user doesn't have to know about the topo configuration. For more advanced users, they can just apply them directly.

@enisoc any feedback on this discussion? Anything that we're not considering?

FWIW I'm also convinced about the CRD change (I was already leaning there anyway). I will work on moving the current implementation over. Using ConfigMaps worked as a quick test of the concept. But the CRD path will be much better now that this is getting more traction.

carsonoid · 2020-02-05T00:42:10Z

I've got the ConfigMap based implementation working. I'm going to start work on converting it to a CustomResource

carsonoid · 2020-02-28T23:05:25Z

I've got it completely moved over to CRDs now. I just want to run through the local example and make sure everything works and then we can talk about next steps.

carsonoid · 2020-02-29T18:45:52Z

The full test is done! I've been able to run through the entire local example using the k8s api as a topology store and all of the examples worked!

deepthi · 2020-03-11T00:21:29Z

@carsonoid can you update the copyright on all the files to 2020, and also fix the DCO (--signoff)? That will kick the build so that we can get test status (it doesn't seem to have run properly the last time).

carsonoid · 2020-03-11T15:58:16Z

@deepthi updated as requested. It looks like some of the tests are currently failing due to an unrelated maven repo timeout though. I'll retry them in a bit.

deepthi

Very nice work.
I understand that we need to run some kind of k8s server to act as the topo, and k3s is the most lightweight choice, though it is limited to Linux.
It's great that you have included all the topo tests that are available for the other flavors.
My comments are mostly nits and/or naming except for the incorrect copyright notices.

go/cmd/topo2topo/plugin_kubernetestopo.go

go/vt/topo/kubernetestopo/VitessTopoNodes-crd.yaml

go/cmd/topo2topo/plugin_kubernetestopo.go

go/vt/topo/kubernetestopo/config.go

go/vt/topo/kubernetestopo/directory.go

vendor/vendor.json

carsonoid · 2020-03-13T22:27:14Z

@deepthi thanks for all the nits :) I have updated nearly all of them as requested. I have also explicitly set up the tests to report as skipped when running the tests on a non-linux machine.

deepthi

Looks good with the changes (except for one copyright notice).
I can approve once tests are all passing.

go/vt/vtctl/plugin_kubernetestopo.go

Signed-off-by: Carson Anderson <ca@carsonoid.net>

carsonoid · 2020-03-14T20:37:52Z

@deepthi updated as requested and the check page shows all green.

morgo · 2020-03-16T21:44:03Z

Note to all: There was one check that was showing as yellow, which is local example on Ubuntu. This is because this test is required (and renamed in this PR to local example using etcd on ubuntu).

I have temporarily disabled the required test. We can re-enable it a couple of days after this PR merges so that it gives in-flight PRs a chance to merge without having to rebase on master (and successfully pick up this rename).

derekperkins

LGTM. I'm excited to have this merged!

derekperkins · 2020-03-17T02:11:32Z

@PaulFurtado Any last feedback before we merge this?

This reverts commit f97eef3, reversing changes made to 9e74b07. Signed-off-by: Andrew Mason <amason@slack-corp.com>

carsonoid requested a review from sougou as a code owner January 13, 2020 18:01

deepthi requested review from derekperkins and enisoc January 13, 2020 19:06

morgo reviewed Jan 13, 2020

View reviewed changes

bootstrap.sh Outdated Show resolved Hide resolved

examples/local/k3s-up.sh Outdated Show resolved Hide resolved

derekperkins reviewed Jan 14, 2020

View reviewed changes

examples/local/env.sh Show resolved Hide resolved

carsonoid force-pushed the kubernetestopo branch from be868ab to 1819c87 Compare January 17, 2020 19:12

carsonoid commented Jan 17, 2020

View reviewed changes

carsonoid force-pushed the kubernetestopo branch from b5acdb9 to 0a27a91 Compare January 17, 2020 23:12

PaulFurtado reviewed Jan 21, 2020

View reviewed changes

carsonoid force-pushed the kubernetestopo branch 6 times, most recently from b6dcc42 to c238e28 Compare February 5, 2020 00:15

morgo mentioned this pull request Feb 24, 2020

Make topology implementation in helm configurable #5322

Closed

carsonoid force-pushed the kubernetestopo branch from c238e28 to bb2e795 Compare February 28, 2020 23:03

carsonoid changed the title ~~[WIP] Add Kubernetes topo implementation~~ Add Kubernetes topo implementation Feb 29, 2020

carsonoid force-pushed the kubernetestopo branch from bb2e795 to 7da40f3 Compare February 29, 2020 18:44

carsonoid force-pushed the kubernetestopo branch 4 times, most recently from db6bee2 to 6cf257a Compare March 1, 2020 20:20

morgo mentioned this pull request Mar 5, 2020

101_initial_cluster not compatible Kubernetes 1.16 #5411

Closed

carsonoid force-pushed the kubernetestopo branch 2 times, most recently from 86c592e to 7b272ea Compare March 11, 2020 15:54

carsonoid force-pushed the kubernetestopo branch 6 times, most recently from 3385ff2 to 29a0fe1 Compare March 11, 2020 22:36

deepthi reviewed Mar 12, 2020

View reviewed changes

carsonoid force-pushed the kubernetestopo branch 3 times, most recently from 4cb91aa to eec1c82 Compare March 13, 2020 22:18

carsonoid force-pushed the kubernetestopo branch from eec1c82 to 4e64414 Compare March 13, 2020 22:43

deepthi reviewed Mar 14, 2020

View reviewed changes

go/vt/vtctl/plugin_kubernetestopo.go Outdated Show resolved Hide resolved

carsonoid added 2 commits March 14, 2020 09:49

Add Kubernetes topo implementation

fcf1473

Signed-off-by: Carson Anderson <ca@carsonoid.net>

Test the k8s and etcd topos in the local-example

72f1423

Signed-off-by: Carson Anderson <ca@carsonoid.net>

carsonoid force-pushed the kubernetestopo branch from 4e64414 to 72f1423 Compare March 14, 2020 15:49

derekperkins approved these changes Mar 17, 2020

View reviewed changes

deepthi approved these changes Mar 17, 2020

View reviewed changes

derekperkins merged commit f97eef3 into vitessio:master Mar 18, 2020

ajm188 added a commit to tinyspeck/vitess that referenced this pull request Jun 22, 2020

Revert "Merge pull request vitessio#5703 from carsonoid/kubernetestopo"

257355a

This reverts commit f97eef3, reversing changes made to 9e74b07. Signed-off-by: Andrew Mason <amason@slack-corp.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Kubernetes topo implementation #5703

Add Kubernetes topo implementation #5703

carsonoid commented Jan 13, 2020 •

edited

Loading

carsonoid commented Jan 13, 2020

morgo left a comment

derekperkins commented Jan 14, 2020

carsonoid left a comment •

edited

Loading

PaulFurtado Jan 21, 2020

derekperkins Jan 22, 2020

carsonoid Jan 22, 2020

PaulFurtado Jan 22, 2020

derekperkins Jan 22, 2020

derekperkins Jan 22, 2020

PaulFurtado Jan 22, 2020

derekperkins Jan 22, 2020 •

edited

Loading

derekperkins Jan 22, 2020

carsonoid Jan 23, 2020

carsonoid commented Feb 5, 2020

carsonoid commented Feb 28, 2020

carsonoid commented Feb 29, 2020

deepthi commented Mar 11, 2020

carsonoid commented Mar 11, 2020

deepthi left a comment

carsonoid commented Mar 13, 2020

deepthi left a comment

carsonoid commented Mar 14, 2020

morgo commented Mar 16, 2020

derekperkins left a comment

derekperkins commented Mar 17, 2020

Add Kubernetes topo implementation #5703

Add Kubernetes topo implementation #5703

Conversation

carsonoid commented Jan 13, 2020 • edited Loading

carsonoid commented Jan 13, 2020

morgo left a comment

Choose a reason for hiding this comment

derekperkins commented Jan 14, 2020

carsonoid left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

derekperkins Jan 22, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

carsonoid commented Feb 5, 2020

carsonoid commented Feb 28, 2020

carsonoid commented Feb 29, 2020

deepthi commented Mar 11, 2020

carsonoid commented Mar 11, 2020

deepthi left a comment

Choose a reason for hiding this comment

carsonoid commented Mar 13, 2020

deepthi left a comment

Choose a reason for hiding this comment

carsonoid commented Mar 14, 2020

morgo commented Mar 16, 2020

derekperkins left a comment

Choose a reason for hiding this comment

derekperkins commented Mar 17, 2020

carsonoid commented Jan 13, 2020 •

edited

Loading

carsonoid left a comment •

edited

Loading

derekperkins Jan 22, 2020 •

edited

Loading