-
Notifications
You must be signed in to change notification settings - Fork 224
Clarify status of bootkube experimental self-hosted etcd #738
Comments
This is not true. The major problem is not self hosted etcd itself, but etcd HA setup for Kubernetes in general. It is challenging for self hosted etcd, since users would expect etcd/apiserver to recover from failures automatically (which requires manual restarts of api server/etcd right now in non-self-hosted mode) The failing test simply shows that the API server fails to reconnect to etcd after failure injection. We are working with Kubernetes upstream to get etcd HA setup stable, so that we can continue the work. Simply speaking, it is still progressing. |
I agree with the rest of the suggestions. |
I've tried to draw the distinction between general self-hosted etcd work (which is progressing towards stability) and bootkube's self-hosted etcd feature, which is the harder challenge of self-hosted etcd running on Kubernetes and also being the etcd cluster for the cluster and interacting well (which has complexities related to Kubernetes itself, I'm not blaming etcd). That's what's being discussed here. I think we're all on the same page. Up until this point we've used the term "self-hosted etcd" for both. Didn't mean to imply your operator work was in question. |
What I am saying is that the work being done on Kubernetes helps both self hosted and non-self hosted. And that work currently blocks the progress of self hosted more for the reason I just said. Once things get unblocked, most of the issues we see here (in bootkube tests) will resolve itself, and requires almost no action. But, yes, we are not actively changing anything in bootkube for self hosted etcd case. |
Interest in looking into the flakes we're seeing on the self-hosted etcd setup? |
@xiang90 thanks for the background infos, much appreciated. Do you have a link to the upstream discussion? |
@dghubble @xiang90 What are some of the edge cases that prevent the etcd-operator being used to manage the cluster etcd? Does this mean the operator is not recommended for cluster etcd, or that you're cautious about the effect it has on SLAs etc? If it's a lack of HA functionality, my preference would be to work on that upstream rather than pivotting towards other form factors. I think there's a lot of value in using the operator (declarative upgrades, rollbacks, backups, etc.) |
We recently disabled the self-hosted etcd tests, but opened an issue tracking that we should consider re-enabling them in the future when we can spend more time on the feature: #748 |
I just experienced something weird with our k8s cluster yesterday (bootkube v0.7.0) and self-hosted etcd. The After adding But this does not yield much confidence to the self-hosted etcd.. :/ |
I've dropped the self-hosted etcd option from projects matchbox#655 and typhoon#13 if that clarifies things for anyone. Updated the issue to reflect that self-hosted etcd is no longer tested as well. |
Self-hosted etcd has been removed from bootkube for the foreseeable future #828 |
Naïve question, why was this removed? I've been requested to have a similar setup. I have to wonder, what's the case for on-host etcd rather than self-contained? |
@carlos-licea this very thread wholly is dedicated to your question |
@redbaron not trying to be dismissive but this thread is a bit vague and I cannot use it as-is to argue we should have in-host etcd's other than say, "weird things might happen". |
I'd like to track the status of
--experimental-self-hosted-etcd
to address some recurring questions.bootkube self-hosted etcd (as in, self-hosted the cluster's etcd on the cluster itself) is a challenging balancing act to get right. Its not been done before. There are complex interaction edge cases for keeping self-hosted etcd working.
A while back, bootkube experimental self-hosted etcd efforts were paused, while the etcd team's general etcd-operator work continues (i.e. self-hosting etcd, but not for the cluster's etcd itself). The bootkube
--experimental-self-hosted-etcd
feature is not currently progressing toward stability or production readiness, but neither is it disappearing (don't panic). Whether efforts will be resumed in the future as etcd-operator and Kubernetes matures is unknown at this time.At this time there are no changes:
bootkube will still run self-hosted etcd e2e tests and try to keep them working. Best effort.The text was updated successfully, but these errors were encountered: