-
Notifications
You must be signed in to change notification settings - Fork 741
selfHosted: fix backup unable to talk to etcd pods #1108
Conversation
Can we add a test for self hosted etcd + backup? |
Right. I am going to if the solution is right. |
@rimusz it would be great if you can give this patch a try. |
I can try it. |
@hongchaodeng fix is working |
ok, it only works on vagrant cluster, tested again on StackPointCloud cluster, still the same
|
could be related to operator update from 0.2.4->0.2.6->0.3.1(with the fix) ? |
This potential fix as shown here isn't backwards compatible. |
Change the method to be upgrade compatible. |
@rimusz |
sure, will give it a go tomorrow morning |
@xiang90 |
test/e2e/self_hosted_test.go
Outdated
t.Run("migrate boot member to self hosted cluster", testCreateSelfHostedClusterWithBootMember) | ||
t.Run("create self hosted cluster from scratch", testCreateSelfHostedCluster) | ||
t.Run("migrate boot member to self hosted cluster", testCreateSelfHostedClusterWithBootMember) | ||
t.Run("backup for self hosted cluster", func(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make this a separate test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to clean up after all self hosted test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason we cannot separate it out now? i do not want to make it worse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a cleanup function at the end of the outer layer test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just duplicate it for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cleanup function is not simple.
It sets up pods on each node, runs commands, waits them to complete. It's not worth to do it again just for code structure.
ok, tested it with different backup/restore scenarios for in-cluster etcd cluster, everything works just fine. also noticed if I update etcd-operator some logs:
last remaining etcd pod logs:
|
@rimusz For bootkube, if you deleted or killed majority nodes (or etcd pods), you have to follow bootkube's instructions to recover a downed cluster: https://github.com/kubernetes-incubator/bootkube/#recover-a-downed-cluster . If you are interested, please ask in bootkube repo. |
@hongchaodeng I did not kill nodes, I killed etcd pods for testing. |
Yeah. We usually refer to nodes because we spread etcd pods across nodes. |
lgtm |
cool, let's get it merged then :) |
Currently Backup Sidecar is not able to talk to etcd pods because it is using the FQDN which isn't supported in self hosted yet. For self-hosted case, we are going to use PodIP.
Currently Backup Sidecar is not able to talk to etcd pods because
it is using the FQDN which isn't supported in self hosted yet.
For self-hosted case, we are going to use PodIP.
fix #1107