Add PersistVolumne Support #1861

rjtsdl · 2018-01-17T00:31:44Z

Successor of #1695

Copy the plan:

add PVC for etcd pod (Add PersistVolumne Support #1861)
handle node restart case
Behavior: first, it will has the eviction behavior. After that, when the nodes back. The Unknown pod(s) would disappear in the pod list.
handle node partition case
handle node eviction case
Behavior: the pod(s) scheduled on the evicted node will become Unknown after a certain period. Then another pod(s) will be scheduled on the healthy nodes to reach the desired replica count. The Unknown will not disappear though.

etcd-bot · 2018-01-17T00:32:01Z

Can one of the admins verify this patch?

etcd-bot · 2018-01-17T00:32:05Z

Can one of the admins verify this patch?

etcd-bot · 2018-01-17T00:32:10Z

Can one of the admins verify this patch?

xiang90 · 2018-01-18T18:55:14Z

/cc @msohn @georgekuruvillak

Can you guys also help on this PR.

As a side note, I noticed the SAP fork of etcd operator. Now etcd operator can already work with swift through its s3 compatible API. We want to get PV in.

We want to put some love in the project to move everyone back to upstream.

hongchaodeng · 2018-01-18T18:56:32Z

@etcd-bot ok to test

rjtsdl · 2018-01-18T18:58:52Z

@hongchaodeng , can you explain a bit, what is the node partition case?
For other cases, i have tested, and documented the behavior in the description. Check if it makes sense.

brendandburns · 2018-01-18T19:12:19Z

LGTM.

xiang90 · 2018-01-18T19:30:37Z

@rjtsdl

i have not really looked into the code. but want to mention that if PV is enabled, etcd operator should

always use write once mode to avoid two pods use the same etcd metadata. this can happen if k8s thinks a node is dead (however it is not) and creates a pod on some other node.
disable disaster detection. even if there is 1 out of 3 pods left there (or even 0/3 pods), the data is on PV. we can just wait for the other two member comes back with data.

hongchaodeng · 2018-01-18T19:46:46Z

pkg/apis/etcd/v1beta2/cluster.go

+
+	// By default, kubernetes will mount a service account token into the etcd pods.
+	// AutomountServiceAccountToken indicates whether pods running with the service account should have an API token automatically mounted.
+	AutomountServiceAccountToken *bool `json:"automountServiceAccountToken,omitempty"`


Can you remove the changes related to AutomountServiceAccountToken?
This PR should be about PersistentVolumeClaimSpec.

hongchaodeng · 2018-01-18T19:47:22Z

pkg/controller/restore-operator/sync.go

@@ -219,6 +219,7 @@ func (r *Restore) createSeedMember(ec *api.EtcdCluster, svcAddr, clusterName str
 	}
 	ms := etcdutil.NewMemberSet(m)
 	backupURL := backupapi.BackupURLForRestore("http", svcAddr, clusterName)
+


remove blank line

hongchaodeng · 2018-01-18T19:48:57Z

pkg/util/k8sutil/k8sutil.go

@@ -264,10 +295,12 @@ func newEtcdPod(m *etcdutil.Member, initialCluster []string, clusterName, state,
 		livenessProbe,
 		readinessProbe)

-	volumes := []v1.Volume{
-		{Name: "etcd-data", VolumeSource: v1.VolumeSource{EmptyDir: &v1.EmptyDirVolumeSource{}}},
+	if cs.Pod != nil {


This changes aren't needed?

It is taken care by function AddEtcdVolumeToPod in the same file.

Sorry. I don't understand your comment?
Why add this change here?

Here is

func AddEtcdVolumeToPod(pod *v1.Pod, pvc *v1.PersistentVolumeClaim) { vol := v1.Volume{Name: etcdVolumeName} if pvc != nil { vol.VolumeSource = v1.VolumeSource{ PersistentVolumeClaim: &v1.PersistentVolumeClaimVolumeSource{ClaimName: pvc.Name}, } } else { vol.VolumeSource = v1.VolumeSource{EmptyDir: &v1.EmptyDirVolumeSource{}} } pod.Spec.Volumes = append(pod.Spec.Volumes, vol) }

If it is not PV, it will just append the EmptyDir.

I still don't understand it.
Why is this related to PV or emptyDir? This is what I saw on this PR on github:

if cs.Pod != nil { container = containerWithRequirements(container, cs.Pod.Resources) }

It only deals with resources.

Ah, i thought you are asking line 267 & 268.

This line I don't think it has anything to do with this PR scope. It was there mostly because of my cherry pick from the old PR.

I do think it is a nice thing to have. But not belong to this PR's scope. I can remove it though

Please remove this.
It's duplicate. Resource requirements/pod policy are applied outside this func:

etcd-operator/pkg/util/k8sutil/k8sutil.go

Line 338 in 7880638

applyPodPolicy(clusterName, pod, cs.Pod)

xiang90 · 2018-01-18T19:59:25Z

@rjtsdl

it seems that this PR only adds the PV as the non-stable storage (same as empty dir) for etcd pod, not use PV as a stable storage?

if we consider PV as stable storage, we should not add/remove members if the size field is not changed. if one pod dies, we should create another pod with the same FQDN + PV. no member changes need to be involved. when size is changed, we need to add/remove member as before.

rjtsdl · 2018-01-18T20:01:48Z

@xiang90 you are right. It is treated as non-stable storage. We should make it as a stable storage.

xiang90 · 2018-01-18T20:03:53Z

@rjtsdl

i am fine marking this as alpha which uses PV as temp storage. then we can iterate on it.

hongchaodeng · 2018-01-18T20:07:31Z

@rjtsdl
Let me clarify the four points in the plan:

1. Add PVC for etcd pod. This is basically what is achieved in this PR.
1. When node restart happens, current etcd pods would fail and won't come up anymore. This PR DOES NOT handle it. Besides changing the restart policy to "Always", following cases need be addressed:
- 2.1. kubelet couldn't report to master and node would become "unready". etcd-operator would need to tolerate this case for some period of time. If node restarts and then etcd pods restarts and everything turns to healthy state, then etcd-operator wouldn't do anything. Otherwise, etcd-operator needs to take the nodes and thus etcd pods on it as unhealthy and do healing.
- 2.2. Precious case assumes etcd pods still exist. If not, e.g. evicted, we will handle it like case 4.
1. If node partition happens, it could encounter case 2.1 that node would become "unready" and case 2.2 that etcd pods get deleted. The solution applies the same. But we need to think about and test this case.
1. Node eviction happens, and etcd pods get deleted. Let's assume PV are still there. In this case, etcd-operator should have the knowledge of current membership and reschedule the "failed" members onto healthy nodes, and mount its corresponding PV back, and then everything should be fine because network and storage are still the same.

hongchaodeng · 2018-01-18T20:08:47Z

Gopkg.lock

@@ -328,6 +328,6 @@
 [solve-meta]
  analyzer-name = "dep"
  analyzer-version = 1
-  inputs-digest = "ff4a4b53ef454edf0ae2ec3bc497baf74d561af328faa123eb472d87954a906c"
+  inputs-digest = "b19769641bf117041022e3de0c490313047f3b7aaef54e00a7354573fff9b738"


No need to change this?

It is updated automatically after I ran update_vendor.sh. Do I need to revert?

xiang90 · 2018-01-18T20:11:06Z

When node restart happens, current etcd pods would fail and won't come up anymore.

it is unfair to say this PR does not handle it. this PR adds PV as non-stable storage.

if the node goes down, the operator will still remove the member and add a new one as the empty-dir case, no?

hongchaodeng · 2018-01-18T20:25:35Z

I'm fine to use it to replace ephemeral storage initially.

xiang90 · 2018-01-18T20:27:47Z

I'm fine to use it to replace ephemeral storage initially.

just to be clear. not to replace current ephemeral storage, but another ephemeral storage options besides current empty-dir, right? eventually we should make PV a stable storage option.

hongchaodeng · 2018-01-18T20:29:00Z

@xiang90
Yes. That's the plan in #1323 (comment) which is explained above. Those items are expected to be solved one by one.

rjtsdl · 2018-01-18T20:36:29Z

That would be awesome. I will address comments, then we can merge it as initial step.

this patch implements part 1 of the etcd data on persistent volumes design. when pod pvsource is defined in the spec it'll create a PVC for every etcd member and use it as the volume for etcd data. pvc without a member will be removed during the reconcile.

xiang90 · 2018-01-19T02:11:26Z

pkg/cluster/cluster.go

@@ -375,9 +375,26 @@ func (c *Cluster) setupServices() error {
 	return k8sutil.CreatePeerService(c.config.KubeCli, c.cluster.Name, c.cluster.Namespace, c.cluster.AsOwner())
 }

+func (c *Cluster) IsPodPVEnabled() bool {


please add doc string on public method.

xiang90 · 2018-01-19T02:11:37Z

pkg/util/k8sutil/k8sutil.go

@@ -78,6 +78,10 @@ func GetPodNames(pods []*v1.Pod) []string {
 	return res
 }

+func PVCNameFromMemberName(memberName string) string {


please add doc string on public method.

xiang90 · 2018-01-19T02:11:43Z

pkg/util/k8sutil/k8sutil.go

@@ -209,6 +213,18 @@ func newEtcdServiceManifest(svcName, clusterName, clusterIP string, ports []v1.S
 	return svc
 }

+func AddEtcdVolumeToPod(pod *v1.Pod, pvc *v1.PersistentVolumeClaim) {


please add doc string on public method.

xiang90 · 2018-01-19T02:11:49Z

pkg/util/k8sutil/k8sutil.go

@@ -231,6 +249,19 @@ func NewSeedMemberPod(clusterName string, ms etcdutil.MemberSet, m *etcdutil.Mem
 	return pod
 }

+func NewEtcdPodPVC(m *etcdutil.Member, pvcSpec v1.PersistentVolumeClaimSpec, clusterName, namespace string, owner metav1.OwnerReference) *v1.PersistentVolumeClaim {


please add doc string on public method.

xiang90 · 2018-01-19T02:12:13Z

pkg/apis/etcd/v1beta2/cluster.go

@@ -138,6 +138,9 @@ type PodPolicy struct {
 	// bootstrap the cluster (for example `--initial-cluster` flag).
 	// This field cannot be updated.
 	EtcdEnv []v1.EnvVar `json:"etcdEnv,omitempty"`
+
+	// PersistentVolumeClaimSpec ...


complete the comment?

xiang90 · 2018-01-19T23:47:10Z

lgtm defer to @hongchaodeng

xiang90 · 2018-01-19T23:47:42Z

pkg/apis/etcd/v1beta2/cluster.go

+
+	// PersistentVolumeClaimSpec is the spec to describe PVC for the etcd container
+	// This field is optional. If no PVC spec, etcd container will use emptyDir as volume
+	PersistentVolumeClaimSpec *v1.PersistentVolumeClaimSpec `json:"persistentVolumeClaimSpec,omitempty"`


Please mention that this field is alpha, and not use PV as stable storage as it should yet.

hongchaodeng · 2018-01-19T23:50:11Z

pkg/util/k8sutil/k8sutil.go

@@ -78,6 +78,11 @@ func GetPodNames(pods []*v1.Pod) []string {
 	return res
 }

+// PVCNameFromMemberName the way we get PVC name from the member name
+func PVCNameFromMemberName(memberName string) string {


PVCNameFromMemberName -> PVCNameFromMember

hongchaodeng · 2018-01-19T23:53:20Z

pkg/util/k8sutil/k8sutil.go

@@ -264,10 +295,12 @@ func newEtcdPod(m *etcdutil.Member, initialCluster []string, clusterName, state,
 		livenessProbe,
 		readinessProbe)

-	volumes := []v1.Volume{
-		{Name: "etcd-data", VolumeSource: v1.VolumeSource{EmptyDir: &v1.EmptyDirVolumeSource{}}},
+	if cs.Pod != nil {


I still don't understand it.
Why is this related to PV or emptyDir? This is what I saw on this PR on github:

if cs.Pod != nil { container = containerWithRequirements(container, cs.Pod.Resources) }

It only deals with resources.

hongchaodeng · 2018-01-19T23:54:24Z

pkg/util/k8sutil/self_hosted.go

-			DNSPolicy:                    v1.DNSClusterFirstWithHostNet,
-			Hostname:                     m.Name,
-			Subdomain:                    clusterName,
-			AutomountServiceAccountToken: func(b bool) *bool { return &b }(false),


revert this. It is not related to this PR.

I need to delete AutomountServiceAccountToken . That is basically all the change here.

I need to delete AutomountServiceAccountToken

What's the reason for it?

Because we want to

Can you remove the changes related to AutomountServiceAccountToken?
This PR should be about PersistentVolumeClaimSpec.

The comment from your previous review.

Can you explain why AutomountServiceAccountToken is related to PV?

I just get it back.

I deleted it while I was trying to remove changes to AutomountServiceAccountToken. Apparently, it shouldn't be removed here.

hongchaodeng

lgtm after nit

hongchaodeng · 2018-01-20T03:55:20Z

pkg/util/k8sutil/k8sutil.go

@@ -264,10 +295,12 @@ func newEtcdPod(m *etcdutil.Member, initialCluster []string, clusterName, state,
 		livenessProbe,
 		readinessProbe)

-	volumes := []v1.Volume{
-		{Name: "etcd-data", VolumeSource: v1.VolumeSource{EmptyDir: &v1.EmptyDirVolumeSource{}}},
+	if cs.Pod != nil {


Please remove this.
It's duplicate. Resource requirements/pod policy are applied outside this func:

etcd-operator/pkg/util/k8sutil/k8sutil.go

Line 338 in 7880638

applyPodPolicy(clusterName, pod, cs.Pod)

rjtsdl · 2018-01-20T19:09:33Z

@hongchaodeng thx for explaining. Removed the line.

hongchaodeng · 2018-01-20T21:05:27Z

Thanks everyone!
Especially @sgotti for the initiate and @rjtsdl for the finish!

rjtsdl changed the title ~~[WIP] add PersistVolumne Support~~ Add PersistVolumne Support Jan 18, 2018

This was referenced Jan 18, 2018

*: PersistentVolume for etcd data #1695

Closed

*: implement PersistentVolume for etcd data design part 1 #1434

Closed

hongchaodeng mentioned this pull request Jan 18, 2018

Persistent/Durable etcd cluster #1323

Open

hongchaodeng reviewed Jan 18, 2018

View reviewed changes

sgotti and others added 5 commits January 18, 2018 13:45

*: PVC for etcd datadir

9f39fa1

update generated and dependency

3e42083

remove unnecessary changes

1b8aef4

revert Gopkg.lock digest change

2f8dc0d

rjtsdl force-pushed the jiren-pvsupport branch from f2852d0 to 2f8dc0d Compare January 19, 2018 00:43

xiang90 reviewed Jan 19, 2018

View reviewed changes

add comment docs and make funcs private if possible

296b27c

rjtsdl force-pushed the jiren-pvsupport branch from 34c3ac4 to 296b27c Compare January 19, 2018 18:40

xiang90 reviewed Jan 19, 2018

View reviewed changes

hongchaodeng reviewed Jan 19, 2018

View reviewed changes

Jingtao Ren added 2 commits January 19, 2018 16:06

address comments

a5a0cb2

bring AutomountServiceAccountToken back in self_hosted.go

458bc6a

rjtsdl force-pushed the jiren-pvsupport branch from 7801efc to 458bc6a Compare January 20, 2018 00:53

hongchaodeng reviewed Jan 20, 2018

View reviewed changes

remove adding resources to container, it taken care by applyPodPolicy

910c502

hongchaodeng merged commit ffbd359 into coreos:master Jan 20, 2018

shrutir25 mentioned this pull request Feb 7, 2018

Backport PVC support to v0.5.2 Azure/etcd-operator#17

Merged

Add PersistVolumne Support #1861

Add PersistVolumne Support #1861

Conversation

rjtsdl commented Jan 17, 2018 • edited Loading

etcd-bot commented Jan 17, 2018

etcd-bot commented Jan 17, 2018

etcd-bot commented Jan 17, 2018

xiang90 commented Jan 18, 2018

hongchaodeng commented Jan 18, 2018

rjtsdl commented Jan 18, 2018

brendandburns commented Jan 18, 2018

xiang90 commented Jan 18, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rjtsdl Jan 20, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xiang90 commented Jan 18, 2018

rjtsdl commented Jan 18, 2018

xiang90 commented Jan 18, 2018

hongchaodeng commented Jan 18, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xiang90 commented Jan 18, 2018

hongchaodeng commented Jan 18, 2018

xiang90 commented Jan 18, 2018

hongchaodeng commented Jan 18, 2018

rjtsdl commented Jan 18, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xiang90 commented Jan 19, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hongchaodeng left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rjtsdl commented Jan 20, 2018

hongchaodeng commented Jan 20, 2018

rjtsdl commented Jan 17, 2018 •

edited

Loading

rjtsdl Jan 20, 2018 •

edited

Loading

hongchaodeng commented Jan 18, 2018 •

edited

Loading