Skip to content
This repository has been archived by the owner on Mar 28, 2020. It is now read-only.

How can I specify volumes spec? #957

Closed
junghoahnsc opened this issue Apr 11, 2017 · 25 comments
Closed

How can I specify volumes spec? #957

junghoahnsc opened this issue Apr 11, 2017 · 25 comments

Comments

@junghoahnsc
Copy link

Hello,

I'm trying to deploy etcd cluster in GKE. I created a node pool with SSD and I think
I need to specify volumes spec to use it like

volumes:
   - name: etcd-data
      hostPath:
          path: /mnt/disks/ssd0

But I couldn't find any way to specify this from examples. How can I do that?

Thanks,

@hongchaodeng
Copy link
Member

Duplicate to #873 .

We can provide such options in PodPolicy.

@hongchaodeng
Copy link
Member

hongchaodeng commented Apr 12, 2017

One problem we have for enabling hostPath is that it could reuse data that exists before.
For example, if you ran etcd-0 on node-0 using hostPath before, then the pod was deleted, then a new etcd-0 is run on node-0, the previous etcd-0's data still exists there.

@junghoahnsc
Copy link
Author

I see. Is there any workaround for now?

@hongchaodeng
Copy link
Member

hongchaodeng commented Apr 12, 2017

There isn't any current workaround.

But in the future I envision the solution would be that: like a human operator, etcd operator will mount the datadir to hostPath to endure restarts, and clean up/prepare datadir before starting etcd process, or clean up datadir after stopping etcd process.

@junghoahnsc
Copy link
Author

@hongchaodeng That would be great. Thanks!

For now, I'm trying to hack PodPolicy for this, but I failed to build it in glide: Failed to install: hg is not installed. Would it be better to open a new issue for that?

@hongchaodeng
Copy link
Member

That's not etcd operator issue. It's glide issue.

@hongchaodeng
Copy link
Member

A long term solution for this would be to use persistent local storage: kubernetes/community#306

@xiang90
Copy link
Collaborator

xiang90 commented Apr 13, 2017

hostpath can be a quick hack. however you have to per config the hostpath as far as i can tell (set permission for example).

we can use emptydir now since we never restart a failed etcd server for non-self hosted etcd anyway. in theory, tmpfs should work just fine...

i do not really know how to work around this unless kubernetes/community#306 lands. we also want to be able to specify a stable storage on a local node, then we can try to restart a failed pod on the same node to save replication cost.

@xiang90
Copy link
Collaborator

xiang90 commented Apr 13, 2017

/cc @junghoahnsc

@xiang90
Copy link
Collaborator

xiang90 commented Apr 13, 2017

hg is not installed.

you need to install Mercurial, which is similar to git. some deps are hg deps. that is not really an etcd operator issue. so do not create an issue for it.

@GrapeBaBa
Copy link

GrapeBaBa commented Apr 13, 2017

Forgive my naive question, why can't use previous data?Wouldn't it catch up with other peer after restarting? And what is the problem with restarting, I feel it need less effort for replication

@junghoahnsc
Copy link
Author

I could run a cluster with a hostpath hack for now, but I hope k8s supports this soon :)

@hongchaodeng
Copy link
Member

Forgive my naive question, why can't use previous data?Wouldn't it catch up with other peer after restarting? And what is the problem with restarting

@GrapeBaBa Yes and no.

Let's say I have etcd-0, etcd-1, etcd-2, and etcd-0 went down, then restarting etcd-0 and reusing previous data should be fine. However, this is making assumptions on specific nodes, specific failure scenarios. First, how did we know the node for etcd-0 was still there. Second, how did we know etcd-0 was removed? Third, how did we know it is not created by another member like etcd-3? etc.

We want generic abstractions over hostnames and storage. The hostPath is just a hack to use specific mount volume at the moment. It's not a good abstraction for cluster level storage management. This is a feature that we should try to work with k8s upstream to get a better volume support for new disk (e.g SSD) mount partitions.

@hongchaodeng
Copy link
Member

@junghoahnsc
Please help provide your feedback to kubernetes/community#306 .
Thanks!

@GrapeBaBa
Copy link

@hongchaodeng Besides hostpath, why operator use restart never strategy when using emptydir?

@xiang90
Copy link
Collaborator

xiang90 commented Apr 13, 2017

@GrapeBaBa

when an etcd member crashes and becomes unresponsive, we will simply delete the pod instead of relying on its own to restart on the same node.

this is because the limited restart policy k8s provides. there is no way to tell a pod that it should not restart on failure type X, but restart on failure type Y. You have to restart in all cases or none. In quite a few cases, like if there is a disk corruption or raft panics, the member should not restart or it will run into restarting loop. there is simply no way to specify that.

never restart is just a current compromise, which happen to enable you to use emptydir with tmpfs to achieve max throughput... (/cc @junghoahnsc).

we might revisit this if kubelet can provide better restart policy (or we do better liveness checking). or when local pv lands, never restarting a pod wont be a problem at all since if we restart the pod on the same node at a higher priority, the data stickiness is still there anyway.

@GrapeBaBa
Copy link

GrapeBaBa commented Apr 14, 2017

@xiang90 @hongchaodeng Thanks.

@junghoahnsc
Copy link
Author

My naive hack with using HostPath doesn't seem to work when I try to upgrade.
I tried to rm -rf < etcdVolumeMountDir > before starting etcd for clean up.
But somehow, when I upgraded by using curl, all etcd pods died.

@xiang90
Copy link
Collaborator

xiang90 commented Apr 23, 2017

@junghoahnsc If you can afford the memory usage, probably a better approach for you is to use in memory empty dir for now to max your performance.

@junghoahnsc
Copy link
Author

@xiang90 thanks for suggestion, but I don't think we can provide enough memory to hold all data.
By changing my hack a bit more (create a unique directory per pod and delete all others), I managed to work with upgrade and backup/restore.

BTW, I have two questions on backup.

  1. After a restore happens (by deleting nodes for testing), whenever I try to upgrade, all pods are died and then restored. But when there was no restore before, upgrade just restarts pods.
    I tested the v0.2.5 (without hack), but it showed the same behavior. Is this expected?

  2. A backup sidecar pod seems to use the NodeSelector of cluster's pod. Is there any special reason for that? In my case, the backup sidecar pod doesn't need to have local ssd since it's using persistent volume. I think it would be better to give a flexibility for that.

@xiang90
Copy link
Collaborator

xiang90 commented Apr 26, 2017

After a restore happens (by deleting nodes for testing), whenever I try to upgrade, all pods are died and then restored. But when there was no restore before, upgrade just restarts pods.
I tested the v0.2.5 (without hack), but it showed the same behavior. Is this expected?

no, it is not expected. can you create a new issue for this with steps to reproduce?

A backup sidecar pod seems to use the NodeSelector of cluster's pod. Is there any special reason for that?

sidecar might need to save intermediate backup to disk before it can upload to S3. sidecar has no anti-affinity with etcd pods, so it can end up on the same machine with etcd pod. does this cause you any issue?

@junghoahnsc
Copy link
Author

I create a new issue.

For sidecar, I set anti-affinity with etcd pods. When I enabled backup, one pod was not scheduled.
I was using 6 node and set the number etcd pods to 6.

@xiang90
Copy link
Collaborator

xiang90 commented Apr 26, 2017

@junghoahnsc ok. i understand the issue now. can you also create a new issue to track the sidecar pod problem? we will resolve it.

@junghoahnsc
Copy link
Author

Sure, created.

@hongchaodeng
Copy link
Member

Closing this.
One question is converged into local PV issue: #1201
Another question is resolved in #1008

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants