-
Notifications
You must be signed in to change notification settings - Fork 716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression - etcd datadir permissions not set on etcd grow #2256
Comments
hm, isn't this what kubeadm is doing by default? |
do you mean this line? it is still here https://github.com/kubernetes/kubernetes/blob/master/cmd/kubeadm/app/cmd/phases/init/etcd.go#L90 here is the version mapping between kubeadm(k8s) version and etcd: so i guess this will happen if one tries to use kubeadm <1.14 with etcd server > 3.4.10? |
There were two mkdirs - the one at bootstrap time (which is still there) and a second one which ran when additional nodes were run. The referenced cleanup SR removed the second one. Which makes sense in the context of the observed behavior; additional control-plane nodes have the /var/lib/etcd directory created with mode 0755 (the kubernetes default), and looking at some older clusters I have around, have been doing that for quite some time. Assuming I identified the right PR, it's been happening since 1.14, and still happening on the 1.18 cluster I stood up this morning to test the new etcd. It would've also been happening before
|
ok, so originally a similar fix was added here in the function that creates the static pods for 1.14-pre: later that code was moved (by ereslibre): and then this refactor that you linked indeed omitted it. so yes, we should not let the kubelet create the path with 755, and include the following:
at this line: thanks for reporting it. |
/help i don't see this as a critical bug for 1.19 during code freeze. given there is a chance an older version of kubeadm could be used to run etcd 3.4.10 we might as well backport the change to the support skew (1.19, 1.18, 1.17) after 1.19 releases. |
@neolit123: Please ensure the request meets the requirements listed here. If this request no longer meets these requirements, the label can be removed In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
one problem though is if the user is upgrading an existing cluster that has the 755 data dir. EDIT: ... but i guess this also has to be done on upgrade. removing "help" as this requires more changes. |
I also just run into this problem trying to upgrade an existing etcd cluster from 3.4.9 to 3.4.11. The etcd node refuses to start with:
|
@frittenlab hi, what k8s/kubeadm version are you using? looks like we might be picking the latest etcd in 1.19, which means we must patch kubeadm too: also:
/milestone v1.19 |
@neolit123 We are currently using kubeadm / k8s 1.18.8. We use an external etcd cluster which we provisioned ourselves on dedicated localstorage vm's. We added the etcd cluster to kubeadm via the kubeadm-config configmap. I was able to upgrade the etcd cluster after changing the directory permissions of
to 700 |
@frittenlab |
PR for master is here: |
@neolit123 We did follow this guide. We created the cluster over 2 years ago. I can't recall having to change the mode of the data dir then. I added the option to out ansible role for the etcd cluster. And everything works fine now. This was not necessary with previous etc versions. |
ok, in the above PR i've added a chmod 700 in kubeadm init even if the directory already exists, just in case. |
Thanks that is a good idea :) |
that's only updating the client library, not the server version |
ok, i still think that including the fix in 1.19 is a good idea to avoid the case where 1.19 kubeadm is used to deploy the newer (problematic) etcd. |
agree |
actually, I think @jingyih is working on resolving the regression in etcd ... not sure if kubeadm should work around it or wait for an etcd fix. I think kubernetes manifests will stay on 3.4.9 until this is resolved |
@jingyih in which etcd version are you planning to resolve this? if a fix in etcd is coming, we can just add a note about this in the kubeadm troubleshooting guide and move the pending PR to 1.20. |
The etcd behavior simply illuminated a security regression in kubeadm, which is why #1308 was referenced in the report. There's no fix to etcd which undoes that regression (unless etcd changes the datadir permissions itself, which is even more sketchy than kubeadm updating the old incorrect perms). Admittedly, world-readable perms at the top level of the datadir isn't a huge deal in practice, as etcd creates the member subdir mode 0700 and thus there shouldn't be any information disclosure risk. But leaving the current behavior is going to cause people following the CIS kubernetes benchmark and similar hardening guides with a bit of heartburn when they have to fix it on about every control plane node. Never mind how many people are doing so, given this has gone unnoticed for years. 🤦
FWIW, staying on etcd 3.4.9 means retaining a security hole fixed in 3.4.10 - CVE-2020-15106. That's why I was updating an etcd package to begin with. :) Also a difficult-to-exploit security hole, but a published CVE none the less. |
yes, that is why i think kubernetes/kubernetes#94102 is still viable. but while enforcing 700 on existing folders makes sense, it also overrides customized permissions such as 0770.
with the member sub-directory being protected with 0700, it is indeed not that much of security issue.
we rarely update the default etcd server version (constant) in prior kubeadm releases but it has happened before. kubeadm makes it possible to deploy a custom etcd image/version, but in later kubeadm versions this comes with the trade off that etcd upgrades will be skipped when doing "kubeadm upgrade". |
Yeah, I'd either leave that out, or check for root/root:0755 and only change in that case. Surely no one's consciously choosing that. :) |
I discussed with @spzala about this. We are thinking about providing a warning message instead of enforcing the file permission: |
updated the PR to only create the directory if it does not exist on init/join-control-plane, but not chmod it. /remove-priority important-soon |
kubernetes/kubernetes#94102 merged closing, thanks. |
Is this a BUG REPORT or FEATURE REQUEST?
BUG REPORT
Versions
kubeadm version (use
kubeadm version
): 1.14+Environment:
N/A
What happened?
With the release of etcd 3.4.10, the datadir permissions now need to be 0700 or etcd won't start. There was an issue (#1308) where perms were set on join before starting the etcd container as a security control, overriding the default behavior of creating a non-existant directory mode 0755. However, in a cleanup, that necessary os.mkdirall was removed. This was transparently ignored for several releases since etcd didn't complain, but with etcd-io/etcd#11798 (in 3.4.10), the new etcd cluster on the second node does not start.
I'm pretty sure this will break anyone on k8s 1.14 or newer who upgrades to etcd 3.4.10 or newer without first fixing the /var/lib/etcd perms.
What you expected to happen?
/var/lib/etcd (or whatever the var is set to) should be set to 0700. :)
How to reproduce it (as minimally and precisely as possible)?
Join a second master node, then
ls -ld /var/lib/etcd
on the node. With an etcd 3.4.10 or newer runtimeAnything else we need to know?
It's worth explicitly noting that the first control plane node added works fine. It's just the second and subsequent nodes which were handled in a separate location in the code which exhibit the problem.
The text was updated successfully, but these errors were encountered: