-
Notifications
You must be signed in to change notification settings - Fork 558
feat(msi): enable managed service identity for Kubernetes #479
Conversation
@colemickens, |
@acs-bot test this please |
One more caveat: I've not been able to test the Windows side of things yet because of the other blocking issues affecting Kubernetes 1.6 + Windows (+kubenet, it looks like...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few comments.
"orchestratorProfile": { | ||
"orchestratorType": "Kubernetes", | ||
"kubernetesConfig": { | ||
"useManagedIdentity": true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This really makes me feel better.
hack/rbacgenerator/function.js
Outdated
@@ -0,0 +1,18 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure whether we want to check in the hack/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so. Better than it only existing in a random Azure Functions deployment. It doesn't have to be permanent. Once ARM team implements uniqueGUID
all of this can go away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if its a hack/temp thing, it better to not name the folder as hack. Adding a doc file to explain the choice might be a better option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, maybe, sure. I'm open to suggestions. It's not totally unprecedented, kubernetes has a hack/ dir for temp/dev type things.
hack/rbacgenerator/raw.js
Outdated
@@ -0,0 +1,11 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raw.json ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea why it's called that. It's package.json
in reality. Will rename. Thanks.
volumes: | ||
- name: "etc-kubernetes" | ||
hostPath: | ||
path: "/etc/kubernetes" | ||
- name: "var-lib-kubelet" | ||
hostPath: | ||
path: "/var/lib/kubelet" | ||
- name: msi | ||
hostPath: | ||
path: "/var/lib/waagent/ManagedIdentity-Settings" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need indent here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. What's weird is:
- I don't know what apiserver uses cloudprovider for...
- I'm surprised everything worked... I would've expected a problem if it tried to actually get a token for something...
Will fix. Looking into why this didn't surface any sort of failure to me...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- It's used for SSH key propagation (it's a GKE thing I think). apiserver also parses the config file and passes it to admission controllers so they can use it. (Not sure of any that do).
- We don't use SSH key stuff and we don't use any admission controllers that need to talk to ARM so it didn't hit on this.
Still, will fix, of course.
"orchestratorProfile": { | ||
"orchestratorType": "Kubernetes", | ||
"kubernetesConfig": { | ||
"useManagedIdentity": true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you anticipate this to remain a true/false type setting or would there be other kinds of identities that might be available to users in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very interesting question. In the future, there will be explicit identity support, meaning that users can pre-provision AAD SPs as real resources (ie /Microsoft.Identity/explicitIdentities/k8s-master0
).
I think that can be an additive change, so it might look like this in the future:
... {
"orchestratorType": "Kubernetes",
"kubernetesConfig": {
"useManagedIdentity": true,
"masterIdentity": "/.../Microsoft.Identity/master-sp",
"nodeIdentity": "/.../Microsoft.Identity/node-sp",
}
} ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I was thinking something along the lines of the following to make it extensible. This is because true/false type fields become outdated fairly quickly.
"kubernetesConfig": {
"identityType": "Managed"
}
OR
"kubernetesConfig": {
"identity": {
"type" : "Managed",
"masterIdentity": "/.../Microsoft.Identity/master-sp",
"nodeIdentity": "/.../Microsoft.Identity/node-sp",
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me think on it and chat with you more about it. I'd be fine with either of those approaches and don't have any strong opinions here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@colemickens @amanohar how do we want to proceed on this PR ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shrutir25 I think it's going to stay here a while longer. There's no need to rush it and I have a better solution than the hack/
stuff with the Azure function coming in ~3 weeks via ARM templates.
Based on my experience using MSI, I think we need to also automate adding MSI to the KeyVault. With cross resource group template deployment, we should include KV access policy in data model. |
Sorry @weinong I don't quite follow. This is for the cloudprovider integration in Kubernetes, not for getting assets from MSI necessarily (although it could do both). This change doesn't require that the identity be shared or any such thing (I don't think we can handle those more advanced scenarios well until explicit identities come in). It would be cool to start having ACS-Engine stash the PKI assets, etc into KeyVault and retrieving them, but that's not in scope for this PR. This just does the SP + RBACing so that K8s can call ARM APIs. |
4a8a2d7
to
c52e49e
Compare
73b2d6f
to
ffd3f9f
Compare
This is ready to review and merge hopefully. PR text has also been re-written to be more up-to-date and accurate. cc: @dmitsh for reviewing the test config changes I've tested this manually. I have a Jenkins job that has been running it. From a cluster deployed via Jenkins, we see
LOOK MA! No scary credentials! And in
And in
|
There is one small quirk that make this less than 100% ideal and would keep us from using MSI immediately:
That having been said, I'd strongly prefer if we can review and merge this. I've spent a pretty serious amount of time rebasing this over the past 6.5 months. It'd be great to get it in, get a soak test going, etc. |
Note: The PR is only failing because it runs in a different sub with a different SP than I was testing with. And I don't have access to the subscription, so I can't do anything about this. Hopefully @dmitsh or someone else with access to that subscription can either add me or the relevant SP Application ID to unblock this. In the meantime, here's a build with the same git hash that is live now ( |
CLUSTER_SERVICE_PRINCIPAL_CLIENT_ID=msi | ||
CLUSTER_SERVICE_PRINCIPAL_CLIENT_SECRET=msi | ||
CUSTOM_HYPERKUBE_SPEC=docker.io/colemickens/hyperkube-amd64:531664519485fdbb543e1e83c90bac6fef829d73 | ||
EXPECTED_NODE_COUNT=5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add a newline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, I'll add a couple
}, | ||
"servicePrincipalProfile": { | ||
"servicePrincipalClientID": "", | ||
"servicePrincipalClientSecret": "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this could be removed IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And also, what this file is used for? The hyperkube here is up to date?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an example apimodel that activates MSI. I don't want to remove the SPP yet. It will require refactoring the validation code to be aware of what mode it's running in, plus this section will likely still be used for either ACR scenario or for AAD-kubectl scenarios.
@acs-bot test this please |
For scaling down, do we need to remove principalIDs from role assignment? Reviewed 1 of 32 files at r2, 16 of 51 files at r3, 4 of 4 files at r4, 4 of 4 files at r5, 2 of 2 files at r6. examples/managed-identity/kubernetes-msi.json, line 46 at r6 (raw file):
do we still need spn secret with msi enabled? parts/kubernetesagentresourcesvmas.t, line 133 at r6 (raw file):
should we enable msi on agent pool separately? Comments from Reviewable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Wil let another dev approve.
"name": "vmLoopNode" | ||
}, | ||
"name": "[guid(concat('Microsoft.Compute/virtualMachines/', variables('{{.Name}}VMNamePrefix'), copyIndex(), 'vmidentity'))]", | ||
"type": "Microsoft.Authorization/roleAssignments", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably stupid question, excuse me in advance. Is it at all possible to extract this into another file? Looking for opportunities to make all the parts (literally and figuratively) a bit more visible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dunno why this showed in Reviewable for me to comment on but didn't post back here. Anyway, what I wrote:
I'm not sure actually. I don't know if go tmpl has a generic include function or if we use it anywhere. I'd selfishly say I'd prefer to punt on it given my lack of immediate knowledge.
thanks for clarifying offlien. LGTM |
Review status: 23 of 24 files reviewed at latest revision, 9 unresolved discussions, some commit checks failed. examples/managed-identity/kubernetes-msi.json, line 46 at r6 (raw file): Previously, weinong (Weinong Wang) wrote…
It becomes optional. For now, it happens to still be usable in that the ACR integration will keep working, even in some scenario where user activates MSI and still supplies creds. I've chosen not to omit it because of the refactoring changes required that I mentioned in reply to Jingtao. examples/managed-identity/kubernetes-msi.json, line 46 at r6 (raw file): Previously, colemickens (Cole Mickens) wrote…
Done. examples/managed-identity/kubernetes-msi.json.env, line 4 at r6 (raw file): Previously, colemickens (Cole Mickens) wrote…
Done. parts/kubernetesagentresourcesvmas.t, line 133 at r6 (raw file): Previously, weinong (Weinong Wang) wrote…
IMO, no. The ServicePrincipalProfile applies uniformly today, this seems consistent. parts/kubernetesagentresourcesvmas.t, line 205 at r6 (raw file): Previously, seanknox (Sean Knox) wrote…
I'm not sure actually. I don't know if go tmpl has a generic include function or if we use it anywhere. I'd selfishly say I'd prefer to punt on it given my lack of immediate knowledge. parts/kubernetesmaster-kube-apiserver.yaml, line 51 at r2 (raw file): Previously, colemickens (Cole Mickens) wrote…
Done. examples/managed-identity/kubernetes.json, line 7 at r2 (raw file): Previously, colemickens (Cole Mickens) wrote…
Done. hack/rbacgenerator/function.js, line 1 at r2 (raw file): Previously, colemickens (Cole Mickens) wrote…
This no longer applies since I've removed hack/ and everything in lieu of guid(). hack/rbacgenerator/raw.js, line 1 at r2 (raw file): Previously, colemickens (Cole Mickens) wrote…
No longer applies. It's all been removed. Comments from Reviewable |
Review status: 14 of 24 files reviewed at latest revision, 9 unresolved discussions. examples/managed-identity/kubernetes-msi.json, line 46 at r6 (raw file): Previously, colemickens (Cole Mickens) wrote…
Done. examples/managed-identity/kubernetes-msi.json, line 46 at r6 (raw file): Previously, colemickens (Cole Mickens) wrote…
Done. examples/managed-identity/kubernetes-msi.json.env, line 4 at r6 (raw file): Previously, colemickens (Cole Mickens) wrote…
Done. parts/kubernetesagentresourcesvmas.t, line 133 at r6 (raw file): Previously, colemickens (Cole Mickens) wrote…
Done. parts/kubernetesagentresourcesvmas.t, line 205 at r6 (raw file): Previously, colemickens (Cole Mickens) wrote…
Done. parts/kubernetesmaster-kube-apiserver.yaml, line 51 at r2 (raw file): Previously, colemickens (Cole Mickens) wrote…
Done. examples/managed-identity/kubernetes.json, line 7 at r2 (raw file): Previously, colemickens (Cole Mickens) wrote…
Done. hack/rbacgenerator/function.js, line 1 at r2 (raw file): Previously, colemickens (Cole Mickens) wrote…
Done. hack/rbacgenerator/raw.js, line 1 at r2 (raw file): Previously, colemickens (Cole Mickens) wrote…
Done. Comments from Reviewable |
Any documentation for that? Also, is there support for containers authorization? e.g. granting access only to specific container to a given KeyVault? |
This PR enables the use of "Managed Identity" with Kubernetes. This removes the need for user supplied credentials for Kubernetes to be able to leverage the cloudprovider integration with Azure ARM APIs.
Specifically, this PR:
kubernetesConfig -> useManagedIdentity
useManagedIdentity==true
)It works for all configurations, including hybrid clusters with Windows nodes.
Bonuses:
Here's the caveats:
MSI is not GA'd (though it is globally deployed)
The deployer must be "Owner", not just "Contributor".
This requires a custom build of Kubernetes. Though the PR is out for 1.8: azure: msi: add managed identity field, logic kubernetes/kubernetes#48854 (The custom build of Kubernetes comes from this branch: https://github.com/colemickens/kubernetes/tree/msi)
Testing is most easily done by looking at the example/test/jenkins related commits and invoking the job via Jenkins, or locally.
This change is