Skip to content
This repository has been archived by the owner on Jan 11, 2023. It is now read-only.

feat(msi): enable managed service identity for Kubernetes #479

Merged
merged 11 commits into from
Jul 14, 2017

Conversation

colemickens
Copy link
Contributor

@colemickens colemickens commented Apr 9, 2017

This PR enables the use of "Managed Identity" with Kubernetes. This removes the need for user supplied credentials for Kubernetes to be able to leverage the cloudprovider integration with Azure ARM APIs.

Specifically, this PR:

  • adds support for kubernetesConfig -> useManagedIdentity
  • requests a managed implicit identity for the k8s CRP VMs
  • enables the MSI VM Extension (when useManagedIdentity==true)
  • make the custom script (final provisioner) depend on it

It works for all configurations, including hybrid clusters with Windows nodes.

Bonuses:

  • No user supplied creds!
  • Easier to assign more limited scopes to worker VMs. They only need Reader permissions.
  • No user supplied creds!
  • No left over Apps/SPs in the user's AAD tenant (there are probably hundreds or thousands in the MSIT tenant, I just deleted hundreds on my personal test subscription the other day)
  • No user supplied creds!

Here's the caveats:

  1. MSI is not GA'd (though it is globally deployed)

  2. The deployer must be "Owner", not just "Contributor".

  3. This requires a custom build of Kubernetes. Though the PR is out for 1.8: azure: msi: add managed identity field, logic kubernetes/kubernetes#48854 (The custom build of Kubernetes comes from this branch: https://github.com/colemickens/kubernetes/tree/msi)

Testing is most easily done by looking at the example/test/jenkins related commits and invoking the job via Jenkins, or locally.


This change is Reviewable

@msftclas
Copy link

msftclas commented Apr 9, 2017

@colemickens,
Thanks for your contribution as a Microsoft full-time employee or intern. You do not need to sign a CLA.
Thanks,
Microsoft Pull Request Bot

@colemickens
Copy link
Contributor Author

@acs-bot test this please

@colemickens
Copy link
Contributor Author

One more caveat: I've not been able to test the Windows side of things yet because of the other blocking issues affecting Kubernetes 1.6 + Windows (+kubenet, it looks like...)

Copy link
Contributor

@rjtsdl rjtsdl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few comments.

"orchestratorProfile": {
"orchestratorType": "Kubernetes",
"kubernetesConfig": {
"useManagedIdentity": true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really makes me feel better.

@@ -0,0 +1,18 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure whether we want to check in the hack/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so. Better than it only existing in a random Azure Functions deployment. It doesn't have to be permanent. Once ARM team implements uniqueGUID all of this can go away.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if its a hack/temp thing, it better to not name the folder as hack. Adding a doc file to explain the choice might be a better option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, maybe, sure. I'm open to suggestions. It's not totally unprecedented, kubernetes has a hack/ dir for temp/dev type things.

@@ -0,0 +1,11 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raw.json ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea why it's called that. It's package.json in reality. Will rename. Thanks.

volumes:
- name: "etc-kubernetes"
hostPath:
path: "/etc/kubernetes"
- name: "var-lib-kubelet"
hostPath:
path: "/var/lib/kubelet"
- name: msi
hostPath:
path: "/var/lib/waagent/ManagedIdentity-Settings"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need indent here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. What's weird is:

  1. I don't know what apiserver uses cloudprovider for...
  2. I'm surprised everything worked... I would've expected a problem if it tried to actually get a token for something...

Will fix. Looking into why this didn't surface any sort of failure to me...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. It's used for SSH key propagation (it's a GKE thing I think). apiserver also parses the config file and passes it to admission controllers so they can use it. (Not sure of any that do).
  2. We don't use SSH key stuff and we don't use any admission controllers that need to talk to ARM so it didn't hit on this.

Still, will fix, of course.

"orchestratorProfile": {
"orchestratorType": "Kubernetes",
"kubernetesConfig": {
"useManagedIdentity": true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you anticipate this to remain a true/false type setting or would there be other kinds of identities that might be available to users in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very interesting question. In the future, there will be explicit identity support, meaning that users can pre-provision AAD SPs as real resources (ie /Microsoft.Identity/explicitIdentities/k8s-master0).

I think that can be an additive change, so it might look like this in the future:

... {
  "orchestratorType": "Kubernetes",
  "kubernetesConfig": {
     "useManagedIdentity": true,
     "masterIdentity": "/.../Microsoft.Identity/master-sp",
     "nodeIdentity": "/.../Microsoft.Identity/node-sp",
   }
} ...

Copy link
Contributor

@amanohar amanohar Apr 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I was thinking something along the lines of the following to make it extensible. This is because true/false type fields become outdated fairly quickly.

 "kubernetesConfig": {
     "identityType": "Managed"
   }

OR

 "kubernetesConfig": {
     "identity": {
          "type" : "Managed",
          "masterIdentity": "/.../Microsoft.Identity/master-sp",
        "nodeIdentity": "/.../Microsoft.Identity/node-sp",

     }
   }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me think on it and chat with you more about it. I'd be fine with either of those approaches and don't have any strong opinions here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@colemickens @amanohar how do we want to proceed on this PR ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shrutir25 I think it's going to stay here a while longer. There's no need to rush it and I have a better solution than the hack/ stuff with the Azure function coming in ~3 weeks via ARM templates.

@weinong
Copy link
Contributor

weinong commented Jun 13, 2017

Based on my experience using MSI, I think we need to also automate adding MSI to the KeyVault. With cross resource group template deployment, we should include KV access policy in data model.
Also, I'd like to have master and agents accessing different Keyvault. So it seems more sensible to me to put MSI options under Master and AgentPoolProfiles.
Thoughts?

@colemickens
Copy link
Contributor Author

Sorry @weinong I don't quite follow. This is for the cloudprovider integration in Kubernetes, not for getting assets from MSI necessarily (although it could do both). This change doesn't require that the identity be shared or any such thing (I don't think we can handle those more advanced scenarios well until explicit identities come in).

It would be cool to start having ACS-Engine stash the PKI assets, etc into KeyVault and retrieving them, but that's not in scope for this PR. This just does the SP + RBACing so that K8s can call ARM APIs.

@colemickens colemickens force-pushed the msi-rebase2 branch 4 times, most recently from 4a8a2d7 to c52e49e Compare July 13, 2017 06:54
@colemickens colemickens changed the title k8s: managed service identity feat(msi): enable managed service identity for Kubernetes Jul 13, 2017
@colemickens colemickens force-pushed the msi-rebase2 branch 6 times, most recently from 73b2d6f to ffd3f9f Compare July 13, 2017 09:51
@colemickens
Copy link
Contributor Author

This is ready to review and merge hopefully. PR text has also been re-written to be more up-to-date and accurate.

cc: @dmitsh for reviewing the test config changes
cc: @weinong for reviewing the MSI change in general
cc: @seanknox for release/project management side of things - hopefully this one can be allowed... it's a new feature in isolation and pre-dates our new precedent for having an open issue.

I've tested this manually. I have a Jenkins job that has been running it.

From a cluster deployed via Jenkins, we see /etc/kubernetes/azure.json:

{ 
   "cloud":"AzurePublicCloud", 
   "tenantId": "72f988bf-86f1-41af-91ab-2d7cd011db47", 
   "subscriptionId": "8ecadfc9-d1a3-4ea4-b844-0d9f87e4d7c8", 
   "aadClientId": "msi", 
   "aadClientSecret": "msi", 
   "resourceGroup": "x-managed-identity-kubernetes-msi-westus-27-2-0", 
   "location": "westus", 
   "subnetName": "k8s-subnet", 
   "securityGroupName": "k8s-master-25273340-nsg", 
   "vnetName": "k8s-vnet-25273340", 
   "routeTableName": "k8s-master-25273340-routetable", 
   "primaryAvailabilitySetName": "linuxpool1-availabilitySet-25273340", 
   "cloudProviderBackoff": false, 
   "cloudProviderBackoffRetries": 6, 
   "cloudProviderBackoffExponent": 1.5, 
   "cloudProviderBackoffDuration": 5, 
   "cloudProviderBackoffJitter": 1, 
   "cloudProviderRatelimit": false, 
   "cloudProviderRateLimitQPS": 1, 
   "cloudProviderRateLimitBucket": 5, 
   "useManagedIdentityExtension": true 
}

LOOK MA! No scary credentials!

And in sudo journalctl -u kubelet | grep azure | grep "access token":

azure.go:159] azure: using managed identity extension to retrieve access token

And in kubectl get nodes:

2017-07-13 10:42:47.364600 I | proto: duplicate proto type registered: google.protobuf.Any 
2017-07-13 10:42:47.364650 I | proto: duplicate proto type registered: google.protobuf.Duration 
2017-07-13 10:42:47.364679 I | proto: duplicate proto type registered: google.protobuf.Timestamp 
NAME                        STATUS                     AGE       VERSION 
25273acs9010                Ready                      2m        v1.6.4-beta.0.147+2b3fc68910b4dd 
25273acs9011                Ready                      2m        v1.6.4-beta.0.147+2b3fc68910b4dd 
k8s-linuxpool1-25273340-0   Ready                      2m        v1.8.0-alpha.2.85+531664519485fd-dirty 
k8s-linuxpool1-25273340-1   Ready                      1m        v1.8.0-alpha.2.85+531664519485fd-dirty 
k8s-master-25273340-0       Ready,SchedulingDisabled   3m        v1.8.0-alpha.2.85+531664519485fd-dirty

@colemickens
Copy link
Contributor Author

There is one small quirk that make this less than 100% ideal and would keep us from using MSI immediately:

  • An MSI-powered cluster is mutually exclusive with automatic ACR integration. As far as I can tell, ACR doesn't offer a way to authenticate via access tokens, only static credentials, so it's sort of fundamentally incompatible with MSI for the time being. As a result, it's also hard to test Windows given that it doesn't use the cloudprovider integration and we can't test with ACR. But I've made the necessary changes such that it should work.

That having been said, I'd strongly prefer if we can review and merge this. I've spent a pretty serious amount of time rebasing this over the past 6.5 months. It'd be great to get it in, get a soak test going, etc.

@colemickens
Copy link
Contributor Author

Note: The PR is only failing because it runs in a different sub with a different SP than I was testing with. And I don't have access to the subscription, so I can't do anything about this. Hopefully @dmitsh or someone else with access to that subscription can either add me or the relevant SP Application ID to unblock this. In the meantime, here's a build with the same git hash that is live now (<jenkins_url>/job/colemick-jobs/job/colemick-acs-engine-msi2/29/)

@rjtsdl rjtsdl requested a review from weinong July 13, 2017 17:20
CLUSTER_SERVICE_PRINCIPAL_CLIENT_ID=msi
CLUSTER_SERVICE_PRINCIPAL_CLIENT_SECRET=msi
CUSTOM_HYPERKUBE_SPEC=docker.io/colemickens/hyperkube-amd64:531664519485fdbb543e1e83c90bac6fef829d73
EXPECTED_NODE_COUNT=5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add a newline.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, I'll add a couple

},
"servicePrincipalProfile": {
"servicePrincipalClientID": "",
"servicePrincipalClientSecret": ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this could be removed IMO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And also, what this file is used for? The hyperkube here is up to date?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an example apimodel that activates MSI. I don't want to remove the SPP yet. It will require refactoring the validation code to be aware of what mode it's running in, plus this section will likely still be used for either ACR scenario or for AAD-kubectl scenarios.

@colemickens
Copy link
Contributor Author

@acs-bot test this please

@weinong
Copy link
Contributor

weinong commented Jul 13, 2017

For scaling down, do we need to remove principalIDs from role assignment?


Reviewed 1 of 32 files at r2, 16 of 51 files at r3, 4 of 4 files at r4, 4 of 4 files at r5, 2 of 2 files at r6.
Review status: all files reviewed at latest revision, 8 unresolved discussions, some commit checks failed.


examples/managed-identity/kubernetes-msi.json, line 46 at r6 (raw file):

    "servicePrincipalProfile": {
      "servicePrincipalClientID": "",
      "servicePrincipalClientSecret": ""

do we still need spn secret with msi enabled?


parts/kubernetesagentresourcesvmas.t, line 133 at r6 (raw file):

      "location": "[variables('location')]",
      "name": "[concat(variables('{{.Name}}VMNamePrefix'), copyIndex(variables('{{.Name}}Offset')))]",
      {{if UseManagedIdentity}}

should we enable msi on agent pool separately?


Comments from Reviewable

Copy link
Contributor

@seanknox seanknox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Wil let another dev approve.

"name": "vmLoopNode"
},
"name": "[guid(concat('Microsoft.Compute/virtualMachines/', variables('{{.Name}}VMNamePrefix'), copyIndex(), 'vmidentity'))]",
"type": "Microsoft.Authorization/roleAssignments",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably stupid question, excuse me in advance. Is it at all possible to extract this into another file? Looking for opportunities to make all the parts (literally and figuratively) a bit more visible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dunno why this showed in Reviewable for me to comment on but didn't post back here. Anyway, what I wrote:

I'm not sure actually. I don't know if go tmpl has a generic include function or if we use it anywhere. I'd selfishly say I'd prefer to punt on it given my lack of immediate knowledge.

@weinong
Copy link
Contributor

weinong commented Jul 13, 2017

thanks for clarifying offlien. LGTM

@colemickens
Copy link
Contributor Author

Review status: 23 of 24 files reviewed at latest revision, 9 unresolved discussions, some commit checks failed.


examples/managed-identity/kubernetes-msi.json, line 46 at r6 (raw file):

Previously, weinong (Weinong Wang) wrote…

do we still need spn secret with msi enabled?

It becomes optional. For now, it happens to still be usable in that the ACR integration will keep working, even in some scenario where user activates MSI and still supplies creds.

I've chosen not to omit it because of the refactoring changes required that I mentioned in reply to Jingtao.


examples/managed-identity/kubernetes-msi.json, line 46 at r6 (raw file):

Previously, colemickens (Cole Mickens) wrote…

This is an example apimodel that activates MSI. I don't want to remove the SPP yet. It will require refactoring the validation code to be aware of what mode it's running in, plus this section will likely still be used for either ACR scenario or for AAD-kubectl scenarios.

Done.


examples/managed-identity/kubernetes-msi.json.env, line 4 at r6 (raw file):

Previously, colemickens (Cole Mickens) wrote…

sure, I'll add a couple

Done.


parts/kubernetesagentresourcesvmas.t, line 133 at r6 (raw file):

Previously, weinong (Weinong Wang) wrote…

should we enable msi on agent pool separately?

IMO, no. The ServicePrincipalProfile applies uniformly today, this seems consistent.


parts/kubernetesagentresourcesvmas.t, line 205 at r6 (raw file):

Previously, seanknox (Sean Knox) wrote…

Probably stupid question, excuse me in advance. Is it at all possible to extract this into another file? Looking for opportunities to make all the parts (literally and figuratively) a bit more visible.

I'm not sure actually. I don't know if go tmpl has a generic include function or if we use it anywhere. I'd selfishly say I'd prefer to punt on it given my lack of immediate knowledge.


parts/kubernetesmaster-kube-apiserver.yaml, line 51 at r2 (raw file):

Previously, colemickens (Cole Mickens) wrote…
  1. It's used for SSH key propagation (it's a GKE thing I think). apiserver also parses the config file and passes it to admission controllers so they can use it. (Not sure of any that do).
  2. We don't use SSH key stuff and we don't use any admission controllers that need to talk to ARM so it didn't hit on this.

Still, will fix, of course.

Done.


examples/managed-identity/kubernetes.json, line 7 at r2 (raw file):

Previously, colemickens (Cole Mickens) wrote…

@shrutir25 I think it's going to stay here a while longer. There's no need to rush it and I have a better solution than the hack/ stuff with the Azure function coming in ~3 weeks via ARM templates.

Done.


hack/rbacgenerator/function.js, line 1 at r2 (raw file):

Previously, colemickens (Cole Mickens) wrote…

Hm, maybe, sure. I'm open to suggestions. It's not totally unprecedented, kubernetes has a hack/ dir for temp/dev type things.

This no longer applies since I've removed hack/ and everything in lieu of guid().


hack/rbacgenerator/raw.js, line 1 at r2 (raw file):

Previously, colemickens (Cole Mickens) wrote…

No idea why it's called that. It's package.json in reality. Will rename. Thanks.

No longer applies. It's all been removed.


Comments from Reviewable

@colemickens
Copy link
Contributor Author

Review status: 14 of 24 files reviewed at latest revision, 9 unresolved discussions.


examples/managed-identity/kubernetes-msi.json, line 46 at r6 (raw file):

Previously, colemickens (Cole Mickens) wrote…

It becomes optional. For now, it happens to still be usable in that the ACR integration will keep working, even in some scenario where user activates MSI and still supplies creds.

I've chosen not to omit it because of the refactoring changes required that I mentioned in reply to Jingtao.

Done.


examples/managed-identity/kubernetes-msi.json, line 46 at r6 (raw file):

Previously, colemickens (Cole Mickens) wrote…

Done.

Done.


examples/managed-identity/kubernetes-msi.json.env, line 4 at r6 (raw file):

Previously, colemickens (Cole Mickens) wrote…

Done.

Done.


parts/kubernetesagentresourcesvmas.t, line 133 at r6 (raw file):

Previously, colemickens (Cole Mickens) wrote…

IMO, no. The ServicePrincipalProfile applies uniformly today, this seems consistent.

Done.


parts/kubernetesagentresourcesvmas.t, line 205 at r6 (raw file):

Previously, colemickens (Cole Mickens) wrote…

Dunno why this showed in Reviewable for me to comment on but didn't post back here. Anyway, what I wrote:

I'm not sure actually. I don't know if go tmpl has a generic include function or if we use it anywhere. I'd selfishly say I'd prefer to punt on it given my lack of immediate knowledge.

Done.


parts/kubernetesmaster-kube-apiserver.yaml, line 51 at r2 (raw file):

Previously, colemickens (Cole Mickens) wrote…

Done.

Done.


examples/managed-identity/kubernetes.json, line 7 at r2 (raw file):

Previously, colemickens (Cole Mickens) wrote…

Done.

Done.


hack/rbacgenerator/function.js, line 1 at r2 (raw file):

Previously, colemickens (Cole Mickens) wrote…

This no longer applies since I've removed hack/ and everything in lieu of guid().

Done.


hack/rbacgenerator/raw.js, line 1 at r2 (raw file):

Previously, colemickens (Cole Mickens) wrote…

No longer applies. It's all been removed.

Done.


Comments from Reviewable

@colemickens colemickens merged commit 2d54601 into Azure:master Jul 14, 2017
@seanknox
Copy link
Contributor

giphy-downsized 1

@omerlh
Copy link

omerlh commented Dec 20, 2017

Any documentation for that? Also, is there support for containers authorization? e.g. granting access only to specific container to a given KeyVault?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants