Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure: Add support for Azure Workload Identity authentication using DefaultAzureCredential #82

Merged
merged 1 commit into from
Nov 12, 2023

Conversation

rikhil-s
Copy link
Contributor

@rikhil-s rikhil-s commented Oct 19, 2023

  • I added CHANGELOG entry for this change.
  • Change is not relevant to the end user.

Changes

  • Add support for Azure Workload Identity authentication using DefaultAzureCredential
    • Bump azidentity from 1.2.0 to 1.4.0 to enable Workload Identity authentication

Verification

  • Have tested by building a thanos image with these changes and using it in an:
    • AKS cluster set up to use Workload Identity
    • AKS cluster set up to use VM Managed Identity
  • In both scenarios, thanos components which use objstore successfully authenticated with Azure

@rikhil-s rikhil-s force-pushed the workloadidentity branch 2 times, most recently from 85a8bb6 to 4f2667b Compare October 19, 2023 15:53
@rikhil-s rikhil-s marked this pull request as draft October 19, 2023 15:54
@rikhil-s rikhil-s closed this Oct 19, 2023
@rikhil-s rikhil-s reopened this Oct 19, 2023
@rikhil-s rikhil-s closed this Oct 19, 2023
@rikhil-s rikhil-s reopened this Oct 19, 2023
@rikhil-s rikhil-s force-pushed the workloadidentity branch 4 times, most recently from 93a3646 to 434c07b Compare October 19, 2023 17:15
@rikhil-s rikhil-s marked this pull request as ready for review October 19, 2023 17:16
@rikhil-s
Copy link
Contributor Author

@kakkoyun Please could I get a review of this PR. Thanks!

@rikhil-s rikhil-s force-pushed the workloadidentity branch 2 times, most recently from 44f81a3 to 6422764 Compare October 27, 2023 08:19
@audunsolemdal
Copy link

Just a friendly note - I think you are able to use WorkloadIdentityCredential which is more accurate for an in-place replacement

@rikhil-s
Copy link
Contributor Author

Just a friendly note - I think you are able to use WorkloadIdentityCredential which is more accurate for an in-place replacement

Thanks for the suggestion! I did look into using WorkloadIdentityCredential, but using that would mean users would no longer be able to authenticate using VM Managed Identities - which is something Thanos currently supports.

DefaultAzureCredential has the benefit of supporting all types of authentication (providing the correct inputs exist or are passed to it) and attempts to authenticate in the order described here.

@nbjohnson
Copy link

Great to see workload identity support being added for azure storage connections. Is this going to support sovereign clouds? From what I can see it doesn't look to be an option. Usually there is a cloud client option set for both the auth call and for the actual storage call. It would be great to see additional cloud env support included with the workload identity support

@rikhil-s
Copy link
Contributor Author

rikhil-s commented Nov 1, 2023

Great to see workload identity support being added for azure storage connections. Is this going to support sovereign clouds? From what I can see it doesn't look to be an option. Usually there is a cloud client option set for both the auth call and for the actual storage call. It would be great to see additional cloud env support included with the workload identity support

I'm not entirely sure what sovereign cloud is but doing a quick online search suggests it's something to do with being compliant with local laws ensuring all data stays on sovereign soil. I'm not sure how that ties in with making a request for an authorisation token though!

The DefaultAzureCredential being used here allows authentication using all the methods described here which includes the Azure CLI if that's what you mean by cloud client option.

@rikhil-s
Copy link
Contributor Author

rikhil-s commented Nov 1, 2023

@vglafirov @phillebaba As Azure Storage Account maintainers please could I get a review of this PR. Currently, the only two supported ways to authenticate with Azure in Thanos are either using VM Managed Identities or AAD Pod Identity. AAD Pod Identity is end of life, so there is currently only one safe way to authenticate with Azure in Thanos. This PR will allow for authentication using Azure Workload Identity which is the Azure recommended replacement for AAD Pod Identity. I think this change will be useful for a lot of users of Thanos! Thanks

@nbjohnson
Copy link

@rikhil-s So sovereign clouds are special cloud environments that use different endpoints and any call to them you need to pass it the cloud config for that environment otherwise it will always call the the main public cloud endpoints which won't work. Both the identity call as well as the storage calls need to be updated to pass the correct cloud environment option.

externaldns recently had to fix support for it as well: kubernetes-sigs/external-dns#3942, I can point out some of the key points to set so it can hopefully be replicated here.

This is the line that set the client options to use the correct cloud for the identity call: https://github.com/kubernetes-sigs/external-dns/blob/master/provider/azure/config.go#L74. Looks like NewDefaultAzureCredential takes an options object which can take a client options object which can include a cloud setting: https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/azidentity#DefaultAzureCredential, https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/azcore@v1.8.0/policy#ClientOptions

As for the actual storage calls, I think it would just be setting the cloud option in the already defined options object: https://github.com/thanos-io/objstore/blob/main/providers/azure/helpers.go#L27 and just make sure it gets passed to all calls. I am not sure of the specifics of the storage calls, but here is an example in externaldns of where a resource call passes client options that include the cloud environment: https://github.com/kubernetes-sigs/external-dns/blob/master/provider/azure/azure.go#L78

I know that was a lot, hopefully that information helps, would be awesome if we could get additional cloud environment support directly with workload identity support. Just want to be able to set a variable for which azure environment to use and for it to be able to work.

@rikhil-s
Copy link
Contributor Author

rikhil-s commented Nov 2, 2023

@rikhil-s So sovereign clouds are special cloud environments that use different endpoints and any call to them you need to pass it the cloud config for that environment otherwise it will always call the the main public cloud endpoints which won't work. Both the identity call as well as the storage calls need to be updated to pass the correct cloud environment option.

externaldns recently had to fix support for it as well: kubernetes-sigs/external-dns#3942, I can point out some of the key points to set so it can hopefully be replicated here.

This is the line that set the client options to use the correct cloud for the identity call: https://github.com/kubernetes-sigs/external-dns/blob/master/provider/azure/config.go#L74. Looks like NewDefaultAzureCredential takes an options object which can take a client options object which can include a cloud setting: https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/azidentity#DefaultAzureCredential, https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/azcore@v1.8.0/policy#ClientOptions

As for the actual storage calls, I think it would just be setting the cloud option in the already defined options object: https://github.com/thanos-io/objstore/blob/main/providers/azure/helpers.go#L27 and just make sure it gets passed to all calls. I am not sure of the specifics of the storage calls, but here is an example in externaldns of where a resource call passes client options that include the cloud environment: https://github.com/kubernetes-sigs/external-dns/blob/master/provider/azure/azure.go#L78

I know that was a lot, hopefully that information helps, would be awesome if we could get additional cloud environment support directly with workload identity support. Just want to be able to set a variable for which azure environment to use and for it to be able to work.

Ok that makes sense. I think I would be reluctant to include those changes in this PR as I don't really have the knowledge or expertise to test that. CI doesn't currently test Azure object storage client implementations so there would need to be a lot of local testing involved which I can't commit to at this time.

@jellis18
Copy link

jellis18 commented Nov 8, 2023

I’ve been putting off using thanos for this reason and was about to do this myself. Hopefully this few line PR can be reviewed at some point.

@jellis18
Copy link

jellis18 commented Nov 8, 2023

@rikhil-s can you add reviewers?
image

@rikhil-s
Copy link
Contributor Author

rikhil-s commented Nov 8, 2023

@rikhil-s can you add reviewers? image

It won't let me add reviewers - I think only maintainers can add themselves as reviewers. I have posted in the slack channel for a review but haven't received much engagement apart from @MichaHoffmann who is trying to get in contact with one of the Azure Storage Account maintainers privately to get this reviewed and merged.

@yeya24
Copy link
Contributor

yeya24 commented Nov 10, 2023

I am ok to switch to NewDefaultAzureCredential if it is preferred. But wondering if we have a way to not set the env var in this library? Like pass the ID directly to the credential somehow.
Doesn't feel like a good practice to me.

@rikhil-s rikhil-s force-pushed the workloadidentity branch 3 times, most recently from 85dfb6e to b5278e7 Compare November 10, 2023 15:48
@rikhil-s rikhil-s changed the title Azure: Authenticate using DefaultAzureCredential enabling support for Azure Workload Identity Azure: Add support for Azure Workload Identity authentication using DefaultAzureCredential Nov 10, 2023
@rikhil-s
Copy link
Contributor Author

I am ok to switch to NewDefaultAzureCredential if it is preferred. But wondering if we have a way to not set the env var in this library? Like pass the ID directly to the credential somehow. Doesn't feel like a good practice to me.

See thread above, but we have now worked around using the env var whilst maintaining support for VM Managed Identities.

Signed-off-by: Rikhil Shah <rikhil.shah@metaswitch.com>
Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@jellis18
Copy link

Thanks

So after this is merged in, what is the protocol for getting this into the main thanos image? After that happens, I'd be happy to update the documentation for this, unless @rikhil-s wants to handle that too.

@yeya24
Copy link
Contributor

yeya24 commented Nov 12, 2023

@MichaHoffmann I will go ahead and merge this pr. Incase you have any other comments we can follow up in next pr.

So after this is merged in, what is the protocol for getting this into the main thanos image? After that happens, I'd be happy to update the documentation for this, unless @rikhil-s wants to handle that too.

@jellis18 We don't have a doc for the upgrade process. But you just need to do go get to pull the latest objstore library in Thanos and update doc wherever it is needed

@yeya24 yeya24 merged commit 37752ee into thanos-io:main Nov 12, 2023
7 checks passed
@MichaHoffmann
Copy link
Contributor

Hey!

No this looks great to me.

@rikhil-s
Copy link
Contributor Author

rikhil-s commented Nov 13, 2023

Thanks

So after this is merged in, what is the protocol for getting this into the main thanos image? After that happens, I'd be happy to update the documentation for this, unless @rikhil-s wants to handle that too.

I can create the PR for the main Thanos image and if you're happy to handle the documentation, that'd be great!

@MichaHoffmann @yeya24 What's the process for releasing a patch release? From reading the release process docs, I can see that major releases are on a 6 weekly cadence but I would like to release a 0.32.6 of Thanos with these changes if possible? Is this something which could be done?

@MichaHoffmann
Copy link
Contributor

0.33-rc0 is in progress; we could add it there maybe

@rikhil-s
Copy link
Contributor Author

0.33-rc0 is in progress; we could add it there maybe

When is that being released? I guess putting these changes in a 0.32.6 wouldn't be semantically correct as this is not a patch / bug fix

@MichaHoffmann
Copy link
Contributor

Sometime early this week probably; if we add this it would be -rc1 i think

@rikhil-s
Copy link
Contributor Author

Sometime early this week probably; if we add this it would be -rc1 i think

Ok, let's aim to get it in there. Will create the PR in Thanos now to pull in these changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants