Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional GCP authentication #3541

Merged
merged 4 commits into from
Jan 25, 2023
Merged

Additional GCP authentication #3541

merged 4 commits into from
Jan 25, 2023

Conversation

winding-lines
Copy link
Contributor

Which issue does this PR close?

Closes #3533.

Rationale for this change

What changes are included in this PR?

This is WIP. I may be missing something, however workload identity federation is big. https://google.aip.dev/auth/4117. Definitely more than one of my weekends :) Is this what you had in mind @tustvold ?

I am happy to keep on pushing on this but I am wondering if we could maybe land the Application Default Credentials in the mean time because that would unblock some end-user scenarios.

Are there any user-facing changes?

@github-actions github-actions bot added the object-store Object Store Interface label Jan 17, 2023
@tustvold
Copy link
Contributor

tustvold commented Jan 17, 2023

I'm confused workload identity federation is different from workload identity? The ticket is for the former but this PR references the latter? I agree workload identity federation is highly complex, and may warrant a different approach, but I'm also not sure if this is something we need to support?

Workload identity is for when accessing GCP resources from within GCP, workload identity federation is when accessing GCP resources from a non-GCP environment

@winding-lines
Copy link
Contributor Author

Thanks for your reply. I am not using workload identity in my daily work, I am just trying to help move forward a great project. Could you be more specific about what the library should support?

I also looked in the azure and aws implementations but jt is still not clear to me.

Many thanks!

@tustvold
Copy link
Contributor

I think a good place to start would be just supporting workload identity (#3490) as documented here. This is very similar to the AWS InstanceCredentialProvider.

We can then look to extend this to support application default credentials as a follow up

@winding-lines
Copy link
Contributor Author

Ok, so the intent is just to fetch the token from the local metadata server when running on GKE and similar. Thanks for the clarification 🙏🏻

@winding-lines
Copy link
Contributor Author

@tustvold here is the code to implement

  1. fetching the token from the metadata server on Google Cloud
  2. fetching the token from the Application Default Credentials

I got an instance on Google Cloud and tested this code.

Looking forward to your feedback :)

@tustvold
Copy link
Contributor

I intend to review this first thing tomorrow

@winding-lines
Copy link
Contributor Author

Great! I work on this mostly in the weekends, but of course I don't expect weekend reviewers.

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good, mostly just some minor nits. I will also endeavour to give this a final test before merge

@@ -185,32 +181,6 @@ impl From<Error> for super::Error {
}
}

/// A deserialized `service-account-********.json`-file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is moved into credentials

object_store/src/gcp/credential.rs Outdated Show resolved Hide resolved
object_store/src/gcp/credential.rs Outdated Show resolved Hide resolved
object_store/src/gcp/credential.rs Outdated Show resolved Hide resolved
object_store/src/gcp/credential.rs Show resolved Hide resolved
object_store/src/gcp/credential.rs Outdated Show resolved Hide resolved
object_store/src/gcp/credential.rs Show resolved Hide resolved
object_store/src/gcp/credential.rs Show resolved Hide resolved
object_store/src/gcp/mod.rs Outdated Show resolved Hide resolved
object_store/src/gcp/credential.rs Outdated Show resolved Hide resolved
) -> Result<TemporaryToken<String>> {
println!("fetching token from metadata server");
const TOKEN_URL: &str =
"http://metadata/computeMetadata/v1/instance/service-accounts/default/token";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"http://metadata/computeMetadata/v1/instance/service-accounts/default/token";
"http://metadata.google.internal/computeMetadata/v1/instance/service-accounts/default/token";

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Google's documentation specifically has only the hostname without the google.internal domain. I was able to test with the hostname only. I see the value of using the full domain here but I prefer to stick with the documenation.

Copy link
Contributor

@jychen7 jychen7 Jan 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

understood, I think Google just have inconsistent documents. Both work for me.

ps: I was looking at https://cloud.google.com/kubernetes-engine/docs/concepts/workload-identity#metadata_server and it mentions the full hostname metadata.google.internal and IP 169.254.169.254. It also reference https://cloud.google.com/compute/docs/metadata/overview

I agree that https://cloud.google.com/docs/authentication/get-id-token#metadata-server only show metadata

(sorry, I can not close this comment, maybe no permission. Feel free to ignore)

@winding-lines
Copy link
Contributor Author

I think I addressed the current round of feedback, let me know what you think :)

Also tested with Application Default Credentials and Google cloud.

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some minor nits and I think this is ready to go, thank you

object_store/src/gcp/credential.rs Outdated Show resolved Hide resolved
object_store/src/gcp/credential.rs Outdated Show resolved Hide resolved
@tustvold tustvold merged commit 42b2d55 into apache:master Jan 25, 2023
@tustvold
Copy link
Contributor

Thanks again

@winding-lines
Copy link
Contributor Author

Thanks a lot @tustvold, looking forward to contributing more in the near future.

@ursabot
Copy link

ursabot commented Jan 25, 2023

Benchmark runs are scheduled for baseline = 73ce076 and contender = 42b2d55. 42b2d55 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@yonch
Copy link

yonch commented Jan 26, 2023

I was looking for this functionality just yesterday and saw this commit. Thank you @winding-lines and @tustvold for getting this out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
object-store Object Store Interface
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement workload identity and application default credentials for GCP object store.
5 participants