Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow to specify the STS endpoint for hive connector on S3 #10169

Conversation

clemensvonschwerin
Copy link
Contributor

@clemensvonschwerin clemensvonschwerin commented Dec 3, 2021

Description

Allow to specify the STS endpoint for hive connector on S3

Is this change a fix, improvement, new feature, refactoring, or other?

Improvement.

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

Hive/Iceberg/Delta connectors

How would you describe this change to a non-technical end user or system administrator?

Allow to specify the STS endpoint for hive connector on S3

Documentation

(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

() No release notes entries required.
(x) Release notes entries required with the following suggested text:

# Hive/Iceberg/Delta connectors
* Allow specifying STS endpoint to be used when connecting to S3. ({issue}`10169`)

@cla-bot
Copy link

cla-bot bot commented Dec 3, 2021

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to cla@trino.io. For more information, see https://github.com/trinodb/cla.

@clemensvonschwerin clemensvonschwerin force-pushed the hive-allow-sts-endpoint-specification branch from 313534c to a998448 Compare December 3, 2021 12:09
@cla-bot cla-bot bot added the cla-signed label Dec 3, 2021
@kokosing
Copy link
Member

kokosing commented Dec 3, 2021

Can you please explain what is the use case for this? Also how have you tested it?

@cvs-mckinsey
Copy link

Can you please explain what is the use case for this? Also how have you tested it?

The use case for this is using a third-party s3 / AWS service provider such as minio in your setup, but still depending on STS to get temporary credentials.
I tested this manually using s3 and sts provided by minio. In case trino.sts.endpoint or trino.sts.region are not set nothing changes, I only took the liberty to update the deprecated method of adding credentials with the new recommended way according to AWS documentation.
If there is any way for an automated test please let me know.

@cla-bot
Copy link

cla-bot bot commented Dec 3, 2021

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to cla@trino.io. For more information, see https://github.com/trinodb/cla.

@cla-bot cla-bot bot removed the cla-signed label Dec 3, 2021
@clemensvonschwerin clemensvonschwerin force-pushed the hive-allow-sts-endpoint-specification branch from b769fc6 to 2470411 Compare December 3, 2021 15:05
@cla-bot cla-bot bot added the cla-signed label Dec 3, 2021
@findepi findepi requested a review from alexjo2144 December 3, 2021 21:29
@findepi findepi changed the title Allow to specify the sts endpoint for hive connector on s3 Allow to specify the STS endpoint for hive connector on S3 Dec 3, 2021
@cla-bot
Copy link

cla-bot bot commented Dec 6, 2021

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to cla@trino.io. For more information, see https://github.com/trinodb/cla.

@cla-bot cla-bot bot removed the cla-signed label Dec 6, 2021
@clemensvonschwerin clemensvonschwerin force-pushed the hive-allow-sts-endpoint-specification branch from 1f25ab3 to 05710de Compare December 6, 2021 15:59
@cla-bot cla-bot bot added the cla-signed label Dec 6, 2021
AWSSecurityTokenServiceClientBuilder stsClientBuilder =
AWSSecurityTokenServiceClientBuilder.standard()
.withCredentials(provider);
stsClientBuilder.withEndpointConfiguration(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If stsRegionOverwrite is set but stsEndpointOverwrite is not it looks like we want to call stsClientBuilder.withRegion

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I will update that

String stsEndpointOverwrite = conf.get(STS_ENDPOINT);
String stsRegionOverwrite = conf.get(STS_REGION);

if (stsEndpointOverwrite != null && stsRegionOverwrite != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (stsEndpointOverwrite != null && stsRegionOverwrite != null) {
if (!isNullOrEmpty(stsEndpointOverwrite)l && !isNullOrEmpty(stsRegionOverwrite)) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I will update that

.withExternalId(externalId)
.withLongLivedCredentialsProvider(provider)
.build();
String stsEndpointOverwrite = conf.get(STS_ENDPOINT);
Copy link
Member

@alexjo2144 alexjo2144 Dec 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd format this block a bit differently, withLongLivedCredentialsProvider is also deprecated so using AWSSecurityTokenServiceClientBuilder is preferred in all cases:

String stsEndpointOverwrite = conf.get(STS_ENDPOINT);
String stsRegionOverwrite = conf.get(STS_REGION);

AWSSecurityTokenServiceClientBuilder stsClient = ...
    .withCredentials(provider);

if (<both overrides are set>) {
    stsClient.withEndpointConfiguration(...);
}
else if (<region is set>) {
    stsClient.withRegion(...);
}

provider = new STSAssumeRoleSessionCredentialsProvider.Builder(iamRole, "trino-session")
    .with(<all the common properties>);
    .withStsClient(stsClient.build())
    .build();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the same and wanted to get rid of the deprecated withLongLivedCredentialsProvider as well, but that resulted in errors in TrinoS3FileSystemTest: com.amazonaws.SdkClientException: Unable to find a region via the region provider chain. Must provide an explicit region in the builder or setup environment to supply a region.

So the question is: fix the tests setup or stay with the deprecated method (or use a option I have not thought of) ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably need to use the default region provider chain, and if that comes up empty use us-east-1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I updated the PR accordingly

@cvs-mckinsey
Copy link

@alexjo2144 thank you very much for your review. I will try to update the PR tomorrow based on your suggestions and I am looking forward to your answer to my question.

@clemensvonschwerin clemensvonschwerin force-pushed the hive-allow-sts-endpoint-specification branch from bfce7ca to f079163 Compare December 17, 2021 08:23
@cla-bot
Copy link

cla-bot bot commented Dec 17, 2021

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to cla@trino.io. For more information, see https://github.com/trinodb/cla.

@cla-bot cla-bot bot removed the cla-signed label Dec 17, 2021
@clemensvonschwerin clemensvonschwerin force-pushed the hive-allow-sts-endpoint-specification branch from 9bd1c9a to 4f3515d Compare December 17, 2021 12:58
@cla-bot cla-bot bot added the cla-signed label Dec 17, 2021
@cvs-mckinsey
Copy link

@alexjo2144 @electrum @kokosing hey guys, I hope you had a fantastic Christmas time and started well in the new year 2022. Would be great if you would find the time to check this PR again since all of the checks are passing now. Thank you!

Comment on lines 968 to 970
final String defaultRegion = "us-east-1";
log.debug("Falling back to default AWS region " + defaultRegion);
stsClientBuilder.withRegion(defaultRegion);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should throw here instead hard-coding failover to us-east-1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for looking into the PR. Your review directly contradicts @alexjo2144 's review comment earlier on, unfortunately. I am really okay with both, I would just like to know which way to take to get this PR approved.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what it's worth, we default to us-east-1 when creating the AmazonS3 client object: https://github.com/trinodb/trino/blob/master/plugin/trino-hive/src/main/java/io/trino/plugin/hive/s3/TrinoS3FileSystem.java#L885

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair :)

Comment on lines 202 to 203
public static final String STS_ENDPOINT = "trino.sts.endpoint";
public static final String STS_REGION = "trino.sts.region";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As these are used only for accessing S3 I would use trino.s3.sts.endpoint/region. And S3_STS_* as constant names.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me, will update as soon as I know how to proceed regarding your first comment.

@cla-bot
Copy link

cla-bot bot commented Jan 25, 2022

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to cla@trino.io. For more information, see https://github.com/trinodb/cla.

@cla-bot cla-bot bot removed the cla-signed label Jan 25, 2022
@clemensvonschwerin clemensvonschwerin force-pushed the hive-allow-sts-endpoint-specification branch from 3be751b to 203a634 Compare January 25, 2022 14:43
@cla-bot
Copy link

cla-bot bot commented Jan 25, 2022

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to cla@trino.io. For more information, see https://github.com/trinodb/cla.

@clemensvonschwerin clemensvonschwerin force-pushed the hive-allow-sts-endpoint-specification branch 3 times, most recently from fbea926 to c8f6b6f Compare March 9, 2022 12:31
Comment on lines 982 to 983
String stsEndpointOverwrite = conf.get(S3_STS_ENDPOINT);
String stsRegionOverwrite = conf.get(S3_STS_REGION);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Overwrite -> Override

region = regionProviderChain.getRegion();
}
catch (SdkClientException ex) {
log.debug("Falling back to default AWS region " + US_EAST_1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log as WARN

@losipiuk
Copy link
Member

losipiuk commented Mar 9, 2022

LGTM. @clemensvonschwerin did you have a chance to test it. I do not belive it is covered with automated tests.
@alexjo2144 can you please take another look?

Copy link
Member

@alexjo2144 alexjo2144 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly style nits, one real question.

@losipiuk This needs a Trino based mirror PR to run the S3 tests right?

Comment on lines 985 to 986
AWSSecurityTokenServiceClientBuilder stsClientBuilder =
AWSSecurityTokenServiceClientBuilder.standard()
.withCredentials(provider);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can inline the first two lines

Suggested change
AWSSecurityTokenServiceClientBuilder stsClientBuilder =
AWSSecurityTokenServiceClientBuilder.standard()
.withCredentials(provider);
AWSSecurityTokenServiceClientBuilder stsClientBuilder = AWSSecurityTokenServiceClientBuilder.standard()
.withCredentials(provider);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also move this down to right before it's used

Comment on lines 994 to 995
DefaultAwsRegionProviderChain regionProviderChain =
new DefaultAwsRegionProviderChain();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to wrap this line

Suggested change
DefaultAwsRegionProviderChain regionProviderChain =
new DefaultAwsRegionProviderChain();
DefaultAwsRegionProviderChain regionProviderChain = new DefaultAwsRegionProviderChain();

Comment on lines 1006 to 1007
stsClientBuilder.withEndpointConfiguration(
new EndpointConfiguration(stsEndpointOverwrite, region));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too

Suggested change
stsClientBuilder.withEndpointConfiguration(
new EndpointConfiguration(stsEndpointOverwrite, region));
stsClientBuilder.withEndpointConfiguration(new EndpointConfiguration(stsEndpointOverwrite, region));

@@ -973,9 +979,40 @@ private AWSCredentialsProvider createAwsCredentialsProvider(URI uri, Configurati
.orElseGet(DefaultAWSCredentialsProviderChain::getInstance);

if (iamRole != null) {
String stsEndpointOverwrite = conf.get(S3_STS_ENDPOINT);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super familiar with the need for STS region/endpoint overrides. Is it only ever useful when an IAM role is used, or could it also be used without a role?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since STS service is used to assume a specific IAM role and get temporary credentials for that role I do not see any use case without setting an IAM role.

@cla-bot
Copy link

cla-bot bot commented Mar 9, 2022

Thank you for your pull request and welcome to our community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. In order for us to review and merge your code, please submit the signed CLA to cla@trino.io. For more information, see https://github.com/trinodb/cla.

@cla-bot cla-bot bot removed the cla-signed label Mar 9, 2022
@clemensvonschwerin clemensvonschwerin force-pushed the hive-allow-sts-endpoint-specification branch from ef3a4b7 to 3caac54 Compare March 9, 2022 17:27
@cla-bot cla-bot bot added the cla-signed label Mar 9, 2022
@cvs-mckinsey
Copy link

LGTM. @clemensvonschwerin did you have a chance to test it. I do not belive it is covered with automated tests. @alexjo2144 can you please take another look?

So an older version of this PR has been running on our dev and production trino instances for a while and works without issues. The combination of Trino and S3 & STS on Minio for our dev systems is possible that way.

@clemensvonschwerin clemensvonschwerin force-pushed the hive-allow-sts-endpoint-specification branch 2 times, most recently from f2be8c9 to 075e504 Compare March 14, 2022 09:01
@losipiuk
Copy link
Member

@losipiuk This needs a Trino based mirror PR to run the S3 tests right?

#11461

@losipiuk
Copy link
Member

@losipiuk This needs a Trino based mirror PR to run the S3 tests right?

#11461

passed

Copy link
Member

@losipiuk losipiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clemensvonschwerin Please also add documentation to hive-s3.rst

@clemensvonschwerin clemensvonschwerin force-pushed the hive-allow-sts-endpoint-specification branch from 075e504 to 9905c4c Compare March 15, 2022 09:49
@cvs-mckinsey
Copy link

@clemensvonschwerin Please also add documentation to hive-s3.rst

@losipiuk sure, I updated hive-s3.rst. Is there anything else I would need to do before we can get this merged ?

@github-actions github-actions bot added the docs label Mar 15, 2022
@losipiuk
Copy link
Member

@clemensvonschwerin Please also add documentation to hive-s3.rst

@losipiuk sure, I updated hive-s3.rst. Is there anything else I would need to do before we can get this merged ?

No. other than that I think it is good to go. But please fix the doc error :)

@losipiuk losipiuk force-pushed the hive-allow-sts-endpoint-specification branch from 9905c4c to 3d2bbc7 Compare March 15, 2022 14:36
@losipiuk
Copy link
Member

I updated myself - lets wait for CI.

@losipiuk losipiuk force-pushed the hive-allow-sts-endpoint-specification branch from 3d2bbc7 to f500388 Compare March 15, 2022 14:37
@losipiuk losipiuk force-pushed the hive-allow-sts-endpoint-specification branch from f500388 to 0d66aa4 Compare March 15, 2022 14:39
@losipiuk losipiuk merged commit 010e840 into trinodb:master Mar 16, 2022
@github-actions github-actions bot added this to the 374 milestone Mar 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

5 participants