Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/s3 #20

Open
wants to merge 20 commits into
base: develop
Choose a base branch
from
Open

Feature/s3 #20

wants to merge 20 commits into from

Conversation

andersleung
Copy link
Collaborator

No description provided.

Copy link

@brainstorm brainstorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good :) Other than the inner AWS region hardcoding I'm curious on how you'll handle the AWSCredentialProviderChain, within this Feature/s3 classes or somewhere else?

Thinking about giving access to S3 objects sitting behind an org that restrict public S3 object access, therefore either CloudFront, Presigned URLs or another mechanism must be in place... again, perhaps out of scope and abstraction for this PR, apologies if so ;)

/cc @ohofmann @victorskl @resingerf

}

private static S3Client mainClient = S3Client.builder()
.region(Region.US_EAST_1)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded region, should be parametrizable externally, perhaps CLI so that it can be easily passed to a docker container during deployment (we are in AWS Sydney region :)).

URI uri = URI.create(objectPath);
bucket = uri.getHost();
// Strip leading /
key = uri.getPath().substring(1);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a bit clearer/safer?: uri.getPath().replaceAll("^/+$", "")

@jb-adams
Copy link
Member

jb-adams commented Feb 3, 2021

thanks @brainstorm !

@andersleung , let's at least allow for multiple AWS regions (which you may have already covered).

Regarding credentials, signed URLs, etc, there are 2 scenarios where we'd want the server to behave slightly differently in each case.

  1. The DRSObject and bytes are not access controlled, but the S3 bucket is private. Any unauthenticated client can access the data in S3, but must go through the DRS server. This case doesn't require user auth, the credentials to access the S3 bucket can be stored on the server itself.

  2. The DRSObject and bytes are access controlled, the S3 bucket is private and only the authorized users should be able to access it. This case requires user auth, so credentials need to be passed to the DRS server. In this case, we will use passports somehow to facilitate the authorization evaluation (and subsequent access to the bytes).

Any access control is out of scope for this PR, as we are just getting up and running with completely public data.

@andersleung
Copy link
Collaborator Author

Just to clear one thing up, the default AWS region is just the region that the server will try first when trying to access an S3 bucket. If the access fails, the server will inspect the error and try again with the right region, which is more or less how amazon's own libraries handle "multi" region support. There isn't really an AWS region associated with the DRS server itself, but if you think being able to configure the default region would be useful, it should be easy to support.

@jb-adams
Copy link
Member

jb-adams commented Feb 4, 2021

@brainstorm this AWS multi-region discussion raises an important issue that we need a more sophisticated means of specifying various types of data sources (e.g. local files vs s3 buckets). Connected S3 bucket data sources require unique properties to be specified (region, bucket name, etc.). We've started #21 to be able to handle this in the configuration and startup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants