-
Notifications
You must be signed in to change notification settings - Fork 860
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support upload directory in s3 transfer manager #2743
Conversation
PutObjectRequest putObjectRequest = PutObjectRequest.builder() | ||
.bucket(uploadDirectoryRequest.bucket()) | ||
.key(key) | ||
.build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, we are just passing bucket and key and I'm thinking about introducing a transfer extension (similar to the core execution interceptor) for this to allow users to modify other parameters in PutObjectRequest
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They could do this with an ExecutionInterceptor on the underlying S3 client, once we support configuring the underlying S3 client, right?
I suppose a putObjectRequestSupplier(Supplier<PutObjectRequest>)
on the UploadDirectoryRequest would be more friendly, especially if they want request-level parameter configuration. +1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They could do this with an ExecutionInterceptor on the underlying S3 client, once we support configuring the underlying S3 client, right?
Yes, that's right.
I suppose a putObjectRequestSupplier(Supplier) on the UploadDirectoryRequest would be more friendly, especially if they want request-level parameter configuration. +1
+1. It would be much easier than what I proposed. I suppose we can use Consumer<PutObjectRequest.Builder>
to make it clear that this is a modification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that offering a way to hook into the resulting PutObjectRequest
is crucial.
There are a few other use cases I can think of:
- I want to be able to supply my own
Path
-filters. For example, I may want to ensure I exclude sensitive file extensions. E.g., aPredicate<Path>
. - I want to be able to customize a
key
name. For example, I may want to force everything to lowercase. E.g., a configurableFunction<Path,String>
, or aUnaryOperator<String>
. - I want to be able to transform file contents before they are uploaded. For example, I may want to compress everything before uploading.
Not all of these are needed now, but it would be good to make sure we have some type of idea of the path forward here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice summary! I'll create a backlog item for this specific feature because it seems out of scope of this PR, and is additive.
...nsfer-manager/src/main/java/software/amazon/awssdk/transfer/s3/CompletedUploadDirectory.java
Outdated
Show resolved
Hide resolved
...nsfer-manager/src/main/java/software/amazon/awssdk/transfer/s3/CompletedUploadDirectory.java
Outdated
Show resolved
Hide resolved
...om/s3-transfer-manager/src/main/java/software/amazon/awssdk/transfer/s3/UploadDirectory.java
Outdated
Show resolved
Hide resolved
.../s3-transfer-manager/src/main/java/software/amazon/awssdk/transfer/s3/S3TransferManager.java
Outdated
Show resolved
Hide resolved
*/ | ||
@SdkPublicApi | ||
@SdkPreviewApi | ||
public final class UploadDirectoryConfiguration implements ToCopyableBuilder<UploadDirectoryConfiguration.Builder, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are transfer-manager-level retries handled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have any transfer-manager-level retries for now. We can definitely add it in the future, something like automatically retrying failed files.
...anager/src/main/java/software/amazon/awssdk/transfer/s3/internal/UploadDirectoryManager.java
Outdated
Show resolved
Hide resolved
...anager/src/main/java/software/amazon/awssdk/transfer/s3/internal/UploadDirectoryManager.java
Outdated
Show resolved
Hide resolved
...anager/src/main/java/software/amazon/awssdk/transfer/s3/internal/UploadDirectoryManager.java
Outdated
Show resolved
Hide resolved
...anager/src/main/java/software/amazon/awssdk/transfer/s3/internal/UploadDirectoryManager.java
Outdated
Show resolved
Hide resolved
PutObjectRequest putObjectRequest = PutObjectRequest.builder() | ||
.bucket(uploadDirectoryRequest.bucket()) | ||
.key(key) | ||
.build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They could do this with an ExecutionInterceptor on the underlying S3 client, once we support configuring the underlying S3 client, right?
I suppose a putObjectRequestSupplier(Supplier<PutObjectRequest>)
on the UploadDirectoryRequest would be more friendly, especially if they want request-level parameter configuration. +1
...java/software/amazon/awssdk/transfer/s3/S3TransferManagerUploadDirectoryIntegrationTest.java
Show resolved
Hide resolved
...java/software/amazon/awssdk/transfer/s3/S3TransferManagerUploadDirectoryIntegrationTest.java
Outdated
Show resolved
Hide resolved
...nsfer-manager/src/main/java/software/amazon/awssdk/transfer/s3/CompletedUploadDirectory.java
Outdated
Show resolved
Hide resolved
...nsfer-manager/src/main/java/software/amazon/awssdk/transfer/s3/CompletedUploadDirectory.java
Outdated
Show resolved
Hide resolved
@@ -42,6 +42,7 @@ | |||
private final Double targetThroughputInGbps; | |||
private final Integer maxConcurrency; | |||
private final ClientAsyncConfiguration asyncConfiguration; | |||
private final UploadDirectoryConfiguration uploadDirectoryConfiguration; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since TransferManager
is a construct that sits on top of S3Client
, I do feel like this configuration belongs better at the TransferManager
-level.
Currently we only have UploadDirectoryConfiguration
, once we have a similar Download
configuration, how would we wish to handle any overlap in both configurations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, agreed. We will allow customers to pass s3 client in the future and will likely make s3ClientConfiguration and s3Client mutually exclusive, which would make configuring uploadDirectoryConfiguration
awkward if we keep it inside s3ClientConfiguration
.
I'll create a TransferManagerConfiguration
and move the options there for now. There are other things we need to consider, e.g., do we want to have separate upload directory and download directory configuration? Separating them is more flexible, but will increase the verbosity and affect the usability.
Do you think this is a blocker to this PR or is it fine to address this when we have the API surface area review(I'll still move the UploadDirectoryConfiguration
to TransferManagerConfiguration
)?
new TransferConfigurationOption<>("UploadDirectoryFileVisitOption", Boolean.class); | ||
|
||
|
||
private static final int DEFAULT_UPLOAD_DIRECTORY_MAX_DEPTH = Integer.MAX_VALUE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we consider a more conservative default to protect against cycles? 2 billion is a pretty large default, but I'm not sure how to settle on the right value. Maybe we could try to estimate the maximum possible depth of a typical file system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Added a TODO for this. We will revisit all default settings altogether.
...anager/src/main/java/software/amazon/awssdk/transfer/s3/internal/UploadDirectoryManager.java
Outdated
Show resolved
Hide resolved
...anager/src/main/java/software/amazon/awssdk/transfer/s3/internal/UploadDirectoryManager.java
Outdated
Show resolved
Hide resolved
...anager/src/main/java/software/amazon/awssdk/transfer/s3/internal/UploadDirectoryManager.java
Outdated
Show resolved
Hide resolved
PutObjectRequest putObjectRequest = PutObjectRequest.builder() | ||
.bucket(uploadDirectoryRequest.bucket()) | ||
.key(key) | ||
.build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that offering a way to hook into the resulting PutObjectRequest
is crucial.
There are a few other use cases I can think of:
- I want to be able to supply my own
Path
-filters. For example, I may want to ensure I exclude sensitive file extensions. E.g., aPredicate<Path>
. - I want to be able to customize a
key
name. For example, I may want to force everything to lowercase. E.g., a configurableFunction<Path,String>
, or aUnaryOperator<String>
. - I want to be able to transform file contents before they are uploaded. For example, I may want to compress everything before uploading.
Not all of these are needed now, but it would be good to make sure we have some type of idea of the path forward here.
3be1e08
to
0228ea5
Compare
.../s3-transfer-manager/src/main/java/software/amazon/awssdk/transfer/s3/CompletedDownload.java
Show resolved
Hide resolved
...nsfer-manager/src/main/java/software/amazon/awssdk/transfer/s3/CompletedUploadDirectory.java
Show resolved
Hide resolved
.../s3-transfer-manager/src/main/java/software/amazon/awssdk/transfer/s3/S3TransferManager.java
Show resolved
Hide resolved
...src/main/java/software/amazon/awssdk/transfer/s3/S3TransferManagerOverrideConfiguration.java
Outdated
Show resolved
Hide resolved
...src/main/java/software/amazon/awssdk/transfer/s3/S3TransferManagerOverrideConfiguration.java
Show resolved
Hide resolved
CompletableFuture.runAsync(() -> doUploadDirectory(returnFuture, uploadDirectoryRequest), | ||
transferConfiguration.option(TransferConfigurationOption.EXECUTOR)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CompletableFuture.runAsync and ignoring the result scares me. If doUploadDirectory fails without completing the returnFuture, we'll never complete the returnFuture.
Should we just do the delegating to the executor ourselves and add the exception handling, or should we rely on the result of CompletableFuture.runAsync instead of managing our own returnFuture?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I can add exception handling as well as try-catch logic in doUploadDirectory
. I'm not sure how else we can implement cancellation logic w/o managing our own returnFuture though.
...manager/src/main/java/software/amazon/awssdk/transfer/s3/internal/UploadDirectoryHelper.java
Outdated
Show resolved
Hide resolved
...manager/src/main/java/software/amazon/awssdk/transfer/s3/internal/UploadDirectoryHelper.java
Outdated
Show resolved
Hide resolved
...ager/src/main/java/software/amazon/awssdk/transfer/s3/internal/DefaultS3TransferManager.java
Outdated
Show resolved
Hide resolved
...manager/src/main/java/software/amazon/awssdk/transfer/s3/internal/UploadDirectoryHelper.java
Outdated
Show resolved
Hide resolved
*/ | ||
@SdkPublicApi | ||
@SdkPreviewApi | ||
public interface CompletedUploadDirectory extends CompletedTransfer { | ||
public final class CompletedUploadDirectory implements CompletedTransfer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this implement CompletedFileTransfer
instead?
...nsfer-manager/src/main/java/software/amazon/awssdk/transfer/s3/CompletedUploadDirectory.java
Outdated
Show resolved
Hide resolved
...nsfer-manager/src/main/java/software/amazon/awssdk/transfer/s3/CompletedUploadDirectory.java
Outdated
Show resolved
Hide resolved
@@ -20,19 +20,15 @@ | |||
import software.amazon.awssdk.annotations.SdkPublicApi; | |||
|
|||
/** | |||
* A failed single file upload transfer. | |||
* Represents a completed file transfer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we specify if this is for concrete files only, or also directories? Both are represented via File
/Path
, so it's not immediately obvious to me.
import software.amazon.awssdk.annotations.SdkPublicApi; | ||
|
||
/** | ||
* Represents a failed single file transfer in a multi-file transfer operation such as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we just have a common interface for any "file based transfer"? That would include:
- Requests to upload a single file
- Requests to upload a directory of files
- The resulting individual single file transfers from \2
They all point to a Path
. I'm not sure if we need to distinguish at the interface-level beyond that?
}).collect(Collectors.toList()); | ||
entries.forEach(path -> { | ||
CompletableFuture<CompletedUpload> future = | ||
uploadSingleFile(uploadDirectoryRequest, failedUploads, path, phaser); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job extracting helper methods like uploadSingleFile
(and in other places). It definitely helps readability.
...manager/src/main/java/software/amazon/awssdk/transfer/s3/internal/UploadDirectoryHelper.java
Outdated
Show resolved
Hide resolved
...manager/src/main/java/software/amazon/awssdk/transfer/s3/internal/UploadDirectoryHelper.java
Outdated
Show resolved
Hide resolved
...manager/src/main/java/software/amazon/awssdk/transfer/s3/internal/UploadDirectoryHelper.java
Show resolved
Hide resolved
// Replace "\" (Windows FS) with "/" | ||
return relativePathName.replace('\\', '/'); | ||
String separator = fileSystem.getSeparator(); | ||
return relativePathName.replace(separator, delimiter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe repeated calls to replace(..)
can be very expensive (enough to show up on a profiler in a non-trivial way). Can we save this as a Pattern
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad you brought it up! 😄 🤝 (microbenchmark enthusiasts handshaking). In Java 9+, String.replace
actually doesn't use Pattern
under the hood anymore, and it is much faster compared with String.replaceAll
which uses pattern matching. Per this blog post, in Java 11, String.replace
performs much better than String.replaceAll
. So I think we should optimize for Java 11. Ideally, we should prefer String.replace(char, char)
, but this does not work for our case.
I think we can add optimization for the default case where the delimiter is not provided here by using String.replace(char, char)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. I was not aware of some of these optimizations. Will help do away with some of the boilerplate needed. But since the SDK depends on Java 8, I think it's reasonable to say we should target performance at the Java 8 level. We can have conditional version checks at runtime if really needed, but we should at least default to supporting Java 8. I also suspect that a large number of users are still running on 8.
...ager/src/main/java/software/amazon/awssdk/transfer/s3/internal/DefaultS3TransferManager.java
Outdated
Show resolved
Hide resolved
.handle((ignore, t) -> { | ||
// should never execute this | ||
returnFuture.completeExceptionally(t); | ||
return ignore; | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we treat doUploadDirectory(..)
as entirely synchronous, we can eliminate its inner try/catch block, and rely on this logic to propagate exceptions.
Similar suggestion for the non-exceptional return case. E.g., if possible, make doUploadDirectory(..)
return a value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the try-catch block in doUploadDirectory. Not sure about the benefit of making doUploadDirectory
returning a value since we are managing our own returnFuture
...manager/src/main/java/software/amazon/awssdk/transfer/s3/internal/UploadDirectoryHelper.java
Outdated
Show resolved
Hide resolved
...manager/src/main/java/software/amazon/awssdk/transfer/s3/internal/UploadDirectoryHelper.java
Outdated
Show resolved
Hide resolved
...manager/src/main/java/software/amazon/awssdk/transfer/s3/internal/UploadDirectoryHelper.java
Outdated
Show resolved
Hide resolved
// Replace "\" (Windows FS) with "/" | ||
return relativePathName.replace('\\', '/'); | ||
String separator = fileSystem.getSeparator(); | ||
return relativePathName.replace(separator, delimiter); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. I was not aware of some of these optimizations. Will help do away with some of the boilerplate needed. But since the SDK depends on Java 8, I think it's reasonable to say we should target performance at the Java 8 level. We can have conditional version checks at runtime if really needed, but we should at least default to supporting Java 8. I also suspect that a large number of users are still running on 8.
...manager/src/main/java/software/amazon/awssdk/transfer/s3/internal/UploadDirectoryHelper.java
Outdated
Show resolved
Hide resolved
@@ -120,6 +120,7 @@ public static Builder builder() { | |||
return DefaultBuilder.class; | |||
} | |||
|
|||
// TODO: consider consolidating maxDepth and recursive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we agree on this change, I think we should considering doing it now (as part of the initial directory release), to minimize breaking changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern is that we might change it again before GA. I'd prefer to do it once when we reach to the point where APIs are close be finalized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood, but the more I think about it, the more I think we have a pretty strong precedent to say that the boolean isn’t needed. Namely, Files.walk(..)
does not accept a boolean either, which we are ultimately delegating to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm still on the fence between this and getting rid of maxDepth
because imo most customers probably only care about recursive or not recursive (this is also how AWS CLI s3 cp is designed I think), but I don't have data. It seems the only use-case for maxDepth is to prevent OOM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Maybe we could just use recursive(false)
as a convenience alias for maxDepth(1)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't be against maxDepth being gone. Feels like it's mostly just for dealing with cycles and symlinks being enabled.
...m/s3-transfer-manager/src/main/java/software/amazon/awssdk/transfer/s3/FailedFileUpload.java
Outdated
Show resolved
Hide resolved
...manager/src/main/java/software/amazon/awssdk/transfer/s3/internal/UploadDirectoryHelper.java
Show resolved
Hide resolved
@@ -120,6 +120,7 @@ public static Builder builder() { | |||
return DefaultBuilder.class; | |||
} | |||
|
|||
// TODO: consider consolidating maxDepth and recursive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't be against maxDepth being gone. Feels like it's mostly just for dealing with cycles and symlinks being enabled.
SonarCloud Quality Gate failed. |
Description
#37
Support upload directory in s3 transfer manager
Sample code
Users can configure upload settings on the request
UploadDirectoryRequest#overrideConfiguration
or on the transfer manager itself. Client-level configuration takes precedence over request-level configTesting
Added functional tests against different file systems as well as integ tests.
Types of changes
Checklist
mvn install
succeedsLicense