Support upload directory in s3 transfer manager #2743

zoewangg · 2021-09-27T23:31:54Z

Description

#37
Support upload directory in s3 transfer manager
Sample code

UploadDirectory uploadDirectory =
            tm.uploadDirectory(UploadDirectoryRequest.builder()
                                                     .sourceDirectory(directory)
                                                     .bucket("bucket")
                                                     .overrideConfiguration(o -> o.recursive(false).followSymbolicLinks(true))
                                                     .build());

CompletedUploadDirectory completedUploadDirectory = uploadDirectory.completionFuture().get(5, TimeUnit.SECONDS);

assertThat(completedUploadDirectory.failedUploads()).isEmpty();
assertThat(completedUploadDirectory.successfulObjects()).hasSize(2).containsOnly(completedUpload, completedUpload2);

Users can configure upload settings on the request UploadDirectoryRequest#overrideConfiguration or on the transfer manager itself. Client-level configuration takes precedence over request-level config

        tm = S3TransferManager.builder()
                              .s3ClientConfiguration(b -> b.uploadDirectoryConfiguration(u -> u.followSymbolicLinks(true))
                              .build();

Testing

Added functional tests against different file systems as well as integ tests.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)

Checklist

License

I confirm that this pull request can be released under the Apache 2 license

zoewangg · 2021-09-27T23:52:13Z

...anager/src/main/java/software/amazon/awssdk/transfer/s3/internal/UploadDirectoryManager.java

+        PutObjectRequest putObjectRequest = PutObjectRequest.builder()
+                                                            .bucket(uploadDirectoryRequest.bucket())
+                                                            .key(key)
+                                                            .build();


Currently, we are just passing bucket and key and I'm thinking about introducing a transfer extension (similar to the core execution interceptor) for this to allow users to modify other parameters in PutObjectRequest.

They could do this with an ExecutionInterceptor on the underlying S3 client, once we support configuring the underlying S3 client, right?

I suppose a putObjectRequestSupplier(Supplier<PutObjectRequest>) on the UploadDirectoryRequest would be more friendly, especially if they want request-level parameter configuration. +1

They could do this with an ExecutionInterceptor on the underlying S3 client, once we support configuring the underlying S3 client, right?

Yes, that's right.

I suppose a putObjectRequestSupplier(Supplier) on the UploadDirectoryRequest would be more friendly, especially if they want request-level parameter configuration. +1

+1. It would be much easier than what I proposed. I suppose we can use Consumer<PutObjectRequest.Builder> to make it clear that this is a modification.

I agree that offering a way to hook into the resulting PutObjectRequest is crucial.

There are a few other use cases I can think of:

I want to be able to supply my own Path-filters. For example, I may want to ensure I exclude sensitive file extensions. E.g., a Predicate<Path>.

I want to be able to customize a key name. For example, I may want to force everything to lowercase. E.g., a configurable Function<Path,String>, or a UnaryOperator<String>.

I want to be able to transform file contents before they are uploaded. For example, I may want to compress everything before uploading.

Not all of these are needed now, but it would be good to make sure we have some type of idea of the path forward here.

Nice summary! I'll create a backlog item for this specific feature because it seems out of scope of this PR, and is additive.

...nsfer-manager/src/main/java/software/amazon/awssdk/transfer/s3/CompletedUploadDirectory.java

...om/s3-transfer-manager/src/main/java/software/amazon/awssdk/transfer/s3/UploadDirectory.java

.../s3-transfer-manager/src/main/java/software/amazon/awssdk/transfer/s3/S3TransferManager.java

millems · 2021-09-28T17:46:38Z

...r-manager/src/main/java/software/amazon/awssdk/transfer/s3/UploadDirectoryConfiguration.java

+ */
+@SdkPublicApi
+@SdkPreviewApi
+public final class UploadDirectoryConfiguration implements ToCopyableBuilder<UploadDirectoryConfiguration.Builder,


How are transfer-manager-level retries handled?

We don't have any transfer-manager-level retries for now. We can definitely add it in the future, something like automatically retrying failed files.

...anager/src/main/java/software/amazon/awssdk/transfer/s3/internal/UploadDirectoryManager.java

millems · 2021-09-28T18:31:11Z

...anager/src/main/java/software/amazon/awssdk/transfer/s3/internal/UploadDirectoryManager.java

+        PutObjectRequest putObjectRequest = PutObjectRequest.builder()
+                                                            .bucket(uploadDirectoryRequest.bucket())
+                                                            .key(key)
+                                                            .build();


They could do this with an ExecutionInterceptor on the underlying S3 client, once we support configuring the underlying S3 client, right?

I suppose a putObjectRequestSupplier(Supplier<PutObjectRequest>) on the UploadDirectoryRequest would be more friendly, especially if they want request-level parameter configuration. +1

...java/software/amazon/awssdk/transfer/s3/S3TransferManagerUploadDirectoryIntegrationTest.java

...nsfer-manager/src/main/java/software/amazon/awssdk/transfer/s3/CompletedUploadDirectory.java

Bennett-Lynch · 2021-09-28T20:11:57Z

...transfer-manager/src/main/java/software/amazon/awssdk/transfer/s3/S3ClientConfiguration.java

@@ -42,6 +42,7 @@
    private final Double targetThroughputInGbps;
    private final Integer maxConcurrency;
    private final ClientAsyncConfiguration asyncConfiguration;
+    private final UploadDirectoryConfiguration uploadDirectoryConfiguration;


Since TransferManager is a construct that sits on top of S3Client, I do feel like this configuration belongs better at the TransferManager-level.

Currently we only have UploadDirectoryConfiguration, once we have a similar Download configuration, how would we wish to handle any overlap in both configurations?

Yeah, agreed. We will allow customers to pass s3 client in the future and will likely make s3ClientConfiguration and s3Client mutually exclusive, which would make configuring uploadDirectoryConfiguration awkward if we keep it inside s3ClientConfiguration.

I'll create a TransferManagerConfiguration and move the options there for now. There are other things we need to consider, e.g., do we want to have separate upload directory and download directory configuration? Separating them is more flexible, but will increase the verbosity and affect the usability.

Do you think this is a blocker to this PR or is it fine to address this when we have the API surface area review(I'll still move the UploadDirectoryConfiguration to TransferManagerConfiguration)?

Bennett-Lynch · 2021-09-29T02:19:46Z

...r/src/main/java/software/amazon/awssdk/transfer/s3/internal/TransferConfigurationOption.java

+        new TransferConfigurationOption<>("UploadDirectoryFileVisitOption", Boolean.class);
+
+
+    private static final int DEFAULT_UPLOAD_DIRECTORY_MAX_DEPTH = Integer.MAX_VALUE;


Should we consider a more conservative default to protect against cycles? 2 billion is a pretty large default, but I'm not sure how to settle on the right value. Maybe we could try to estimate the maximum possible depth of a typical file system.

Good point. Added a TODO for this. We will revisit all default settings altogether.

...anager/src/main/java/software/amazon/awssdk/transfer/s3/internal/UploadDirectoryManager.java

Bennett-Lynch · 2021-09-29T03:04:14Z

...anager/src/main/java/software/amazon/awssdk/transfer/s3/internal/UploadDirectoryManager.java

+        PutObjectRequest putObjectRequest = PutObjectRequest.builder()
+                                                            .bucket(uploadDirectoryRequest.bucket())
+                                                            .key(key)
+                                                            .build();


I agree that offering a way to hook into the resulting PutObjectRequest is crucial.

There are a few other use cases I can think of:

I want to be able to supply my own Path-filters. For example, I may want to ensure I exclude sensitive file extensions. E.g., a Predicate<Path>.

I want to be able to customize a key name. For example, I may want to force everything to lowercase. E.g., a configurable Function<Path,String>, or a UnaryOperator<String>.

I want to be able to transform file contents before they are uploaded. For example, I may want to compress everything before uploading.

Not all of these are needed now, but it would be good to make sure we have some type of idea of the path forward here.

.../s3-transfer-manager/src/main/java/software/amazon/awssdk/transfer/s3/CompletedDownload.java

...nsfer-manager/src/main/java/software/amazon/awssdk/transfer/s3/CompletedUploadDirectory.java

.../s3-transfer-manager/src/main/java/software/amazon/awssdk/transfer/s3/S3TransferManager.java

...src/main/java/software/amazon/awssdk/transfer/s3/S3TransferManagerOverrideConfiguration.java

millems · 2021-10-06T18:45:34Z

...manager/src/main/java/software/amazon/awssdk/transfer/s3/internal/UploadDirectoryHelper.java

+        CompletableFuture.runAsync(() -> doUploadDirectory(returnFuture, uploadDirectoryRequest),
+                                   transferConfiguration.option(TransferConfigurationOption.EXECUTOR));


CompletableFuture.runAsync and ignoring the result scares me. If doUploadDirectory fails without completing the returnFuture, we'll never complete the returnFuture.

Should we just do the delegating to the executor ourselves and add the exception handling, or should we rely on the result of CompletableFuture.runAsync instead of managing our own returnFuture?

Yeah, I can add exception handling as well as try-catch logic in doUploadDirectory. I'm not sure how else we can implement cancellation logic w/o managing our own returnFuture though.

...manager/src/main/java/software/amazon/awssdk/transfer/s3/internal/UploadDirectoryHelper.java

...ager/src/main/java/software/amazon/awssdk/transfer/s3/internal/DefaultS3TransferManager.java

...manager/src/main/java/software/amazon/awssdk/transfer/s3/internal/UploadDirectoryHelper.java

Bennett-Lynch · 2021-10-06T19:34:05Z

...nsfer-manager/src/main/java/software/amazon/awssdk/transfer/s3/CompletedUploadDirectory.java

 */
 @SdkPublicApi
 @SdkPreviewApi
-public interface CompletedUploadDirectory extends CompletedTransfer {
+public final class CompletedUploadDirectory implements CompletedTransfer {


Should this implement CompletedFileTransfer instead?

...nsfer-manager/src/main/java/software/amazon/awssdk/transfer/s3/CompletedUploadDirectory.java

Bennett-Lynch · 2021-10-06T19:45:25Z

...transfer-manager/src/main/java/software/amazon/awssdk/transfer/s3/CompletedFileTransfer.java

@@ -20,19 +20,15 @@
 import software.amazon.awssdk.annotations.SdkPublicApi;

 /**
- *  A failed single file upload transfer.
+ * Represents a completed file transfer.


Can we specify if this is for concrete files only, or also directories? Both are represented via File/Path, so it's not immediately obvious to me.

Bennett-Lynch · 2021-10-06T21:54:16Z

...nsfer-manager/src/main/java/software/amazon/awssdk/transfer/s3/FailedSingleFileTransfer.java

+import software.amazon.awssdk.annotations.SdkPublicApi;
+
+/**
+ * Represents a failed single file transfer in a multi-file transfer operation such as


Could we just have a common interface for any "file based transfer"? That would include:

Requests to upload a single file

Requests to upload a directory of files

The resulting individual single file transfers from \2

They all point to a Path. I'm not sure if we need to distinguish at the interface-level beyond that?

Bennett-Lynch · 2021-10-06T22:48:01Z

...manager/src/main/java/software/amazon/awssdk/transfer/s3/internal/UploadDirectoryHelper.java

-                }).collect(Collectors.toList());
+            entries.forEach(path -> {
+                CompletableFuture<CompletedUpload> future =
+                    uploadSingleFile(uploadDirectoryRequest, failedUploads, path, phaser);


Nice job extracting helper methods like uploadSingleFile (and in other places). It definitely helps readability.

...manager/src/main/java/software/amazon/awssdk/transfer/s3/internal/UploadDirectoryHelper.java

Bennett-Lynch · 2021-10-06T23:06:07Z

...manager/src/main/java/software/amazon/awssdk/transfer/s3/internal/UploadDirectoryHelper.java

-        // Replace "\" (Windows FS) with "/"
-        return relativePathName.replace('\\', '/');
+        String separator = fileSystem.getSeparator();
+        return relativePathName.replace(separator, delimiter);


I believe repeated calls to replace(..) can be very expensive (enough to show up on a profiler in a non-trivial way). Can we save this as a Pattern?

Glad you brought it up! 😄 🤝 (microbenchmark enthusiasts handshaking). In Java 9+, String.replace actually doesn't use Pattern under the hood anymore, and it is much faster compared with String.replaceAll which uses pattern matching. Per this blog post, in Java 11, String.replace performs much better than String.replaceAll. So I think we should optimize for Java 11. Ideally, we should prefer String.replace(char, char), but this does not work for our case.

I think we can add optimization for the default case where the delimiter is not provided here by using String.replace(char, char).

Nice. I was not aware of some of these optimizations. Will help do away with some of the boilerplate needed. But since the SDK depends on Java 8, I think it's reasonable to say we should target performance at the Java 8 level. We can have conditional version checks at runtime if really needed, but we should at least default to supporting Java 8. I also suspect that a large number of users are still running on 8.

...ager/src/main/java/software/amazon/awssdk/transfer/s3/internal/DefaultS3TransferManager.java

Bennett-Lynch · 2021-10-12T22:57:32Z

...manager/src/main/java/software/amazon/awssdk/transfer/s3/internal/UploadDirectoryHelper.java

+            .handle((ignore, t) -> {
+                // should never execute this
+                returnFuture.completeExceptionally(t);
+                return ignore;
+            });


If we treat doUploadDirectory(..) as entirely synchronous, we can eliminate its inner try/catch block, and rely on this logic to propagate exceptions.

Similar suggestion for the non-exceptional return case. E.g., if possible, make doUploadDirectory(..) return a value.

I removed the try-catch block in doUploadDirectory. Not sure about the benefit of making doUploadDirectory returning a value since we are managing our own returnFuture

...manager/src/main/java/software/amazon/awssdk/transfer/s3/internal/UploadDirectoryHelper.java

Bennett-Lynch · 2021-10-12T23:13:52Z

...manager/src/main/java/software/amazon/awssdk/transfer/s3/internal/UploadDirectoryHelper.java

-        // Replace "\" (Windows FS) with "/"
-        return relativePathName.replace('\\', '/');
+        String separator = fileSystem.getSeparator();
+        return relativePathName.replace(separator, delimiter);


Nice. I was not aware of some of these optimizations. Will help do away with some of the boilerplate needed. But since the SDK depends on Java 8, I think it's reasonable to say we should target performance at the Java 8 level. We can have conditional version checks at runtime if really needed, but we should at least default to supporting Java 8. I also suspect that a large number of users are still running on 8.

...manager/src/main/java/software/amazon/awssdk/transfer/s3/internal/UploadDirectoryHelper.java

Bennett-Lynch · 2021-10-12T23:21:01Z

...r/src/main/java/software/amazon/awssdk/transfer/s3/UploadDirectoryOverrideConfiguration.java

@@ -120,6 +120,7 @@ public static Builder builder() {
        return DefaultBuilder.class;
    }

+    // TODO: consider consolidating maxDepth and recursive


If we agree on this change, I think we should considering doing it now (as part of the initial directory release), to minimize breaking changes.

My concern is that we might change it again before GA. I'd prefer to do it once when we reach to the point where APIs are close be finalized.

Understood, but the more I think about it, the more I think we have a pretty strong precedent to say that the boolean isn’t needed. Namely, Files.walk(..) does not accept a boolean either, which we are ultimately delegating to.

https://docs.oracle.com/javase/8/docs/api/java/nio/file/Files.html#walk-java.nio.file.Path-int-java.nio.file.FileVisitOption...-

Yeah, I'm still on the fence between this and getting rid of maxDepth because imo most customers probably only care about recursive or not recursive (this is also how AWS CLI s3 cp is designed I think), but I don't have data. It seems the only use-case for maxDepth is to prevent OOM.

I see. Maybe we could just use recursive(false) as a convenience alias for maxDepth(1)?

I wouldn't be against maxDepth being gone. Feels like it's mostly just for dealing with cycles and symlinks being enabled.

...m/s3-transfer-manager/src/main/java/software/amazon/awssdk/transfer/s3/FailedFileUpload.java

...manager/src/main/java/software/amazon/awssdk/transfer/s3/internal/UploadDirectoryHelper.java

millems · 2021-10-13T23:20:34Z

...r/src/main/java/software/amazon/awssdk/transfer/s3/UploadDirectoryOverrideConfiguration.java

@@ -120,6 +120,7 @@ public static Builder builder() {
        return DefaultBuilder.class;
    }

+    // TODO: consider consolidating maxDepth and recursive


I wouldn't be against maxDepth being gone. Feels like it's mostly just for dealing with cycles and symlinks being enabled.

sonarqubecloud · 2021-10-14T00:48:28Z

SonarCloud Quality Gate failed.

0 Bugs
0 Vulnerabilities
0 Security Hotspots
15 Code Smells

77.5% Coverage
0.0% Duplication

Support upload directory in s3 transfer manager

c47d69e

zoewangg commented Sep 27, 2021

View reviewed changes

millems reviewed Sep 28, 2021

View reviewed changes

aws deleted a comment from millems Sep 28, 2021

Bennett-Lynch reviewed Sep 29, 2021

View reviewed changes

zoewangg added 3 commits October 3, 2021 19:31

address part of the feedback

144b16d

Address part of the feedback

1b4fa19

Address remaining feedback

0228ea5

zoewangg force-pushed the zoewang/tmUploadDirectory branch from 3be1e08 to 0228ea5 Compare October 5, 2021 17:02

millems reviewed Oct 6, 2021

View reviewed changes

Bennett-Lynch reviewed Oct 6, 2021

View reviewed changes

...ager/src/main/java/software/amazon/awssdk/transfer/s3/internal/DefaultS3TransferManager.java Outdated Show resolved Hide resolved

zoewangg added 2 commits October 12, 2021 11:40

Address part of the feedback

afae96b

address feedback

27d2ef6

Bennett-Lynch reviewed Oct 12, 2021

View reviewed changes

Address feedback

61f0ad3

Bennett-Lynch reviewed Oct 13, 2021

View reviewed changes

...m/s3-transfer-manager/src/main/java/software/amazon/awssdk/transfer/s3/FailedFileUpload.java Outdated Show resolved Hide resolved

Bennett-Lynch reviewed Oct 13, 2021

View reviewed changes

...manager/src/main/java/software/amazon/awssdk/transfer/s3/internal/UploadDirectoryHelper.java Show resolved Hide resolved

zoewangg added 2 commits October 13, 2021 10:02

address feedback

693b6b9

Merge branch 'master' into zoewang/tmUploadDirectory

74f4bc7

Bennett-Lynch approved these changes Oct 13, 2021

View reviewed changes

millems approved these changes Oct 13, 2021

View reviewed changes

Merge branch 'master' into zoewang/tmUploadDirectory

0c938c9

zoewangg merged commit 4a6f5c4 into master Oct 14, 2021

zoewangg deleted the zoewang/tmUploadDirectory branch October 14, 2021 16:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support upload directory in s3 transfer manager #2743

Support upload directory in s3 transfer manager #2743

zoewangg commented Sep 27, 2021 •

edited

Loading

zoewangg Sep 27, 2021 •

edited

Loading

millems Sep 28, 2021

zoewangg Sep 28, 2021

Bennett-Lynch Sep 29, 2021

zoewangg Sep 29, 2021

millems Sep 28, 2021

zoewangg Sep 28, 2021

millems Sep 28, 2021

Bennett-Lynch Sep 28, 2021

zoewangg Sep 29, 2021

Bennett-Lynch Sep 29, 2021

zoewangg Oct 4, 2021

Bennett-Lynch Sep 29, 2021

millems Oct 6, 2021

zoewangg Oct 12, 2021

Bennett-Lynch Oct 6, 2021

Bennett-Lynch Oct 6, 2021

Bennett-Lynch Oct 6, 2021

Bennett-Lynch Oct 6, 2021

Bennett-Lynch Oct 6, 2021

zoewangg Oct 7, 2021

Bennett-Lynch Oct 12, 2021

Bennett-Lynch Oct 12, 2021

zoewangg Oct 13, 2021

Bennett-Lynch Oct 12, 2021

Bennett-Lynch Oct 12, 2021

zoewangg Oct 13, 2021

Bennett-Lynch Oct 13, 2021

zoewangg Oct 13, 2021

Bennett-Lynch Oct 13, 2021

millems Oct 13, 2021

millems Oct 13, 2021

sonarqubecloud bot commented Oct 14, 2021

		new TransferConfigurationOption<>("UploadDirectoryFileVisitOption", Boolean.class);


		private static final int DEFAULT_UPLOAD_DIRECTORY_MAX_DEPTH = Integer.MAX_VALUE;

		CompletableFuture.runAsync(() -> doUploadDirectory(returnFuture, uploadDirectoryRequest),
		transferConfiguration.option(TransferConfigurationOption.EXECUTOR));

Support upload directory in s3 transfer manager #2743

Support upload directory in s3 transfer manager #2743

Conversation

zoewangg commented Sep 27, 2021 • edited Loading

Description

Testing

Types of changes

Checklist

License

zoewangg Sep 27, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sonarqubecloud bot commented Oct 14, 2021

zoewangg commented Sep 27, 2021 •

edited

Loading

zoewangg Sep 27, 2021 •

edited

Loading