-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid another extra getObjectMetadata request #983
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -43,24 +43,21 @@ public class DownloadImpl extends AbstractTransfer implements Download { | |
|
||
private final GetObjectRequest getObjectRequest; | ||
private final File file; | ||
private final ObjectMetadata objectMetadata; | ||
private final ProgressListenerChain progressListenerChain; | ||
|
||
@Deprecated | ||
public DownloadImpl(String description, TransferProgress transferProgress, | ||
ProgressListenerChain progressListenerChain, S3Object s3Object, TransferStateChangeListener listener, | ||
GetObjectRequest getObjectRequest, File file) { | ||
this(description, transferProgress, progressListenerChain, s3Object, listener, | ||
getObjectRequest, file, null, false); | ||
getObjectRequest, file, false); | ||
} | ||
|
||
public DownloadImpl(String description, TransferProgress transferProgress, | ||
ProgressListenerChain progressListenerChain, S3Object s3Object, TransferStateChangeListener listener, | ||
GetObjectRequest getObjectRequest, File file, | ||
ObjectMetadata objectMetadata, boolean isDownloadParallel) { | ||
GetObjectRequest getObjectRequest, File file, boolean isDownloadParallel) { | ||
super(description, transferProgress, progressListenerChain, listener); | ||
this.s3Object = s3Object; | ||
this.objectMetadata = objectMetadata; | ||
this.getObjectRequest = getObjectRequest; | ||
this.file = file; | ||
this.progressListenerChain = progressListenerChain; | ||
|
@@ -77,7 +74,7 @@ public synchronized ObjectMetadata getObjectMetadata() { | |
if (s3Object != null) { | ||
return s3Object.getObjectMetadata(); | ||
} | ||
return objectMetadata; | ||
return null; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This will be a breaking change as customers who are getting object metadata before will get null now. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, they wont. Null will only be returned to the customers who will be using the new API where object metadata is not retrieved before object is downloaded. After that even while using new API, object metadata will be populated properly. It's definitely not a breaking change. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, the s3Object is only populated for serial downloads. For parallel downloads, the s3Object will always be null. That is the reason we store object metadata and return it for parallel downloads. |
||
} | ||
|
||
/** | ||
|
@@ -190,7 +187,7 @@ private PersistableDownload captureDownloadState( | |
getObjectRequest.getVersionId(), getObjectRequest.getRange(), | ||
getObjectRequest.getResponseHeaders(), getObjectRequest.isRequesterPays(), | ||
file.getAbsolutePath(), getLastFullyDownloadedPartNumber(), | ||
getObjectMetadata().getLastModified().getTime()); | ||
getObjectRequest.getLastModifiedTime()); | ||
} | ||
return null; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We model the request and response classes as per service team (Amazon S3 in this case) specifications. The members in GetObjectRequest are based on the S3 Get Object specification. We cannot add it to the request class if we are not sending it over wire.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, if you can't accept changes in GetObjectRequest, the other approach is to modify doDownload in TransferManager and pass this timestamp as a separate parameter instead of part of GetObjectRequest. Will you accept such approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use object metadata in 3 different places during Download operation.
It is highly unlikely that customers will provide 1 and 2 values. Even then, we need object metadata to use in (3) for parallel downloads. The only solution I see to avoid this call is providing the object metadata as a parameter to download method. This is a rare case as most customers want to download an object from bucket/key values and won't have object metadata. So if we expect them to provide it, they have to make an additional call to get object metadata before calling download method which is not great user experience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use case might looks like not being very likely, but it's legit use case and even AWS recommends to use external lookup tables for storing file data attributes outside of S3 (such as key, size, timestamp). Based on this and various other use cases, I can't see a reason why metadata needs to be pulled 100% of the time ahead of object itself. Yes, it might be needed sometimes, but it's not always needed and there is no way to avoid this extra roundtrip now (which btw was avoidable using earlier versions of AWS SDK, so it should be considered as a regression). Refer (for example to 1.9.25) to
aws-sdk-java/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/transfer/TransferManager.java
Line 747 in 5907106
First, object metadata is available as soon as object download process is started as it's delivered as part of HEAD. For the purpose of pause/resume, there is no reason to pull the metadata beforehand as last modification time could be obtained from the object itself when pause action will be triggered because it's part of S3Object which is about to be put on pause (which might not even happened). If download not even started, it's not needed, right?
Content length could be already known to the client as part of listAvailableObjects() or by other measures and should be used instead of another round trip to the S3 (if available).
As for the point 3, it wasn't the case before (
aws-sdk-java/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/transfer/internal/DownloadImpl.java
Line 54 in 5907106
Yes, probably the current solution offered by me is a bit straightforward and nicer one is required, but let's decide how to fix it. I'm open to suggestions. It doesn't make sense to sequentially pull the metadata for each object just to know information we already know...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand your concerns. I agree that making HEAD request to get metadata for each call might be excessive. I am worried that if we remove object metadata from DownloadImpl, it might be a breaking change. I have to think some more on how to fix it in a clean way. I don't have bandwidth this week to work on this. I will look at this issue next week and get back to you. Thanks.