Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid another extra getObjectMetadata request #983

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,8 @@ public class GetObjectRequest extends AmazonWebServiceRequest implements
*/
private Integer partNumber;

private Long lastModifiedTime;

/**
* Constructs a new {@link GetObjectRequest} with all the required parameters.
*
Expand All @@ -121,7 +123,25 @@ public class GetObjectRequest extends AmazonWebServiceRequest implements
* @see GetObjectRequest#GetObjectRequest(String, String, boolean)
*/
public GetObjectRequest(String bucketName, String key) {
this(bucketName, key, null);
this(bucketName, key, (String) null);
}

/**
* Constructs a new {@link GetObjectRequest} with all the required parameters.
*
* @param bucketName
* The name of the bucket containing the desired object.
* @param key
* The key in the specified bucket under which the object is
* stored.
* @param lastModifiedTime
* Last modified time for the object known by the client
*
* @see GetObjectRequest#GetObjectRequest(String, String, String)
* @see GetObjectRequest#GetObjectRequest(String, String, boolean)
*/
public GetObjectRequest(String bucketName, String key, Long lastModifiedTime) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We model the request and response classes as per service team (Amazon S3 in this case) specifications. The members in GetObjectRequest are based on the S3 Get Object specification. We cannot add it to the request class if we are not sending it over wire.

Copy link
Contributor Author

@NikolayAtSony NikolayAtSony Feb 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, if you can't accept changes in GetObjectRequest, the other approach is to modify doDownload in TransferManager and pass this timestamp as a separate parameter instead of part of GetObjectRequest. Will you accept such approach?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use object metadata in 3 different places during Download operation.

  1. Get last modification time.
  2. Get content length
  3. In DownloadImpl task, return metadata when requested by customer. See the other comment.

It is highly unlikely that customers will provide 1 and 2 values. Even then, we need object metadata to use in (3) for parallel downloads. The only solution I see to avoid this call is providing the object metadata as a parameter to download method. This is a rare case as most customers want to download an object from bucket/key values and won't have object metadata. So if we expect them to provide it, they have to make an additional call to get object metadata before calling download method which is not great user experience.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use case might looks like not being very likely, but it's legit use case and even AWS recommends to use external lookup tables for storing file data attributes outside of S3 (such as key, size, timestamp). Based on this and various other use cases, I can't see a reason why metadata needs to be pulled 100% of the time ahead of object itself. Yes, it might be needed sometimes, but it's not always needed and there is no way to avoid this extra roundtrip now (which btw was avoidable using earlier versions of AWS SDK, so it should be considered as a regression). Refer (for example to 1.9.25) to

private Download doDownload(final GetObjectRequest getObjectRequest,
where it was possible to actually use TransferManager without pulling metadata beforehand.

First, object metadata is available as soon as object download process is started as it's delivered as part of HEAD. For the purpose of pause/resume, there is no reason to pull the metadata beforehand as last modification time could be obtained from the object itself when pause action will be triggered because it's part of S3Object which is about to be put on pause (which might not even happened). If download not even started, it's not needed, right?

Content length could be already known to the client as part of listAvailableObjects() or by other measures and should be used instead of another round trip to the S3 (if available).

As for the point 3, it wasn't the case before (

), so I fail to see why you think it's imperative that metadata is available in advance of S3Object download start (it was never promised). The whole point of introducing this separate instance of object metadata into DownloadImpl is to achieve the retrieval of last modified time for persisting the object (https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/transfer/internal/DownloadImpl.java#L193). Does it really necessary? If download wasn't started, persist it as 0, if started - pull from the S3Object metadata.

Yes, probably the current solution offered by me is a bit straightforward and nicer one is required, but let's decide how to fix it. I'm open to suggestions. It doesn't make sense to sequentially pull the metadata for each object just to know information we already know...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your concerns. I agree that making HEAD request to get metadata for each call might be excessive. I am worried that if we remove object metadata from DownloadImpl, it might be a breaking change. I have to think some more on how to fix it in a clean way. I don't have bandwidth this week to work on this. I will look at this issue next week and get back to you. Thanks.

this(bucketName, key, null, lastModifiedTime);
}

/**
Expand All @@ -140,9 +160,31 @@ public GetObjectRequest(String bucketName, String key) {
* @see GetObjectRequest#GetObjectRequest(String, String, boolean)
*/
public GetObjectRequest(String bucketName, String key, String versionId) {
this(bucketName, key, versionId, null);
}

/**
* Constructs a new {@link GetObjectRequest} with all the required parameters.
*
* @param bucketName
* The name of the bucket containing the desired object.
* @param key
* The key in the specified bucket under which the object is
* stored.
* @param versionId
* The Amazon S3 version ID specifying a specific version of the
* object to download.
* @param lastModifiedTime
* Last modified time for the object known by the client
*
* @see GetObjectRequest#GetObjectRequest(String, String)
* @see GetObjectRequest#GetObjectRequest(String, String, boolean)
*/
public GetObjectRequest(String bucketName, String key, String versionId, Long lastModifiedTime) {
setBucketName(bucketName);
setKey(key);
setVersionId(versionId);
setLastModifiedTime(lastModifiedTime);
}

public GetObjectRequest(S3ObjectId s3ObjectId) {
Expand Down Expand Up @@ -1003,9 +1045,48 @@ public void setS3ObjectId(S3ObjectId s3ObjectId) {

/**
* Fluent API to set the S3 object id for this request.
*
* @return This {@link GetObjectRequest}, enabling additional method
* calls to be chained together.
*/
public GetObjectRequest withS3ObjectId(S3ObjectId s3ObjectId) {
setS3ObjectId(s3ObjectId);
return this;
}

/**
* Set last modified time known by the client. Useful to avoid roundtrip
* to S3 to fetch the metadata but not required
*
* @param lastModifiedTime
* Last modified time (in milliseconds) for this object
*/
public void setLastModifiedTime(Long lastModifiedTime) {
this.lastModifiedTime = lastModifiedTime;
}

/**
* Set last modified time known by the client. Useful to avoid roundtrip
* to S3 to fetch the metadata but not required
*
* @param lastModifiedTime
* Last modified time (in milliseconds) for this object
*
* @return This {@link GetObjectRequest}, enabling additional method
* calls to be chained together.
*/
public GetObjectRequest withLastModifiedTime(Long lastModifiedTime) {
setLastModifiedTime(lastModifiedTime);
return this;
}

/**
* Get last modified time for this object (in milliseconds). Only if
* set previously
*/
public Long getLastModifiedTime() {
return lastModifiedTime;
}


}
Original file line number Diff line number Diff line change
Expand Up @@ -1013,15 +1013,21 @@ private Download doDownload(final GetObjectRequest getObjectRequest,
getObjectRequest
.setGeneralProgressListener(new ProgressListenerChain(new TransferCompletionFilter(), listenerChain));

// Defer actual request till later stage if it's required
GetObjectMetadataRequest getObjectMetadataRequest = new GetObjectMetadataRequest(
getObjectRequest.getBucketName(), getObjectRequest.getKey(), getObjectRequest.getVersionId());
if (getObjectRequest.getSSECustomerKey() != null) {
getObjectMetadataRequest.setSSECustomerKey(getObjectRequest.getSSECustomerKey());
}
final ObjectMetadata objectMetadata = s3.getObjectMetadata(getObjectMetadataRequest);

ObjectMetadata objectMetadata = null;

// Used to check if the object is modified between pause and resume
long lastModifiedTime = objectMetadata.getLastModified().getTime();
if (getObjectRequest.getLastModifiedTime() == null) {
objectMetadata = s3.getObjectMetadata(getObjectMetadataRequest);
long lastModifiedTime = lastModifiedTime = objectMetadata.getLastModified().getTime();
getObjectRequest.setLastModifiedTime(lastModifiedTime);
}

long startingByte = 0;
long lastByte;
Expand All @@ -1031,6 +1037,9 @@ private Download doDownload(final GetObjectRequest getObjectRequest,
startingByte = range[0];
lastByte = range[1];
} else {
if (objectMetadata == null) {
objectMetadata = s3.getObjectMetadata(getObjectMetadataRequest);
}
lastByte = objectMetadata.getContentLength() - 1;
}

Expand All @@ -1040,7 +1049,7 @@ private Download doDownload(final GetObjectRequest getObjectRequest,

// We still pass the unfiltered listener chain into DownloadImpl
final DownloadImpl download = new DownloadImpl(description, transferProgress, listenerChain, null,
stateListener, getObjectRequest, file, objectMetadata, isDownloadParallel);
stateListener, getObjectRequest, file, isDownloadParallel);

long totalBytesToDownload = lastByte - startingByte + 1;
transferProgress.setTotalBytesToTransfer(totalBytesToDownload);
Expand All @@ -1062,7 +1071,7 @@ private Download doDownload(final GetObjectRequest getObjectRequest,
long fileLength = -1;

if (resumeExistingDownload) {
if (isS3ObjectModifiedSincePause(lastModifiedTime, lastModifiedTimeRecordedDuringPause)) {
if (isS3ObjectModifiedSincePause(getObjectRequest.getLastModifiedTime(), lastModifiedTimeRecordedDuringPause)) {
throw new AmazonClientException("The requested object in bucket " + getObjectRequest.getBucketName()
+ " with key " + getObjectRequest.getKey() + " is modified on Amazon S3 since the last pause.");
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -43,24 +43,21 @@ public class DownloadImpl extends AbstractTransfer implements Download {

private final GetObjectRequest getObjectRequest;
private final File file;
private final ObjectMetadata objectMetadata;
private final ProgressListenerChain progressListenerChain;

@Deprecated
public DownloadImpl(String description, TransferProgress transferProgress,
ProgressListenerChain progressListenerChain, S3Object s3Object, TransferStateChangeListener listener,
GetObjectRequest getObjectRequest, File file) {
this(description, transferProgress, progressListenerChain, s3Object, listener,
getObjectRequest, file, null, false);
getObjectRequest, file, false);
}

public DownloadImpl(String description, TransferProgress transferProgress,
ProgressListenerChain progressListenerChain, S3Object s3Object, TransferStateChangeListener listener,
GetObjectRequest getObjectRequest, File file,
ObjectMetadata objectMetadata, boolean isDownloadParallel) {
GetObjectRequest getObjectRequest, File file, boolean isDownloadParallel) {
super(description, transferProgress, progressListenerChain, listener);
this.s3Object = s3Object;
this.objectMetadata = objectMetadata;
this.getObjectRequest = getObjectRequest;
this.file = file;
this.progressListenerChain = progressListenerChain;
Expand All @@ -77,7 +74,7 @@ public synchronized ObjectMetadata getObjectMetadata() {
if (s3Object != null) {
return s3Object.getObjectMetadata();
}
return objectMetadata;
return null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be a breaking change as customers who are getting object metadata before will get null now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, they wont. Null will only be returned to the customers who will be using the new API where object metadata is not retrieved before object is downloaded. After that even while using new API, object metadata will be populated properly. It's definitely not a breaking change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the s3Object is only populated for serial downloads. For parallel downloads, the s3Object will always be null. That is the reason we store object metadata and return it for parallel downloads.

}

/**
Expand Down Expand Up @@ -190,7 +187,7 @@ private PersistableDownload captureDownloadState(
getObjectRequest.getVersionId(), getObjectRequest.getRange(),
getObjectRequest.getResponseHeaders(), getObjectRequest.isRequesterPays(),
file.getAbsolutePath(), getLastFullyDownloadedPartNumber(),
getObjectMetadata().getLastModified().getTime());
getObjectRequest.getLastModifiedTime());
}
return null;
}
Expand Down