-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preserve POSIX metadata #24
Conversation
Hello @alokito, thank you so much for your contribution! This looks great. Let me test it out and I'll merge it if all goes well. |
Thanks @yanko7 ! Two things
In light of #2 I'll mark this as "draft" for now. |
mem_concat.go
Outdated
@@ -25,7 +26,7 @@ func buildInMemoryConcat(ctx context.Context, client *s3.Client, objectList []*S | |||
} | |||
|
|||
if estimatedSize < fileSizeMin { | |||
data, err := tarGroup(ctx, client, objectList) | |||
data, err := tarGroup(ctx, client, objectList, opts.PreservePOSIXMetadata) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be beneficial to pass the whole opts *S3TarS3Options object to tarGroup in case we need to pass more options in the future, that way the function type doesn't keep changing/growing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you wish.
@@ -252,6 +288,18 @@ func concatObjAndHeader(ctx context.Context, svc *s3.Client, objectList []*S3Obj | |||
return results, nil | |||
} | |||
|
|||
func fetchS3ObjectHead(ctx context.Context, svc *s3.Client, nextObject *S3Obj) *s3.HeadObjectOutput { | |||
Debugf(ctx, "fetching head for %s/%s", *&nextObject.Bucket, *nextObject.Key) | |||
head, err := svc.HeadObject(ctx, &s3.HeadObjectInput{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So one thing to consider as well is that every call to HeadObject counts as a GET request. This will essentially double the pricing to build the object. For example if you're trying to create an in-memory tarball with 1,000 Amazon S3 objects, you will need 1,000 GET requests to pull the data, then 1,000 requests to fetch the Head.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I have implemented the suggestion for in-memory so we do not call this anymore in that code path, but in other cases where we only call ListObjects I don't think this is avoidable. This is why I decided to add the --preserve-posix-metadata
command line argument, since there's no need to do these checks unless you are using FSX or Storage Gateway. Let me know if you have other suggestions.
mem_concat.go
Outdated
@@ -240,6 +241,10 @@ func tarGroup(ctx context.Context, client *s3.Client, objectList []*S3Obj) ([]by | |||
AccessTime: *o.LastModified, | |||
Format: tarFormat, | |||
} | |||
if preservePOSIXMetadata { | |||
head := fetchS3ObjectHead(ctx, client, o) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The GetObject
API call returns a Metadata dictionary as well. It might be worth modifying the downloadS3Data
function so it also returns the header data that you need. Have you checked to see if it contains what is needed to presesrve the POSIXMetadata?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! I have implemented this.
ctime is not mentioned in AWS docs, but is set by AWS Storage Gateway.
@yanko7 I have run several validation tests in different scenarios, and I feel this is ready to be merged now. There were two additional changes since before:
|
Implements Issue #23
When creating tar archives, copy POSIX metadata from S3 headers into TAR headers.
When extracting tar archives, set S3 headers using POSIX metadata from TAR headers.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.