Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

command/storage: add versioning support #475

Merged
merged 64 commits into from
Jun 16, 2023
Merged

command/storage: add versioning support #475

merged 64 commits into from
Jun 16, 2023

Conversation

kucukaslan
Copy link
Contributor

@kucukaslan kucukaslan commented Jul 26, 2022

This commit adds versioning support to the s5cmd.

  • Added --all-versions flag to ls, rm, du and select subcommands
    to apply operation on(/over) all versions of the objects.
  • Added --version-id flag to cat, cp/mv, rm, du and select
    subcommands to apply operation on(/over) a specific versions of the object.
  • Added bucket-version command to configure bucket versioning. Bucket name
    alone returns the bucket versioning status of the bucket. Bucket versioning can
    be configured with set flag which only accepts.
  • Added --raw flag to cat and select subcommands. It disables the wildcard operations.

Note: Google Cloud Storage uses a different approach for versioning. So with current implementation, s5cmd cannot use or retrieve generation numbers . However, bucket-version command and du command with all-versions flag works as expected since they do not use version ids.

Fixes: #218
Fixes: #386
Fixes: #539

…sioning

This commit adds "--all-versions" flag to "ls", "rm", "du" subcommands.

When rm is used with "--all-versions" flag, it succesfully deletes all "versions" of object, but it DOES NOT delete any of the "Delete Markers" associated with the given key(s). For now, I'm not sure about how to handle delete markers. Whether deletion of them should be a default behaviour, or should we require another flag for them or maybe another subcommand.

Similarly, we are yet to decide how or whether s5cmd will support enabling or suspending versioning.

As a last note, for operations with "--all-versions" I'm using same interfaces/methods used by default. So I expect them to have similar high performance.

Updates. #386, #218.
@kucukaslan
Copy link
Contributor Author

kucukaslan commented Jul 26, 2022

Status as of July 26 (Outdated):

  • add all-versions flag to following subcommands:
    • ls
    • rm ( only with wildcards,does not delete delete markers)
    • du
  • add version-id flag to following sub commands:
    • cp/mv
    • cat
    • rm
    • du
  • format outputs
    • ls
      ...

Background

You may refer to #386 (comment) for background of changes

Current problem (I'm trying to solve):

rm uses expandSource method which handle keys differently when wildcards are used (or not). So It doesn't work when all-versions of a particular key was to be deleted, it just put a delete marker, though rm succesfully deletes objects when wildcards are. used.
To fix this, we need to pass value of "all-versions" flag expandSource. Hence I propose to

  • put value of all-version flag as a field to the URL (instead of passing to s3 object and using it in s3.List method).
  • I want to change the type of src & dst fields of commands (Copy, Delete etc.) to URL (from string) accordingly.

Alternatively, we can add new parameters to expandSource method to pass all-versions flag.

Note
You can refer to Kucukaslan@df94602 to see what kind of changes I'm intended to do in code as rm.go and Delete being an example.

Example usage syntax

s5cmd ls --all-versions s3://bucket/

s5cmd rm --all-versions "s3://bucket/*"
s5cmd rm --all-versions s3://bucket/key

s5cmd du --all-versions "s3://bucket/*"


s5cmd cat --version-id smUtf8Thng s3://bucket/key

s5cmd du --version-id smUtf8Thng s3://bucket/key

s5cmd rm --version-id smUtf8Thng s3://bucket/key

add examples
change examples to quote wildcarded strings
add examples
change examples to quote wildcarded strings
add raw flag to Cat
add examples
GetObjectInput.VersionID should be nil not empty string when there is no version.
this change affects both cp and mv operations.
@kucukaslan
Copy link
Contributor Author

kucukaslan commented Aug 3, 2022

Up to date status

I've made the changes to the Command objects and url.URL I mentioned earlier.

Implementation

  • add all-versions flag to following subcommands:
    • ls ( including delete markers)
    • rm ( including delete markers)
    • du
    • select
  • add version-id flag to following subcommands:
    • cp/mv
    • cat
    • rm
    • du
    • select
  • Added bucket-version command to configure bucket versioning.
    • get status
    • set bucket versioning

Output formats

  • cp/mv : Didn't change. It is ambiguous that whether version-id that should be printed belongs to the source or destination.
  • cat: Didn't change. It should print the content of the file
  • du: Didn't change. It is generally used for multiple objects and return their total disk usage. ls can be used with all-versions to see sizes of each version.
  • select: Didn't change. It should print the result of the query.
  • all-versions flag :
    • ls ( including delete markers)
      Example
      2022/08/10 09:53:03 3171 log/log.go 3/60O30C1G60O30C1G60O30C1G60O30C1G60O30C1G60O30C1J0109VB3PCPEL0PO=
      2022/08/10 09:53:28 23 log/log.go 3/60O30C1G60O30C1G60O30C1G60O30C1G60O30C1G60O30C1K01TF8AJ9OV6NF7O=
      {"key":"s3://mcks5cmd/log/log.go","etag":"b96979fea4ce57766596e47d1b6cc5e1","last_modified":"2022-08-10T09:53:03.124Z","type":"file","size":3171,"storage_class":"STANDARD","version_id":"3/60O30C1G60O30C1G60O30C1G60O30C1G60O30C1G60O30C1J0109VB3PCPEL0PO="}
      {"key":"s3://mcks5cmd/log/log.go","etag":"05f2faf2442033698d1aa6778ca70c1b","last_modified":"2022-08-10T09:53:28.325Z","type":"file","size":23,"storage_class":"STANDARD","version_id":"3/60O30C1G60O30C1G60O30C1G60O30C1G60O30C1G60O30C1K01TF8AJ9OV6NF7O="}
    • rm ( including delete markers)
      Example
      rm s3://mcks5cmd/log/log.go 3/60O30C1G60O30C1G60O30C1G60O30C1G60O30C1G60O30C1J0109VB3PCPEL0PO=
      rm s3://mcks5cmd/log/log.go 3/60O30C1G60O30C1G60O30C1G60O30C1G60O30C1G60O30C1N03KIF6K5MOK5168=
      {"operation":"rm","success":true,"source":"s3://mcks5cmd/log/log.go","version_id":"3/60O30C1G60O30C1G60O30C1G60O30C1G60O30C1G60O30C1J0109VB3PCPEL0PO="}
      {"operation":"rm","success":true,"source":"s3://mcks5cmd/log/log.go","version_id":"3/60O30C1G60O30C1G60O30C1G60O30C1G60O30C1G60O30C1N03KIF6K5MOK5168="}
  • version-id:
    • rm
      Example rm s3://mcks5cmd/log/log.go 3/60O30C1G60O30C1G60O30C1G60O30C1G60O30C1G60O30C1K01TF8AJ9OV6NF7O=
      {"operation":"rm","success":true,"source":"s3://mcks5cmd/log/log.go","version_id":"3/60O30C1G60O30C1G60O30C1G60O30C1G60O30C1G60O30C1J0109VB3PCPEL0PO"}

Tests

  • prepare versioning setup for gofakes3
  • testing all-versions flag
    • ls & rm
    • du
  • testing version-id flag
    • cp/mv
    • cat
    • rm
    • du
  • command validations
    • cannot use both of the flags together
    • this flags are only meaningful to remote files
    • make validation checks reusable

Google Cloud

Warning
Google Cloud Storage uses a different approach for versioning. So with current implementation, s5cmd cannot use or retrieve generation numbers . However, with all-versions flag du works as expected since it does not use version ids, ls lists object metadata except the generation numbers etc.

Commentary & Known Issues & Discussion Topics

  • gofakes3 package that we use in our tests supports versioning only with in memory backend, so I've added another method to setup fake server.
  • There is was a bug when I try to delete from gofakes3 server using s5cmd rm. Despite using version-id/all-versions flags, the server does not permanently delete the corresponding objects and just adds delete marker to them. Interestingly:
    • this bug does not happen when I use aws s3api delete-object to connect gofakes3 server.
    • this bug does not happen when I use s5cmd rm to connect real AWS S3 server.
    • other subcommands of s5cmd works as expected.
      I'm currently trying to identify root cause of this bug and to fix it.

Note
It turned out that gofakes3 does not support multidelete for versioned objects. At the moment we've fixed it in igungor's fork with igungor/gofakes3#6. Also we've a PR to fix it in upstream too johannesboyne/gofakes3#69.

  • I do not discern the objects and delete markers when all-version flag is used.
    • Should we show the distinction in outputs?
    • Should we require yet another flag to take delete markers into account (and ignore them otherwise)

Note
We continue not to discern objects and delete markers, in this case. No special flag.

  • Both of the s3 keys and object versions have maximum length of 1024 byte (UTF-8 string). It, potentially, might require a lot of whitespaces to align VersionID and Key columns in output (especially because we don't know, in advance, what their respective maximum lengths will be. Should we apply adaptive alignment? I mean: Each column is aligned according to the longest element so far.)

Note
We will only align key to left with fixed "50" (?) characters width and append the versionID (prefixed with a space) to it
aws s3api prints out json
mc has an example output here

export "storage.isGoogleEndpoint" method
@kucukaslan kucukaslan marked this pull request as draft September 5, 2022 17:13
@kucukaslan kucukaslan marked this pull request as ready for review September 8, 2022 07:49
Copy link
Member

@sonmezonur sonmezonur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some minor comments but LGTM overall. Thank you!

command/util.go Outdated Show resolved Hide resolved
command/util.go Outdated Show resolved Hide resolved
command/flag.go Show resolved Hide resolved
command/bucket_version.go Outdated Show resolved Hide resolved
sonmezonur
sonmezonur previously approved these changes Sep 14, 2022
storage/s3.go Outdated Show resolved Hide resolved
sonmezonur
sonmezonur previously approved these changes Sep 14, 2022
Our use case is very simple then the the problem that text/cases is handling. We only want first rune to be upper case, and rest to be lower case, we don't care the word breaks and we don't care any other language than English, in fact we only care words: "Suspended" and "Enabled".
So it would be overkill to use that package.
sonmezonur
sonmezonur previously approved these changes Sep 15, 2022
If a bucket is not versioned, this delete method should also work as expected.

Also rename S3.listObjectsVersion to S3.listObjectVersions to match the AWS SDK's naming
@igungor igungor merged commit 0993c6d into peak:master Jun 16, 2023
@igungor
Copy link
Member

igungor commented Jun 16, 2023

🥇

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support --raw flag with cat How to delete object versions? du: include versioned objects
3 participants