Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a --source-region parameter. #872

Merged
merged 3 commits into from
Aug 13, 2014
Merged

Conversation

kyleknap
Copy link
Contributor

This parameter ensures the ability to do
trans-region syncs, moves, and copies. Tests
were expanded as well to better test
handling endpoints.

This parameter ensures the ability to do
trans-region syncs, moves, and copies.  Tests
were expanded as well to better test
handling endpoints.
@kyleknap
Copy link
Contributor Author

This is a feature addition for cp, mv, and sync. So, if you want to sync two buckets that exist in different regions and avoid redirect errors, you would now enter aws s3 sync s3://us-west-1-bucket s3://ap-northeast-1-bucket --region ap-northeast-1 --source-region us-west-1 assuming the default region is not ap-northeast-1 region. The previous behavior used the region specified by the default region or --region for operations on both buckets. However, this resulted in 301 redirect errors if the bucket names were DNS incompatible. If --source-region is not specified the previous default behavior is used.

Tests were improved in two ways.
First, I added the ability to create buckets in different regions and the ability to name the bucket. The test remembers the region of the bucket using a dictionary to be able to do operations on objects in the bucket without having to specify the region.
Second, whenever local files are created, I subtract a few years off the the last modified time. This ensures local files are always older than uploaded s3 files.

@kyleknap
Copy link
Contributor Author

@danielgtaylor @jamesls

@jamesls
Copy link
Member

jamesls commented Aug 11, 2014

Can you clarify why the FileGenerator now needs both an endpoint and source_endpoint? I was expecting that for sync, one file generator gets the source_endpoint, and the other generator gets the destination endpoint, but the FileGenerator itself only needs a single endpoint argument.

I realize that this may be a slightly larger refactoring to do this, but I think it makes the most sense going forward.

@kyleknap
Copy link
Contributor Author

FileGenerator for cp needs both the endpoint and source endpoint because the source_endpoint is needed to list out the objects in the source bucket, and the endpoint refers to the destination and that is needed for the actaul CopyObject command. When I was trying to figure out how to approach this, I learned for copying objects over there were two common redirects errors I got: One for ListObjects (which was thrown if I only used endpoint) and one for CopyObject (which was thrown if I only used source_endpoint). So I had to use both endpoints.

@kyleknap
Copy link
Contributor Author

Also, this scenario is not limited to cp command. The same applies to sync and mv. In the case for mv, the source endpoint is needed for the DeleteObject operation. If the source endpoint is not used, you get a redirect error for DeleteObject

@jamesls
Copy link
Member

jamesls commented Aug 11, 2014

What I was alluding to was that the FileGenerator class ideally shouldn't know about source/dest region. It's really just about listing local or remote files, for which the BucketLister only needs to know what region it's suppose to talk to . The only reason it needs both a source and a destination region is because it has to pass this info along to FileInfo which is where the actual logic matters (e.g a move operation requires copying to the dest and deleting from the source, a copy operation requires sending the request to the destination bucket region, etc.).

I think I might have underestimated the work by calling this a "slightly larger refactoring". Looking into this, it's actually quite a bit of work to decouple fileinfo from filegenerator. However, long term, I think this is the right direction.

This refactoring removes the necessity of passing
arguments through the ``FileGenerator`` class
in order for the ``FileInfo`` class to obtain
the arguments it requires to perform an operation.
@kyleknap
Copy link
Contributor Author

I agree it was about time to break it up. So here is how I refactored the code. There are two new classes. A FileBase class and an InfoSetter class. The FileBase class is much like a FileInfo but it is much simpler (i.e. only has attributes like src_path, dest_path etc.). These are generated by the FileGenerator class. The FileBase objects are passed to the objects like the IncludeExcludeFilter and the Comparator. Then right before the FileBase hits the S3Handler they are passed to the new class InfoSetter. The InfoSetter creates the FileInfo object by taking the attributes from the FileBase and injecting the attributes relevant to the operation (e.g. endpoint, source_endpoint, and parameters) Then the InfoSetter passes the FileInfo to the S3Handler to perform the operation.

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling c931e54 on kyleknap:source-region into * on aws:develop*.

@danielgtaylor
Copy link
Contributor

LGTM, but I'll defer to James for the final review.

@@ -579,6 +594,74 @@ def test_sync_with_delete_option_with_same_prefix(self):
self.assertEqual('', p.stdout)


class TestSourceRegion(BaseS3CLICommand):
def extra_setup(self):
name_comp = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this used anywhere?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, misread, ignore above comment. Is there any reason you couldn't use the randomly generated names?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the issue, this redirect error happens every time when there is a dot in the name or the bucket name is not DNS compatible. So, I decided to make random url names to satisfy the conditions. When I tested it, the error does not occur if the bucket name does not have those properties. The error does happen with the names I generate as shown in the testFailWithoutRegion test

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that's right, makes perfect sense. Would you mind adding a comment somewhere about the fact that we're specifically testing non DNS compatible bucket names?

@jamesls
Copy link
Member

jamesls commented Aug 12, 2014

Overall I think this looks good. I think the refactoring to reduce the coupling between the file generator and the actual S3 operations is great. I just have two small things:

  • Let's switch all the s3 tests in test_plugin.py to use us-west-2 by default.
  • I like the abstractions you created, but I think the names are pretty vague. For example FileBase makes it seems like it's a base class for file like objects. And InfoSetter is very generic (a class that sets info). Are there maybe more specific terms we can use?

@kyleknap
Copy link
Contributor Author

Sounds good. How about for InfoSetter make it FileInfoBuilder? Then for FileBase how about FileDetails? I really cannot think of any good names for FileBase. Any ideas?

@coveralls
Copy link

Coverage Status

Changes Unknown when pulling b4c932d on kyleknap:source-region into * on aws:develop*.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.09%) when pulling baa8684 on kyleknap:source-region into 2643155 on aws:develop.

@@ -150,7 +172,7 @@ class TestMoveCommand(BaseS3CLICommand):
def test_mv_local_to_s3(self):
bucket_name = self.create_bucket()
full_path = self.files.create_file('foo.txt', 'this is foo.txt')
p = aws('s3 mv %s s3://%s/foo.txt' % (full_path,
p = _aws('s3 mv %s s3://%s/foo.txt' % (full_path,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These small changes make it hard to review what's actually functionally changed. Another way to accomplish the same thing is to import aws as _aws, and then define def aws in this module that calls to _aws.

@jamesls
Copy link
Member

jamesls commented Aug 13, 2014

:shipit: looks good. Just a few small changes, but feel free to merge once feedback's been incorporated.

Also moved the ``test_plugin.py`` tests out of
``us-east-1`` and into ``us-west-2``.
@coveralls
Copy link

Coverage Status

Coverage increased (+0.09%) when pulling 8e832a6 on kyleknap:source-region into a6a3c2d on aws:develop.

kyleknap added a commit that referenced this pull request Aug 13, 2014
Added a ``--source-region`` parameter.
@kyleknap kyleknap merged commit 29a797b into aws:develop Aug 13, 2014
@kyleknap kyleknap deleted the source-region branch August 13, 2014 21:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants