Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New S3 Boto3 backend (closes #57) #111

Closed
wants to merge 3 commits into from

Conversation

mbarrien
Copy link
Contributor

This pull request implements #57 by adding a Boto 3 backend that tries to be a close-to-drop-in replacement for Boto 2. Due to the slight differences, I have kept it a separate backend, but with a lot of copy and paste code. Given that Boto 2 is heading towards maintenance mode according to boto/boto@e3dd996, I don't think it's worth trying to have 2 backends sharing code when the Boto 2 implementation looks on the way to being deprecated.

Note that this isn't just me blithely throwing away the Boto 2 implementation; the fundamental underlying operations are VERY different and worthy of a separate backend. Boto 2 operates on the assumption that you can set arbitrary headers by passing in a dictionary; Boto 3 restricts you to specific named parameters and as such the 2 approaches are very incompatible with one another. You can try to do minor mappings here and there to try to map some of the headers in the AWS_HEADERS setting, but trying to map every possible header value to the right argument name in the right method is pretty tedious and error prone. Instead, this pull request embraces Boto 3's use of parameters as its way of taking in these extra arguments, leaving the remapping up to the django-storages user who wants to switch backends. For the limited number of extra headers and parameters they'll use, this mapping is easier to do, and looking up in Boto 3's documentation for the parameter name is straightforward.

This pull request replaces #66, adding unit tests and incorporating changes due to pull requests accepted into Boto3/botocore. A substantially similar version of this (minus the recent merges from the past few days) has been run in production with Django 1.6.11 for several months without problems.

Also note that while I was at it, I made the necessary change to support #95 if you switch to this backend.

Because this is based on s3boto.py, you should be able to manually perform a diff of s3boto.py and the new s3boto3.py to understand the changes.

Changes:

  • The AWS_HEADERS/storage.headers setting is replaced with AWS_S3_OBJECT_PARAMETERS/storage.object_parameters. The keys/values to this are intended to be the arguments to the http://boto3.readthedocs.org/en/latest/reference/services/s3.html#S3.Object.put boto3.Resource('s3').Object.put() method, which has all the arguments that would be used to generate the correct request headers. Note that this gives a little more access than just headers, since a user could also provide ACL and Content arguments in this dict.
  • Unlike the s3boto implementation, s3boto3 backend does not currently support proxies (Proxy support? boto/botocore#541), or alternate host/ports (Can't use s3 signature version with private S3 URL. boto/botocore#601) because the underlying Boto3/Botocore library does not currently support it. It only kind of supports the endpoint URL
  • If using s3v4, since botocore does not automatically redirect you to the correct region's endpoint nor sign properly unless it knows the region, there is an AWS_S3_REGION_NAME/storage.region_name setting to force the region.
  • There's some behavior that was being done in the boto2 library that boto3 does not do, so equivalent code has been added to s3boto3 to do this. These include things like rewinding the file pointer, automatically checking for bucket and object existence, or directly writing the S3 contents to a file/file pointer. There's cases where Boto3 does not allow previous operations, like locally updating the last_modified attribute, which this new version performs by just reloading the object from the server.
  • No need to parse timestamps, since boto3 already performs this conversion for you.

Known issues:

  • For unsigned URLs (e.g. querystring_auth=False), Boto3 does not support this in its API (Need way to get pre-constructed, non-signed S3 signature boto/boto3#169). This implements compatibility by stripping off any signature parameters by parsing the querystring. Note that these parameter names will be different for s3v1/s3v4 URLs, but this implements it by stripping them all away.
  • To support s3v4 URLs for endpoints that aren't v4 by default (most of them), there are 2 ways. The one I've used in production is to have a ~/.aws/config file as generated by running "aws configure set default.s3.signature_version s3v4". If deploying in a configuration where there is no user home directory, the location of this config can be set with AWS_CONFIG_FILE environment variable. Another way involves setting the S3Boto3Storage.config variable with a Boto3 client config.

This has been tested using s3v4 signatures with both signed and unsigned URLs, along with response content disposition and KMS server-side encryption key arguments.

@mbarrien mbarrien changed the title New S3 Boto3 backend. New S3 Boto3 backend (closes #57) Jan 13, 2016
@mbarrien mbarrien force-pushed the boto3-new branch 3 times, most recently from de60f98 to cc4ae56 Compare January 13, 2016 20:09
This was referenced Jan 13, 2016
@mbarrien
Copy link
Contributor Author

What would it take for this to be reviewed and/or merged in. Manually running "diff s3boto.py s3boto3.py" should be able to show the changes and why the 2 backends are so incompatible (unfortunately GitHub doesn't make this easy).

A refactor to try to have the 2 backends share code is liable to make both unmaintainable, while the approach of replace s3boto with the new s3boto3 is dangerous too because it is not completely a drop-in replacement, so I intentionally made it sit side by side with the existing s3boto.

@mbarrien mbarrien closed this Jan 26, 2016
@mbarrien mbarrien reopened this Jan 26, 2016
@jschneier
Copy link
Owner

I was just thinking about this as I was falling asleep last night. I'm going to push out a 1.3.2 later today and then review this. I'd like to get it into a 1.4 by the end of the week.

The way the code is architected is completely correct, I am the only holdup.

self.name = name[len(self._storage.location):].lstrip('/')
self._mode = mode
self.obj = storage.bucket.Object(storage._encode_name(name))
# TODO: Is this explicitly necessary? Done to emulate old
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to remove this sort of questionable backwards compat code for a new backend. If there is an issue we can deal with it.

@jschneier
Copy link
Owner

There have been a few fixes that have landed in master since this was first opened. If we can get those moved over as well I will merge.

I'm going to release a 1.4 that is just changing the PyPI package name back to django-storages (!!) but will do a fast 1.5 following with this code.

* Based on existing s3boto module
* Replace Boto2 headers settings with parameters
* Does not support proxies, alternate host/port
@laurrentt
Copy link

If I were to specify a KMS Key Id, should I put it in AWS_S3_OBJECT_PARAMETERS ?

@mbarrien
Copy link
Contributor Author

    AWS_S3_OBJECT_PARAMETERS = {
        'ServerSideEncryption': 'aws:kms',
        'SSEKMSKeyId': 'INSERT_YOUR_KMS_ID_HERE'
    }

For the use case that I wrote this port for, we happen to use an alias for the KMS ID (e.g. 'alias/foobar'), but I think it'll support any of alias, ARN, or key ID for SSEKMSKeyId.

@jplaza
Copy link

jplaza commented Mar 22, 2016

Any timeline for this to be released? Happy to test it out and provide some feedback if necessary 😄

@alejandrodnm
Copy link

I'm using @mbarrien fork in a project that it's currently in development and I haven't encounter any issues so far.

I'm just waiting for this to be merged in order to update the dependencies.

@jplaza
Copy link

jplaza commented Mar 22, 2016

@alejandrodnm how would you install the fork?
pip install -e 'git+https://github.com/syapse/django-storages.git#egg=django-storages' ?
Would this work when freezing deps in a requirements.txt file to deploy to the server?

@alejandrodnm
Copy link

@jplaza I added to my requirements.txt file git+https://github.com/syapse/django-storages.git@boto3-new#egg=django-storages then with a pip install -r requirements.txt it downloads the repo at the boto3-new branch and installs it at your virtualenv.

The thing is that if you do a pip freeze what you'll see is django-storages==1.4 so you cannot rely on doing a pip freeze > requirements.txt (if that's how your freezing your deps), you have to manually add the complete git url with the protocol and branch to the file.

@jplaza
Copy link

jplaza commented Mar 22, 2016

Cool! Thank you @alejandrodnm

@mbarrien
Copy link
Contributor Author

Any update on this? I left a comment on b95ec91 about 1 1/2 months back noting my doubts about a specific change you were asking me to merge in, but otherwise it is up to date with what was checked in at that time.

Non-existent file raises IOError in _open for backwards compatibility with s3boto

Don't let the ClientError bubble up
@akoumjian
Copy link

What is the current status of boto3 support? I'm guessing currently unsupported in latest pypi release? The RTD documentation has instructions for it, but an attempt to use it results in an import failure for original boto.

@coredumperror
Copy link

I, too, would like to see this PR get merged. My company's upcoming website offering is going to be relying on django-storages, and we'd definitely rather not be left in the dust with old-boto as it becomes deprecated.

@slepa
Copy link

slepa commented May 20, 2016

+1

# Preserve the trailing slash after normalizing the path.
# TODO: Handle force_http=not self.secure_urls like in s3boto
name = self._normalize_name(self._clean_name(name))
if self.custom_domain:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mbarrien This skips generate_presigned_url for custom domains, which means that for those we lose the AWS_QUERYSTRING_AUTH parameter. I don't think that's intended.

@thomasf
Copy link

thomasf commented Jun 23, 2016

@mbarrien Why not create a separate django-s3boto3-storage package instead? The bundling of all these different storage back ends in one package doesn't contribute to any use case I can think of.

@adamn
Copy link

adamn commented Jun 23, 2016

The backends only work for django-storages so there's very little point to having a package per-backend. The real work for this is done in boto3 which is it's own package.

This backend just takes 30KB and has no overhead aside from the space on the drive.

@adamn
Copy link

adamn commented Jun 23, 2016

With that said, I do think somebody should go through the new backend and devolve as much as possible up to boto3. Code like:

if err.response['ResponseMetadata']['HTTPStatusCode'] == 301:
    raise ImproperlyConfigured("Bucket %s exists, but in a different "
                                               "region than we are connecting to. Set "
                                               "the region to connect to by setting "
                                               "AWS_S3_REGION_NAME to the correct region." % name)

Seems very low level - I have to imagine boto3 abstracts all the HTTP responses into status codes and messages that can simply be passed to the user directly rather than running through custom logic. Where custom logic is necessarily, there should at least be a TODO added that links to a ticket in boto3 trying to get that logic implemented upstream as appropriate.

@mbarrien
Copy link
Contributor Author

@adamn Believe me I went looking for something higher level for that and could find nothing. Perhaps things have changed since then; I haven't touched the code in a few months and perhaps Boto3 has added it.

@thomasf
Copy link

thomasf commented Jun 24, 2016

@adamn I disagree

if this would be it's own package it can directly depend on a correct minimum version of boto3.
I don't think that almost anyone installing django-storages packages does it for more than one storage backend so I don't see why multiple projects has any downside at all.

I ONLY want the s3boto3 backend so I copied it into my project from this pull request and made some changes that I needed but I rather depend on a single purpose clean package with tests instead of the full django-storages suite which does't even have tests written for several of the back ends.

@adamn
Copy link

adamn commented Jun 24, 2016

The problem with multiple backend projects is that they would all have to be coordinated for almost no gain. What would make sense might be multiple requirements.txt files. But since this project doesn't even have a requirements.txt file, that seems like a whole other issue to be addressed.

I wholeheartedly agree though that any future requirements.txt files should not install drivers for the backends automatically - since that would be burdensome.

@thomasf
Copy link

thomasf commented Jun 25, 2016

Why would they have to be coordinated? The storage system is a part of Django, not local to django-storages. Also, package dependencies are specified in setup.py for a package, not a requirements file.

@adamn
Copy link

adamn commented Jun 27, 2016

Sorry, should have said setup.py.

Nonetheless, there are no required packages in the current setup.py file so
I'm not sure what we're talking about. If it's really just the backends in
the backends/ folder, those are just Python modules that never run if you
don't use those given backends. They are just a few KB each. The actual
work is done by boto, or ftplib, or libcloud which each developer can
install or not at his/her discretion.

Coordination would be a problem if Django changes it's storage interface,
and if that were to happen, then each underlying bakend repo would have to
be coordinated with django-storages which would have to support both
interfaces (the new and the deprecated), and then stop supporting the old
interface at some point.

But anyway, this isn't really my decision to make - just an opinion.

On Sat, Jun 25, 2016 at 3:46 AM, Thomas Frössman notifications@github.com
wrote:

Why would they have to be coordinated? The storage system is a part of
Django, not local to django-storages. Also, package dependencies are
specified in setup.py for a package, not a requirements file.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#111 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AADyyukDw09Xu6_KU2Roj63l860KVN0bks5qPNzvgaJpZM4HESOT
.

@jleclanche
Copy link
Contributor

@jschneier I've been using this in production for a few months now and I can confirm it works great.

You said back in January that you'd like to land this soon but it's blocked on you. Do you need help maintaining the library? I can't promise much of my time but as long as we're using the library in our app, I can help review PRs and merge some stuff in.

@jleclanche
Copy link
Contributor

BTW, I do think the PR needs some improvements (cf. my comment on AWS_QUERYSTRING_AUTH and @adamn's comment about delegating more functionality to boto) but these are things which can land later. As it stands, I see there's a consequent amount of people (myself included) using the s3boto3 fork just because of this and that's not healthy for the project. django-storages has suffered through one fork already.

@adamn
Copy link

adamn commented Jul 19, 2016

+1 to just merging it and putting other work in future PRs.

@selevit
Copy link

selevit commented Jul 29, 2016

I love this PR :) Wait for merging.

@jschneier
Copy link
Owner

Thanks so much @mbarrien for the work and everyone else for the comments. I agree with the iterative improvement approach and have merged the rebased version of this in #179.

Going to release a new version now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.