Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

slim_handler reduce /tmp use and enable significantly larger deployments #1022

Closed
wants to merge 3 commits into from
Closed

Conversation

olirice
Copy link
Contributor

@olirice olirice commented Jul 28, 2017

Description

Updated slim_handler behavior to:

Previously:

  • Download project.zip from s3 to /tmp
  • Unzip project.zip to /tmp/project
  • Do not delete project.zip

Now:

  • Download project.zip into an SpooledTemporaryFile
    • SpooledTempfile is in-memory
    • SpooledTempfile has a max_size based on the Lambda function's memory from an env var
    • SpooledTempfile seamlessly dumps to /tmp and continues if max_size is exceeded (safe)
    • SpooledTempfile deletes automatically once it falls out of scope
  • Unzip project.zip to /tmp

GitHub Issues

#1020
#961
#881

@coveralls
Copy link

coveralls commented Jul 28, 2017

Coverage Status

Coverage decreased (-0.04%) to 74.019% when pulling 4475b7a on olirice:master into a21c973 on Miserlou:master.

@Miserlou
Copy link
Owner

This is really great, is there any chance you can also update the README file or write a blog post about this for the release?

Poking @mcrowson for review as well.

@coveralls
Copy link

coveralls commented Jul 30, 2017

Coverage Status

Coverage decreased (-0.04%) to 74.019% when pulling c07ab31 on olirice:master into a21c973 on Miserlou:master.

@olirice
Copy link
Contributor Author

olirice commented Jul 30, 2017

Great, README updated

The existing blog post on Large Applications already says that Zappa supports projects up to 500 MB.

Zappa deployments now support up to 500M of zipped up Python projects. Simply set “slim_handler”: true in zappa_settings.json and your large projects can now serve up requests from Lambda without a server.

What are you looking for in a blog post?

I could write a walkthrough with code snippets to create and deploy a 500 MB project and explain the memory considerations throughout? "Lets Deploy a 500 MB Project on Lambda" or something similar.

@mcrowson
Copy link
Collaborator

Love the idea, i just don't think it makes a tremendous difference. It still ends up spilling onto disk and you have the zip size and zip folder to worry about. The only change here then is that we have /tmp space plus ram space to put the zip file and the unzipped contents.

Setting the RAM size to 650M to account for all of it is over a huge cost increase for the whole application to get it all to fit. I still think a streaming approach is the right long term solution, but this might help the projects that just needed an extra 100M or something to fit it all. For those project zips over 300M though they still might not be big enough to account for the fully unzipped contents in addition to the zip file.

@olirice
Copy link
Contributor Author

olirice commented Jul 31, 2017

@mcrowson
Agreed. Unfortunatey, Zipfile expects random access to the archive so streaming unzip isn't possible.

@Miserlou If I can change the project upload format to .tar.gz for slim handler projects then streaming unzip is totally feasible.

Objections?

@mcrowson
Copy link
Collaborator

totally on the same page. I say give it a go packaging with the gzipped tarball and streaming unzip that way

@olirice
Copy link
Contributor Author

olirice commented Aug 1, 2017

@mcrowson streaming .tar.gz from s3 works great.

# Resources
remote_bucket, remote_file = parse_s3_url(project_zip_path)
s3 = boto_session.resource('s3')

# S3 File Object
remote_project = s3.Object(remote_bucket, remote_file)

# remote_project byte stream
raw_stream = remote_project.get()['Body']._raw_stream

# Create tarfile from byte stream. Note mode = 'r|gz' not 'r:gz' 
# For mode explanation see https://docs.python.org/2/library/tarfile.html
remote_archive = tarfile.open(None, 'r|gz', fileobj=raw_stream)

# Extract as usual
remote_archive.extractall(path=project_folder)

It looks like both the cli and core need updating to implement this feature. That's a little more than I intended to bite off.

Do you know of any contributors who might consider updating the client-side tools?

If not, how about merging the SpooledTemporaryFile PR so those of us who insist on abusing Lambda have a simple (albeit costly) solution for 500mb deployments until I have time to loop back around and implement the streaming solution.

@mcrowson
Copy link
Collaborator

mcrowson commented Aug 1, 2017

I think @mbeacom offered to do the whole of it as well over on #881

@dswah
Copy link

dswah commented Aug 2, 2017

I've already found this code to be super useful!

I'd like to deploy my app with this, but I dont want to manually copy over handler.py every time i reinstall Zappa...

What's the status of this PR?

@olirice
Copy link
Contributor Author

olirice commented Aug 2, 2017

@dswah I believe @mcrowson is waiting on an update wrt @mbeacom's streaming gzip implementation over on #881 before making a recommendation to merge this (not as good) solution.

@mbeacom, if you had any trouble wrestling that s3 file object into a stream (I did) there's a code snippet ^ that might be helpful.

@olirice
Copy link
Contributor Author

olirice commented Aug 7, 2017

Closing due to better solution at #1037

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants