Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gzippin #1037

Merged
merged 3 commits into from
Aug 14, 2017
Merged

gzippin #1037

merged 3 commits into from
Aug 14, 2017

Conversation

mcrowson
Copy link
Collaborator

@mcrowson mcrowson commented Aug 4, 2017

Description

Using gzipped tarballs for the slim_handler's package. This allows the project to be downloaded and unzipped into /tmp on the fly.

Would love help testing especially on windows as I attempted to address
https://github.com/Miserlou/Zappa/blob/master/zappa/core.py#L568

from PR #716 with the tarball approach.

GitHub Issues

#961
#881
#1020

mcrowson added 2 commits August 4, 2017 12:02
@coveralls
Copy link

coveralls commented Aug 4, 2017

Coverage Status

Coverage increased (+0.1%) to 74.165% when pulling 2696d9b on mcrowson:gzipped_project into a21c973 on Miserlou:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-15.9%) to 58.137% when pulling 2696d9b on mcrowson:gzipped_project into a21c973 on Miserlou:master.

@coveralls
Copy link

coveralls commented Aug 4, 2017

Coverage Status

Coverage increased (+0.2%) to 74.236% when pulling 6551e34 on mcrowson:gzipped_project into a21c973 on Miserlou:master.

@mbeacom
Copy link
Contributor

mbeacom commented Aug 4, 2017

Beat me to it! Thanks for getting it done, though!

@dswah
Copy link

dswah commented Aug 5, 2017

so stoked!

@olirice
Copy link
Contributor

olirice commented Aug 5, 2017

Looks great, nice work!

I just tested deploying a bare bones project with python 2.7.12 and am getting an error from handler.py

[1501892918179] 'StreamingBody' object has no attribute 'tell': AttributeError
Traceback (most recent call last):
  File "/var/task/handler.py", line 505, in lambda_handler
  return LambdaHandler.lambda_handler(event, context)
  File "/var/task/handler.py", line 239, in lambda_handler
  handler = cls()
  File "/var/task/handler.py", line 104, in __init__
  self.load_remote_project_archive(project_archive_path)
  File "/var/task/handler.py", line 167, in load_remote_project_archive
  with tarfile.open(fileobj=archive_on_s3['Body'], mode="r:gz") as t:
  File "/usr/lib64/python2.7/tarfile.py", line 1691, in open
  return func(name, filemode, fileobj, **kwargs)
  File "/usr/lib64/python2.7/tarfile.py", line 1745, in gzopen
  t = cls.taropen(name, mode, fileobj, **kwargs)
  File "/usr/lib64/python2.7/tarfile.py", line 1721, in taropen
  return cls(name, mode, fileobj, **kwargs)
  File "/usr/lib64/python2.7/tarfile.py", line 1587, in __init__
  self.firstmember = self.next()
  File "/usr/lib64/python2.7/tarfile.py", line 2356, in next
  tarinfo = self.tarinfo.fromtarfile(self)
  File "/usr/lib64/python2.7/tarfile.py", line 1251, in fromtarfile
  buf = tarfile.fileobj.read(BLOCKSIZE)
  File "/usr/lib64/python2.7/gzip.py", line 268, in read
  self._read(readsize)
  File "/usr/lib64/python2.7/gzip.py", line 295, in _read
  pos = self.fileobj.tell() # Save current position
AttributeError: 'StreamingBody' object has no attribute 'tell'

If you put the tarfile in streaming mode 'r|gz' (instead of 'r:gz') its quiets that error.

with tarfile.open(fileobj=archive_on_s3['Body'], mode="r|gz") as t:

After that update, handler.py errors that the module has no attribute your_entrypoint.

[1501896837626] 'module' object has no attribute 'app': AttributeError
Traceback (most recent call last):
  File "/var/task/handler.py", line 513, in lambda_handler
  return LambdaHandler.lambda_handler(event, context)
  File "/var/task/handler.py", line 247, in lambda_handler
  handler = cls()
  File "/var/task/handler.py", line 134, in __init__
  wsgi_app_function = getattr(self.app_module, self.settings.APP_FUNCTION)
AttributeError: 'module' object has no attribute 'app'

If you check the .tar.gz on s3, the files are in the right places but they all have a size of 0 bytes.

I haven't had time to look into that yet. It might be related to tarfile.TarFile.addfile expecting a file object, rather than a file path.

@coveralls
Copy link

coveralls commented Aug 6, 2017

Coverage Status

Coverage increased (+0.2%) to 74.272% when pulling c466a0b on mcrowson:gzipped_project into a21c973 on Miserlou:master.

@mcrowson
Copy link
Collaborator Author

mcrowson commented Aug 6, 2017

ok, try it now. Works for me with a 200M project on 128M Ram lambda.

Test Environments were OSX py2.7 and OSX py3.6

@olirice
Copy link
Contributor

olirice commented Aug 6, 2017

Working now, 2.7 and 3.6 on ubuntu 16.04

Here are a couple of cold start times on different memory size instances.

414 mb project
Memory Size | Gzip Extract time (s)
1536mb | 3.05 seconds
512 mb | 12.49 seconds
256 mb | 24.79 seconds
128 mb | Error(timeout)

For the cases I could check (when the project + zip less than 500 mb), cold start performance is at least as fast as zip on disk, usually better.

225 mb project
Memory Size | Gzip Extract Time (s) | Zip Extract Time (s)
1536mb | 1.88 | 1.86
512 mb | 4.10 | 5.99
256 mb | 9.56 | 10.01
128 mb | 16.89 | 20.74

Given the slow cold start times on small instances, the win here will be huge deployments on large instances versus big-ish deployments on tiny instances.

Notes:

  • Precompiled packages was turned off to make sure extract sizes matched local size.
  • All times are slowest of 3 attempts

@mcrowson
Copy link
Collaborator Author

mcrowson commented Aug 6, 2017 via email

@olirice
Copy link
Contributor

olirice commented Aug 6, 2017

Yes, all tests went through API Gateway and timed out after 30 seconds. I'm sure the it would have completed over direct invocation if you crank the Lambda timeout up to 5 minutes.

@mcrowson
Copy link
Collaborator Author

mcrowson commented Aug 6, 2017 via email

@olirice
Copy link
Contributor

olirice commented Aug 7, 2017

Completely agree. I'm not clear where the bottleneck is yet though. I tried removing gzip compression and stream extracting an uncompressed tarball to see if CPU usage during extraction was the issue. Speeds were pretty similar to gzip so it wasn't a helpful test.

We need to get a better understanding of disk/network/CPU performance at each Lambda memory size to figure out if there's more we can do to improve cold starts.

I think disk is most likely to be the problem. I'll run some tests this week and get back to you but (as you say) it's outside the scope for this PR.

@olirice
Copy link
Contributor

olirice commented Aug 7, 2017

Any thoughts on a failover strategy to zip or an uncompressed tarfile when zlib isn't available?

Zlib is technically an optional component so using gzip without a backup breaks slim_handler for certain valid builds.

Too niche to worry about?

@GeorgianaPetria
Copy link

Hi all,

Is it already possible to use this fix?

I am getting the following error
[Errno 28] No space left on device: IOError Traceback (most recent call last): File "/var/task/handler.py", line 491, in lambda_handler return LambdaHandler.lambda_handler(event, context) File "/var/task/handler.py", line 240, in lambda_handler handler = cls() File "/var/task/handler.py", line 102, in __init__ self.load_remote_project_zip(project_zip_path) File "/var/task/handler.py", line 169, in load_remote_project_zip z.extractall(path=project_folder) File "/usr/lib64/python2.7/zipfile.py", line 1040, in extractall self.extract(zipinfo, path, pwd) File "/usr/lib64/python2.7/zipfile.py", line 1028, in extract return self._extract_member(member, path, pwd) File "/usr/lib64/python2.7/zipfile.py", line 1084, in _extract_member shutil.copyfileobj(source, target) File "/usr/lib64/python2.7/shutil.py", line 52, in copyfileobj fdst.write(buf) IOError: [Errno 28] No space left on device

@mcrowson
Copy link
Collaborator Author

mcrowson commented Aug 7, 2017

Got any details about the project? Size of zip? RAM size on lambda? Size of project unzipped? etc.

Oh, just reading your trace. You're using current code and want this new code. Not saying that this PR is broken.

@GeorgianaPetria
Copy link

GeorgianaPetria commented Aug 7, 2017

Sure:
Size of the zip: 107M
Project unzipped: 487M
Max memory used on Lambda: 512M

@mcrowson
Copy link
Collaborator Author

mcrowson commented Aug 7, 2017

Check out this PR and give it a shot. I'd love to see how the 487M works. Also if you're on Windows we'd love to see it succeed there as well.

@GeorgianaPetria
Copy link

I'm on Ubuntu.
Sorry, how can I check out the PR? I'm using zappa version 0.43.1.

It's actually working for me now, not sure if I already have your modified version though.
I'm doing find . | grep -E "(__pycache__|\.pyc|\.pyo$)" | xargs rm -rf before deploying.

I believe the reason why it wasn't working is that the unzipped project was sometimes larger than 525M when I had all the python files. But after deleting them (project has 487M) it works.

@Miserlou Miserlou merged commit ec39035 into Miserlou:master Aug 14, 2017
@dswah
Copy link

dswah commented Aug 15, 2017

@Miserlou @mcrowson i saw that this is merged! awesome! but has it been released?
if so, how do i pass the archive_format command to zappa to tell it to use tarball ?

@dswah
Copy link

dswah commented Aug 15, 2017

pip install -U zappa did it :)
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants