Skip to content
This repository has been archived by the owner on Aug 27, 2023. It is now read-only.

Add documentation for deploying to AWS Lambda #123

Open
dhui opened this issue May 11, 2017 · 19 comments
Open

Add documentation for deploying to AWS Lambda #123

dhui opened this issue May 11, 2017 · 19 comments

Comments

@dhui
Copy link

dhui commented May 11, 2017

No description provided.

@brianrower
Copy link

brianrower commented Sep 17, 2018

Have others got this running on Lambda?
I've got it working with a couple modifications and using Zappa (https://github.com/Miserlou/Zappa)
My Branch is here: https://github.com/brianrower/pypicloud/tree/feature/zappa
I'm happy to get this cleaned up for a PR, but if there's another preferred way of running it on lambda, I'll happily switch to that method and document it as well.
Most of my changes, came from ideas on this thread: Miserlou/Zappa#278

@stevearc
Copy link
Owner

I've certainly never tried to get it running on Lambda. I looked over the branch and it seems pretty simple. Would be happy to take a look at a PR!

@brianrower
Copy link

@stevearc after playing with it a bit more I realized I was getting TERRIBLE performance. Installing some of my standard requirements files was taking 2 minutes instead of 10 seconds (and all packages were still cached, it was just doing index checks). Debugged it a bit and determined that it was pyramid_tm causing the slow down. Removed the dependency and confirmed that theory. I'm not really familiar with Pyramid or the transaction library. I understand the concept of what it's trying to do, but don't know about the internals of how it does it. Any thoughts on why pyramid_tm would be so much slower?

@stevearc
Copy link
Owner

What cache backend were you using? If you just remove pyramid_tm then the database operations would...I guess each one would be wrapped in a transaction? Really all that pyramid_tm is doing is opening a transaction at the beginning of a request and committing (or rolling back) at the completion of the request. If it's slow, then it's probably because of some contention on the DB, or that something in pypicloud is performing the operation improperly, or both. The next step would probably be to inspect the DB during operation to see if anything is amiss.

@brianrower
Copy link

@stevearc using dynamodb and s3. I'll try and dig into the dynamo interaction when I have some time.

@stevearc
Copy link
Owner

That's very strange then, because nothing should be hooking into the transaction library. It's only used for SQL. The main thing that pyramid_tm is doing is adding a tween to the request, which you can see here. If you want to figure out where that time is going, I recommend adding some profiling around these sections, or maybe commenting out some of the functions at the bottom.

I'm going to be off the grid for a week or so. If you hit other roadblocks let me know and I'll look into it when I get back.

@fchorney
Copy link

fchorney commented Apr 9, 2019

has there been any progress with this? I'm currently looking to deploy this to an all aws configuration, and this is possibly the last step.

@stevearc
Copy link
Owner

I haven't looked into this since then, but I'd be curious to know if the Lambda deployment worked out. Another method possibly worth investigating is deploying as a container with Lightsail.

@brianrower
Copy link

We ended up moving away from this solution for various reasons. So I never spent enough time to figure out where the slowness discussed above came from.

@brianrower
Copy link

brianrower commented Apr 11, 2019

I just went through the steps to reproduce what I did back in September. Here's some notes if anyone would like to reproduce this and debug the timing issue more (or confirm/deny that they also get the timing issue):

requirements.txt:

botocore==1.12.5
flywheel==0.5.3
#pypicloud==1.0.7
-e git+git://github.com/brianrower/pypicloud.git@53e81c34d8848bf9ab5926cfc0ad9d754ad4be52#egg=pypicloud
python-dateutil==2.6.1
zappa==0.46.2

Note that this is using a branch that has modifications to pypicloud and is not the mainline pypicloud code.

zappa_setttings.json
(s3 randomness removed, make sure to populate with a unique value)

{
    "prod": {
        "project_name": "pypicloud-zappa",
        "app_function": "pypicloud.generate_wsgi_app",
        "aws_region": "us-west-2",
        "profile_name": "default",
        "project_name": "pypicloud",
        "runtime": "python3.6",
        "s3_bucket": "pypicloud-<some randomness-to-make-a-unique-bucket>",
        "debug": false
    }
}

server.ini
(password hashes, keys, and s3 randomness removed, make sure to populate with values)

[app:main]
use = egg:pypicloud

pyramid.reload_templates = False
pyramid.debug_authorization = false
pyramid.debug_notfound = false
pyramid.debug_routematch = false
pyramid.default_locale_name = en

pypi.default_read =
  authenticated

# Give a 404 if the package is not found on this server
pypi.fallback = redirect

pypi.storage = s3
storage.bucket = pypicloud-<the-same-randomness-as-in-zappa_settings>
storage.region_name = us-west-2
storage.prepend_hash = True

pypi.db = dynamo
db.region_name = us-west-2
db.namespace = pypicloud-db-

auth.admins =
  admin
  brower

user.admin = <password hash>
user.brower = <password hash>

# For beaker
session.encrypt_key = <key here>
session.validate_key = <key here>
session.secure = False
session.invalidate_corrupt = true

###
# wsgi server configuration
###

[server:main]
use = egg:waitress#main
host = 0.0.0.0
port = 6543

###
# logging configuration
# http://docs.pylonsproject.org/projects/pyramid/en/latest/narr/logging.html
###

[loggers]
keys = root, botocore, pypicloud

[handlers]
keys = console

[formatters]
keys = generic

[logger_root]
level = INFO
handlers = console

[logger_pypicloud]
level = DEBUG
qualname = pypicloud
handlers =

[logger_botocore]
level = WARN
qualname = botocore
handlers =

[handler_console]
class = StreamHandler
args = (sys.stderr,)
level = NOTSET
formatter = generic

[formatter_generic]
format = %(levelname)s %(asctime)s [%(name)s] %(message)s
  1. Create a directory containing the above files.
  2. virtualenv -p python3 .python
  3. source .python/bin/activate
  4. pip install -r requirements.txt
  5. zappa deploy prod

zappa will create an api gateway and lambda, you should be able to hit the home page by going to the api gateway url when this step is done.>

If you make changes to anything and want to update the existing deployment, run zappa update prod

To remove the deployment, run zappa undeploy prod

If anyone wants to take my hacked together steps here and put them into something more formal, please by all means do so, you have my blessing.

@vjm
Copy link

vjm commented Oct 24, 2019

i have a working example that uses serverless framework and lambda -- the only thing I can't figure out is, I seem to have corrupted package uploads. I don't think it's related to the serverless deployment but am having trouble pinpointing the exact issue (see details here: #221 ). I can make a copy of my code publicly available if people are interested.

@vjm
Copy link

vjm commented Oct 24, 2019

update: I found the issue -- API gateway was refusing those binary data types (application/zip, application/gzip) -- what is the correct datatype for whl?

@aperuru
Copy link

aperuru commented Jul 2, 2020

@stevearc after playing with it a bit more I realized I was getting TERRIBLE performance. Installing some of my standard requirements files was taking 2 minutes instead of 10 seconds (and all packages were still cached, it was just doing index checks). Debugged it a bit and determined that it was pyramid_tm causing the slow down. Removed the dependency and confirmed that theory. I'm not really familiar with Pyramid or the transaction library. I understand the concept of what it's trying to do, but don't know about the internals of how it does it. Any thoughts on why pyramid_tm would be so much slower?

@stevearc @brianrower Thank you for the valuable information. Seems like I'm running into similar issue, I am using same backend storage(s3) and cacheing (dynamoDB) as described by Brian. Can we expect the fix for the same anytime soon ?

If you want me to create a seperate issue, I can do that as well. Please let me know.

@stevearc
Copy link
Owner

stevearc commented Jul 6, 2020

Still no idea why pyramid_tm would tank performance. I made a branch here with a change that should completely exclude pyramid_tm if it's not needed (i.e. no SQL backends). Could you try it and see if that solves your problem?

@aperuru
Copy link

aperuru commented Jul 6, 2020

Still no idea why pyramid_tm would tank performance. I made a branch here with a change that should completely exclude pyramid_tm if it's not needed (i.e. no SQL backends). Could you try it and see if that solves your problem?

Hey @stevearc thank you for making the necessary changes required in a new branch. I'm going to try it out now. Before that I think you'd still need to remove "pyramid_tm" from setup.py, correct ?

@stevearc
Copy link
Owner

stevearc commented Jul 6, 2020

Well, I'm not exactly sure what changes @brianrower made when he tested it, but even if pyramid_tm is installed as a dependency it shouldn't do anything if it's not included. It should function the same as any code on disk not being run and have no impact.

That said, it's already a mystery to me why using pyramid_tm would slow anything down that much, so it's possible that there's something else I don't understand causing performance to tank even when it's just installed and not used. Even if that's the case, I think it's worth trying this patch to narrow down where the problem is occurring.

@aperuru
Copy link

aperuru commented Jul 7, 2020

@stevearc Thank you for creating the branch and making necessary changes. I further made following changes.

  • Added wrap_transactions = True in pypicloud/cache/dynamo.py under DynamoCache class.
  • Generated and added packages into the directory where the Dockerfile exists (https://github.com/stevearc/pypicloud-docker/tree/master/py3-baseimage)
  • Adding the packages into container and running them explicitly and installing dynamo3 and flywheel packages as I ran into multiple errors.
ADD pypicloud-1.1.1* /tmp/
RUN pip3 install /tmp/pypicloud-1.1.1* dynamo3 flywheel

Uploaded the required package on both pypiclouds to compare between both master and pyramid_tm branch.
here are the outputs

  • Using master branch
$ time pipenv install --clear                                                                                                                                                                                                                     ✔  10169  14:11:49 
Pipfile.lock not found, creating…
Locking [dev-packages] dependencies…
Locking [packages] dependencies…
Building requirements...
Resolving dependencies...
✔ Success! 
Updated Pipfile.lock (1166be)!
Installing dependencies from Pipfile.lock (1166be)…
  🐍   ▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉ 51/51 — 00:00:30
To activate this project's virtualenv, run pipenv shell.
Alternatively, run a command inside the virtualenv with pipenv run.
pipenv install --clear  122.11s user 14.36s system 95% cpu 2:22.16 total
  • Using pyramid_tm branch
$ time pipenv install --clear                                                                                                                                                                                                                     ✔  10165  13:15:30 
Pipfile.lock not found, creating…
Locking [dev-packages] dependencies…
Locking [packages] dependencies…
Building requirements...
Resolving dependencies...
✔ Success! 
Updated Pipfile.lock (bc1570)!
Installing dependencies from Pipfile.lock (bc1570)…
  🐍   ▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉▉ 51/51 — 00:00:27
To activate this project's virtualenv, run pipenv shell.
Alternatively, run a command inside the virtualenv with pipenv run.
pipenv install --clear  126.14s user 14.62s system 128% cpu 1:49.46 total

So, there is at-least 40 seconds difference on an average.
Besides that, on every GET request per package, I also see that constantly POST requests are being made to dynamoDB and secrets-manager like mentioned below, as each request take 200-300 ms when running in debug mode

DEBUG 2020-07-06 21:16:58,317 [urllib3.connectionpool] https://secretsmanager.us-east-1.amazonaws.com:443 "POST / HTTP/1.1" 200 484
DEBUG 2020-07-06 21:16:58,423 [urllib3.connectionpool] https://dynamodb.us-east-1.amazonaws.com:443 "POST / HTTP/1.1" 200 39

wondering If there is any way to handle it in way where those request happen only once while performing the package download/installation in which way we could save a lot of time and that would be a great performance enhancement overall. I'd like to know your thoughts and ideas towards it.

@aperuru
Copy link

aperuru commented Jul 7, 2020

Also, any idea why it throws warnings like this on regular intervals (minutes of every hour) when idea?

Jul  7 13:13:12 60135663c66e syslog-ng[11]: WARNING: you are using the pipe driver, underlying file is not a FIFO, it should be used by file(); filename='/dev/stdout'
Jul  7 13:33:11 60135663c66e syslog-ng[11]: WARNING: you are using the pipe driver, underlying file is not a FIFO, it should be used by file(); filename='/dev/stdout'
Jul  7 13:53:09 60135663c66e syslog-ng[11]: WARNING: you are using the pipe driver, underlying file is not a FIFO, it should be used by file(); filename='/dev/stdout'
Jul  7 14:13:08 60135663c66e syslog-ng[11]: WARNING: you are using the pipe driver, underlying file is not a FIFO, it should be used by file(); filename='/dev/stdout'
Jul  7 14:33:07 60135663c66e syslog-ng[11]: WARNING: you are using the pipe driver, underlying file is not a FIFO, it should be used by file(); filename='/dev/stdout'
Jul  7 14:53:05 60135663c66e syslog-ng[11]: WARNING: you are using the pipe driver, underlying file is not a FIFO, it should be used by file(); filename='/dev/stdout'
Jul  7 15:13:04 60135663c66e syslog-ng[11]: WARNING: you are using the pipe driver, underlying file is not a FIFO, it should be used by file(); filename='/dev/stdout'
Jul  7 15:33:03 60135663c66e syslog-ng[11]: WARNING: you are using the pipe driver, underlying file is not a FIFO, it should be used by file(); filename='/dev/stdout'
Jul  7 15:53:01 60135663c66e syslog-ng[11]: WARNING: you are using the pipe driver, underlying file is not a FIFO, it should be used by file(); filename='/dev/stdout'

@stevearc
Copy link
Owner

Couple of thoughts:

  1. You said you added wrap_transactions = True to the DynamoCache class. This is my fault. The initial commit I added to the pyramid_tm branch had a typo. What we actually wanted to test was what happened if wrap_transactions = False, because in that case we will completely ignore the pyramid_tm package entirely.
  2. For the POST requests, it sounds like you're seeing 2 per package, correct? That's what we expect and the only way around that would be to cache data more aggressively. I don't think it would make sense to cache the DynamoDB calls, but the auth call to Secrets Manager could probably be cached for 10m or so without much ill effect.
  3. The warning you're seeing could be related to syslog-ng WARNING with --tty phusion/baseimage-docker#468

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants