-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run the app as ASGI #3011
Run the app as ASGI #3011
Conversation
5112061
to
337f84e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change looks good to me, and I'm onboard with the idea of switching to ASGI first to avoid the overhead of building async-to-sync wrappers that can be buggy, time-consuming and will end up being deleted anyway.
e8a415c
to
2d9485f
Compare
Based on the high urgency of this PR, the following reviewers are being gently reminded to review this PR: @krysal Excluding weekend1 days, this PR was ready for review 2 day(s) ago. PRs labelled with high urgency are expected to be reviewed within 2 weekday(s)2. @sarayourfriend, if this PR is not ready for a review, please draft it to prevent reviewers from getting further unnecessary pings. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I really wanted to make sure I wrapped my head around this 😅 Having gone back through the previous PRs and discussions, this makes sense and looks like a good (and exciting!) path forward.
Everything worked well in my testing, including the comparison of logs to main
. I also tested the static file handling by changing the ENVIRONMENT
variable locally. Thanks for the additional explanation in the commit messages 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🚀 I left just a few non-blocking questions. Thanks for explaining the move in so much detail. I surfed a bit the endpoints and everything seems to be working normally.
api/Pipfile
Outdated
limit = "~=0.2" | ||
Pillow = "~=10.0" | ||
psycopg2 = "~=2.9" | ||
python-decouple = "~=3.8" | ||
python-xmp-toolkit = "~=2.0" | ||
sentry-sdk = "~=1.30" | ||
django-split-settings = "*" | ||
uvicorn = {extras = ["standard"], version = "*"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you want to specify the version here?
uvicorn = {extras = ["standard"], version = "*"} | |
uvicorn = {extras = ["standard"], version = "~=0.23"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! After working with pipenv locally a bit more and seeing how it updates dependencies, I don't think these versions do very much for us. It will update any dependency that has any newer version that matches the constraint any time it runs install
😢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 0b00b43 (which does indeed include updates to boto 🤷).
I'm going to hold off merging this until after the Pillow update is released. It'll be better to orchestrate the necessary infrastructures changes anyway. Adding DO NOT MERGE to the title for posterity. |
This PR depends on two related infrastructure changes: https://github.com/WordPress/openverse-infrastructure/pull/618 to enable a canary service, to prevent increasing the task count from making our deployments run for hours. https://github.com/WordPress/openverse-infrastructure/pull/616 to resize the tasks and desired counts to accommodate a one-worker-per-task configuration we will end up with after deploying this PR. I'll wait to rebase this after #3029 because that PR will cause another rebase to be required anyway and is higher priority than this one. |
14d064c
to
f956f4e
Compare
I've rebased this branch and re-written the history a bit to squash a few commits that didn't need to be separated from other ones and were complicating rebases (especially changes to This PR should be ready to go, as long as it passes CI, and tried out in staging, after staging is increased to 4 tasks by https://github.com/WordPress/openverse-infrastructure/pull/616. |
Note: this does not apply to production because we serve static files from Nginx there: those static file requests never make it to the Django application and, indeed, it is not configured to serve static files. This change uses the ASGI static file handler that the Django `runserver` management command uses and correctly handles streaming responses. The only consequence of not doing this is that warnings will appear locally and, if for some reason local interactions are bypassing the static file cache on the browser, you could get a memory leak. Again, that only applies to local environments. Python code never interacts with, considers, or is configured for static files in production, so this is not an issue for production. The correct behaviour for production, which you can test by setting ENVIRONMENT to something other than `local` in `api/.env`, is to 404 on static files.
f956f4e
to
17802db
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ASGI build runs great for me locally, code looks good and makes sense!
We'll remove the do not merge label once the changes to staging task count is ready. |
Fixes
Fixes #2790 by @sarayourfriend
Description
In #3000, I've tried to implement the async client sharing necessary for #2788. The idea of that issue is to do async client instantiation "the right way". However, in that PR, I've discovered the deep complexities of trying to do things this way. There is, in fact, no way to do things the right way in an "async under WSGI" approach that we wouldn't immediately remove after we switched the app to run under ASGI.
On the other hand, switching the app to run under ASGI, without making other changes, does not create any such issues. Django automatically wraps sync views in
sync_to_async
and our application runs perfectly (the integration tests show that).It does, however, require making some relatively significant changes to the way we deploy the application. Namely, that is to switch to uvicorn and remove gunicorn in favour of scaling our ECS tasks with smaller individual task resources. In other words, rather than having tasks with more resources split their resources with four gunicorn workers, we'll have tasks with fewer individual resources, and scale the number of "workers" by increasing the number of tasks ECS provisions. This removes the gunicorn worker-manager middleman and relies on the orchestrator (ECS in our case) to create the correct number of "workers". Indeed, FastAPI's deployment advice recommends this approach for containerised applications outside of special cases (which I don't think we are): https://fastapi.tiangolo.com/deployment/docker/#one-process-per-container
So, with all of that in mind, this PR does the following:
This PR will require some changes to the API infrastructure deployment. Specifically, we'll want to do the following before deploying this PR:
For scaling the resources down, we need to consider the existing settings:
5 tasks with 1 vCPU and 4 GB RAM each.
Each task runs 4 workers. To get the exact same number of workers, we'd need 20 tasks. If we scale each part by a quarter, that is 0.25 vCPU and 1 GB RAM each. Surprising, even with the high number of individual tasks, that puts us at 1k USD less per 30-day period than we spend now.
That being said, I don't think this mathematical approach is going to be reliable. Instead, we should consider that we do not fully exercise our API instances to the maximum capacity. That means there could be room to deploy fewer tasks with more resources or just fewer tasks overall. So long as we match roughly the same worker-to-resource allocation ratio we have now, the price differences are either negligible or offer significant savings.
Testing Instructions
Checkout this branch and run
just build web
to get the new dependencies. Then runjust api/up
and make requests to the app. Check the logs withjust logs web
and confirm they look good and match the expected output patterns for our logging configuration (it should just matchmain
).Checklist
Update index.md
).main
) or a parent feature branch.Developer Certificate of Origin
Developer Certificate of Origin