Run the app as ASGI #3011

sarayourfriend · 2023-09-11T08:40:11Z

Fixes

Description

In #3000, I've tried to implement the async client sharing necessary for #2788. The idea of that issue is to do async client instantiation "the right way". However, in that PR, I've discovered the deep complexities of trying to do things this way. There is, in fact, no way to do things the right way in an "async under WSGI" approach that we wouldn't immediately remove after we switched the app to run under ASGI.

On the other hand, switching the app to run under ASGI, without making other changes, does not create any such issues. Django automatically wraps sync views in sync_to_async and our application runs perfectly (the integration tests show that).

It does, however, require making some relatively significant changes to the way we deploy the application. Namely, that is to switch to uvicorn and remove gunicorn in favour of scaling our ECS tasks with smaller individual task resources. In other words, rather than having tasks with more resources split their resources with four gunicorn workers, we'll have tasks with fewer individual resources, and scale the number of "workers" by increasing the number of tasks ECS provisions. This removes the gunicorn worker-manager middleman and relies on the orchestrator (ECS in our case) to create the correct number of "workers". Indeed, FastAPI's deployment advice recommends this approach for containerised applications outside of special cases (which I don't think we are): https://fastapi.tiangolo.com/deployment/docker/#one-process-per-container

So, with all of that in mind, this PR does the following:

Run the application using uvicorn and convert gunicorn configuration to uvicorn
Implement an ASGI handler class that adds lifespan support to Django (compensates for https://code.djangoproject.com/ticket/31508)

This PR will require some changes to the API infrastructure deployment. Specifically, we'll want to do the following before deploying this PR:

Increase the number of tasks for the API
Then deploy this PR
Then scale the task resources down and redeploy

For scaling the resources down, we need to consider the existing settings:

5 tasks with 1 vCPU and 4 GB RAM each.

Each task runs 4 workers. To get the exact same number of workers, we'd need 20 tasks. If we scale each part by a quarter, that is 0.25 vCPU and 1 GB RAM each. Surprising, even with the high number of individual tasks, that puts us at 1k USD less per 30-day period than we spend now.

That being said, I don't think this mathematical approach is going to be reliable. Instead, we should consider that we do not fully exercise our API instances to the maximum capacity. That means there could be room to deploy fewer tasks with more resources or just fewer tasks overall. So long as we match roughly the same worker-to-resource allocation ratio we have now, the price differences are either negligible or offer significant savings.

Testing Instructions

Checkout this branch and run just build web to get the new dependencies. Then run just api/up and make requests to the app. Check the logs with just logs web and confirm they look good and match the expected output patterns for our logging configuration (it should just match main).

Checklist

My pull request has a descriptive title (not a vague title likeUpdate index.md).
My pull request targets the default branch of the repository (main) or a parent feature branch.
My commit messages follow best practices.
My code follows the established code style of the repository.
[N/A] I added or updated tests for the changes I made (if applicable).
I added or updated documentation (if applicable).
I tried running the project locally and verified that there are no visible errors.
[N/A] I ran the DAG documentation generator (if applicable).

Developer Certificate of Origin

Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

sarayourfriend · 2023-09-11T08:41:19Z

Adding @obulat and @dhruvkb to this PR because we discussed the issue in the team meeting. But please, @krysal and @stacimc give y'all's input on whether you think this approach is appropriate and if it makes sense or needs further testing or clarification.

dhruvkb

The change looks good to me, and I'm onboard with the idea of switching to ASGI first to avoid the overhead of building async-to-sync wrappers that can be buggy, time-consuming and will end up being deleted anyway.

api/conf/settings/base.py

api/run.py

openverse-bot · 2023-09-14T00:00:13Z

Based on the high urgency of this PR, the following reviewers are being gently reminded to review this PR:

@krysal
@obulat
@stacimc
This reminder is being automatically generated due to the urgency configuration.

Excluding weekend¹ days, this PR was ready for review 2 day(s) ago. PRs labelled with high urgency are expected to be reviewed within 2 weekday(s)².

@sarayourfriend, if this PR is not ready for a review, please draft it to prevent reviewers from getting further unnecessary pings.

Specifically, Saturday and Sunday. ↩
For the purpose of these reminders we treat Monday - Friday as weekdays. Please note that the operation that generates these reminders runs at midnight UTC on Monday - Friday. This means that depending on your timezone, you may be pinged outside of the expected range. ↩

stacimc

LGTM! I really wanted to make sure I wrapped my head around this 😅 Having gone back through the previous PRs and discussions, this makes sense and looks like a good (and exciting!) path forward.

Everything worked well in my testing, including the comparison of logs to main. I also tested the static file handling by changing the ENVIRONMENT variable locally. Thanks for the additional explanation in the commit messages 👍

api/conf/urls/__init__.py

krysal

LGTM 🚀 I left just a few non-blocking questions. Thanks for explaining the move in so much detail. I surfed a bit the endpoints and everything seems to be working normally.

krysal · 2023-09-14T18:54:52Z

api/Pipfile

 limit = "~=0.2"
 Pillow = "~=10.0"
 psycopg2 = "~=2.9"
 python-decouple = "~=3.8"
 python-xmp-toolkit = "~=2.0"
 sentry-sdk = "~=1.30"
 django-split-settings = "*"
+uvicorn = {extras = ["standard"], version = "*"}


Don't you want to specify the version here?

Suggested change

uvicorn = {extras = ["standard"], version = "*"}

uvicorn = {extras = ["standard"], version = "~=0.23"}

Sure! After working with pipenv locally a bit more and seeing how it updates dependencies, I don't think these versions do very much for us. It will update any dependency that has any newer version that matches the constraint any time it runs install 😢

Done in 0b00b43 (which does indeed include updates to boto 🤷).

api/conf/urls/__init__.py

sarayourfriend · 2023-09-14T23:59:44Z

I'm going to hold off merging this until after the Pillow update is released. It'll be better to orchestrate the necessary infrastructures changes anyway. Adding DO NOT MERGE to the title for posterity.

sarayourfriend · 2023-09-18T05:32:42Z

This PR depends on two related infrastructure changes:

https://github.com/WordPress/openverse-infrastructure/pull/618 to enable a canary service, to prevent increasing the task count from making our deployments run for hours.

https://github.com/WordPress/openverse-infrastructure/pull/616 to resize the tasks and desired counts to accommodate a one-worker-per-task configuration we will end up with after deploying this PR.

I'll wait to rebase this after #3029 because that PR will cause another rebase to be required anyway and is higher priority than this one.

sarayourfriend · 2023-09-21T06:01:51Z

I've rebased this branch and re-written the history a bit to squash a few commits that didn't need to be separated from other ones and were complicating rebases (especially changes to Pipfile et co).

This PR should be ready to go, as long as it passes CI, and tried out in staging, after staging is increased to 4 tasks by https://github.com/WordPress/openverse-infrastructure/pull/616.

api/Pipfile

Note: this does not apply to production because we serve static files from Nginx there: those static file requests never make it to the Django application and, indeed, it is not configured to serve static files. This change uses the ASGI static file handler that the Django `runserver` management command uses and correctly handles streaming responses. The only consequence of not doing this is that warnings will appear locally and, if for some reason local interactions are bypassing the static file cache on the browser, you could get a memory leak. Again, that only applies to local environments. Python code never interacts with, considers, or is configured for static files in production, so this is not an issue for production. The correct behaviour for production, which you can test by setting ENVIRONMENT to something other than `local` in `api/.env`, is to 404 on static files.

AetherUnbound

ASGI build runs great for me locally, code looks good and makes sense!

sarayourfriend · 2023-11-07T03:13:45Z

We'll remove the do not merge label once the changes to staging task count is ready.

sarayourfriend requested a review from a team as a code owner September 11, 2023 08:40

sarayourfriend requested review from krysal, stacimc, obulat and dhruvkb September 11, 2023 08:40

sarayourfriend force-pushed the add/asgi branch from 5112061 to 337f84e Compare September 11, 2023 08:42

github-actions bot added 🧱 stack: catalog Related to the catalog and Airflow DAGs 🧱 stack: ingestion server Related to the ingestion/data refresh server labels Sep 11, 2023

sarayourfriend mentioned this pull request Sep 11, 2023

Switch to httpx and reuse async client when possible #3000

Closed

7 tasks

dhruvkb approved these changes Sep 11, 2023

View reviewed changes

api/conf/settings/base.py Show resolved Hide resolved

api/run.py Show resolved Hide resolved

api/run.py Show resolved Hide resolved

sarayourfriend commented Sep 11, 2023

View reviewed changes

api/run.py Show resolved Hide resolved

sarayourfriend mentioned this pull request Sep 13, 2023

Add ADRF and make the thumbnail view async #3020

Merged

7 tasks

sarayourfriend force-pushed the add/asgi branch from e8a415c to 2d9485f Compare September 13, 2023 03:01

github-actions bot added 🧱 stack: catalog Related to the catalog and Airflow DAGs 🧱 stack: ingestion server Related to the ingestion/data refresh server labels Sep 13, 2023

sarayourfriend mentioned this pull request Sep 14, 2023

Add aiohttp client sharing #3024

Merged

7 tasks

stacimc approved these changes Sep 14, 2023

View reviewed changes

api/conf/urls/__init__.py Outdated Show resolved Hide resolved

krysal approved these changes Sep 14, 2023

View reviewed changes

sarayourfriend commented Sep 14, 2023

View reviewed changes

api/conf/urls/__init__.py Outdated Show resolved Hide resolved

sarayourfriend changed the title ~~Run the app as ASGI~~ [DO NOT MERGE] Run the app as ASGI Sep 14, 2023

sarayourfriend mentioned this pull request Sep 15, 2023

Upgrade ES dependencies to match cluster version #3029

Merged

7 tasks

sarayourfriend mentioned this pull request Sep 20, 2023

Replace gevent with uvloop #3048

Merged

5 tasks

sarayourfriend force-pushed the add/asgi branch from 14d064c to f956f4e Compare September 21, 2023 05:59

krysal reviewed Sep 21, 2023

View reviewed changes

api/Pipfile Show resolved Hide resolved

sarayourfriend mentioned this pull request Nov 2, 2023

Django ASGI #2843

Closed

2 tasks

sarayourfriend removed 🧱 stack: ingestion server Related to the ingestion/data refresh server 🧱 stack: catalog Related to the catalog and Airflow DAGs labels Nov 2, 2023

sarayourfriend changed the title ~~[DO NOT MERGE] Run the app as ASGI~~ Run the app as ASGI Nov 2, 2023

sarayourfriend added 3 commits November 3, 2023 10:25

Run the app as ASGI

9e0e893

Set explicit worker count

d4a7956

sarayourfriend force-pushed the add/asgi branch from f956f4e to 17802db Compare November 2, 2023 23:32

github-actions bot added 🧱 stack: catalog Related to the catalog and Airflow DAGs 🧱 stack: ingestion server Related to the ingestion/data refresh server labels Nov 2, 2023

Delete WSGI configuration (no longer needed)

0b2ff04

sarayourfriend requested review from krysal, dhruvkb, stacimc and AetherUnbound November 2, 2023 23:44

sarayourfriend changed the title ~~Run the app as ASGI~~ [DO NOT MERGE] Run the app as ASGI Nov 2, 2023

AetherUnbound approved these changes Nov 7, 2023

View reviewed changes

sarayourfriend changed the title ~~[DO NOT MERGE] Run the app as ASGI~~ Run the app as ASGI Nov 8, 2023

sarayourfriend merged commit 8ce37fa into main Nov 8, 2023
45 checks passed

sarayourfriend deleted the add/asgi branch November 8, 2023 22:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Run the app as ASGI #3011

Run the app as ASGI #3011

sarayourfriend commented Sep 11, 2023

sarayourfriend commented Sep 11, 2023

dhruvkb left a comment

openverse-bot commented Sep 14, 2023

stacimc left a comment

krysal left a comment

krysal Sep 14, 2023 •

edited by sarayourfriend

Loading

sarayourfriend Sep 14, 2023

sarayourfriend Sep 18, 2023

sarayourfriend commented Sep 14, 2023

sarayourfriend commented Sep 18, 2023 •

edited

Loading

sarayourfriend commented Sep 21, 2023

AetherUnbound left a comment

sarayourfriend commented Nov 7, 2023

	uvicorn = {extras = ["standard"], version = "*"}
	uvicorn = {extras = ["standard"], version = "~=0.23"}

Run the app as ASGI #3011

Run the app as ASGI #3011

Conversation

sarayourfriend commented Sep 11, 2023

Fixes

Description

Testing Instructions

Checklist

Developer Certificate of Origin

sarayourfriend commented Sep 11, 2023

dhruvkb left a comment

Choose a reason for hiding this comment

openverse-bot commented Sep 14, 2023

Footnotes

stacimc left a comment

Choose a reason for hiding this comment

krysal left a comment

Choose a reason for hiding this comment

krysal Sep 14, 2023 • edited by sarayourfriend Loading

Choose a reason for hiding this comment

sarayourfriend Sep 14, 2023

Choose a reason for hiding this comment

sarayourfriend Sep 18, 2023

Choose a reason for hiding this comment

sarayourfriend commented Sep 14, 2023

sarayourfriend commented Sep 18, 2023 • edited Loading

sarayourfriend commented Sep 21, 2023

AetherUnbound left a comment

Choose a reason for hiding this comment

sarayourfriend commented Nov 7, 2023

krysal Sep 14, 2023 •

edited by sarayourfriend

Loading

sarayourfriend commented Sep 18, 2023 •

edited

Loading