Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run the app as ASGI #3011

Merged
merged 4 commits into from
Nov 8, 2023
Merged

Run the app as ASGI #3011

merged 4 commits into from
Nov 8, 2023

Conversation

sarayourfriend
Copy link
Collaborator

Fixes

Fixes #2790 by @sarayourfriend

Description

In #3000, I've tried to implement the async client sharing necessary for #2788. The idea of that issue is to do async client instantiation "the right way". However, in that PR, I've discovered the deep complexities of trying to do things this way. There is, in fact, no way to do things the right way in an "async under WSGI" approach that we wouldn't immediately remove after we switched the app to run under ASGI.

On the other hand, switching the app to run under ASGI, without making other changes, does not create any such issues. Django automatically wraps sync views in sync_to_async and our application runs perfectly (the integration tests show that).

It does, however, require making some relatively significant changes to the way we deploy the application. Namely, that is to switch to uvicorn and remove gunicorn in favour of scaling our ECS tasks with smaller individual task resources. In other words, rather than having tasks with more resources split their resources with four gunicorn workers, we'll have tasks with fewer individual resources, and scale the number of "workers" by increasing the number of tasks ECS provisions. This removes the gunicorn worker-manager middleman and relies on the orchestrator (ECS in our case) to create the correct number of "workers". Indeed, FastAPI's deployment advice recommends this approach for containerised applications outside of special cases (which I don't think we are): https://fastapi.tiangolo.com/deployment/docker/#one-process-per-container

So, with all of that in mind, this PR does the following:

  1. Run the application using uvicorn and convert gunicorn configuration to uvicorn
  2. Implement an ASGI handler class that adds lifespan support to Django (compensates for https://code.djangoproject.com/ticket/31508)

This PR will require some changes to the API infrastructure deployment. Specifically, we'll want to do the following before deploying this PR:

  1. Increase the number of tasks for the API
  2. Then deploy this PR
  3. Then scale the task resources down and redeploy

For scaling the resources down, we need to consider the existing settings:

5 tasks with 1 vCPU and 4 GB RAM each.

Each task runs 4 workers. To get the exact same number of workers, we'd need 20 tasks. If we scale each part by a quarter, that is 0.25 vCPU and 1 GB RAM each. Surprising, even with the high number of individual tasks, that puts us at 1k USD less per 30-day period than we spend now.

That being said, I don't think this mathematical approach is going to be reliable. Instead, we should consider that we do not fully exercise our API instances to the maximum capacity. That means there could be room to deploy fewer tasks with more resources or just fewer tasks overall. So long as we match roughly the same worker-to-resource allocation ratio we have now, the price differences are either negligible or offer significant savings.

Testing Instructions

Checkout this branch and run just build web to get the new dependencies. Then run just api/up and make requests to the app. Check the logs with just logs web and confirm they look good and match the expected output patterns for our logging configuration (it should just match main).

Checklist

  • My pull request has a descriptive title (not a vague title likeUpdate index.md).
  • My pull request targets the default branch of the repository (main) or a parent feature branch.
  • My commit messages follow best practices.
  • My code follows the established code style of the repository.
  • [N/A] I added or updated tests for the changes I made (if applicable).
  • I added or updated documentation (if applicable).
  • I tried running the project locally and verified that there are no visible errors.
  • [N/A] I ran the DAG documentation generator (if applicable).

Developer Certificate of Origin

Developer Certificate of Origin
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

@sarayourfriend
Copy link
Collaborator Author

Adding @obulat and @dhruvkb to this PR because we discussed the issue in the team meeting. But please, @krysal and @stacimc give y'all's input on whether you think this approach is appropriate and if it makes sense or needs further testing or clarification.

@openverse-bot openverse-bot added 🟧 priority: high Stalls work on the project or its dependents 💻 aspect: code Concerns the software code in the repository 🧰 goal: internal improvement Improvement that benefits maintainers, not users ⛔ status: blocked Blocked & therefore, not ready for work 🧱 stack: api Related to the Django API labels Sep 11, 2023
@github-actions github-actions bot added 🧱 stack: catalog Related to the catalog and Airflow DAGs 🧱 stack: ingestion server Related to the ingestion/data refresh server labels Sep 11, 2023
Copy link
Member

@dhruvkb dhruvkb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks good to me, and I'm onboard with the idea of switching to ASGI first to avoid the overhead of building async-to-sync wrappers that can be buggy, time-consuming and will end up being deleted anyway.

api/conf/settings/base.py Show resolved Hide resolved
api/run.py Show resolved Hide resolved
api/run.py Show resolved Hide resolved
@sarayourfriend sarayourfriend added dependencies Pull requests that update a dependency file and removed ⛔ status: blocked Blocked & therefore, not ready for work 🧱 stack: ingestion server Related to the ingestion/data refresh server 🧱 stack: catalog Related to the catalog and Airflow DAGs labels Sep 13, 2023
@github-actions github-actions bot added 🧱 stack: catalog Related to the catalog and Airflow DAGs 🧱 stack: ingestion server Related to the ingestion/data refresh server labels Sep 13, 2023
@openverse-bot
Copy link
Collaborator

Based on the high urgency of this PR, the following reviewers are being gently reminded to review this PR:

@krysal
@obulat
@stacimc
This reminder is being automatically generated due to the urgency configuration.

Excluding weekend1 days, this PR was ready for review 2 day(s) ago. PRs labelled with high urgency are expected to be reviewed within 2 weekday(s)2.

@sarayourfriend, if this PR is not ready for a review, please draft it to prevent reviewers from getting further unnecessary pings.

Footnotes

  1. Specifically, Saturday and Sunday.

  2. For the purpose of these reminders we treat Monday - Friday as weekdays. Please note that the operation that generates these reminders runs at midnight UTC on Monday - Friday. This means that depending on your timezone, you may be pinged outside of the expected range.

@sarayourfriend sarayourfriend mentioned this pull request Sep 14, 2023
7 tasks
Copy link
Collaborator

@stacimc stacimc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I really wanted to make sure I wrapped my head around this 😅 Having gone back through the previous PRs and discussions, this makes sense and looks like a good (and exciting!) path forward.

Everything worked well in my testing, including the comparison of logs to main. I also tested the static file handling by changing the ENVIRONMENT variable locally. Thanks for the additional explanation in the commit messages 👍

api/conf/urls/__init__.py Outdated Show resolved Hide resolved
Copy link
Member

@krysal krysal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀 I left just a few non-blocking questions. Thanks for explaining the move in so much detail. I surfed a bit the endpoints and everything seems to be working normally.

api/Pipfile Outdated
limit = "~=0.2"
Pillow = "~=10.0"
psycopg2 = "~=2.9"
python-decouple = "~=3.8"
python-xmp-toolkit = "~=2.0"
sentry-sdk = "~=1.30"
django-split-settings = "*"
uvicorn = {extras = ["standard"], version = "*"}
Copy link
Member

@krysal krysal Sep 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you want to specify the version here?

Suggested change
uvicorn = {extras = ["standard"], version = "*"}
uvicorn = {extras = ["standard"], version = "~=0.23"}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! After working with pipenv locally a bit more and seeing how it updates dependencies, I don't think these versions do very much for us. It will update any dependency that has any newer version that matches the constraint any time it runs install 😢

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 0b00b43 (which does indeed include updates to boto 🤷).

api/conf/urls/__init__.py Outdated Show resolved Hide resolved
@sarayourfriend
Copy link
Collaborator Author

I'm going to hold off merging this until after the Pillow update is released. It'll be better to orchestrate the necessary infrastructures changes anyway. Adding DO NOT MERGE to the title for posterity.

@sarayourfriend sarayourfriend changed the title Run the app as ASGI [DO NOT MERGE] Run the app as ASGI Sep 14, 2023
@sarayourfriend
Copy link
Collaborator Author

sarayourfriend commented Sep 18, 2023

This PR depends on two related infrastructure changes:

https://github.com/WordPress/openverse-infrastructure/pull/618 to enable a canary service, to prevent increasing the task count from making our deployments run for hours.

https://github.com/WordPress/openverse-infrastructure/pull/616 to resize the tasks and desired counts to accommodate a one-worker-per-task configuration we will end up with after deploying this PR.

I'll wait to rebase this after #3029 because that PR will cause another rebase to be required anyway and is higher priority than this one.

@sarayourfriend
Copy link
Collaborator Author

I've rebased this branch and re-written the history a bit to squash a few commits that didn't need to be separated from other ones and were complicating rebases (especially changes to Pipfile et co).

This PR should be ready to go, as long as it passes CI, and tried out in staging, after staging is increased to 4 tasks by https://github.com/WordPress/openverse-infrastructure/pull/616.

api/Pipfile Show resolved Hide resolved
@sarayourfriend sarayourfriend mentioned this pull request Nov 2, 2023
2 tasks
@sarayourfriend sarayourfriend removed 🧱 stack: ingestion server Related to the ingestion/data refresh server 🧱 stack: catalog Related to the catalog and Airflow DAGs labels Nov 2, 2023
@sarayourfriend sarayourfriend changed the title [DO NOT MERGE] Run the app as ASGI Run the app as ASGI Nov 2, 2023
Note: this does not apply to production because we serve static files from Nginx there: those static file requests never make it to the Django application and, indeed, it is not configured to serve static files. This change uses the ASGI static file handler that the Django `runserver` management command uses and correctly handles streaming responses. The only consequence of not doing this is that warnings will appear locally and, if for some reason local interactions are bypassing the static file cache on the browser, you could get a memory leak. Again, that only applies to local environments. Python code never interacts with, considers, or is configured for static files in production, so this is not an issue for production. The correct behaviour for production, which you can test by setting ENVIRONMENT to something other than `local` in `api/.env`, is to 404 on static files.
@github-actions github-actions bot added 🧱 stack: catalog Related to the catalog and Airflow DAGs 🧱 stack: ingestion server Related to the ingestion/data refresh server labels Nov 2, 2023
@sarayourfriend sarayourfriend changed the title Run the app as ASGI [DO NOT MERGE] Run the app as ASGI Nov 2, 2023
Copy link
Collaborator

@AetherUnbound AetherUnbound left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ASGI build runs great for me locally, code looks good and makes sense!

@sarayourfriend
Copy link
Collaborator Author

We'll remove the do not merge label once the changes to staging task count is ready.

@sarayourfriend sarayourfriend changed the title [DO NOT MERGE] Run the app as ASGI Run the app as ASGI Nov 8, 2023
@sarayourfriend sarayourfriend merged commit 8ce37fa into main Nov 8, 2023
45 checks passed
@sarayourfriend sarayourfriend deleted the add/asgi branch November 8, 2023 22:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository dependencies Pull requests that update a dependency file 🧰 goal: internal improvement Improvement that benefits maintainers, not users 🟧 priority: high Stalls work on the project or its dependents 🧱 stack: api Related to the Django API 🧱 stack: catalog Related to the catalog and Airflow DAGs 🧱 stack: ingestion server Related to the ingestion/data refresh server
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Convert the API to ASGI
6 participants