Async Django app #2663

sarayourfriend · 2023-07-17T21:16:53Z

Fixes

Description

This is still a WIP. I haven't done anything to unit tests at all, so I'm sure many/all of them will fail. Wanted to get this up early though, I'm excited that it was possible at all to get it working!

Adds adrf along with uvicorn and django-async-redis. Also removes the synchronous wsgi-basic-auth which we weren't using in production at all.

I'll leave notes on other specific changes inline.

I decided to take the approach of separating the async routes from the sync routes. We can move routes one-by-one this way without needing to wrap entire routes in sync_to_async. It duplicates a small amount of code this way and exposes how bare-bones adrf is, but I don't think it's a big detriment. Future work to re-implement routes in the async view can happen at leisure. They'll coexist side-by-side without issue for now.

The other big change is to convert the image_proxy modules to async. This was not as straight-forward as I had hoped due to some strange event loop behaviour when using async Redis for the tallies DB in image_proxy.get. If you change that to use django_async_redis.get_redis_connect (and await it), you'll get a strange hanging event loop and subsequent closure issue on every other thumbnail request caused by redis-py. I dug around for a solution but couldn't find anything for now. Luckily, we do not need to solve this right away. For now, using sync_to_async is a safe way to adapt the synchronous redis client for usage in our async views. It might be worth switching to a different approach for redis connection management interactions than django's cache module. Django channels (https://channels.readthedocs.io/en/stable/introduction.html) could be a much better way to manage these connections. It's redis support already has full async and it comes with heaps of other cool features we could use like background workers (for things like letting thumbnail generation run beyond the timeout of a single request and be cached in Photon for subsequent requesters).

If we deploy this (or something based on it), it would be in multiple stages:

Merge this change and continue using the WSGI app with an "async" thumbnails route (still handled by a single worker)
Update the deployment template to use uvicorn and the ASGI app for true async thumbnails route. Other routes automatically get wrapped with sync_to_async and handled in the loop.

We can use gunicorn with uvicorn workers. That way we still have multiple workers but they can each handle asynchronous request lifecycles: https://www.uvicorn.org/deployment/#deployment

Even if we have sync routes handled by those workers, we won't be any worse off for those routes than we are with the current sync handlers. As we convert more routes to async, we'll get bigger improvements.

There might be some way to tell gunicorn to direct thumbnail requests to a subset of workers with sync requests to another. I'll need to dig into it more. Ultimately it shouldn't be an issue though because the sync routes are all very fast for the most part already.

Testing Instructions

just api/up should work after just build web. The dockerfile is configured to run uvicorn locally as a replacement for runserver, which doesn't properly support a full ASGI app. That means that we'll be able to fully test the ASGI app when running the app locally.

Additional testing is needed to confirm that running the app with gunicorn and uvicorn workers works correctly. This is the recommended approach from uvicorn for production deployments.

Reviewers: For now, please focus first on the overall approach rather than implementation details. In particular, splitting the views vs wrapping existing views in sync_to_async etc. If you agree with the overall approach, then feel free to dive into the implementation details. Please do test this locally, even though our test suite is pretty comprehensive.

Checklist

My pull request has a descriptive title (not a vague title likeUpdate index.md).
My pull request targets the default branch of the repository (main) or a parent feature branch.
My commit messages follow best practices.
My code follows the established code style of the repository.
I added or updated tests for the changes I made (if applicable).
I added or updated documentation (if applicable).
I tried running the project locally and verified that there are no visible errors.
[N/A] I ran the DAG documentation generator (if applicable).

Developer Certificate of Origin

Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

sarayourfriend · 2023-07-21T03:41:19Z

api/conf/wsgi.py

-from wsgi_basic_auth import BasicAuth
-


This middleware was never used.

sarayourfriend · 2023-07-21T03:41:40Z

api/conf/wsgi.py

-
-from gevent import monkey
-
-
-monkey.patch_all()


We don't use geventlet anymore, so this patching was not necessary and conflicts with asyncio.

sarayourfriend · 2023-07-21T03:42:29Z

api/Dockerfile

@@ -101,7 +101,6 @@ COPY --chown=opener . /api/
 # Collect static assets, these are used by the next stage, `nginx`
 RUN env \
  SETUP_ES="False" \
-  STATIC_ROOT="/static" \


This is now correctly configured in settings for local and production deployments (which are intercepted by Nginx anyway) so there's no need to repeat the configuration here.

sarayourfriend · 2023-07-21T03:43:26Z

api/api/utils/aiohttp.py

+    if _SESSION is None or _SESSION.loop.is_closed():
+        _SESSION = aiohttp.ClientSession()


Checking the event loop makes unit testing easier and is extra security for cases where the event loop closed unexpectedly. If the loop associated with the session is closed, we need a new session or it'll try to use the closed event loop.

sarayourfriend · 2023-07-21T03:44:19Z

api/api/utils/check_dead_links/__init__.py

@@ -32,9 +33,11 @@ def _get_expiry(status, default):
    return config(f"LINK_VALIDATION_CACHE_EXPIRY__{status}", default=default, cast=int)


-async def _head(url: str, session: aiohttp.ClientSession) -> tuple[str, int]:
+async def _head(url: str, **kwargs) -> tuple[str, int]:


Changes in this module are just to support the reusable aiohttp session. We should have been doing that already anyway.

sarayourfriend · 2023-07-21T03:51:10Z

api/env.docker

@@ -5,8 +5,8 @@ DJANGO_SECRET_KEY="ny#b__$$f6ry4wy8oxre97&-68u_0lk3gw(z=d40_dxey3zw0v1"
 DJANGO_DEBUG_ENABLED=True

 BASE_URL=http://localhost:50280/
-ENVIRONMENT=development
-ALLOWED_HOSTS=api.openverse.engineering,api-dev.openverse.engineering,host.docker.internal
+ENVIRONMENT=local


We use local elsewhere and it's a bit clearer than development, which has been historically muddled with staging.

sarayourfriend · 2023-07-21T03:51:56Z

api/env.template

-BASE_URL=http://localhost:50280/
-ENVIRONMENT=development
-# List of comma-separated hosts/domain names, e.g., 127.17.0.1,local.app
-ALLOWED_HOSTS=localhost,172.17.0.1,host.docker.internal


No need to duplicate these between env.docker. These environment files are now exclusively used for local development, so we don't need to do anything to make sure it's easy to handle live vs local deployments of it.

sarayourfriend · 2023-07-21T03:52:20Z

api/test/unit/utils/test_image_proxy.py

+@pytest.fixture
+def photon_url():
+    return f"{settings.PHOTON_ENDPOINT}subdomain.example.com/path_part1/part2/{uuid4()}.jpg"
+
+
+@pytest.fixture
+def image_url(photon_url):
+    return photon_url.replace(settings.PHOTON_ENDPOINT, "http://")


Prevent duplicated image URLs.

sarayourfriend · 2023-07-21T03:52:56Z

api/test/unit/utils/test_image_proxy.py

+def setup_http_get_exception(monkeypatch):
    def do(exc):
-        def raise_exc(*args, **kwargs):
-            raise exc
+        mock_session = MagicMock(spec=aiohttp.ClientSession)
+        mock_session.get.side_effect = exc

-        monkeypatch.setattr("requests.get", raise_exc)
+        monkeypatch.setattr(
+            "api.utils.image_proxy.get_aiohttp_session", lambda: mock_session
+        )


We don't use requests anymore for this, so we need to mock out the aiohttp session instead.

sarayourfriend · 2023-07-21T03:53:48Z

api/test/unit/utils/test_image_proxy.py

+async def test_photon_get_saves_image_type_to_cache(
+    pook_off, image_url, redis, headers, expected_cache_val
+):
+    image_url = image_url.replace(".jpg", "")
+    image = await sync_to_async(ImageFactory.create)(url=image_url)
+
+    pook.on()
+    mock_head = pook.head(image_url).reply(200)
+    if headers:
+        mock_head = mock_head.headers(headers)

-    with mock.patch("requests.head") as mock_head, pytest.raises(UnsupportedMediaType):
-        mock_head.return_value.headers = headers
-        photon_get(image_url, image.identifier)
+    with pytest.raises(UnsupportedMediaType):
+        await photon_get(image_url, image.identifier)


Using pook instead of mocking aiohttp is a more reliable way to prevent the head request from going through.

Now that we use pytest-asyncio, it is easier to test the dead links in an async context and not need to worry about weird issues with the aiohttp client session being used outside of an async context. The places where the function was being called before (only one, in search) can now call `sync_check_dead_links` instead without any change to the overall behaviour of the app. It just makes the tests easier to configure now that we actually have real async contexts going on in some places.

sarayourfriend · 2023-07-24T09:02:38Z

Something is happening in my implementation that causes the event loop associated with the aiohttp ClientSession to close after each thumbnail request. I can't figure out what is causing this. It can be mitigated by checking if the loop on the session is closed before reusing it (and creating a new one if needed), but the .loop property on the session is deprecated, so this raises a warning, and in future versions of aiohttp the property will be removed entirely.

I've tried to figure out what it is I did that is causing this behaviour. There's virtually nothing available online that describes the issue. It is only reproducible when using the production runtime configuration (gunicorn with uvicorn workers). When running directly under uvicorn, there's no issue at all.

Using httpx instead of aiohttp doesn't solve the error. I get PoolTimeouts instead of event loop closure errors. This means it isn't something to do with aiohttp, but the implementation of the thumbnail route. I suspect it might have something to do with sync_to_async wrappers around Redis, but I just don't know.

If I switch the implementation back to using django-async-redis, I only get event loop closure errors on every other request (as opposed to being able to serve a single request followed by only getting event loop closure errors on every subsequent request).

Anyway, I'm not able to continue working on this due to AFK. I just wanted to share where I got with this project and the feasibility.

I'm of the opinion that it is absolutely feasible and that we just need to identify the underlying issue that is causing the event loop closures. All routes other than the thumbnail continues to work perfectly fine, which leads me to believe it really is something caused by the implementation of the image proxy and small asyncio usage details that I'm not aware of. The requirement of needing to share the HTTP library session is a pain, but the implementation for it does work, theoretically!

Maybe it's worth just doing everything in FastAPI, maybe even re-writing our entire app in FastAPI. Who knows. I think the Django approach would be the best one, but getting stuck on this event loop closed issue has left me confounded. Chances are that the same implementation would have all the same problems if it was running on FastAPI instead of Django... I really don't know what the effective difference would be there!

Anyway, if someone else wants to try to debug the issue, just override the command in the docker-compose for web with gunicorn --reload and the gunicorn configuration will pick up the rest. It is only reproducible when running multiple workers, so maybe there's something thread-unsafe being done in the code that is causing this? 🤷

It is worth reverting the httpx addition back to the aiohttp version. Use the same implementation of `get_aiohttp_session` but actually use `aiohttp.ClientSession` instead. Pass `session.close` instead of `aclose` to the application shutdown callback. Everything else is basically the same. You will see the issues I have noted in the PR comment of the event loop closing if you replace the `web` service command in docker-compose with `gunicorn --reload`. The rest of the relevant configuration will get picked up from the gunicorn configuration file.

sarayourfriend · 2023-07-24T09:11:44Z

Also, just a note, I don't think the issue is caused by splitting the views, because all other views work perfectly fine. It's just the thumbnails route that has issues. I really think it's down to the implementation of the thumbnails route, something funky going on there with the event loops.

sarayourfriend · 2023-07-25T02:23:53Z

Okay, the core maintainers have agreed that feasibility has been demonstrated by this PR, so I'm going to close both the PR and the issue. I'll ping on the issue to get clarification on the precise next steps.

openverse-bot added 🟨 priority: medium Not blocking but should be addressed soon 🤖 aspect: dx Concerns developers' experience with the codebase 🧰 goal: internal improvement Improvement that benefits maintainers, not users 🧱 stack: api Related to the Django API labels Jul 17, 2023

sarayourfriend mentioned this pull request Jul 18, 2023

Add explicit gunicorn configuration for the API so logging is consistent #2648

Merged

8 tasks

sarayourfriend added 5 commits July 21, 2023 11:44

Add ASGI Django app

84c81d7

Use uvicorn locally for full asgi testing

36d16fc

Use reusable aiohttp session and prevent async redis event loop closures

cd6e699

Fix local static files when using uvicorn

b1bbc1d

Update unit tests for async

8d5dd60

sarayourfriend force-pushed the add/django-asgi-exploration branch from 7788fa0 to 8d5dd60 Compare July 21, 2023 03:34

Remove unnecessary closure

dd68ca2

sarayourfriend commented Jul 21, 2023

View reviewed changes

sarayourfriend marked this pull request as ready for review July 21, 2023 03:53

sarayourfriend requested a review from a team as a code owner July 21, 2023 03:54

sarayourfriend requested review from krysal and stacimc July 21, 2023 03:54

krysal mentioned this pull request Jul 21, 2023

Timeout related errors in /v1/images/ endpoint #2701

Closed

github-actions bot added 🧱 stack: catalog Related to the catalog and Airflow DAGs 🧱 stack: ingestion server Related to the ingestion/data refresh server labels Jul 24, 2023

sarayourfriend closed this Jul 25, 2023

sarayourfriend mentioned this pull request Jul 25, 2023

Evaluate feasibility of Django ASGI and ADRF conversion #2566

Closed

sarayourfriend mentioned this pull request Aug 7, 2023

Convert the image proxy route to async #2789

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Async Django app #2663

Async Django app #2663

sarayourfriend commented Jul 17, 2023 •

edited

Loading

sarayourfriend Jul 21, 2023

sarayourfriend Jul 21, 2023

sarayourfriend Jul 21, 2023

sarayourfriend Jul 21, 2023

sarayourfriend Jul 21, 2023

sarayourfriend Jul 21, 2023

sarayourfriend Jul 21, 2023

sarayourfriend Jul 21, 2023

sarayourfriend Jul 21, 2023

sarayourfriend Jul 21, 2023

sarayourfriend commented Jul 24, 2023

sarayourfriend commented Jul 24, 2023

sarayourfriend commented Jul 25, 2023

		if _SESSION is None or _SESSION.loop.is_closed():
		_SESSION = aiohttp.ClientSession()

Async Django app #2663

Async Django app #2663

Conversation

sarayourfriend commented Jul 17, 2023 • edited Loading

Fixes

Description

Testing Instructions

Checklist

Developer Certificate of Origin

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sarayourfriend commented Jul 24, 2023

sarayourfriend commented Jul 24, 2023

sarayourfriend commented Jul 25, 2023

sarayourfriend commented Jul 17, 2023 •

edited

Loading