-
Notifications
You must be signed in to change notification settings - Fork 204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Async Django app #2663
Async Django app #2663
Conversation
7788fa0
to
8d5dd60
Compare
from wsgi_basic_auth import BasicAuth | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This middleware was never used.
|
||
from gevent import monkey | ||
|
||
|
||
monkey.patch_all() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't use geventlet anymore, so this patching was not necessary and conflicts with asyncio
.
@@ -101,7 +101,6 @@ COPY --chown=opener . /api/ | |||
# Collect static assets, these are used by the next stage, `nginx` | |||
RUN env \ | |||
SETUP_ES="False" \ | |||
STATIC_ROOT="/static" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now correctly configured in settings for local and production deployments (which are intercepted by Nginx anyway) so there's no need to repeat the configuration here.
api/api/utils/aiohttp.py
Outdated
if _SESSION is None or _SESSION.loop.is_closed(): | ||
_SESSION = aiohttp.ClientSession() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking the event loop makes unit testing easier and is extra security for cases where the event loop closed unexpectedly. If the loop associated with the session is closed, we need a new session or it'll try to use the closed event loop.
@@ -32,9 +33,11 @@ def _get_expiry(status, default): | |||
return config(f"LINK_VALIDATION_CACHE_EXPIRY__{status}", default=default, cast=int) | |||
|
|||
|
|||
async def _head(url: str, session: aiohttp.ClientSession) -> tuple[str, int]: | |||
async def _head(url: str, **kwargs) -> tuple[str, int]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes in this module are just to support the reusable aiohttp session. We should have been doing that already anyway.
@@ -5,8 +5,8 @@ DJANGO_SECRET_KEY="ny#b__$$f6ry4wy8oxre97&-68u_0lk3gw(z=d40_dxey3zw0v1" | |||
DJANGO_DEBUG_ENABLED=True | |||
|
|||
BASE_URL=http://localhost:50280/ | |||
ENVIRONMENT=development | |||
ALLOWED_HOSTS=api.openverse.engineering,api-dev.openverse.engineering,host.docker.internal | |||
ENVIRONMENT=local |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use local
elsewhere and it's a bit clearer than development
, which has been historically muddled with staging
.
BASE_URL=http://localhost:50280/ | ||
ENVIRONMENT=development | ||
# List of comma-separated hosts/domain names, e.g., 127.17.0.1,local.app | ||
ALLOWED_HOSTS=localhost,172.17.0.1,host.docker.internal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to duplicate these between env.docker
. These environment files are now exclusively used for local development, so we don't need to do anything to make sure it's easy to handle live vs local deployments of it.
@pytest.fixture | ||
def photon_url(): | ||
return f"{settings.PHOTON_ENDPOINT}subdomain.example.com/path_part1/part2/{uuid4()}.jpg" | ||
|
||
|
||
@pytest.fixture | ||
def image_url(photon_url): | ||
return photon_url.replace(settings.PHOTON_ENDPOINT, "http://") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prevent duplicated image URLs.
def setup_http_get_exception(monkeypatch): | ||
def do(exc): | ||
def raise_exc(*args, **kwargs): | ||
raise exc | ||
mock_session = MagicMock(spec=aiohttp.ClientSession) | ||
mock_session.get.side_effect = exc | ||
|
||
monkeypatch.setattr("requests.get", raise_exc) | ||
monkeypatch.setattr( | ||
"api.utils.image_proxy.get_aiohttp_session", lambda: mock_session | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't use requests
anymore for this, so we need to mock out the aiohttp session instead.
async def test_photon_get_saves_image_type_to_cache( | ||
pook_off, image_url, redis, headers, expected_cache_val | ||
): | ||
image_url = image_url.replace(".jpg", "") | ||
image = await sync_to_async(ImageFactory.create)(url=image_url) | ||
|
||
pook.on() | ||
mock_head = pook.head(image_url).reply(200) | ||
if headers: | ||
mock_head = mock_head.headers(headers) | ||
|
||
with mock.patch("requests.head") as mock_head, pytest.raises(UnsupportedMediaType): | ||
mock_head.return_value.headers = headers | ||
photon_get(image_url, image.identifier) | ||
with pytest.raises(UnsupportedMediaType): | ||
await photon_get(image_url, image.identifier) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using pook instead of mocking aiohttp is a more reliable way to prevent the head request from going through.
Now that we use pytest-asyncio, it is easier to test the dead links in an async context and not need to worry about weird issues with the aiohttp client session being used outside of an async context. The places where the function was being called before (only one, in search) can now call `sync_check_dead_links` instead without any change to the overall behaviour of the app. It just makes the tests easier to configure now that we actually have real async contexts going on in some places.
Something is happening in my implementation that causes the event loop associated with the aiohttp ClientSession to close after each thumbnail request. I can't figure out what is causing this. It can be mitigated by checking if the loop on the session is closed before reusing it (and creating a new one if needed), but the I've tried to figure out what it is I did that is causing this behaviour. There's virtually nothing available online that describes the issue. It is only reproducible when using the production runtime configuration (gunicorn with uvicorn workers). When running directly under uvicorn, there's no issue at all. Using httpx instead of aiohttp doesn't solve the error. I get PoolTimeouts instead of event loop closure errors. This means it isn't something to do with aiohttp, but the implementation of the thumbnail route. I suspect it might have something to do with If I switch the implementation back to using Anyway, I'm not able to continue working on this due to AFK. I just wanted to share where I got with this project and the feasibility. I'm of the opinion that it is absolutely feasible and that we just need to identify the underlying issue that is causing the event loop closures. All routes other than the thumbnail continues to work perfectly fine, which leads me to believe it really is something caused by the implementation of the image proxy and small asyncio usage details that I'm not aware of. The requirement of needing to share the HTTP library session is a pain, but the implementation for it does work, theoretically! Maybe it's worth just doing everything in FastAPI, maybe even re-writing our entire app in FastAPI. Who knows. I think the Django approach would be the best one, but getting stuck on this event loop closed issue has left me confounded. Chances are that the same implementation would have all the same problems if it was running on FastAPI instead of Django... I really don't know what the effective difference would be there! Anyway, if someone else wants to try to debug the issue, just override the |
It is worth reverting the httpx addition back to the aiohttp version. Use the same implementation of `get_aiohttp_session` but actually use `aiohttp.ClientSession` instead. Pass `session.close` instead of `aclose` to the application shutdown callback. Everything else is basically the same. You will see the issues I have noted in the PR comment of the event loop closing if you replace the `web` service command in docker-compose with `gunicorn --reload`. The rest of the relevant configuration will get picked up from the gunicorn configuration file.
Also, just a note, I don't think the issue is caused by splitting the views, because all other views work perfectly fine. It's just the thumbnails route that has issues. I really think it's down to the implementation of the thumbnails route, something funky going on there with the event loops. |
Okay, the core maintainers have agreed that feasibility has been demonstrated by this PR, so I'm going to close both the PR and the issue. I'll ping on the issue to get clarification on the precise next steps. |
Fixes
Fixes #2566 by @zackkrida
Description
This is still a WIP. I haven't done anything to unit tests at all, so I'm sure many/all of them will fail. Wanted to get this up early though, I'm excited that it was possible at all to get it working!
Adds
adrf
along withuvicorn
anddjango-async-redis
. Also removes the synchronouswsgi-basic-auth
which we weren't using in production at all.I'll leave notes on other specific changes inline.
I decided to take the approach of separating the async routes from the sync routes. We can move routes one-by-one this way without needing to wrap entire routes in
sync_to_async
. It duplicates a small amount of code this way and exposes how bare-bonesadrf
is, but I don't think it's a big detriment. Future work to re-implement routes in the async view can happen at leisure. They'll coexist side-by-side without issue for now.The other big change is to convert the
image_proxy
modules to async. This was not as straight-forward as I had hoped due to some strange event loop behaviour when using async Redis for the tallies DB inimage_proxy.get
. If you change that to usedjango_async_redis.get_redis_connect
(andawait
it), you'll get a strange hanging event loop and subsequent closure issue on every other thumbnail request caused by redis-py. I dug around for a solution but couldn't find anything for now. Luckily, we do not need to solve this right away. For now, usingsync_to_async
is a safe way to adapt the synchronous redis client for usage in our async views. It might be worth switching to a different approach for redis connection management interactions than django's cache module. Django channels (https://channels.readthedocs.io/en/stable/introduction.html) could be a much better way to manage these connections. It's redis support already has full async and it comes with heaps of other cool features we could use like background workers (for things like letting thumbnail generation run beyond the timeout of a single request and be cached in Photon for subsequent requesters).If we deploy this (or something based on it), it would be in multiple stages:
sync_to_async
and handled in the loop.We can use gunicorn with uvicorn workers. That way we still have multiple workers but they can each handle asynchronous request lifecycles: https://www.uvicorn.org/deployment/#deployment
Even if we have sync routes handled by those workers, we won't be any worse off for those routes than we are with the current sync handlers. As we convert more routes to async, we'll get bigger improvements.
There might be some way to tell gunicorn to direct thumbnail requests to a subset of workers with sync requests to another. I'll need to dig into it more. Ultimately it shouldn't be an issue though because the sync routes are all very fast for the most part already.
Testing Instructions
just api/up
should work afterjust build web
. The dockerfile is configured to run uvicorn locally as a replacement forrunserver
, which doesn't properly support a full ASGI app. That means that we'll be able to fully test the ASGI app when running the app locally.Additional testing is needed to confirm that running the app with gunicorn and uvicorn workers works correctly. This is the recommended approach from uvicorn for production deployments.
Reviewers: For now, please focus first on the overall approach rather than implementation details. In particular, splitting the views vs wrapping existing views in
sync_to_async
etc. If you agree with the overall approach, then feel free to dive into the implementation details. Please do test this locally, even though our test suite is pretty comprehensive.Checklist
Update index.md
).main
) or a parent feature branch.Developer Certificate of Origin
Developer Certificate of Origin