-
Notifications
You must be signed in to change notification settings - Fork 218
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce permissions of default authentication scope #4372
Conversation
47a42a8
to
53b2423
Compare
Something (presumably caused by my deduplication of the Very frustrating! I can't see at all how the test configuration is causing that to happen, and it only happens on a full test run, so it's purely down to some luck of ordering. Very annoying. I'm trying to find a way to completely squash the redis data on each subsequent test run so there's no chance of either the real redis being used nor of the data leaking between tests. |
@@ -41,18 +43,20 @@ def get_redis_connection(*args, **kwargs): | |||
|
|||
|
|||
@pytest.fixture(autouse=True) | |||
def django_cache(redis, monkeypatch) -> RedisCache: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it! Indeed, something was not working about the cache fixtures, and they were sometimes persisting between tests. Adding new tests (and moving fixtures around) in this PR exposed the issue, but it ends up being easy to solve by actually completely squashing the possibility of a cache being reused by literally replacing it. We can access the real data structure underneath (and it isn't even a hack to do so), so we don't need to monkeypatch in this case.
This explains the changes in this fixture module.
@@ -1,140 +1,22 @@ | |||
from django.conf import settings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to reorganise this module and the admin in general. We've been putting things pretty haphazardly, and it was becoming a mess. The forms
module was too generically named, and encouraged needlessly separating a form definition from the admin it was meant for. That doesn't make sense.
I've split it up into basically the conceptual model of our data, roughly matching the serializers, models, and so forth, rather than the implementation detail (form this, admin that).
The only new things in the admin are the throttled admin changes and the new form for it in admin.oauth
.
@@ -50,19 +50,17 @@ | |||
"name": "auth", | |||
"description": dedent( | |||
""" | |||
The API has rate-limiting and throttling in place to prevent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose not to list out all the details of what the limits are for throttling or privileged pagination. The throttle limits were wrong, sort of, in that those are the limits for some views, but thumbnails and some others have different limits. I don't think we actually need to publish the configured limits explicitly, and doing it like this (when we know they are configured via environment variables elsewhere anyway) inevitably leads to inaccurate information and drifting documentation.
Instead, I've changed this to describe the general idea of Openverse's API having intentional limits to requesters, how we communicate those limits (kept from the existing docs), and letting the reader know they can request higher limits if they wish.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, this is a great improvement IMO.
@@ -0,0 +1,161 @@ | |||
from dataclasses import dataclass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just moved from api/test/unit/conftest.py
and deduplicates the much more recent and less complete media_type
fixture removed from api/test/integration
.
"/", HTTP_AUTHORIZATION=f"Bearer {access_token.token}" | ||
) | ||
|
||
return APIView().initialize_request(request) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixtures in this module are just moved around, but I've also fixed an issue in this authed_request
fixture where it was causing request.auth
to be a token string, rather than the actual authenticated access token object that it would be in normal circumstances. It was using force_authenticate
which causes auth
to be just a string, and to bypass the authentication class. We don't need to do that, we have a real token, so we can just rely on the normal authentication configuration. That's a good thing! It means we're integrating the whole authentication flow into these tests and ensuring it works in all scenarios that we have tests for.
@pytest.mark.django_db | ||
def test_page_size_limit_unauthed(api_client): | ||
query_params = {"page_size": 20} | ||
res = api_client.get("/v1/images/", query_params) | ||
assert res.status_code == 200 | ||
query_params["page_size"] = 21 | ||
res = api_client.get("/v1/images/", query_params) | ||
assert res.status_code == 401 | ||
|
||
|
||
@pytest.mark.django_db | ||
def test_page_size_limit_authed(api_client, test_auth_token_exchange): | ||
time.sleep(1) | ||
token = test_auth_token_exchange["access_token"] | ||
query_params = {"page_size": 21} | ||
res = api_client.get( | ||
"/v1/images/", query_params, HTTP_AUTHORIZATION=f"Bearer {token}" | ||
) | ||
|
||
assert res.status_code == 200 | ||
|
||
query_params = {"page_size": 500} | ||
res = api_client.get( | ||
"/v1/images/", query_params, HTTP_AUTHORIZATION=f"Bearer {token}" | ||
) | ||
assert res.status_code == 200 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests are superseded by the new ones at the end of this module.
@pytest.mark.django_db | ||
def test_max_page_count(api_client): | ||
response = api_client.get( | ||
"/v1/images/", {"page": settings.MAX_PAGINATION_DEPTH + 1} | ||
) | ||
assert response.status_code == 400 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is replaced by the new test_auth
tests.
@@ -1,40 +1,6 @@ | |||
from dataclasses import dataclass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of these removed lines are just moved to api/test/fixtures/media_type_config.py
I tried testing this PR locally, but I get
error in tests and when I try to open the ThrottledApplication admin page. I assumed that I needed to run
What should I do? It would also be easier to review this PR if the admin splitting was extracted to a separate PR. |
@obulat you need to run |
Based on the high urgency of this PR, the following reviewers are being gently reminded to review this PR: @obulat Excluding weekend1 days, this PR was ready for review 3 day(s) ago. PRs labelled with high urgency are expected to be reviewed within 2 weekday(s)2. @sarayourfriend, if this PR is not ready for a review, please draft it to prevent reviewers from getting further unnecessary pings. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code, including tests, looks great to me @sarayourfriend. I really like the refactoring here, and the effort in comments and the commit history to make that easier to review 😅
I have been having trouble actually testing but just confirmed it's an issue with my local env and not this branch. Submitting my comments so far, but I'm going to continue looking at this now!
api/api/constants/privilege.py
Outdated
""" | ||
|
||
slug: str | ||
anonymous: typing.Any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why typing.Any
for these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In practice, they're currently int
, but I resume they could be anything. I'll try and see if Python will support a Generic parameter here 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A generic appears to work! 🎉 Well, whether it would actually pass mypy, I have no idea, but the code runs still and I don't get any errors with my informal type information from my IDE.
"""This serializer passes pagination parameters from the query string.""" | ||
|
||
_SUBJECT_TO_PAGINATION_LIMITS = ( | ||
"This parameter is subject to limitations based on authentication " | ||
"and special privileges. For details, refer to [the authentication " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
special privileges
-> access level
here might also be more clear.
|
||
if real_page_count > max_possible_page_count: | ||
return floor(max_possible_page_count) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The math here reads confusingly to me for the same reason commented above related to the name PAGINATION_DEPTH
. That being said -- very cool to clamp the actual result count independent of page size!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if the changes to names fixes this for you. Otherwise, I can try to think of a different way to express this.
@@ -50,19 +50,17 @@ | |||
"name": "auth", | |||
"description": dedent( | |||
""" | |||
The API has rate-limiting and throttling in place to prevent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, this is a great improvement IMO.
c66cbe2
to
f8e26dc
Compare
This PR has migrations. Please rebase it before merging to ensure that conflicting migrations are not introduced. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your patience @sarayourfriend! Finally got it working and this tested well for me. I tried authenticated & unauthenticated, with setting the privileges individually and both at once. Everything worked beautifully :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks great and works well locally (just recreate
worked, thank you!).
My main change request is for form clarification. It can be done in this PR, or in a follow-up one.
As a maintainer trying to restrict a violating app's access level, what do I need to do? When I look at the form, without the knowledge about the underlying code, I'm not sure if I should be changing the rate_limit_model
or the privileges
, and which privileges should I restrict to fix the problem.
We could add a documentation page about it, but I think for the maintainer trying to fix an API traffic problem, it would be nice to have an instruction inside the form itself. Something like "Unselect max_page_size for an app that is violating Openverse TOS by scraping content" in the privileges label.
That's definitely a documentation problem, and I think a misunderstanding of how to apply decisions based on ToS violations. The correct response to an application abusing any level of access it's been granted is to revoke its access. I'll talk with @zackkrida about where best to document this, I'm not entirely sure if it's suitable for public documentation right now. |
Fixes
Fixes #4324 by @sarayourfriend
Description
This PR adds a new
Privilege
concept to the Openverse API, and allows us to "upgrade" access to fine-grained aspects of the API for certain requesters. I've used this to add further gradations to the limits on page size and pagination depth in line with the issue's requirements.I've left comments on the PR to help explain some of the changes. The line count looks big, but it really is mostly a lot of changes to unit tests and reorganisation of the admin site configuration. The unit tests had some severe duplication of the
media_type_config
fixture, which is now fixed, and cleans the test data up a bit. This makes the tests consistent across the integration and unit tests, and easier to reason about and work with (no more context switching which media type fixture is this one?...). I ended up touching these because the changes in the serializer at first assumed therequest
was always available. This works in practice, but not in tests, because we actually do not always instantiate the serializer with a fully and correctly configured request object. That's pretty frustrating, but fixing it would require even more changes outside the scope of this PR, so for now the serializer shamefully allows the request to be missing so that tests will not fail. I've fixed the issue for some tests, but really we should have a helper method for the tests to use that creates the serializer with particular data and an correctly configured authed or anonymous request. Or, even better, move more tests towards the integration level, rather than unit level, and forget about these implementation details of the serializer in the tests (where they really don't matter). Anyway, that's all a lot more work and this is messy enough as it is.Otherwise, the bulk of the real changes are the new api.constants.privileges module and its data, along with the changes to the media serializer.
Testing Instructions
Run the app with
just api/test
and make anonymous requests, and see that limits are enforced. Page size should be limited to 20, and you shouldn't be able to go deeper than 240 results (12 pages of 20 size).Make authenticated requests, and confirm the behaviour matches the configuration and your understanding of it. Do the same with a privileged app, by visiting the Django admin and adding the privileges on the ThrottledApplication. Enable the privilege by checking the checkbox for that privilege, and save the throttled application.
Review the new unit tests in
test_auth
and confirm they cover the requirements.Checklist
Update index.md
).main
) or a parent feature branch.just catalog/generate-docs
for catalogPRs) or the media properties generator (
just catalog/generate-docs media-props
for the catalog or
just api/generate-docs
for the API) where applicable.Developer Certificate of Origin
Developer Certificate of Origin