-
Notifications
You must be signed in to change notification settings - Fork 766
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: fk resolver permissions leak #1411
Conversation
@superlevure this seems to be very interesting! waiting to test on my local environment! |
Thank you! If the description was not clear enough, this PR allows When one is defined, though, any filtering on the queryset will invalidate the "cache" offered by those as explained by Django documentation here: https://docs.djangoproject.com/en/4.2/ref/models/querysets/#prefetch-related I hope this MR can reconcile the two worlds: those who are dependent on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@superlevure Thank you for following up on this!
I definitely appreciate your point around this:
I hope this MR can reconcile the two worlds: those who are dependent on
select/prefetch_related
and those who rely on customget_queryset
It looks like overall this does the trick for that, and your approach of only using the custom resolver when get_queryset
has been overridden seems fairly reasonable.
I was still mostly unconvinced that relying on get_queryset
is really the right way to handle authorization, for the reasons around performance that you quoted from the Django docs, in that it tends to mean very expensive N+1 exponential SQL query fan-out. However, your tests around the FilmType
/FilmDetailsType
show a use-case where you don't actually do any further queryset filter
ing inside of get_queryset
to do your authorization, and that does seem tenable. 👉 Can I suggest that we update the "Global Filtering" graphene-django docs example to be more similar to that, where it doesn't actually change/filter the queryset, but instead just conditionally returns the queryset? I think it would also be worth noting in those docs something like "Warning: if you modify the queryset, such as performing any filtering here, it will result in N+1 queries when this DjangoObjectType is used as a relation from another Django model in graphene." WDYT?
I'm going to try running the tests from graphene-django-optimizer
(and from my graphene-django-permissions
project) against the changes in this branch to make sure there aren't any separately-revealed performance regressions there.
graphene_django/converter.py
Outdated
class CustomField(Field): | ||
def wrap_resolve(self, parent_resolver): | ||
""" | ||
Implements a custom resolver which go through the `get_node` method to insure that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor grammar: go
-> goes
, insure
-> ensure
Same in convert_field_to_djangomodel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the catch
return resolver | ||
|
||
def custom_resolver(root, info, **args): | ||
# Note: this function is used to resolve FK or 1:1 fields |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just out of curiosity, if this is used to resolve 1:1 fields, why does convert_onetoone_field_to_djangomodel
also need this similar custom_resolver
overriding logic separately?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question; for each relation kind, we need to deal with both directions: forward (OneToOneField
, ForeignKey
, ManyToManyField
) and backward (ManyToManyRel
, ManyToOneRel
, OneToOneRel
). The resolving logic depends on which side you're working on, hence the separation.
model = field.related_model | ||
|
||
def dynamic_type(): | ||
_type = registry.get_type_for_model(model) | ||
if not _type: | ||
return | ||
|
||
return Field( | ||
class CustomField(Field): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a general comment (and I don't mean this as a slight in any way), the code to achieve this behavior feels quite complex/confusing to me, bordering on it seeming like a code smell. e.g., are we properly handling all scenarios with it? How brittle is it? But I'm still not super familiar with all of the internals of graphene-django, so perhaps that's just how something of this lower-level nature would have to be written. I'd be curious to see @mahdyhamad's thoughts, since you're incorporating some of his efforts he'd described in #1315 (comment) but hadn't completed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this needs to be tested further and maybe simplified/refactored.
Also, it seems that this part from #1361 by-passes get_queryset
when the resolver is awaitable. I plan to tackle this in a future PR.
if is_resolver_awaitable:
fk_obj = resolver(root, info, **args)
# In case the resolver is a custom awaitable resolver that overwrites
# the default Django resolver
return fk_obj
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, unfortunately the changes here do break query optimization in graphene-django-optimizer
. I overwrote the converter.py
file in that project's env to match this PR locally, and the master branch of graphene-django-optimizer
then had 8 new tests failing. Specifically:
tests/test_query.py::test_should_reduce_number_of_queries_by_using_select_related FAILED
tests/test_query.py::test_should_optimize_when_using_fragments FAILED
tests/test_query.py::test_should_select_nested_related_fields FAILED
tests/test_query.py::test_should_prefetch_nested_select_related_field FAILED
tests/test_query.py::test_should_select_nested_prefetch_related_field FAILED
tests/test_query.py::test_should_select_nested_prefetch_and_select_related_fields FAILED
tests/test_query.py::test_should_fetch_fields_of_related_field FAILED
tests/test_relay.py::test_should_reduce_number_of_queries_in_relay_schema_by_using_select_related FAILED
For instance, for test_should_reduce_number_of_queries_by_using_select_related
:
> assert_query_equality(items, optimized_items)
tests/test_query.py:39:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
left_query = <QuerySet []>, right_query = <QuerySet []>
def assert_query_equality(left_query, right_query):
> assert str(left_query.query) == str(right_query.query)
E assert 'SELECT "test...."name" = bar' == 'SELECT "tests...."name" = bar'
E - SELECT "tests_item"."id", "tests_item"."name", "tests_item"."parent_id", "tests_item"."item_id", "tests_item"."value" FROM "tests_item" WHERE "tests_item"."name" = bar
E + SELECT "tests_item"."id", "tests_item"."name", "tests_item"."parent_id", "tests_item"."item_id", "tests_item"."value", T2."id", T2."name", T2."parent_id", T2."item_id", T2."value" FROM "tests_item" LEFT OUTER JOIN "tests_item" T2 ON ("tests_item"."parent_id" = T2."id") WHERE "tests_item"."name" = bar
It looks to me like the select_related
optimization is getting removed. Maybe a quirk of how those tests are written and how this CustomField
override is happening? Any ideas how to fix that?
I don't think this should be merged until we can be confident it does not introduce these sorts of performance regressions.
@sjdemartini, thank you for taking the time to test my PR on I'm trying to reproduce the issue, but I can't seem to make the tests pass on
virtualenv -p python3 venv
. venv/bin/activate
pip install -r dev-requirements.txt
# run tests:
python setup.py test I'm getting the following for each test: Does that ring a bell to you? Is this the correct repo you are referring to? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review @sjdemartini
I was still mostly unconvinced that relying on get_queryset is really the right way to handle authorization, for the reasons around performance that you quoted from the Django docs, in that it tends to mean very expensive N+1 exponential SQL query fan-out.
For our use case, we have to rely on get_queryset
to perform auth filtering at the DB level, as pulling all the data from the DB to Python to perform the filter would be a substantial performance/memory cost and would not scale.
We're tackling the N+1 problem with dataloaders instead of select_related
/prefetch_related
.
Can I suggest that we update the "Global Filtering" graphene-django docs example to be more similar to that, where it doesn't actually change/filter the queryset, but instead just conditionally returns the queryset? I think it would also be worth noting in those docs something like "Warning: if you modify the queryset, such as performing any filtering here, it will result in N+1 queries when this DjangoObjectType is used as a relation from another Django model in graphene."
Updating the documentation is on my to-do for this PR indeed. Also to be noted that the N+1 issue occurs anyway if you use graphene-django
without select_related
/prefetch_related
, so the comment should be more something like "Warning: modifying the queryset in get_queryset
will cancel out any optimization technique such as select_related
/prefetch_related
in the model's manager (link to Django doc)"
graphene_django/converter.py
Outdated
class CustomField(Field): | ||
def wrap_resolve(self, parent_resolver): | ||
""" | ||
Implements a custom resolver which go through the `get_node` method to insure that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the catch
return resolver | ||
|
||
def custom_resolver(root, info, **args): | ||
# Note: this function is used to resolve FK or 1:1 fields |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question; for each relation kind, we need to deal with both directions: forward (OneToOneField
, ForeignKey
, ManyToManyField
) and backward (ManyToManyRel
, ManyToOneRel
, OneToOneRel
). The resolving logic depends on which side you're working on, hence the separation.
model = field.related_model | ||
|
||
def dynamic_type(): | ||
_type = registry.get_type_for_model(model) | ||
if not _type: | ||
return | ||
|
||
return Field( | ||
class CustomField(Field): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this needs to be tested further and maybe simplified/refactored.
Also, it seems that this part from #1361 by-passes get_queryset
when the resolver is awaitable. I plan to tackle this in a future PR.
if is_resolver_awaitable:
fk_obj = resolver(root, info, **args)
# In case the resolver is a custom awaitable resolver that overwrites
# the default Django resolver
return fk_obj
Thank you for testing it out! Ah, yeah, that is because
Good point, sounds good to me! |
@sjdemartini I've spent some time on The reason why tests are failing with this branch of Some other thoughts on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @superlevure for taking the time to investigate that and figure out what was causing the test failures within graphene-django-optimizer
! It looks like if we merge this, the optimizer's OptimizedDjangoObjectType
class will effectively be broken, and it's somewhat of a bummer that there will be no way to generically define a DjangoObjectType
base class that will be query-optimized for both lists and nodes by default, in the way that was done before there (since overriding get_queryset
means query optimization will be "broken" for nested models).
But I definitely take your point on the current maintenance state of that package. It seems to have fizzled out around the time that graphene-django
originally did, but never revived sadly. (I've pinged the owner to ask if I can get added as a maintainer, fwiw.) Also fortunately, the OptimizedDjangoObjectType
doesn't seem like a critical piece of the package's functionality IMO and is more of a boilerplate-reducer, since you can still use the optimizer directly with resolve_*
functions and whatnot.
Since it's still possible to optimize queries in general (and with graphene-django-optimizer
, just not using OptimizedDjangoObjectType
) with your approach here, your changes seem reasonable to me!
I believe there had been discussions to have the optimizer built in graphene-django, maybe that is the way to go? to have these optimizations as part of graphene-django? Specially now that the original library is not being maintained. @jkimbo @ulgens @sjdemartini what do you think guys? or should we just keep things separated? One thought I have is that using these optimizations is actually a "very" django thing so maybe it is the way to go? |
I'm supportive of the idea of baking in the optimizations into graphene-django itself; it seems like that'd lend itself to cleaner/easier maintenance and likely a better implementation. There's an old issue here #57 where this was discussed a bit and generally landed on the notion that the optimizations should be part of graphene-django (#57 (comment)). Personally I don't think I'd have the time to dig into the details of how it could be done or port that functionality myself—I imagine it's not a small feat—and sadly the graphene-django-optimizer maintainer hasn't been responsive either. But I generally like the concept if someone else wants to pick that up! |
* fix: fk resolver permissions leak * fix: only one query for 1o1 relation * tests: added queries count check * fix: docstring * fix: typo * docs: added warning to authorization * feat: added bypass_get_queryset decorator
This PR re-introduce the changes from #1315 to make sure we go through
get_queryset
when resolving foreign relationships, with some differences:id
of the foreign object can be retrieved from the object itself (viarelation_id
)OneToOneRel
as wellget_queryset
was not overloaded by theType
, so we can leverage optimization techniques such asprefetch_related
andselect_related
(see Revert field resolver logic to fix poor query performance #1401)Left to do: