-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only include total count in the first page of list views #1911
Conversation
0c16cfe
to
a0b4537
Compare
Is it possible to just exclude |
It could be included in the same instances when |
Good question, I forgot to mention why this implementation is necessary. I initially tried to simply exclude |
The use case is someone specifying |
FYI, there are some flaky tests which should be fixed by #1910. I noticed a failure in this PR's CI due to that and re-ran the test, which this time succeeded. This may continue to happen if I update this branch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Relying on @mvandenburgh for a substantive review.
a0b4537
to
569e0ef
Compare
NICE digging! Could someone though run this by https://github.com/encode/django-rest-framework/ , e.g. inquire on
my points are
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just left one suggestion to add a clarifying comment explaining the motivation for this reworked pagination as a whole. It's unfortunate that we have to override so much to accomplish this, but it all makes sense to me.
🚀 PR was released in |
Currently regardless of which page from a paginated endpoint is fetched, we return the count of the entire queryset, which requires a full table scan. This is a problem for endpoints where optimization is key, since regardless of how efficient you structure the database queries, you'll always scan the entire table to get the total number of objects that match query.
This can be wildly inefficient for queries with a large result set. If, for example, you were to list the assets from a dandiset containing 50k assets and page through them 100 at a time, you'd be making 500 requests, and for every request would scan the entire table to count all of these assets, even though you're only fetching 100.
To fix this behavior, this PR introduces a few new pagination related classes. Since the existing implementation of DRF's
PageNumberPagination
class (and the classes it invokes by default) involves many calls to the querysetcount
method from one path or another, specific classes and methods are overridden in order to provide pagination without invokingcount
, unless explicitly done so.At the moment this new pagination is applied to all paginated views in the archive. If this is undesirable it could easily be applied to only the asset list endpoint, but I'm not a fan of that approach. It's more confusing to have two different behaviors across the archive, and the performance improvement will now benefit all of those endpoints.
The consequences of this change at the API surface are the following:
count
will only be returned ifpage
is unspecified or equals1
, andnull
otherwiselast
can no longer be specified as a page number. I don't think anyone was really using this to begin with so that should have no effect.As far as I can tell the dandi-cli won't be affected by this change, as it only checks for the checks for the count on the first page.