Only include total count in the first page of list views #1911

jjnesbitt · 2024-03-27T20:11:42Z

Currently regardless of which page from a paginated endpoint is fetched, we return the count of the entire queryset, which requires a full table scan. This is a problem for endpoints where optimization is key, since regardless of how efficient you structure the database queries, you'll always scan the entire table to get the total number of objects that match query.

This can be wildly inefficient for queries with a large result set. If, for example, you were to list the assets from a dandiset containing 50k assets and page through them 100 at a time, you'd be making 500 requests, and for every request would scan the entire table to count all of these assets, even though you're only fetching 100.

To fix this behavior, this PR introduces a few new pagination related classes. Since the existing implementation of DRF's PageNumberPagination class (and the classes it invokes by default) involves many calls to the queryset count method from one path or another, specific classes and methods are overridden in order to provide pagination without invoking count, unless explicitly done so.

At the moment this new pagination is applied to all paginated views in the archive. If this is undesirable it could easily be applied to only the asset list endpoint, but I'm not a fan of that approach. It's more confusing to have two different behaviors across the archive, and the performance improvement will now benefit all of those endpoints.

The consequences of this change at the API surface are the following:

count will only be returned if page is unspecified or equals 1, and null otherwise
last can no longer be specified as a page number. I don't think anyone was really using this to begin with so that should have no effect.
The page number controls will no longer be shown on the stock DRF page for paginated views, as that requires knowing the full count of the queryset on every page.

As far as I can tell the dandi-cli won't be affected by this change, as it only checks for the checks for the count on the first page.

mvandenburgh · 2024-03-27T20:14:51Z

count will only be returned if page is unspecified or equals 1``, and null` otherwise

Is it possible to just exclude count from the response?

waxlamp · 2024-03-27T20:17:00Z

2. last can no longer be specified as a page number. I don't think anyone was really using this to begin with so that should have no effect.

It could be included in the same instances when count is returned, right?

jjnesbitt · 2024-03-27T20:17:56Z

Is it possible to just exclude count from the response?

Good question, I forgot to mention why this implementation is necessary. I initially tried to simply exclude count from the response, but the underlying pagination class calls count on the queryset in various places. This happens when calling paginate_queryset, and by the time the response is generated, it's simply using the cached value of count on that page instance.

jjnesbitt · 2024-03-27T20:21:30Z

last can no longer be specified as a page number. I don't think anyone was really using this to begin with so that should have no effect.

It could be included in the same instances when count is returned, right?

The use case is someone specifying page=last to return the last page, and doesn't really apply to normal page queries. I suppose it's actually not strictly necessary to disallow this, since it won't apply to the pages where we're excluding count, so I could keep it.

jjnesbitt · 2024-03-27T20:26:07Z

FYI, there are some flaky tests which should be fixed by #1910. I noticed a failure in this PR's CI due to that and re-ran the test, which this time succeeded. This may continue to happen if I update this branch.

waxlamp

Looks good. Relying on @mvandenburgh for a substantive review.

dandiapi/api/views/pagination.py

yarikoptic · 2024-03-28T20:54:39Z

NICE digging! Could someone though run this by https://github.com/encode/django-rest-framework/ , e.g. inquire on

either observation on the overhead is expected
whether DRF has some means to remove that overhead without providing an alternative paginator
would they be interested to receive contribution to adopt proposed here paginator?

my points are

would be great to run by this discovery/observations with the authors of the framework used here
see if they might come up with some less intrusive solution
long term -- offload maintenance of a needed solution to DRF instead of maintaining custom solution here.

mvandenburgh

LGTM, just left one suggestion to add a clarifying comment explaining the motivation for this reworked pagination as a whole. It's unfortunate that we have to override so much to accomplish this, but it all makes sense to me.

dandiapi/api/views/pagination.py

dandibot · 2024-03-29T21:28:13Z

🚀 PR was released in v0.3.82 🚀

jjnesbitt requested review from waxlamp and mvandenburgh March 27, 2024 20:11

jjnesbitt force-pushed the asset-list-excess-count branch from 0c16cfe to a0b4537 Compare March 27, 2024 20:13

waxlamp reviewed Mar 27, 2024

View reviewed changes

dandiapi/api/views/pagination.py Outdated Show resolved Hide resolved

Only include count in list views on the first page

569e0ef

jjnesbitt force-pushed the asset-list-excess-count branch from a0b4537 to 569e0ef Compare March 27, 2024 23:11

mvandenburgh approved these changes Mar 29, 2024

View reviewed changes

dandiapi/api/views/pagination.py Show resolved Hide resolved

Add module docstring to pagination.py

cc661ff

jjnesbitt added patch Increment the patch version when merged release Create a release when this pr is merged labels Mar 29, 2024

jjnesbitt merged commit dafbe3f into master Mar 29, 2024
11 checks passed

jjnesbitt deleted the asset-list-excess-count branch March 29, 2024 21:27

dandibot added the released This issue/pull request has been released. label Mar 29, 2024

jjnesbitt mentioned this pull request Apr 5, 2024

dandiset pagination appears broken. #1917

Closed

jjnesbitt mentioned this pull request May 28, 2024

Only use custom pagination class for asset list endpoint #1947

Merged

aaronkanzer mentioned this pull request May 28, 2024

Reference count from cached object_list for lazy paginator #1948

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only include total count in the first page of list views #1911

Only include total count in the first page of list views #1911

jjnesbitt commented Mar 27, 2024

mvandenburgh commented Mar 27, 2024

waxlamp commented Mar 27, 2024

jjnesbitt commented Mar 27, 2024

jjnesbitt commented Mar 27, 2024

jjnesbitt commented Mar 27, 2024

waxlamp left a comment

yarikoptic commented Mar 28, 2024

mvandenburgh left a comment

dandibot commented Mar 29, 2024

Only include total count in the first page of list views #1911

Only include total count in the first page of list views #1911

Conversation

jjnesbitt commented Mar 27, 2024

mvandenburgh commented Mar 27, 2024

waxlamp commented Mar 27, 2024

jjnesbitt commented Mar 27, 2024

jjnesbitt commented Mar 27, 2024

jjnesbitt commented Mar 27, 2024

waxlamp left a comment

Choose a reason for hiding this comment

yarikoptic commented Mar 28, 2024

mvandenburgh left a comment

Choose a reason for hiding this comment

dandibot commented Mar 29, 2024