Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only include total count in the first page of list views #1911

Merged
merged 2 commits into from
Mar 29, 2024

Conversation

jjnesbitt
Copy link
Member

Currently regardless of which page from a paginated endpoint is fetched, we return the count of the entire queryset, which requires a full table scan. This is a problem for endpoints where optimization is key, since regardless of how efficient you structure the database queries, you'll always scan the entire table to get the total number of objects that match query.

This can be wildly inefficient for queries with a large result set. If, for example, you were to list the assets from a dandiset containing 50k assets and page through them 100 at a time, you'd be making 500 requests, and for every request would scan the entire table to count all of these assets, even though you're only fetching 100.

To fix this behavior, this PR introduces a few new pagination related classes. Since the existing implementation of DRF's PageNumberPagination class (and the classes it invokes by default) involves many calls to the queryset count method from one path or another, specific classes and methods are overridden in order to provide pagination without invoking count, unless explicitly done so.

At the moment this new pagination is applied to all paginated views in the archive. If this is undesirable it could easily be applied to only the asset list endpoint, but I'm not a fan of that approach. It's more confusing to have two different behaviors across the archive, and the performance improvement will now benefit all of those endpoints.

The consequences of this change at the API surface are the following:

  1. count will only be returned if page is unspecified or equals 1, and null otherwise
  2. last can no longer be specified as a page number. I don't think anyone was really using this to begin with so that should have no effect.
  3. The page number controls will no longer be shown on the stock DRF page for paginated views, as that requires knowing the full count of the queryset on every page.

As far as I can tell the dandi-cli won't be affected by this change, as it only checks for the checks for the count on the first page.

@mvandenburgh
Copy link
Member

count will only be returned if page is unspecified or equals 1``, and null` otherwise

Is it possible to just exclude count from the response?

@waxlamp
Copy link
Member

waxlamp commented Mar 27, 2024

2. last can no longer be specified as a page number. I don't think anyone was really using this to begin with so that should have no effect.

It could be included in the same instances when count is returned, right?

@jjnesbitt
Copy link
Member Author

Is it possible to just exclude count from the response?

Good question, I forgot to mention why this implementation is necessary. I initially tried to simply exclude count from the response, but the underlying pagination class calls count on the queryset in various places. This happens when calling paginate_queryset, and by the time the response is generated, it's simply using the cached value of count on that page instance.

@jjnesbitt
Copy link
Member Author

  1. last can no longer be specified as a page number. I don't think anyone was really using this to begin with so that should have no effect.

It could be included in the same instances when count is returned, right?

The use case is someone specifying page=last to return the last page, and doesn't really apply to normal page queries. I suppose it's actually not strictly necessary to disallow this, since it won't apply to the pages where we're excluding count, so I could keep it.

@jjnesbitt
Copy link
Member Author

FYI, there are some flaky tests which should be fixed by #1910. I noticed a failure in this PR's CI due to that and re-ran the test, which this time succeeded. This may continue to happen if I update this branch.

Copy link
Member

@waxlamp waxlamp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Relying on @mvandenburgh for a substantive review.

dandiapi/api/views/pagination.py Outdated Show resolved Hide resolved
@yarikoptic
Copy link
Member

NICE digging! Could someone though run this by https://github.com/encode/django-rest-framework/ , e.g. inquire on

  • either observation on the overhead is expected
  • whether DRF has some means to remove that overhead without providing an alternative paginator
  • would they be interested to receive contribution to adopt proposed here paginator?

my points are

  • would be great to run by this discovery/observations with the authors of the framework used here
  • see if they might come up with some less intrusive solution
  • long term -- offload maintenance of a needed solution to DRF instead of maintaining custom solution here.

Copy link
Member

@mvandenburgh mvandenburgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just left one suggestion to add a clarifying comment explaining the motivation for this reworked pagination as a whole. It's unfortunate that we have to override so much to accomplish this, but it all makes sense to me.

dandiapi/api/views/pagination.py Show resolved Hide resolved
@jjnesbitt jjnesbitt added patch Increment the patch version when merged release Create a release when this pr is merged labels Mar 29, 2024
@jjnesbitt jjnesbitt merged commit dafbe3f into master Mar 29, 2024
11 checks passed
@jjnesbitt jjnesbitt deleted the asset-list-excess-count branch March 29, 2024 21:27
@dandibot
Copy link
Member

🚀 PR was released in v0.3.82 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
patch Increment the patch version when merged release Create a release when this pr is merged released This issue/pull request has been released.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants