Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The API result_count is no more than 240 for unauthenticated requests #4474

Closed
obulat opened this issue Jun 12, 2024 · 1 comment
Closed
Labels
💻 aspect: code Concerns the software code in the repository 🛠 goal: fix Bug fix 🟧 priority: high Stalls work on the project or its dependents 🧱 stack: api Related to the Django API 🔒 staff only Restricted to staff members 🧹 status: ticket work required Needs more details before it can be worked on

Comments

@obulat
Copy link
Contributor

obulat commented Jun 12, 2024

Description

The maximum result count returned by the API is 240 instead of 10 000 as it was previously.

Reproduction

  1. Go to https://api.openverse.org/v1/images/?q=cat
  2. See error: the result count is 240 instead of 10000 as it should be

Additional context

@sarayourfriend, this was added in your PR, #4372:

_, max_depth = restricted_features.MAX_RESULT_COUNT.request_level(

I think this was unintentional because we never discussed reducing the shown result_count for the API results. It is tricky since both 240 and 10000 are confusing: an unauthenticated user will only get at max 240 results. However, I think we wanted to always show that we do have the results, but we are not showing all of them due to the restrictions related to the API performance (to prevent scraping).

@obulat obulat added 🟧 priority: high Stalls work on the project or its dependents 🛠 goal: fix Bug fix 💻 aspect: code Concerns the software code in the repository 🧱 stack: api Related to the Django API labels Jun 12, 2024
@openverse-bot openverse-bot moved this to 📋 Backlog in Openverse Backlog Jun 12, 2024
@sarayourfriend
Copy link
Collaborator

This was intentional. We have other places to find stats about how many results we have. Why expose a different non-specific number to a user? 10000 is even more obscure, it doesn't tell the user anything other than that we have a bunch of works, but they can't access them. For a scraper, maybe it's even an indication that they should crawl the tags of each work or something to try to uncover all those extra works behind the pagination barrier. It was even worse because we also showed page_count to match the useless 10000 results. If we showed 10k and a page count based on that, for someone using the API programmatically, the only way they would know that a query was exhausted, was by making a bunch of requests until the API suddenly decided they weren't allowed anymore and sent them a 401. That's absurd. Why not just say the limit? It's the real limit, for that user, at that instance.

Both are artificial barriers. 240 and an accurate page count based on that at least indicates how many real pages the user could request. It means something to API consumers. They can predict how many pages of results will exist for a query (e.g., for a frontend that wanted to show this information... maybe even ours?).

10000 doesn't do that. And it's still just as abstract/artificial as 240, and essentially an arbitrary limit (each responding to different problems being solved). Of the two, 240 (or a different value, if authenticated) is the only one with any real meaning.

I don't believe this is an issue and recommend closing it.

we are not showing all of them due to the restrictions related to the API performance (to prevent scraping)

To clarify, these are separate issues. Scraping can hurt API performance, but the primary motivation to prevent scraping is to prevent scraping. It is against our ToS. Just want to clarify that, for example, we wouldn't undo this pagination limit just because we could handle the performance of it.

@sarayourfriend sarayourfriend added 🧹 status: ticket work required Needs more details before it can be worked on 🔒 staff only Restricted to staff members labels Jun 12, 2024
@WordPress WordPress locked and limited conversation to collaborators Jun 12, 2024
@obulat obulat converted this issue into discussion #4476 Jun 12, 2024
@openverse-bot openverse-bot moved this from 📋 Backlog to ✅ Done in Openverse Backlog Jun 12, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
💻 aspect: code Concerns the software code in the repository 🛠 goal: fix Bug fix 🟧 priority: high Stalls work on the project or its dependents 🧱 stack: api Related to the Django API 🔒 staff only Restricted to staff members 🧹 status: ticket work required Needs more details before it can be worked on
Projects
Archived in project
Development

No branches or pull requests

2 participants