Skip to content
This repository has been archived by the owner on Jul 18, 2022. It is now read-only.

Pagination #26

Closed
Tom-Shorter opened this issue Jul 2, 2020 · 12 comments
Closed

Pagination #26

Tom-Shorter opened this issue Jul 2, 2020 · 12 comments
Assignees

Comments

@Tom-Shorter
Copy link

Pagination is a possible query in the .yaml file but there is no matching pagination section within the response, I believe adding in a pagination section in the response which displays the total number of variants found, the current page being displayed and the total number of pages of variants at the specified page limit would be extremely helpful. This would ensure people don't continue requesting pages which don't exist (e.g page 9 of 8 when there are exactly 8 pages of results so no indication that page 9 doesn't exist due to the number of returned results on page 8).

You might also want to clarify the skip integer. I interpret it as if you entered the value 7 then you would expect page 8 to return as you have asked it to skip the first 7. It might be more intuitive to provide the integer for the page you want to return, in this case entering 7 would return page 7. it would also mean that 0 is not a valid input here so the default should be 1.

I would also like to suggest that for the page limit you can use the integer 0 to indicate that you want all results returned, without a page limit but with an exception where a beacon owner could apply a maximum page limit across all queries to all datasets within a beacon or differing limits per dataset, the limits would be detailed in the / endpoint under the beacon if beacon wide or in the individual datasets if different.

@redmitry
Copy link

Note, that there is a standard way to implement it in the HTTP protocol (RFC 7233) which defines partial request/response. Although this method is commonly used by download managers (e.g. response: “ HTTP/1.1 206 Partial Content Content-Range: bytes 0-1023/146515” ), the range may be arbitrary specified for any data. For instance, OpenEBench uses HTTP Range to limit tools collection to be returned (e.g. response: “Range: tools=10-30”). Usage of standard HTTP protocol for the pagination looks more clear and simple for the implementation.

IMO something like "Range: variant=10-30" would be great.

Dmitry

@sdelatorrep
Copy link
Contributor

I think both approaches are compatible as it's similar to what was discussed in #28. WDYT, @jrambla?

@sdelatorrep
Copy link
Contributor

I've created a PR which adds a new field numTotalResults (as the page & limit requested by the user are already present in the meta section, I don't think we need to repeat this in response) and clarifies the usage of skip and limit.
Let me know what you think @Tom-Shorter in the PR, please.
Thanks!

@redmitry
Copy link

All the idea of HTTP protocol usage is to avoid putting this kind of data in the payload.
This was a reason why SOAP protocol failed - it used SOAP Headers (metadata) where HTTP could be reused.
At least for the SOAP excuse was to make it transport protocol agnostic.

sdelatorrep added a commit that referenced this issue Feb 22, 2021
Add numTotalResults and clarify usage of skip & limit (#26)
@Tom-Shorter
Copy link
Author

The implementation of pagination is actually trickier than first imagined, for a single dataset pagination is straight forward but becomes more complex when additional datasets are added.

This is a problem for implementation and not with the specification itself but the implementation problems could be easily overlooked therefore I feel that the specs should highlight the possible problems outlined below.

When querying multiple datasets for variants the current response model aggregates the same variants from different datasets. To enable this aggregation the beacon must know about the existence of every variant in all datasets, therefore each paginated query must return every matching variant across all datasets and from there limit the number of results as requested.

The same principle applies to returning page X of paginated results (through any endpoint), unlike a single dataset where the offset can be provided directly to the query and only the defined subset of results be returned, multiple dataset queries cannot send the offset as part of the query and instead will need to return all results for the beacon to then aggregate them, correctly order them and then identify which subset of results belongs in page X.

A possibly solution is to return all results un-aggregated in dataset/cohort result wrappers. Each wrapper can then have pagination applied to it separately, this would means that a pagination request with a limit of 10 would return 10* no. of datasets/cohorts results instead of 10 results. This seems counter intuitive to me but I believe it could be easily remedied by renaming pagination to resultSetPagination where a resultSet is a wrapper object for a dataset, cohort etc. Responses will then return a list of resultSet objects.

@mbaudis
Copy link
Member

mbaudis commented May 7, 2021

Conclusion sounds good. And I like resultSet as "neutral" wrapper! See #63 and #67.

@mbaudis
Copy link
Member

mbaudis commented May 7, 2021

@Tom-Shorter Do you want to formulate this as an issue - responses delivering resultSets, IMO by default?

@Tom-Shorter
Copy link
Author

@mbaudis I'll be happy to, I've been discussing this idea with Tim Beck, Colin Veal and a few other people in our group and I believe that the resultset response can solve many problems I have with the current Beacon specification.

@mbaudis
Copy link
Member

mbaudis commented May 7, 2021

As in the linked discussions: IMO it would be good to have always one or more resultSet objects in the response. So basically one would introduce another layer instead of now

  • results => array of responseResults (e.g. VariantInSampleResponseResults)

it becomes

  • results => array of resultSet objects (with info about the resultSet, e.g. dataset...) => array of responseResults (e.g. VariantInSampleResponseResults)

@mbaudis
Copy link
Member

mbaudis commented May 7, 2021

One could have this as optional, but IMO this would be confusing.

@jrambla
Copy link
Collaborator

jrambla commented May 12, 2021

Just as a placeholder.
NOTE: review the GA4GH TASC discussion on pagination here.

@jrambla
Copy link
Collaborator

jrambla commented Jul 1, 2021

Moved to Beacon Framework ga4gh-beacon/beacon-framework-v2#9

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants