From b255fd26284a120f72923bfb45a5d0634a635919 Mon Sep 17 00:00:00 2001 From: Tobias Tengler <45513122+tobias-tengler@users.noreply.github.com> Date: Mon, 3 May 2021 19:55:44 +0200 Subject: [PATCH] Move types of pagination to bottom and add note --- .../hotchocolate/fetching-data/pagination.md | 160 +++++++++--------- 1 file changed, 78 insertions(+), 82 deletions(-) diff --git a/website/src/docs/hotchocolate/fetching-data/pagination.md b/website/src/docs/hotchocolate/fetching-data/pagination.md index 16f4f52d8df..43e67da51d8 100644 --- a/website/src/docs/hotchocolate/fetching-data/pagination.md +++ b/website/src/docs/hotchocolate/fetching-data/pagination.md @@ -4,96 +4,15 @@ title: "Pagination" import { ExampleTabs } from "../../../components/mdx/example-tabs"; -> This document starts by covering different pagination approaches. If you just want to learn how to implement pagination, head over [here](/docs/hotchocolate/fetching-data/pagination/#connections). - Pagination is one of the most common problems that we have to solve when implementing our backend. Often, sets of data are too large to pass them directly to the consumer of our service. Pagination solves this problem by giving the consumer the ability to fetch a set in chunks. -There are various ways we could implement pagination in our server, but there are mainly two concepts we find in most GraphQL servers: _offset-based_ and _cursor-based_ pagination. - -# Types of pagination - -## Offset Pagination - -_Offset-based_ pagination is found in many server implementations whether the backend is implemented in SOAP, REST or GraphQL. - -It is so common, since it is the simplest form of pagination we can implement. All it requires is an `offset` (start index) and a `limit` (number of entries) argument. - -```sql -SELECT * FROM Users -ORDER BY Id -LIMIT %limit OFFSET %offset -``` - -### Problems - -But whilst _offset-based_ pagination is simple to implement and works relatively well, there are also some problems: - -- Using `OFFSET` on the database-side does not scale well for large datasets. Most databases work with an index instead of numbered rows. This means the database always has to count _offset + limit_ rows, before discarding the _offset_ and only returning the requested number of rows. - -- If new entries are written to or removed from our database at high frequency, the _offset_ becomes unreliable, potentially skipping or returning duplicate entries. - -Luckily we can solve these issues pretty easily by switching from an _offset_ to a _cursor_. Continue reading to learn more. - -## Cursor Pagination - -Contrary to the _offset-based_ pagination, where we identify the position of an entry using an _offset_, _cursor-based_ pagination works by returning the pointer to the next entry in our pagination. - -To understand this concept better, let's look at an example: We want to paginate over the users in our application. - -First we execute the following to receive our first page: - -```sql -SELECT * FROM Users -ORDER BY Id -LIMIT %limit -``` - -`%limit` is actually `limit + 1`. We are doing this to know wether there are more entries in our dataset and to receive the _cursor_ of the next entry (in this case its `Id`). This additional entry will not be returned to the consumer of our pagination. - -To now receive the second page, we execute: - -```sql -SELECT * FROM Users -WHERE Id >= %cursor -ORDER BY Id -LIMIT %limit -``` - -Using `WHERE` instead of `OFFSET` is great, since now we can leverage the index of the `Id` field and the database does not have to compute an _offset_. - -For this to work though, our _cursor_ needs to be **unique** and **sequential**. Most of the time the _Id_ field will be the best fit. - -But what if we need to sort by a field that does not have the aforementioned properties? We can simply combine the field with another field, which has the needed properties (like `Id`), to form a _cursor_. - -Let's look at another example: We want to paginate over the users sorted by their birthday. - -After receiving the first page, we create a combined _cursor_, like `"1435+2020-12-31"` (`Id` + `Birthday`), of the next entry. To receive the second page, we convert the _cursor_ to its original values (`Id` + `Birthday`) and use them in our query: - -```sql -SELECT * FROM Users -WHERE (Birthday >= %cursorBirthday -OR (Birthday = %cursorBirthday AND Id >= %cursorId)) -ORDER BY Birthday, Id -LIMIT %limit -``` - -### Problems - -Even though _cursor-based_ pagination can be more performant than _offset-based_ pagination, it comes with some downsides as well: - -- When using `WHERE` and `ORDER BY` on a field without an index, it can be slower than using `ORDER BY` with `OFFSET`. - -- Since we now only know of the next entry, there is no more concept of pages. If we have a feed or only _Next_ and _Previous_ buttons, this works great, but if we depend on page numbers, we are in a tight spot. - -In the next segment we will look at _Connections_ and how they are implemented in HotChocolate. - # Connections _Connections_ are a standardized way to expose pagination to clients. -Instead of returning a list, we now return a _Connection_. +Instead of returning a list of entries, we return a _Connection_. ```sdl type Query { @@ -577,3 +496,80 @@ public class Startup } } ``` + +# Types of pagination + +In this section we will look at the most common pagination approaches and their downsides. There are mainly two concepts we find today: _offset-based_ and _cursor-based_ pagination. + +> Note: This section is intended as a brief overview and should not be treated as a definitive guide or recommendation. + +## Offset Pagination + +_Offset-based_ pagination is found in many server implementations whether the backend is implemented in SOAP, REST or GraphQL. + +It is so common, since it is the simplest form of pagination we can implement. All it requires is an `offset` (start index) and a `limit` (number of entries) argument. + +```sql +SELECT * FROM Users +ORDER BY Id +LIMIT %limit OFFSET %offset +``` + +### Problems + +But whilst _offset-based_ pagination is simple to implement and works relatively well, there are also some problems: + +- Using `OFFSET` on the database-side does not scale well for large datasets. Most databases work with an index instead of numbered rows. This means the database always has to count _offset + limit_ rows, before discarding the _offset_ and only returning the requested number of rows. + +- If new entries are written to or removed from our database at high frequency, the _offset_ becomes unreliable, potentially skipping or returning duplicate entries. + +## Cursor Pagination + +Contrary to the _offset-based_ pagination, where we identify the position of an entry using an _offset_, _cursor-based_ pagination works by returning the pointer to the next entry in our pagination. + +To understand this concept better, let's look at an example: We want to paginate over the users in our application. + +First we execute the following to receive our first page: + +```sql +SELECT * FROM Users +ORDER BY Id +LIMIT %limit +``` + +`%limit` is actually `limit + 1`. We are doing this to know wether there are more entries in our dataset and to receive the _cursor_ of the next entry (in this case its `Id`). This additional entry will not be returned to the consumer of our pagination. + +To now receive the second page, we execute: + +```sql +SELECT * FROM Users +WHERE Id >= %cursor +ORDER BY Id +LIMIT %limit +``` + +Using `WHERE` instead of `OFFSET` is great, since now we can leverage the index of the `Id` field and the database does not have to compute an _offset_. + +For this to work though, our _cursor_ needs to be **unique** and **sequential**. Most of the time the _Id_ field will be the best fit. + +But what if we need to sort by a field that does not have the aforementioned properties? We can simply combine the field with another field, which has the needed properties (like `Id`), to form a _cursor_. + +Let's look at another example: We want to paginate over the users sorted by their birthday. + +After receiving the first page, we create a combined _cursor_, like `"1435+2020-12-31"` (`Id` + `Birthday`), of the next entry. To receive the second page, we convert the _cursor_ to its original values (`Id` + `Birthday`) and use them in our query: + +```sql +SELECT * FROM Users +WHERE (Birthday >= %cursorBirthday +OR (Birthday = %cursorBirthday AND Id >= %cursorId)) +ORDER BY Birthday, Id +LIMIT %limit +``` + +### Problems + +Even though _cursor-based_ pagination can be more performant than _offset-based_ pagination, it comes with some downsides as well: + +- When using `WHERE` and `ORDER BY` on a field without an index, it can be slower than using `ORDER BY` with `OFFSET`. + +- Since we now only know of the next entry, there is no more concept of pages. If we have a feed or only _Next_ and _Previous_ buttons, this works great, but if we depend on page numbers, we are in a tight spot.