docs(graphql): explain scan limits rationale

MystenLabs · Sep 13, 2024 · c6f728b · c6f728b
1 parent d16c114
commit c6f728b
Showing 1 changed file with 49 additions and 0 deletions.
diff --git a/crates/sui-graphql-rpc/src/types/transaction_block/tx_lookups.rs b/crates/sui-graphql-rpc/src/types/transaction_block/tx_lookups.rs
@@ -1,6 +1,55 @@
 // Copyright (c) Mysten Labs, Inc.
 // SPDX-License-Identifier: Apache-2.0
 
+//! # Transaction Filter Lookup Tables
+//!
+//! ## Schemas
+//!
+//! Tables backing Transaction filters in GraphQL all follow the same rough shape:
+//!
+//! 1. They each get their own table, mapping the filter value to the transaction sequence number.
+//!
+//! 2. They also include a `sender` column, and a secondary index over the sender, filter values
+//!    and the transaction sequence number.
+//!
+//! 3. They also include a secondary index over the transaction sequence number.
+//!
+//! This pattern allows us to offer a simple rule for users: If you are filtering on a single
+//! value, you can do so without worrying. If you want to additionally filter by the sender, that
+//! is also possible, but if you want to combine any other set of filters, you need to use a "scan
+//! limit".
+//!
+//! ## Query construction
+//!
+//! Queries that filter transactions work in two phases: Identify the transaction sequence numbers
+//! to fetch, and then fetch their contents. Filtering all happens in the first phase:
+//!
+//! - Firstly filters are broken down into individual queries targeting the appropriate lookup
+//!   table. Each constituent query is expected to return a sorted run of transaction sequence
+//!   numbers.
+//!
+//! - If a `sender` filter is included, then it is incorporated into each constituent query,
+//!   leveraging their secondary indices (2), otherwise each constituent query filters only based on
+//!   its filter value using the primary index (1).
+//!
+//! - The fact that both the primary and secondary indices contain the transaction sequence number
+//!   help to ensure that the output from an index scan is already sorted, which avoids a
+//!   potentially expensive materialize and sort operation.
+//!
+//! - If there are multiple constituent queries, they are intersected using inner joins. Postgres
+//!   can occasionally pick a poor query plan for this merge, so we require that filters resulting in
+//!   such merges also use a "scan limit" (see below).
+//!
+//! ## Scan limits
+//!
+//! The scan limit restricts the number of transactions considered as candidates for the results.
+//! It is analogous to the page size limit, which restricts the number of results returned to the
+//! user, but it operates at the top of the funnel rather than the top.
+//!
+//! When postgres picks a poor query plan, it can end up performing a sequential scan over all
+//! candidate transactions. By limiting the size of the candidate set, we bound the work done in
+//! the worse case (whereas otherwise, the worst case would grow with the history of the chain).
+
 use super::{Cursor, TransactionBlockFilter};
 use crate::{
     data::{pg::bytea_literal, Conn, DbConnection},