-
Notifications
You must be signed in to change notification settings - Fork 11.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[4/n][gql-performance] Indexer handles partitioning per epoch's worth of tx_sequence_number #18224
Merged
wlmyng
merged 3 commits into
idx-breaking-change-park
from
wlmyng/indexer/scan-limit-tx-approach
Jun 18, 2024
Merged
[4/n][gql-performance] Indexer handles partitioning per epoch's worth of tx_sequence_number #18224
wlmyng
merged 3 commits into
idx-breaking-change-park
from
wlmyng/indexer/scan-limit-tx-approach
Jun 18, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The latest updates on your projects. Learn more about Vercel for Git ↗︎
3 Ignored Deployments
|
wlmyng
force-pushed
the
wlmyng/indexer/scan-limit-tx-approach
branch
from
June 12, 2024 20:45
33a826e
to
c373fad
Compare
wlmyng
changed the base branch from
main
to
wlmyng/indexer/scan-limit-tx-approach-update-indexing
June 13, 2024 20:02
wlmyng
force-pushed
the
wlmyng/indexer/scan-limit-tx-approach
branch
from
June 13, 2024 20:03
c0a711c
to
8666105
Compare
wlmyng
force-pushed
the
wlmyng/indexer/scan-limit-tx-approach-update-indexing
branch
2 times, most recently
from
June 13, 2024 21:10
fdb9a42
to
537c908
Compare
wlmyng
force-pushed
the
wlmyng/indexer/scan-limit-tx-approach
branch
from
June 13, 2024 21:17
89172bd
to
734e467
Compare
wlmyng
changed the base branch from
wlmyng/indexer/scan-limit-tx-approach-update-indexing
to
wlmyng/indexer/scan-limit-tx-approach-add-cp-tx-table
June 13, 2024 21:17
wlmyng
changed the title
Wlmyng/indexer/scan limit tx approach
[4/n][gql-performance] Indexer handle different partitioning strategies
Jun 13, 2024
wlmyng
requested review from
amnn,
stefan-mysten,
emmazzz and
suiwombat
as code owners
June 13, 2024 21:35
wlmyng
force-pushed
the
wlmyng/indexer/scan-limit-tx-approach
branch
from
June 13, 2024 21:55
fe0f0ba
to
cd3b53c
Compare
wlmyng
changed the title
[4/n][gql-performance] Indexer handle different partitioning strategies
[4/n][gql-performance] Indexer handles different partitioning strategies
Jun 13, 2024
gegaowp
reviewed
Jun 14, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good
wlmyng
force-pushed
the
wlmyng/indexer/scan-limit-tx-approach-add-cp-tx-table
branch
2 times, most recently
from
June 14, 2024 18:20
c6644a0
to
2c93f86
Compare
wlmyng
force-pushed
the
wlmyng/indexer/scan-limit-tx-approach
branch
from
June 14, 2024 18:24
cd3b53c
to
ca5f573
Compare
wlmyng
force-pushed
the
wlmyng/indexer/scan-limit-tx-approach-add-cp-tx-table
branch
from
June 17, 2024 23:04
291bdd8
to
0de380b
Compare
wlmyng
force-pushed
the
wlmyng/indexer/scan-limit-tx-approach
branch
from
June 17, 2024 23:10
1bb3397
to
0950b69
Compare
wlmyng
force-pushed
the
wlmyng/indexer/scan-limit-tx-approach
branch
from
June 17, 2024 23:20
0950b69
to
ad3c136
Compare
7 tasks
wlmyng
added a commit
that referenced
this pull request
Jun 18, 2024
…iles for pg-backed indexer per transactions benchmark findings (#18231) ## Description Apply findings from latest transactions benchmarking. Instead of keeping indexes on `transactions`, we offload all filtering work to the lookup tables. Additionally, we add `tx_calls_pkg` and `tx_calls_mod` and dedicate `tx_calls_fun` to solely handle filters on `package`, `package` and `module`, and `package`, `module`, and `func` respectively. This is because it's possible to invoke the same package, mod, and even func in a transaction. Finally, the new setup drops `cp_sequence_number` from the lookup tables and adds `sender` to them to support `sender` + specialized filter. The Rust structs and code to index data to the lookup tables are also updated. The full stack of changes are listed below, but each can stand independently, and will be individually merged into `idx-breaking-change-park` 1. #18224 - handling tables partitioned on epoch's worth of tx_sequence_number 2. #18244 - storing a mapping of checkpoint -> tx_sequence_number range ## Test plan --- ## Release notes Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required. For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates. - [ ] Protocol: - [ ] Nodes (Validators and Full nodes): - [ ] Indexer: - [ ] JSON-RPC: - [ ] GraphQL: - [ ] CLI: - [ ] Rust SDK:
wlmyng
force-pushed
the
wlmyng/indexer/scan-limit-tx-approach
branch
from
June 18, 2024 00:18
ad3c136
to
885d8e8
Compare
wlmyng
changed the base branch from
wlmyng/indexer/scan-limit-tx-approach-add-cp-tx-table
to
idx-breaking-change-park
June 18, 2024 00:18
by exposing an additional parameter network_total_transactions, we can derive the tx_sequence_number range of an epoch without additional fetches. however, this can be done more cleanly if we can decouple watermarking from checkpoint persisting
wlmyng
force-pushed
the
wlmyng/indexer/scan-limit-tx-approach
branch
from
June 18, 2024 00:22
885d8e8
to
b8b0fc0
Compare
wlmyng
changed the title
[4/n][gql-performance] Indexer handles different partitioning strategies
[4/n][gql-performance] Indexer handles partitioning per epoch's worth of tx_sequence_number
Jun 18, 2024
…- 2 from the db on the indexing side as we advance epoch from epoch X - 1 to X, we may not have committed this data of X - 2 on the committer side. This is unlikely to happen in prod because it requires the lag between indexing and committing to be >= one epoch for the issue to manifest. And we never caught this issue before in testing because we never had to use this network total txn data before.
… into the transaction block instead of making a separate roundtrip
wlmyng
force-pushed
the
wlmyng/indexer/scan-limit-tx-approach
branch
from
June 18, 2024 19:04
833eb0d
to
b371df2
Compare
emmazzz
pushed a commit
that referenced
this pull request
Jun 20, 2024
…iles for pg-backed indexer per transactions benchmark findings (#18231) Apply findings from latest transactions benchmarking. Instead of keeping indexes on `transactions`, we offload all filtering work to the lookup tables. Additionally, we add `tx_calls_pkg` and `tx_calls_mod` and dedicate `tx_calls_fun` to solely handle filters on `package`, `package` and `module`, and `package`, `module`, and `func` respectively. This is because it's possible to invoke the same package, mod, and even func in a transaction. Finally, the new setup drops `cp_sequence_number` from the lookup tables and adds `sender` to them to support `sender` + specialized filter. The Rust structs and code to index data to the lookup tables are also updated. The full stack of changes are listed below, but each can stand independently, and will be individually merged into `idx-breaking-change-park` 1. #18224 - handling tables partitioned on epoch's worth of tx_sequence_number 2. #18244 - storing a mapping of checkpoint -> tx_sequence_number range --- Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required. For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates. - [ ] Protocol: - [ ] Nodes (Validators and Full nodes): - [ ] Indexer: - [ ] JSON-RPC: - [ ] GraphQL: - [ ] CLI: - [ ] Rust SDK:
emmazzz
pushed a commit
that referenced
this pull request
Jun 20, 2024
… of tx_sequence_number (#18224) ## Description To partition an epoch per `tx_sequence_number`, the last `tx_sequence_number` of an epoch is the `network_total_transactions` from the last checkpoint in the current epoch. The first `tx_sequence_number` is then the `network_total_transactions` from the last checkpoint in the previous epoch + 1, or the `network_total_transactions` of the current epoch - `epoch_total_transactions` of the current epoch. Since we have `network_total_transactions` from the checkpoint to commit, and `epoch_total_transactions` since we fetch current epoch from db, we can derive the range without additional db fetches. However, this approach uncovered a bug of sorts: when we read the network total txn of epoch X - 2 from the db on the indexing side, we `select max(network_total_transactions) from checkpoints where epoch = {epoch}`. However, we may not have actually committed epoch X-2's data on the committer side. This is unlikely to happen in prod because it requires the lag between indexing and committing to be >= one epoch. Since an epoch today consists of ~80k checkpoints, it is quite improbably for this to occur. However, in many of our tests it's possible to have as little as one checkpoint and/ or transaction between epochs, which brought this race condition to light. To resolve, we can place the responsibility on `persist_epoch`. When we get to a point to persist epoch X - 1, epoch X - 2's data must have been indexed to db. ## Test plan Existing tests --- ## Release notes Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required. For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates. - [ ] Protocol: - [ ] Nodes (Validators and Full nodes): - [ ] Indexer: - [ ] JSON-RPC: - [ ] GraphQL: - [ ] CLI: - [ ] Rust SDK:
emmazzz
pushed a commit
that referenced
this pull request
Jun 20, 2024
…iles for pg-backed indexer per transactions benchmark findings (#18231) Apply findings from latest transactions benchmarking. Instead of keeping indexes on `transactions`, we offload all filtering work to the lookup tables. Additionally, we add `tx_calls_pkg` and `tx_calls_mod` and dedicate `tx_calls_fun` to solely handle filters on `package`, `package` and `module`, and `package`, `module`, and `func` respectively. This is because it's possible to invoke the same package, mod, and even func in a transaction. Finally, the new setup drops `cp_sequence_number` from the lookup tables and adds `sender` to them to support `sender` + specialized filter. The Rust structs and code to index data to the lookup tables are also updated. The full stack of changes are listed below, but each can stand independently, and will be individually merged into `idx-breaking-change-park` 1. #18224 - handling tables partitioned on epoch's worth of tx_sequence_number 2. #18244 - storing a mapping of checkpoint -> tx_sequence_number range --- Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required. For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates. - [ ] Protocol: - [ ] Nodes (Validators and Full nodes): - [ ] Indexer: - [ ] JSON-RPC: - [ ] GraphQL: - [ ] CLI: - [ ] Rust SDK:
emmazzz
pushed a commit
that referenced
this pull request
Jun 22, 2024
…iles for pg-backed indexer per transactions benchmark findings (#18231) Apply findings from latest transactions benchmarking. Instead of keeping indexes on `transactions`, we offload all filtering work to the lookup tables. Additionally, we add `tx_calls_pkg` and `tx_calls_mod` and dedicate `tx_calls_fun` to solely handle filters on `package`, `package` and `module`, and `package`, `module`, and `func` respectively. This is because it's possible to invoke the same package, mod, and even func in a transaction. Finally, the new setup drops `cp_sequence_number` from the lookup tables and adds `sender` to them to support `sender` + specialized filter. The Rust structs and code to index data to the lookup tables are also updated. The full stack of changes are listed below, but each can stand independently, and will be individually merged into `idx-breaking-change-park` 1. #18224 - handling tables partitioned on epoch's worth of tx_sequence_number 2. #18244 - storing a mapping of checkpoint -> tx_sequence_number range --- Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required. For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates. - [ ] Protocol: - [ ] Nodes (Validators and Full nodes): - [ ] Indexer: - [ ] JSON-RPC: - [ ] GraphQL: - [ ] CLI: - [ ] Rust SDK:
emmazzz
pushed a commit
that referenced
this pull request
Aug 3, 2024
…iles for pg-backed indexer per transactions benchmark findings (#18231) Apply findings from latest transactions benchmarking. Instead of keeping indexes on `transactions`, we offload all filtering work to the lookup tables. Additionally, we add `tx_calls_pkg` and `tx_calls_mod` and dedicate `tx_calls_fun` to solely handle filters on `package`, `package` and `module`, and `package`, `module`, and `func` respectively. This is because it's possible to invoke the same package, mod, and even func in a transaction. Finally, the new setup drops `cp_sequence_number` from the lookup tables and adds `sender` to them to support `sender` + specialized filter. The Rust structs and code to index data to the lookup tables are also updated. The full stack of changes are listed below, but each can stand independently, and will be individually merged into `idx-breaking-change-park` 1. #18224 - handling tables partitioned on epoch's worth of tx_sequence_number 2. #18244 - storing a mapping of checkpoint -> tx_sequence_number range --- Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required. For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates. - [ ] Protocol: - [ ] Nodes (Validators and Full nodes): - [ ] Indexer: - [ ] JSON-RPC: - [ ] GraphQL: - [ ] CLI: - [ ] Rust SDK:
emmazzz
pushed a commit
that referenced
this pull request
Aug 3, 2024
… of tx_sequence_number (#18224) To partition an epoch per `tx_sequence_number`, the last `tx_sequence_number` of an epoch is the `network_total_transactions` from the last checkpoint in the current epoch. The first `tx_sequence_number` is then the `network_total_transactions` from the last checkpoint in the previous epoch + 1, or the `network_total_transactions` of the current epoch - `epoch_total_transactions` of the current epoch. Since we have `network_total_transactions` from the checkpoint to commit, and `epoch_total_transactions` since we fetch current epoch from db, we can derive the range without additional db fetches. However, this approach uncovered a bug of sorts: when we read the network total txn of epoch X - 2 from the db on the indexing side, we `select max(network_total_transactions) from checkpoints where epoch = {epoch}`. However, we may not have actually committed epoch X-2's data on the committer side. This is unlikely to happen in prod because it requires the lag between indexing and committing to be >= one epoch. Since an epoch today consists of ~80k checkpoints, it is quite improbably for this to occur. However, in many of our tests it's possible to have as little as one checkpoint and/ or transaction between epochs, which brought this race condition to light. To resolve, we can place the responsibility on `persist_epoch`. When we get to a point to persist epoch X - 1, epoch X - 2's data must have been indexed to db. Existing tests --- Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required. For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates. - [ ] Protocol: - [ ] Nodes (Validators and Full nodes): - [ ] Indexer: - [ ] JSON-RPC: - [ ] GraphQL: - [ ] CLI: - [ ] Rust SDK:
wlmyng
added a commit
that referenced
this pull request
Aug 15, 2024
…iles for pg-backed indexer per transactions benchmark findings (#18231) Apply findings from latest transactions benchmarking. Instead of keeping indexes on `transactions`, we offload all filtering work to the lookup tables. Additionally, we add `tx_calls_pkg` and `tx_calls_mod` and dedicate `tx_calls_fun` to solely handle filters on `package`, `package` and `module`, and `package`, `module`, and `func` respectively. This is because it's possible to invoke the same package, mod, and even func in a transaction. Finally, the new setup drops `cp_sequence_number` from the lookup tables and adds `sender` to them to support `sender` + specialized filter. The Rust structs and code to index data to the lookup tables are also updated. The full stack of changes are listed below, but each can stand independently, and will be individually merged into `idx-breaking-change-park` 1. #18224 - handling tables partitioned on epoch's worth of tx_sequence_number 2. #18244 - storing a mapping of checkpoint -> tx_sequence_number range --- Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required. For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates. - [ ] Protocol: - [ ] Nodes (Validators and Full nodes): - [ ] Indexer: - [ ] JSON-RPC: - [ ] GraphQL: - [ ] CLI: - [ ] Rust SDK:
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
To partition an epoch per
tx_sequence_number
, the lasttx_sequence_number
of an epoch is thenetwork_total_transactions
from the last checkpoint in the current epoch. The firsttx_sequence_number
is then thenetwork_total_transactions
from the last checkpoint in the previous epoch + 1, or thenetwork_total_transactions
of the current epoch -epoch_total_transactions
of the current epoch.Since we have
network_total_transactions
from the checkpoint to commit, andepoch_total_transactions
since we fetch current epoch from db, we can derive the range without additional db fetches.However, this approach uncovered a bug of sorts: when we read the network total txn of epoch X - 2 from the db on the indexing side, we
select max(network_total_transactions) from checkpoints where epoch = {epoch}
. However, we may not have actually committed epoch X-2's data on the committer side. This is unlikely to happen in prod because it requires the lag between indexing and committing to be >= one epoch. Since an epoch today consists of ~80k checkpoints, it is quite improbably for this to occur. However, in many of our tests it's possible to have as little as one checkpoint and/ or transaction between epochs, which brought this race condition to light.To resolve, we can place the responsibility on
persist_epoch
. When we get to a point to persist epoch X - 1, epoch X - 2's data must have been indexed to db.Test plan
Existing tests
Release notes
Check each box that your changes affect. If none of the boxes relate to your changes, release notes aren't required.
For each box you select, include information after the relevant heading that describes the impact of your changes that a user might notice and any actions they must take to implement updates.