Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(stages): recover senders for both DB and static file transactions #6089

Merged
merged 15 commits into from
Jan 31, 2024

Conversation

shekhirin
Copy link
Collaborator

@shekhirin shekhirin commented Jan 16, 2024

With this PR, when recovering senders in the SenderRecovery pipeline stage, we will look into both database table Transactions and static files for Transactions segment. It becomes relevant when #5733 is merged.

One thing to note with the current implementation is that we no longer retrieve RawKey and RawValue from the database when querying transactions for sender recovery, but instead decode them before passing the transactions to threads for recovery. The bench showed there's no significant difference in performance between the two approaches.

Holesky benchmark (main vs this PR):

ubuntu@reth4:~/reth-snaps-alexey$ hyperfine './reth_main stage --chain holesky run --datadir ~/.local/share/reth/holesky --from 0 --to 808004 senders'
Benchmark 1: ./reth_main stage --chain holesky run --datadir ~/.local/share/reth/holesky --from 0 --to 808004 senders
  Time (mean ± σ):     48.451 s ±  0.335 s    [User: 1696.042 s, System: 6.879 s]
  Range (min … max):   47.847 s … 48.946 s    10 runs

ubuntu@reth4:~/reth-snaps-alexey$ hyperfine './reth_6089_raw stage --chain holesky run --datadir ~/.local/share/reth/holesky --from 0 --to 808004 senders'
Benchmark 1: ./reth_6089_raw stage --chain holesky run --datadir ~/.local/share/reth/holesky --from 0 --to 808004 senders
  Time (mean ± σ):     52.559 s ±  0.146 s    [User: 1762.202 s, System: 8.185 s]
  Range (min … max):   52.248 s … 52.708 s    10 runs

In addition to that, we need to keep in mind that now transactions, headers, and receipts can be in static files, so when calculating the stage checkpoint, we need to look in both database and static files. I introduced a trait StatsReader for that, and replaced all calls to tx.entries::<T>() with it.

@shekhirin shekhirin added C-enhancement New feature or request A-staged-sync Related to staged sync (pipelines and stages) A-static-files Related to static files labels Jan 18, 2024
@shekhirin shekhirin added the A-db Related to the database label Jan 18, 2024
@shekhirin shekhirin marked this pull request as ready for review January 18, 2024 13:40
@shekhirin shekhirin marked this pull request as draft January 18, 2024 13:40
@shekhirin shekhirin marked this pull request as ready for review January 18, 2024 13:42
@shekhirin
Copy link
Collaborator Author

shekhirin commented Jan 25, 2024

Okay, it performs worse... I need to return the raw query functionality

ubuntu@reth4:~/reth-snaps-alexey$ hyperfine './reth_main stage --chain holesky run --datadir ~/.local/share/reth/holesky --from 0 --to 808004 senders'
Benchmark 1: ./reth_main stage --chain holesky run --datadir ~/.local/share/reth/holesky --from 0 --to 808004 senders
  Time (mean ± σ):     48.451 s ±  0.335 s    [User: 1696.042 s, System: 6.879 s]
  Range (min … max):   47.847 s … 48.946 s    10 runs

ubuntu@reth4:~/reth-snaps-alexey$ hyperfine './reth_6089 stage --chain holesky run --datadir ~/.local/share/reth/holesky --from 0 --to 808004 senders'
Benchmark 1: ./reth_6089 stage --chain holesky run --datadir ~/.local/share/reth/holesky --from 0 --to 808004 senders
  Time (mean ± σ):     64.176 s ±  0.451 s    [User: 1700.714 s, System: 15.671 s]
  Range (min … max):   63.374 s … 64.920 s    10 runs

@shekhirin shekhirin force-pushed the alexey/sender-recovery-snapshot branch from ed39c9a to e9e217d Compare January 25, 2024 16:23
@shekhirin
Copy link
Collaborator Author

shekhirin commented Jan 25, 2024

Raw query looks good

ubuntu@reth4:~/reth-snaps-alexey$ hyperfine './reth_main stage --chain holesky run --datadir ~/.local/share/reth/holesky --from 0 --to 808004 senders'
Benchmark 1: ./reth_main stage --chain holesky run --datadir ~/.local/share/reth/holesky --from 0 --to 808004 senders
  Time (mean ± σ):     48.451 s ±  0.335 s    [User: 1696.042 s, System: 6.879 s]
  Range (min … max):   47.847 s … 48.946 s    10 runs

ubuntu@reth4:~/reth-snaps-alexey$ hyperfine './reth_6089_raw stage --chain holesky run --datadir ~/.local/share/reth/holesky --from 0 --to 808004 senders'
Benchmark 1: ./reth_6089_raw stage --chain holesky run --datadir ~/.local/share/reth/holesky --from 0 --to 808004 senders
  Time (mean ± σ):     52.559 s ±  0.146 s    [User: 1762.202 s, System: 8.185 s]
  Range (min … max):   52.248 s … 52.708 s    10 runs

@shekhirin shekhirin changed the base branch from main to feat/static-files January 25, 2024 17:39
@shekhirin shekhirin changed the base branch from feat/static-files to main January 25, 2024 17:40
@shekhirin shekhirin force-pushed the alexey/sender-recovery-snapshot branch from 7fd3598 to f394358 Compare January 25, 2024 17:43
@shekhirin shekhirin changed the base branch from main to feat/static-files January 25, 2024 17:43
@shekhirin shekhirin changed the base branch from feat/static-files to main January 25, 2024 17:43
@shekhirin shekhirin force-pushed the alexey/sender-recovery-snapshot branch from f394358 to 8394742 Compare January 25, 2024 17:44
@shekhirin shekhirin force-pushed the alexey/sender-recovery-snapshot branch from 8394742 to 019d1bf Compare January 25, 2024 17:45
@shekhirin shekhirin changed the base branch from main to feat/static-files January 25, 2024 17:49
Copy link
Member

@onbjerg onbjerg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

setup::unwind_hashes,
stage,
1..DEFAULT_NUM_BLOCKS,
1..=DEFAULT_NUM_BLOCKS,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this a bug before?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or well "bug", i guess this is in a benchmark, so doesn't rly matter

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the behavior didn't change: even though we were passing an exclusive range, we constructed StageRange struct with range.start and range.end

@shekhirin shekhirin force-pushed the alexey/sender-recovery-snapshot branch from 7d6bb56 to 68c5296 Compare January 26, 2024 13:14
Copy link
Member

@onbjerg onbjerg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Member

@emhane emhane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that you don't constraint generics before it's necessary, it improves readability of the code. it also makes the code more flexible, and therefore easier to test since generic types without constraints can easily be mocked.

in some places the code can panic by calling expect. in these situations, are you sure you want the node to stop execution? I'd suggest using debug_assert and dumping as much state as could be helpful to debug to log in the message. instead in run mode, consider if it's better to conditionally continue execution by if let Some(..) .. and if let Ok(..) ... in some cases it may indeed be best for the node to stop.

let (tx_id, transaction) =
entry.map_err(|e| Box::new(SenderRecoveryStageError::StageError(e.into())))?;
let tx_id = tx_id.key().expect("key to be formated");
let tx = tx.value().expect("value to be formated");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for example here, could it be possible that one tx value is not formatted but other tx values are? in that case, it's better not to panic

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really, we expect the values both in the database and static files always be decodable, so if it fails, it's a critical unrecoverable error and something went wrong: either user interfered with the database, or we have a bug.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alright, in case of a bug, would be nice to get more context, same advice you gave me last week

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, here, and in tracing, it's nice if the variable names match the log message keys exactly. makes it easier to debug. so,

let my_var = 1;
debug!(target: "..",
    my_var=my_var,
    ""
);

@shekhirin
Copy link
Collaborator Author

Okay there's some sort of a bug with ETA, probably related to stage_checkpoint calculation changes

2024-01-29T19:03:16.582026Z  INFO reth_node_core::events: Stage finished executing pipeline_stages=4/13 stage=SenderRecovery checkpoint=836683 target=836683 stage_progress=100.00% stage_eta=4341721years 8months 26days 23h 27m 58s

@shekhirin
Copy link
Collaborator Author

ETA bug is unrelated to these changes, the progress is calculated correctly

@shekhirin shekhirin merged commit d1abb98 into feat/static-files Jan 31, 2024
24 checks passed
@shekhirin shekhirin deleted the alexey/sender-recovery-snapshot branch January 31, 2024 14:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-db Related to the database A-staged-sync Related to staged sync (pipelines and stages) A-static-files Related to static files C-enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants