Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backfill stream returns after first iteration when parallelism is set to 1 #13279

Closed
1 task done
0xalex88 opened this issue Dec 10, 2024 · 5 comments · Fixed by #13281
Closed
1 task done

Backfill stream returns after first iteration when parallelism is set to 1 #13279

0xalex88 opened this issue Dec 10, 2024 · 5 comments · Fixed by #13281
Labels
C-bug An unexpected or incorrect behavior S-needs-triage This issue needs to be labelled

Comments

@0xalex88
Copy link

0xalex88 commented Dec 10, 2024

Describe the bug

Not sure if it's expected but I was trying to use parallelism = 1 in a backfill job and in that case, the stream ends after returning a single chunk, instead of continuing executing the next chunk in the full backfill range.

Steps to reproduce

Simplest reproduction I've found is in the test_batch test, if you change the code to:

let factory = BackfillJobFactory::new(executor.clone(), blockchain_db.clone())
            .with_thresholds(ExecutionStageThresholds { max_blocks: Some(1), ..Default::default() });
        let mut backfill_stream = factory.backfill(1..=2).into_stream();
        let mut chain = backfill_stream.next().await.unwrap().unwrap();
        chain.execution_outcome_mut().state_mut().reverts.sort();

        assert!(chain.blocks_iter().eq(&blocks[0..1]));

        // expect no more blocks
        assert!(backfill_stream.next().await.is_none());

so backfilling one block at a time it fails because assert!(backfill_stream.next().await.is_none()) is not none (as expected) however when using

let factory = BackfillJobFactory::new(executor.clone(), blockchain_db.clone())
            .with_thresholds(ExecutionStageThresholds { max_blocks: Some(1), ..Default::default() })
            .with_stream_parallelism(1);
        let mut backfill_stream = factory.backfill(1..=2).into_stream();
        let mut chain = backfill_stream.next().await.unwrap().unwrap();
        chain.execution_outcome_mut().state_mut().reverts.sort();

        assert!(chain.blocks_iter().eq(&blocks[0..1]));

        // expect no more blocks
        assert!(backfill_stream.next().await.is_none());

so adding the stream parallelism 1,next() returns none, ending the stream

Node logs

2024-12-10T19:22:57.969204Z DEBUG exex::backfill: Executing block range range=0..=499999
2024-12-10T19:23:16.131201Z DEBUG exex::backfill: Finished executing block range range=0..=499999 block_fetch=2.753609197s execution=15.289312944s throughput="1.89 Ggas/second"
(and then the stream returns)

Platform(s)

No response

Container Type

Not running in a container

What version/commit are you on?

v1.1.3

What database version are you on?

2

Which chain / network are you on?

dev

What type of node are you running?

Archive (default)

What prune config do you use, if any?

No response

If you've built Reth from source, provide the full command you used

No response

Code of Conduct

  • I agree to follow the Code of Conduct
@0xalex88 0xalex88 added C-bug An unexpected or incorrect behavior S-needs-triage This issue needs to be labelled labels Dec 10, 2024
@mattsse
Copy link
Collaborator

mattsse commented Dec 10, 2024

the stream ends after returning a single chunk, instead of continuing executing the next chunk in the full backfill range.

so backfilling one block at a time it fails because assert!(backfill_stream.next().await.is_none()) is not none (as expected)

these two are conflicting, which one is it?

@0xalex88 to me it feels like you're mixing your actual issue together with this repro? but I'm not exactly sure

@shekhirin the reason this test setup fails is because we don't have that block 1 in the test database setup hence the stream returns an error. so imo this repro seems unrelated

@mattsse
Copy link
Collaborator

mattsse commented Dec 10, 2024

@shekhirin the second repro is a logic bug if parallelism is one

because we can end up returning None here

but never spawn the second task for block 2

@0xalex88
Copy link
Author

0xalex88 commented Dec 10, 2024

these two are conflicting, which one is it?

in the first quote I'm mentioning my use case, where backfilling from block 0 to block 13M (chain tip) spawns a task with range 0..=50k but nothing more, so the remaining 50k-13M blocks are not backfilled

in the second quote I'm pointing out the case where the test fails when parallelism = 2, which is correct (because there's one more block range to process) but the same test doesn't fail (even if it should) when parallelism = 1

@mattsse
Copy link
Collaborator

mattsse commented Dec 10, 2024

gotcha, was a bit confused at first, maybe because it's late here.

there's indeed an issue with parallelism 1

@0xalex88
Copy link
Author

gotcha, was a bit confused at first, maybe because it's late here.

there's indeed an issue with parallelism 1

it wasn't very clear, my bad!
anyway thanks for the fix, I've tested it and it works as expected now!

@github-project-automation github-project-automation bot moved this from Todo to Done in Reth Tracker Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug An unexpected or incorrect behavior S-needs-triage This issue needs to be labelled
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants