Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky optimism p2p::can_sync test #8166

Closed
Rjected opened this issue May 8, 2024 · 10 comments · Fixed by #9113
Closed

Flaky optimism p2p::can_sync test #8166

Rjected opened this issue May 8, 2024 · 10 comments · Fixed by #9113
Labels
A-op-reth Related to Optimism and op-reth C-bug An unexpected or incorrect behavior C-test A change that impacts how or what we test D-good-first-issue Nice and easy! A great choice to get started M-prevent-stale Prevents old inactive issues/PRs from being closed due to inactivity

Comments

@Rjected
Copy link
Member

Rjected commented May 8, 2024

A recent merge queue run failed in a way that indicates we have something flaky in p2p::can_sync:
https://github.com/paradigmxyz/reth/actions/runs/9007839847/job/24748566483

The important logs:

2024-05-08T20:09:21.442257Z ERROR reth_node_builder::launch::common: Failed to build global thread pool: ThreadPoolBuildError { kind: GlobalPoolAlreadyInitialized }
2024-05-08T20:09:26.441518Z ERROR reth_node_builder::launch::common: Failed to build global thread pool: ThreadPoolBuildError { kind: GlobalPoolAlreadyInitialized }
2024-05-08T20:09:44.902706Z ERROR blockchain_tree: Reverting canonical chain failed with error: OptimisticTargetRevert(88)

We might need something lazy for the thread pool, or it might have to do with launching multiple nodes? Have not debugged this in depth

The error is thrown here:

/// Configure global settings this includes:
///
/// - Raising the file descriptor limit
/// - Configuring the global rayon thread pool
pub fn configure_globals(&self) {
// Raise the fd limit of the process.
// Does not do anything on windows.
let _ = fdlimit::raise_fd_limit();
// Limit the global rayon thread pool, reserving 2 cores for the rest of the system
let _ = ThreadPoolBuilder::new()
.num_threads(
available_parallelism().map_or(25, |cpus| max(cpus.get().saturating_sub(2), 2)),
)
.build_global()
.map_err(|e| error!("Failed to build global thread pool: {:?}", e));
}

@Rjected Rjected added C-bug An unexpected or incorrect behavior C-test A change that impacts how or what we test A-op-reth Related to Optimism and op-reth labels May 8, 2024
@mattsse
Copy link
Collaborator

mattsse commented May 8, 2024

we can simply remove the error log here

@mattsse mattsse added the D-good-first-issue Nice and easy! A great choice to get started label May 8, 2024
@Rjected
Copy link
Member Author

Rjected commented May 8, 2024

we can simply remove the error log here

The test is timing out though, so we need to do something else to fix the test probably

@onbjerg
Copy link
Member

onbjerg commented May 8, 2024

This is because you can't have more than one threadpool per process, I ran into this with one of the benchmarks we hadn't kept up to date, too. Not entirely sure what the best way to circumvent this is

Edit: Actually this is a bit different, but might be a similar error. I ran into this with Tokio (the benchmark tool spawned a tokio runtime, after which the function we called did the same)

@Rjected
Copy link
Member Author

Rjected commented May 8, 2024

Actually the flake might be more related to

2024-05-08T20:09:44.902706Z ERROR blockchain_tree: Reverting canonical chain failed with error: OptimisticTargetRevert(88)

But I still need to repro

@onbjerg
Copy link
Member

onbjerg commented May 8, 2024

Yeah reading the threadpoolbuilder docs, it returns an error but does not mention anything about it being fatal or stalling, so should be ok: https://docs.rs/rayon/latest/rayon/struct.ThreadPoolBuilder.html#method.build_global

@joshieDo
Copy link
Collaborator

joshieDo commented May 9, 2024

That specific error log is expected and part of the e2e test @Rjected
so I'm thinking it's really about the long duration of the test (20s if i recall) unsure now on why its timing out

@Rjected
Copy link
Member Author

Rjected commented May 22, 2024

@joshieDo do you mind taking a look at this? I haven't run into this in a while though, so it may just be difficult to repro

Copy link
Contributor

This issue is stale because it has been open for 21 days with no activity.

@github-actions github-actions bot added the S-stale This issue/PR is stale and will close with no further activity label Jun 13, 2024
@shekhirin
Copy link
Collaborator

I believe it's not an issue anymore, please re-open if occurs again.

@github-project-automation github-project-automation bot moved this from Todo to Done in Reth Tracker Jun 13, 2024
@Rjected
Copy link
Member Author

Rjected commented Jun 13, 2024

@Rjected Rjected reopened this Jun 13, 2024
@github-project-automation github-project-automation bot moved this from Done to In Progress in Reth Tracker Jun 13, 2024
@Rjected Rjected added M-prevent-stale Prevents old inactive issues/PRs from being closed due to inactivity and removed S-stale This issue/PR is stale and will close with no further activity labels Jun 13, 2024
@github-project-automation github-project-automation bot moved this from In Progress to Done in Reth Tracker Jun 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-op-reth Related to Optimism and op-reth C-bug An unexpected or incorrect behavior C-test A change that impacts how or what we test D-good-first-issue Nice and easy! A great choice to get started M-prevent-stale Prevents old inactive issues/PRs from being closed due to inactivity
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants