Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rewrite some flaky zombienet polkadot tests to zombienet-sdk #6757

Open
wants to merge 39 commits into
base: master
Choose a base branch
from

Conversation

alindima
Copy link
Contributor

@alindima alindima commented Dec 4, 2024

Will fix:
#6574
#6644
#6062

@alindima alindima added R0-silent Changes should not be mentioned in any release notes T8-polkadot This PR/Issue is related to/affects the Polkadot network. T10-tests This PR/Issue is related to tests. labels Dec 4, 2024
@alindima alindima requested a review from a team as a code owner December 4, 2024 15:15
@pepoviola
Copy link
Contributor

Hi @alindima, I think this is failing because we need to hide the helpers mod behind the zombie-metadata feature. If you are ok I will move to helpers dir and change the imports.
Thx!

@alindima alindima requested review from a team and koute as code owners December 10, 2024 14:41
@paritytech-review-bot paritytech-review-bot bot requested a review from a team December 10, 2024 14:41
Base automatically changed from alindima/rfc-103-test to master December 18, 2024 08:33
@alindima alindima changed the title [WIP] rewrite some flaky zombienet polkadot tests to zombienet-sdk rewrite some flaky zombienet polkadot tests to zombienet-sdk Dec 18, 2024
Copy link
Member

@eskimor eskimor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thanks Alin!

assert_para_throughput(
&relay_client,
15,
[(2000, 40..46), (2001, 12..16)].into_iter().collect(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These ranges don't make sense to me. 40 > 3 *12 .. are we too strict for the elastic scaling case or too lenient for the non-elastic scaling case?

Also note: I absolutely had to lookup the code of assert_para_throughput to understand what this is doing. Not a big deal, but e.g. using types like ParaId would have made it easier.

Copy link
Contributor Author

@alindima alindima Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These ranges don't make sense to me. 40 > 3 *12 .. are we too strict for the elastic scaling case or too lenient for the non-elastic scaling case?

So these ranges mean that:

we are waiting until we reached 15 finalized blocks (after the first session change, and don't count blocks with session changes).
Then, check that the number of backed blocks for para 2000 is within the 40..46 range (ideally it should be 45, since it's 3*15, but in reality the performance in the CI is not as good).
And check that the number of backed blocks for para 2001 is within the 12..16 range (this para only has one assigned core, so it can get at most 15 blocks in, again allow some buffer here for less performant hardware)

Also note: I absolutely had to lookup the code of assert_para_throughput to understand what this is doing. Not a big deal, but e.g. using types like ParaId would have made it easier.

I can do that 👍🏻

.wait_for_finalized_success()
.await?;

log::info!("1 more core assigned to the parachain");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the previous test we had the assurance that adding additional cores worked because of the throughput assertion. Here we don't have this. I would add a check that the para has indeed two cores (and is still working with the throughput of 1)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, makes sense

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


// Assert the parachain finalized block height is also on par with the number of backed
// candidates.
assert_finalized_block_height(&para_node.wait_client().await?, 6..9).await?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I get that we don't want flaky tests. It is kind of concerning that we let tests pass if we do less blocks than expected. We definitely need some tests in some fashion ensuring that we are not degrading.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I get that we don't want flaky tests. It is kind of concerning that we let tests pass if we do less blocks than expected.

I mean, we've always done this in the CI so far. The only way of fixing this IMO is to invest more in the reliability and performance of the CI machines

@paritytech-workflow-stopper
Copy link

All GitHub workflows were cancelled due to failure one of the required jobs.
Failed workflow url: https://github.com/paritytech/polkadot-sdk/actions/runs/12410396700
Failed job name: test-linux-stable-no-try-runtime

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
R0-silent Changes should not be mentioned in any release notes T8-polkadot This PR/Issue is related to/affects the Polkadot network. T10-tests This PR/Issue is related to tests.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants