Add new acceptance test to soak test BFT chains #7023

matthew1001 · 2024-04-30T16:37:49Z

PR description

This PR adds a new long-running acceptance test that is designed as a soak test for BFT chains.

It is disabled by default (i.e. acceptanceTestCliqueBft does not run this test) but can be run as an individual acceptance test with ./gradlew acceptanceTestBftSoak.

The current acceptance test tasks for gradle have not been updated to run the new test to ensure that PRs do not require an hour-long test before being mergeable. It is intended to be run manually by anyone who wants to easily run a soak test of the latest code on a QBFT or IBFT chain. Future discussions with the maintainers might decide to add the test somewhere else in the release pipeline.

By default the test runs for 60 minutes, but can be run for much longer by configuring the acctests.soakTimeMins system property in build.gradle. It has been tested for runs of up to 6 hours so far.

Here's a summary of what the test does:

Creates a 4-validator QBFT chain with berlin fork
Starts the nodes
Deploys a smart contract to create some initial state
Attempts to deploy a shanghai contract and checks that it fails
Lets the chain continue mining empty blocks for a proportion of the total run time (e.g. for a 60 minute run it will do this step for 10 minutes)
- Every minute it checks that the chain height is increasing, with some tolerance allowed
Stops a node and checks the chain continues to mine blocks
- Again every minute it checks that the chain height is increased, but with more tolerance for the slower mining rate
Stops another node and checks the the chain now stalls
Starts both of the stopped nodes. Waits for the chain to continue mining blocks again
Stops all of the nodes one-by-one, upgrading them from berlin to london fork
Checks that the chain continues mining new blocks once the nodes have restarted
Does another check that the state of the smart contract deployed at step 3 is correct (periodic updates are made to the state of the contract in between the other steps so the last check is to make sure the state is correct from the latest update)
Upgrades the chain from london to shanghai
Attempts to deploy the shanghai contract that failed at step 4 and checks that it was successful

If the test is configured to run for twice as long (e.g. 2 hours) each of the main steps runs for twice as long. The test cannot be configured to run for < 1 hour as the timings for the nodes resyncing & QBFT rounds being agreed upon after the chain has stalled have minimum reliable values. There's not theoretical upper limit to how long the test can run for.

Future additions I'd like to add to the test:

~~Upgrade the chain from london to shanghai and check that the shanghai contract is deployed successfully~~ Added under the latest commit

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>

…n delays Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>

….sol files Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>

macfarla

overall I like the idea of extra tests that can run on demand. Just a bit nervous about deterministically allocating the ports

macfarla · 2024-05-02T06:18:39Z

.../main/java/org/hyperledger/besu/tests/acceptance/dsl/node/configuration/BesuNodeFactory.java

+                Math.abs(name.hashCode() % 60000)
+                    + 1024
+                    + 500) // Generate a consistent port for p2p based on node name (+ 500 to avoid
+            // clashing with RPC port or other nodes with a similar name)


fixing the ports makes me nervous - we've had flaky tests for this reason in the past. Realise it's unlikely but will it be obvious if there is a port conflict at this point?

createQbftNode is used by other tests so would not want to introduce any flakiness here even if it's unlikely

Yes I agree - it's a little difficult to get the choice right re. hard-coding or generating each time. The issue I was hitting was that stopping and starting a node caused it to be restarted with a new p2p node, so either nodes couldn't connect back to the bootnode (if the bootnode had been stopped and restarted) or the other nodes couldn't contact it - I think because its enode had been cached based on the node ID still being the same. The best thing I could think of was to ensure that for a given node name, the port will always be the same. That way if a test creates N nodes of different names you can guarantee they won't clash.

All of the current acceptance tests are still passing with this change in place, so maybe it's OK to leave and see if anyone hits any other issue with test nodes?

An alternative would be to create another version of createQbftNode() that uses fixed ports and one that doesn't. But it feels a little over the top if we're not seeing any port clashes currently.

@matthew1001 Would be a good idea to refactor method with signature public BesuNode createQbftNode(final String name, final int port). Then the existing method can pass 0 to use random port? while new tests can pass calculated ports?

Yes I'd be happy to do that if we think fixing ports in the way I have done is going to be problematic. I'd personally vote for it being fairly stable (certainly the existing acceptance tests all still pass) but v happy to refactor in this way if the consensus is to do so.

…e the time taken to mine new blocks Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>

usmansaleem · 2024-05-02T23:27:56Z

...sts/tests/src/test/java/org/hyperledger/besu/tests/acceptance/bftsoak/BftMiningSoakTest.java

+    // in between certain steps. There should be no upper-limit to how long the test is run for
+    assertThat(getTestDurationMins()).isGreaterThanOrEqualTo(MIN_TEST_TIME_MINS);
+
+    final BesuNode minerNode1 = nodeFactory.createNode(besu, "miner1");


isnt this supposed to be createQbftNode??

It actually ends up being createQbftNode - the createNode call ^^ maps to this definition for the parameterised factory:

public static Stream<Arguments> getFactories() { return Stream.of( Arguments.of( "ibft2", new BftAcceptanceTestParameterization( BesuNodeFactory::createIbft2Node, BesuNodeFactory::createIbft2NodeWithValidators)), Arguments.of( "qbft", new BftAcceptanceTestParameterization( BesuNodeFactory::createQbftNode, BesuNodeFactory::createQbftNodeWithValidators))); <<<--- }

siladu

LGTM, thanks for the useful PR description and code comments.

I won't approve yet to give the others a chance to re-review.

Also, for reference, just wanted to point out a mainnet/engine api-focussed test that is along similar lines (upgrading to shanghai). For example, this request includes a tx that uses push0:
https://github.com/hyperledger/besu/blob/main/acceptance-tests/tests/src/test/resources/jsonrpc/engine/shanghai/test-cases/14_shanghai_newPayloadV2_push0_tx.json

Obviously very different approach to this PR, but worth highlighting the conceptual duplication. I also think it's too early to attempt any de-duplication.

Longer term, I personally think we could move to something more like this PR for mainnet, although maybe using something like https://github.com/kurtosis-tech/ethereum-package

siladu · 2024-05-03T03:14:19Z

acceptance-tests/tests/shanghai/shanghaicontracts/SimpleStorageShanghai.sol

+// solc SimpleStorageShanghai.sol --bin --abi --optimize --overwrite -o .
+// then create web3j wrappers with:
+// web3j generate solidity -b ./SimpleStorageShanghai.bin -a ./SimpleStorageShanghai.abi -o ../../../../../ -p org.hyperledger.besu.tests.web3j.generated
+contract SimpleStorageShanghai {


Not a blocker, but might be interesting to include a shanghai-specific opcode in this contract to be sure that Shanghai EVM code is triggering. Sorry my solidity experience is next to zero, so can't suggest anything, though I imagine some shanghai contract tests exist somewhere in the community and we could steal one!

Yeah I did check this to make sure SimpleStorage uses PUSH0 if compiled for the shanghai EVM by compiling SimpleStorage with different EVM settings in Remix and checking the op-codes. It does use PUSH0 when compiled for shanghai. Also the test currently checks that deploying SimpleStorageShanghai to the chain when it's still on the berlin fork fails, so I think that confirms it uses PUSH0.

siladu · 2024-05-03T03:15:55Z

...sts/tests/src/test/java/org/hyperledger/besu/tests/acceptance/bftsoak/BftMiningSoakTest.java

+  @ParameterizedTest(name = "{index}: {0}")
+  @MethodSource("factoryFunctions")


Is there value in testing IBFT2 every time as well? Wondering if we could make this QBFT-specific since it's recommended over IBFT2. Or maybe separate them...guess it depends on how these are intended to be run.

I went with just re-using the combinations that all the current BFT acceptance tests use because I think we might still have users on IBFT2 with existing chains who still want to move up to shanghai. I'd personally suggest that we remove IBFT2 from all or none (unless we see issues with the run time for this particular test)

acceptance-tests/tests/shanghai/build.gradle

siladu · 2024-05-03T05:00:59Z

acceptance-tests/tests/build.gradle

+publishing {
+  publications {
+    mavenJava(MavenPublication) { artifactId 'acceptance-tests-tests' }
+  }


would this result in publishing these artifacts to artifactory? We currently publish libs to hyperledger jfrog as part of the release using artifactoryPublish task

That's a good question. I've got to admit added this to avoid a gradle build failure and wasn't quite sure of the side-affects (other than it fixed the build! :) )

OK - think the latest commit sorts this out @siladu. I've removed the publish specs and just disabled jar building for the new shanghai project.

Co-authored-by: Simon Dudley <simon.dudley@consensys.net> Signed-off-by: Matt Whitehead <matthew1001@hotmail.com>

matthew1001 · 2024-05-03T10:15:40Z

Also, for reference, just wanted to point out a mainnet/engine api-focussed test that is along similar lines (upgrading to shanghai). For example, this request includes a tx that uses push0: https://github.com/hyperledger/besu/blob/main/acceptance-tests/tests/src/test/resources/jsonrpc/engine/shanghai/test-cases/14_shanghai_newPayloadV2_push0_tx.json

Yeah there are certainly a couple of different approaches used in the tests currently. The BFT acceptance tests use the web3j gradle extension to compile source .sol files at build time with specific EVM/compiler versions. That feels a little better than having to hard-code pre-compiled contracts into test files, although really I think both are perfectly valid. It does feel like it will be easy to extend to new EVM versions in the future with "simple" gradle changes/additions. The thing I feel most strongly about is having the option to invoke arbitrary numbers of transactions during a test (i.e. not use pre-created transaction payloads with hard-coded nonces in).

Signed-off-by: Matt Whitehead <matthew.whitehead@kaleido.io>

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>

… project Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>

macfarla · 2024-05-16T00:36:24Z

hit a failure in starting the cluster in a BFT AT https://github.com/hyperledger/besu/actions/runs/9104377722/job/25028115138

BftMiningAcceptanceTest > shouldStillMineWhenANonProposerNodeFailsAndHasSufficientValidators(String, BftAcceptanceTestParameterization) > 1: ibft2 FAILED

Nothing in the stack trace to tell me whether it's a port conflict.

But - line 450 is waiting for the ports file so that makes me think it is
at app//org.hyperledger.besu.tests.acceptance.dsl.node.BesuNodeRunner.waitForFile(BesuNodeRunner.java:51)
at app//org.hyperledger.besu.tests.acceptance.dsl.node.ProcessBesuNodeRunner.startNode(ProcessBesuNodeRunner.java:450)

That's the only thing stopping me from approving this, that I don't want to add any more flakiness into the ATs.

if you refactor that method @matthew1001 so that port 0 can be used for existing tests, and the deterministic port can be used for the soak test, I'll approve

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>

macfarla · 2024-05-22T02:02:57Z

the "compile" error is this

* What went wrong:
Execution failed for task ':checkMavenCoordinateCollisions'.
> Duplicate maven coordinates detected, .:besu:24.5-develop-be7c665 is used by both :acceptance-tests:tests and :acceptance-tests:tests:shanghai.

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>

matthew1001 · 2024-05-22T13:21:12Z

Yeah was just looking at that issue @macfarla It turned out that the approach I used of setting jar { enabled = false } prevented the shanghai contract from being available on the classpath at test runtime. So I re-instated the jar build, but that has re-introduced the earlier error I was seeing where the acceptance-tests:tests and acceptance-tests:tests:shanghai generate the same maven artifact coordinate.

I don't believe we actual publish acceptance test artifacts to maven, and the coordinate it says is clashing is .:besu:24.5-develop-be7c665, which looks like an incomplete artifact coordinate. The other ones we publish are things like org.hyperledger.besu.internal:rlp:24.5-develop-be7c665.

I've just pushed an update which modifies the coordinate-check task. I'm not a gradle expert so I think I'm probably missing a better/simpler fix.

matthew1001 · 2024-05-22T14:58:54Z

Compilation is looking OK again now @macfarla

macfarla · 2024-05-27T00:36:22Z

Yeah was just looking at that issue @macfarla It turned out that the approach I used of setting jar { enabled = false } prevented the shanghai contract from being available on the classpath at test runtime. So I re-instated the jar build, but that has re-introduced the earlier error I was seeing where the acceptance-tests:tests and acceptance-tests:tests:shanghai generate the same maven artifact coordinate.

I don't believe we actual publish acceptance test artifacts to maven, and the coordinate it says is clashing is .:besu:24.5-develop-be7c665, which looks like an incomplete artifact coordinate. The other ones we publish are things like org.hyperledger.besu.internal:rlp:24.5-develop-be7c665.

I've just pushed an update which modifies the coordinate-check task. I'm not a gradle expert so I think I'm probably missing a better/simpler fix.

@usmansaleem can you review the gradle changes?

Add new acceptance test to soak test BFT chains

ca540f5

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>

matthew1001 force-pushed the soak3 branch from 566e672 to ca540f5 Compare April 30, 2024 16:38

matthew1001 and others added 3 commits April 30, 2024 18:32

Spotless

9e6c26b

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>

Tidy up a little with re-usable start and stop functions with built i…

a51cd78

…n delays Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>

Merge branch 'main' into soak3

be50d8c

matthew1001 marked this pull request as ready for review May 1, 2024 07:31

Add shanghai version of Simple Storage contract

5c51c28

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>

matthew1001 force-pushed the soak3 branch from c0e83b0 to 5c51c28 Compare May 1, 2024 15:18

matthew1001 and others added 5 commits May 1, 2024 16:20

Put commented gradle code back in. Fix the web3j example commands in …

0e038f2

….sol files Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>

Spotless

f013043

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>

Set publication artifacts to avoid clash

fc100e2

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>

Merge branch 'main' into soak3

50a29f7

Exclude from regular acceptance tests

b59aa53

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>

macfarla reviewed May 2, 2024

View reviewed changes

matthew1001 added 2 commits May 2, 2024 16:15

Add shanghai fork to the test. Stall the chain for less time to reduc…

fd4bdb8

…e the time taken to mine new blocks Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>

Tidy up

48f0c32

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>

usmansaleem reviewed May 2, 2024

View reviewed changes

siladu reviewed May 3, 2024

View reviewed changes

Update acceptance-tests/tests/shanghai/build.gradle

2700089

Co-authored-by: Simon Dudley <simon.dudley@consensys.net> Signed-off-by: Matt Whitehead <matthew1001@hotmail.com>

matthew1001 and others added 3 commits May 3, 2024 16:12

Merge branch 'main' into soak3

31a68e1

Signed-off-by: Matt Whitehead <matthew.whitehead@kaleido.io>

Tidy up var names

b62d2d9

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>

Fix ports for IBFT2 as well as QBFT

0573d35

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>

matthew1001 force-pushed the soak3 branch from 59741c7 to 0573d35 Compare May 9, 2024 07:58

matthew1001 and others added 3 commits May 9, 2024 09:57

Merge branch 'main' into soak3

62282b1

Remove maven publish spec, disable jar building for shanghai contract…

242ce3d

… project Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>

web3j version

4753b28

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>

matthew1001 mentioned this pull request May 10, 2024

Implementing support for emptyBlockPeriodSeconds in QBFT (Issue #3810) #6965

Merged

8 tasks

Merge branch 'main' into soak3

c20b131

macfarla added 2 commits May 15, 2024 11:12

Merge branch 'main' into soak3

34003a9

Merge branch 'main' into soak3

89e7738

matthew1001 force-pushed the soak3 branch from 30b9774 to b59412e Compare May 21, 2024 13:40

Make fixed port optional when creating a BFT node

6ec0eeb

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>

matthew1001 force-pushed the soak3 branch from b59412e to 6ec0eeb Compare May 21, 2024 13:52

Merge branch 'main' into soak3

be7c665

Only check artifact coordinates for those starting 'org.*'

06b6825

Signed-off-by: Matthew Whitehead <matthew1001@gmail.com>

Merge branch 'main' into soak3

8bbaa5b

macfarla approved these changes Jun 11, 2024

View reviewed changes

matthew1001 added 3 commits June 11, 2024 08:57

Merge branch 'main' into soak3

600a001

Merge branch 'main' into soak3

16ae7b6

Merge branch 'main' into soak3

0cd4420

matthew1001 merged commit b1ac5ac into hyperledger:main Jun 11, 2024
40 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new acceptance test to soak test BFT chains #7023

Add new acceptance test to soak test BFT chains #7023

matthew1001 commented Apr 30, 2024 •

edited

Loading

macfarla left a comment

macfarla May 2, 2024

macfarla May 2, 2024

matthew1001 May 2, 2024

usmansaleem May 2, 2024

matthew1001 May 3, 2024

usmansaleem May 2, 2024

matthew1001 May 3, 2024

siladu left a comment

siladu May 3, 2024 •

edited

Loading

matthew1001 May 3, 2024 •

edited

Loading

siladu May 3, 2024

matthew1001 May 3, 2024

siladu May 3, 2024

matthew1001 May 3, 2024

matthew1001 May 9, 2024

matthew1001 commented May 3, 2024 •

edited

Loading

macfarla commented May 16, 2024

macfarla commented May 22, 2024

matthew1001 commented May 22, 2024

matthew1001 commented May 22, 2024

macfarla commented May 27, 2024

		@ParameterizedTest(name = "{index}: {0}")
		@MethodSource("factoryFunctions")

Add new acceptance test to soak test BFT chains #7023

Add new acceptance test to soak test BFT chains #7023

Conversation

matthew1001 commented Apr 30, 2024 • edited Loading

PR description

macfarla left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

siladu left a comment

Choose a reason for hiding this comment

siladu May 3, 2024 • edited Loading

Choose a reason for hiding this comment

matthew1001 May 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matthew1001 commented May 3, 2024 • edited Loading

macfarla commented May 16, 2024

macfarla commented May 22, 2024

matthew1001 commented May 22, 2024

matthew1001 commented May 22, 2024

macfarla commented May 27, 2024

matthew1001 commented Apr 30, 2024 •

edited

Loading

siladu May 3, 2024 •

edited

Loading

matthew1001 May 3, 2024 •

edited

Loading

matthew1001 commented May 3, 2024 •

edited

Loading