-
Notifications
You must be signed in to change notification settings - Fork 668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[stateless_validation] Fix tests that fail with stateless validation enabled #10506
Closed
8 of 9 tasks
Labels
A-stateless-validation
Area: stateless validation
Comments
github-merge-queue bot
pushed a commit
that referenced
this issue
Jan 26, 2024
…10507) This PR uncomments two checks - We now filter chunks based on whether they have enough stake endorsed or not. - We validate chunk endorsements in block header. Enabling this breaks a bunch of tests mainly for two reasons - Some tests are handled by mocking out parts/doing manual block production which doesn't take care of chunk endorsement signatures - A number of tests that work with multiple validators don't have the appropriate network request handlers for chunk endorsement and state witness processing. These tests are impossible to explore and fix in a single PR and I've created a task list to help with this here: #10506 Note that we need to merge in the changes from #10501 , #10503 and #10504 before we can merge this. Till then probably the tests would keep failing Tip: Probably a good idea to review PR commit by commit
23 tasks
github-merge-queue bot
pushed a commit
that referenced
this issue
Jan 31, 2024
This test didn't work with stateless validation. Let's fix it by properly handling `ChunkStateWitness` and `ChunkEndorsement` messages. Refs: #10506
We should also look into nayduck, which indicates ~9 more failing tests since enabling endorsements https://nayduck.near.org/#/ |
This was referenced Jan 31, 2024
github-merge-queue bot
pushed a commit
that referenced
this issue
Jan 31, 2024
It looks like this test now works with stateless validation, so let's enable it again. I didn't need to make any changes to the test, maybe something was fixed in the meantime. Refs: #10506
52 tasks
github-merge-queue bot
pushed a commit
that referenced
this issue
Feb 5, 2024
…es writing integration tests easier (#10559) State witnesses are processed asynchronously - a task is spawned on a separate thread, which validates the state witness and then sends out chunk endorsements. Currently there's no way to directly wait for this processing to finish, one can only wait for the chunk endorsements to appear. This makes writing integration tests difficult. Let's add a way to wait for the processing to finish, which will make writing the tests easier. Details: There's no network stack in integration tests, all messages are passed around manually. We first propagate all state witness messages, which starts their processing in the background, but then we have to wait for this processing to finish before propagating chunk endorsements. As there's no way to wait for it, we have to do tricky things. The current approach is to define the set of expected chunk endorsements that should appear, and then wait for the right endorsement messages to be sent out. This approach has some limitations - it's often hard to figure out which endorsements should be sent out exactly, as the tests create various strange scenarios - missing chunks, blocks, forks, etc... It requires a lot of manual work to fully understand the test and prepare the right set. It's also quite fragile, any change in chunk endorsements (e.g not sending them to ourselves) will break all the tests. Adding the ability to wait for witness processing to finish greatly simplifies things. We can just pass around witnesses, wait for them to finish, and then propagate the generated endorsements. Without defining any expected sets to wait for. As an example of the benefits offered by this solution, compare the fix in #10537 to the one added in this PR. The one in #10537 is much more complex and fragile. I saw that there is also a thing called `ChunkEndorsementTracker`, which tracks what chunk endorsements were sent out, but IMO it's a different thing from `ProcessingDoneTracker`. We were already able to track which endorsements are sent out in the tests, but that didn't make writing tests easier. Additionally, in some tests we send out invalid chunks, which wouldn't produce any endorsements at all, so we couldn't wait for them. Refs: #10506, #10537, [zulip conversation](https://near.zulipchat.com/#narrow/stream/407237-pagoda.2Fcore.2Fstateless-validation/topic/A.20way.20to.20wait.20for.20state.20witness.20processing.20to.20finish/near/419062119)
github-merge-queue bot
pushed a commit
that referenced
this issue
Feb 6, 2024
…10571) ## Description The tests in `adversarial_behaviors.rs` didn't work with stateless validation, let's make them pass again. To achieve this we first need to propagate chunk state witnesses and endorsements, and then adjust the expected behavior to match the one exhibited by stateless validation. The most complex part of this PR is modifying which blocks should be skipped in `test_banning_chunk_producer_when_seeing_invalid_chunk_base()`. This function creates a blockchain with one malicious chunk producer and runs it for a few epochs. Which blocks will be skipped depends on whether we are using stateless validation or not. ### Without stateless validation In the old version of the protocol the invalid chunk is included in a block and then the whole block is validated. The chunk is invalid, so the whole block is also invalid and the block should be skipped. After that the malicious chunk producer is banned for the rest of the epoch, which means that block producers will reject chunks sent by this chunk producer, so the subsequent blocks won't contain any invalid chunks and won't be skipped. Only the first block with invalid chunks in an epoch should be skipped. ### With stateless validation With stateless validation the situation is entirely different. When the malicious chunk producer produces an invalid chunk, chunk validators will refuse to send out endorsements for this chunk, so the chunk won't be included in any block. This means that no blocks should be skipped. There is one exception - in the test the block produced at height 2 is invalid and gets skipped. I suspect that this is because the first few blocks after genesis don't use stateless validation, and they are still handled by old protocol. I'm not 100% sure if that's the case, @Longarithm could you confirm that this explanation makes sense? Something is definitely different for the first few blocks, in the logs I can see `client: Banning chunk producer for producing invalid` at height 2, which seems to belong to the old protocol. In the new version malicious chunk producers are banned using `BanPeer`, without any log messages. Refs: #10506, [zulip discussion](https://near.zulipchat.com/#narrow/stream/295558-pagoda.2Fcore/topic/Skipping.20the.20first.20block.20with.20missing.20chunks.20in.20an.20epoch.3F/near/419838610)
jancionear
added a commit
to jancionear/nearcore
that referenced
this issue
Feb 7, 2024
This test didn't work with stateless validation because it didn't propagate chunk state witnesses and endorsements as needed. I modified the test to use `TestEnv`, which allowed me to use the exisiting functionality for propagating all the partial chunk requests, state witnesses and endorsements. This changed the assignment of chunk and block producers. Previously one of the clients was a chunk producer and the other a block producer, with their roles swapped on each block. Now the roles are assigned based on the logic in `EpochManager`. IMO it doesn't affect the test, in the crucial moment when the bad chunk mask is tested, the chunk validator is different from the chunk producer, just like in the original test. The test manually produces a single chunk on shard 0, so I couldn't just use the default way of generating chunks during block processing. I had to implement a function that would manually produce the required chunk. Refs: near#10506
jancionear
added a commit
to jancionear/nearcore
that referenced
this issue
Feb 7, 2024
This test didn't work with stateless validation because it didn't propagate chunk state witnesses and endorsements as needed. I modified the test to use `TestEnv`, which allowed me to use the exisiting functionality for propagating all the partial chunk requests, state witnesses and endorsements. This changed the assignment of chunk and block producers. Previously one of the clients was a chunk producer and the other a block producer, with their roles swapped on each block. Now the roles are assigned based on the logic in `EpochManager`. IMO it doesn't affect the test, in the crucial moment when the bad chunk mask is tested, the chunk producer is different from the block producer, just like in the original test. The test manually produces a single chunk on shard 0, so I couldn't just use the default way of generating chunks during block processing. I had to implement a function that would manually produce the required chunk. Refs: near#10506
jancionear
added a commit
to jancionear/nearcore
that referenced
this issue
Feb 7, 2024
This test didn't work with stateless validation because it didn't propagate chunk state witnesses and endorsements as needed. I modified the test to use `TestEnv`, which allowed me to use the exisiting functionality for propagating all the partial chunk requests, state witnesses and endorsements. This changed the assignment of chunk and block producers. Previously one of the clients was a chunk producer and the other a block producer, with their roles swapped on each block. Now the roles are assigned based on the logic in `EpochManager`. IMO it doesn't affect the test, in the crucial moment when the bad chunk mask is tested, the chunk producer is different from the block producer, just like in the original test. The test manually produces a single chunk on shard 0 to achieve the desired chunk mask, so I couldn't just use the default way of generating chunks during block processing. I had to implement a function that would manually produce the required chunk. Refs: near#10506
jancionear
added a commit
to jancionear/nearcore
that referenced
this issue
Feb 7, 2024
This test didn't work with stateless validation because it didn't propagate chunk state witnesses and endorsements as needed. I modified the test to use `TestEnv`, which allowed me to use the exisiting functionality for propagating all the partial chunk requests, state witnesses and endorsements. This changed the assignment of chunk and block producers. Previously one of the clients was a chunk producer and the other a block producer, with their roles swapped on each block. Now the roles are assigned based on the logic in `EpochManager`. IMO it doesn't affect the test, in the crucial moment when the bad chunk mask is tested, the chunk producer is different from the block producer, just like in the original test. The test manually produces a single chunk on shard 0 to achieve the desired chunk mask, so I couldn't just use the default way of generating chunks during block processing. I had to implement a function that would manually produce the required chunk. Refs: near#10506
jancionear
added a commit
to jancionear/nearcore
that referenced
this issue
Feb 7, 2024
This test didn't work with stateless validation because it didn't propagate chunk state witnesses and endorsements as needed. I modified the test to use `TestEnv`, which allowed me to use the exisiting functionality for propagating all the partial chunk requests, state witnesses and endorsements. This changed the assignment of chunk and block producers. Previously one of the clients was a chunk producer and the other a block producer, with their roles swapped on each block. Now the roles are assigned based on the logic in `EpochManager`. IMO it doesn't affect the test, in the crucial moment when the bad chunk mask is tested, the chunk producer is different from the block producer, just like in the original test. The test manually produces a single chunk on shard 0 to achieve the desired chunk mask, so I couldn't just use the default way of generating chunks during block processing. I had to implement a function that would manually produce the required chunk. Refs: near#10506
github-merge-queue bot
pushed a commit
that referenced
this issue
Feb 8, 2024
This test didn't work with stateless validation because it didn't propagate chunk state witnesses and endorsements as needed. I modified the test to use `TestEnv`, which allowed me to use the exisiting functionality for propagating all the partial chunk requests, state witnesses and endorsements. This changed the assignment of chunk and block producers. Previously one of the clients was a chunk producer and the other a block producer, with their roles swapped on each block. Now the roles are assigned based on the logic in `EpochManager`. IMO it doesn't affect the test, in the crucial moment when the bad chunk mask is tested, the chunk producer is different from the block producer, just like in the original test. The test manually produces a single chunk on shard 0 to achieve the desired chunk mask, so I couldn't just use the default way of generating chunks during block processing. I had to implement a function that would manually produce the required chunk. Refs: #10506
This was referenced Feb 12, 2024
github-merge-queue bot
pushed a commit
that referenced
this issue
Feb 13, 2024
…10603) This test didn't work with stateless validation. It manually creates a chunk and blocks containing this chunk, but it didn't set the `chunk_endorsements` field in the blocks, because of which the blocks couldn't be accepted. Let's fix it by generating the correct chunk endoresment and setting the field in the generated blocks. Refs: #10506
cc. @shreyan-gupta to have another look at the current status of this task |
tayfunelmas
changed the title
[stateless_validation] Fix tests that fail stateless validation
[stateless_validation] Fix tests that fail with stateless validation enabled
Apr 12, 2024
github-merge-queue bot
pushed a commit
that referenced
this issue
May 6, 2024
To make chunks management tests work for stateless validation (#10506), I want to simplify them and fix the broken ones. One problem with these is that main assertions are hidden inside "peer_manager_mock" function which intercepts internal Block messages and checks that chunks are (not) included. This is very ugly. Instead, I create a separate loop which calls view_client GetBlock method sequentially for tested heights. That's the main part of change - unfortunately diff doesn't handle tabs, but you can find that checks move from `NetworkRequests::Block { block } => { ... }` to simple `async move { ... }`. And this allows to fix long-failing `chunks_recovered_from_full_timeout_too_short` tests. As pointed out on [Zulip](https://near.zulipchat.com/#narrow/stream/299922-nearone.2Fnode.2Fprivate.2Farchive/topic/.E2.9C.94.20oncall.20week.20Nov.206.202023/near/402138497), these tests never worked, they panicked because node test4 couldn't state sync, and once it got handled better, test4 stopped failing. Instead, we need to fix the assert condition. The check should be "if some messages to test4 are dropped, then we should assert that some chunks are missing in blocks it produces". This allows to replace `should_panic` with proper assertion. With `block.author` field we can look who produced block directly, instead of check `h % 4 == 3` which was in fact incorrect. Next steps are to handle stateless validation messages and see that everything still works. Could be nice to move that to TestLoop and make tests less expensive, but that's a long way. First step for that could be extending `multi.rs` with tests with assertions on sequences of blocks, not a single block. Nayduck https://nayduck.nearone.org/#/run/77 --------- Co-authored-by: Longarithm <the.aleksandr.logunov@gmail.com>
Closing this task. We can separately track any issues related to chunk management tests and other failing tests in nayduck. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
When trying to enable stateless validation, some of the integration tests are failing. I did some initial investigation and found all the problems are related to our testing infrastructure. It's impossible to fix all these in one go so I'm creating a task to track these.
Tasks
The text was updated successfully, but these errors were encountered: