`KeyValueRuntime` created in `TestEnv::restart` is incomplete #8269

Longarithm · 2022-12-22T11:17:05Z

In test_request_chunk_restart, when env.restart(0); is called, the Client instance is created again using setup_client function. The goal of test (apparently) is to check that client can respond to partial encoded chunk request even after restart.

But actually KeyValueRuntime created there is incomplete. It stores all data in memory, like hash_to_next_epoch, and fills it when get_epoch_and_valset is called, so there is some implicit assumption that it is called for all previous blocks. But for restart it is not the case, and if we call cares_about_shard, it leads to crash: https://buildkite.com/nearprotocol/nearcore/builds/23485#018535ca-c681-4f03-9a9f-bce3ce014ad5

The text was updated successfully, but these errors were encountered:

As discussed in #8193 (comment), flat storage creation makes more sense inside `Client` and `check_triggers`. `update_status` is a job which should be triggered periodically, and it doesn't have to be connected with finishing of block processing. To support that, we introduce config option `flat_storage_creation_period` which defines frequency with which creation status update will be triggered. Node owners could change it to higher values if this work executed in main thread is time consuming for some reasion. Also we fix `TestEnv::restart` a bit, because now we can call `cares_about_shard` in newly created client, and it fails, as described here: #8269. P.S. It makes #8254 not necessary because `Client` already has information about validator signer, what is even more convenient. ## Testing * test `test_flat_storage_creation` needed minor changes and still passes; * https://nayduck.near.org/#/run/2811: nayduck test `python3 pytest/tests/sanity/repro_2916.py` passes now - without a change, a node crashed on restart trying to create FS for non-tracked shard.

As discussed in near#8193 (comment), flat storage creation makes more sense inside `Client` and `check_triggers`. `update_status` is a job which should be triggered periodically, and it doesn't have to be connected with finishing of block processing. To support that, we introduce config option `flat_storage_creation_period` which defines frequency with which creation status update will be triggered. Node owners could change it to higher values if this work executed in main thread is time consuming for some reasion. Also we fix `TestEnv::restart` a bit, because now we can call `cares_about_shard` in newly created client, and it fails, as described here: near#8269. P.S. It makes near#8254 not necessary because `Client` already has information about validator signer, what is even more convenient. ## Testing * test `test_flat_storage_creation` needed minor changes and still passes; * https://nayduck.near.org/#/run/2811: nayduck test `python3 pytest/tests/sanity/repro_2916.py` passes now - without a change, a node crashed on restart trying to create FS for non-tracked shard.

Longarithm added the T-core Team: issues relevant to the core team label Dec 22, 2022

Longarithm mentioned this issue Dec 22, 2022

refactor: move background flat storage creaton to Client #8262

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`KeyValueRuntime` created in `TestEnv::restart` is incomplete #8269

`KeyValueRuntime` created in `TestEnv::restart` is incomplete #8269

Longarithm commented Dec 22, 2022

KeyValueRuntime created in TestEnv::restart is incomplete #8269

KeyValueRuntime created in TestEnv::restart is incomplete #8269

Comments

Longarithm commented Dec 22, 2022

`KeyValueRuntime` created in `TestEnv::restart` is incomplete #8269

`KeyValueRuntime` created in `TestEnv::restart` is incomplete #8269