Introduce a way to change capacity of the network on the fly in FT benchmark #11460

aborg-dev · 2024-06-03T11:14:44Z

At the moment we can only easily change the capacity of the network at the startup time by increasing the gas_limit.

For the FT throughput experiment, we want to find the largest capacity at which the network can still operate in a stable way. One way to achieve this is to be able to gradually increase the gas_limit over the course of the experiment until we reach a critical point.

It is already possible to increase gas_limit a bit at every block height - we should try to leverage that for this experiment.

The text was updated successfully, but these errors were encountered:

mooori · 2024-06-18T14:32:36Z

Current situation

It seems like gas_limit is read from genesis config and then passed on without modification into structs that determine the gas_limit such as ChunkExtra, ShardChunkHeader, ApplyState, .... In line with that, GAS_LIMIT_ADJUSTMENT_FACTOR is currently not used to change the gas_limit.

Proposal for approach to on the fly changes

When applying a chunk that reaches the gas_limit in less than 1 second - margin, the chunk producer increases gas_limit by GAS_LIMIT_ADJUSTMENT_FACTOR.
- When the chunk does not reach the gas_limit, predictions regarding capacity are more difficult to make, hence the gas_limit is not changed for now.
When applying a chunk takes more than 1 second + margin, the chunk producer reduces the gas_limit by GAS_LIMIT_ADJUSTMENT_FACTOR.
- Apply times larger than 1 second should be avoided and hence gas_limit is decreased without checking further conditions.
The apply time of 1 second is chosen since that is the current target for mainnet. The purpose of margin is to prevent gas_limit from flip-flopping around an equilibrium.

Underlying assumption

When running multi-node benchmarks, all nodes have the same hardware and configuration and should therefore be able to handle the same load. Hence if one node increases the gas_limit because it can handle more load, other nodes should be able to keep up.

Discussion

This is a rough heuristic, but it has the advantage of being independent of specific hardware and traffic. Therefore I think it can be a starting point for on the fly adjustments of the gas_limit.

Next steps

Implement a proof of concept for gas_limit adjustments and check if it achieves reasonable gas_limits.
- In a single node setup.
- In a multi node setup.
First it can be on a separate branch to verify the approach. If it succeeds, this functionality must be separated from production code.
Allow configuring GAS_LIMIT_ADJUSTMENT_FACTOR to reach the equilibrium gas_limit in benchmark runs quickly. This might be possible since congestion in benchmark runs does not hurt real world users. Still, it should be verified that congestion does not pollute benchmark results.

Disclaimer

Adjusting the gas_limit on the fly touches on many concepts and places in the code base, so it might well be that I'm missing something here. I think it would be good to get started with something and build a better understanding of the topic while working on it. The heuristics for gas_limit adjustments can be refined later on in case this approach works.

@Akashin what do you think about this approach and the plan for the next steps?

aborg-dev · 2024-06-19T13:57:54Z

@pugachAG , @Longarithm - can you please suggest what is the right place to change the gas limit that the chunk producer proposes to use in the next chunk?

mooori · 2024-06-19T14:29:49Z

Notes from the offline discussion:

This feature is for benchmarks only, it's not intended to make it into production.
Due to periodic workloads (e.g. writing to disk) looking at a single block is not sufficient to always make reasonable gas_limit adjustments. More refined approaches could be:
- Looking at the last n blocks.
- Looking at apply chunk latency histograms.

pugachAG · 2024-06-20T15:24:09Z

@Akashin I suggest implementing that as part of runtime if possible. So currently we set the same gas_limit to NewChunkResult. Instead you can add gas_limit to ApplyChunkResult and then use that. That value then will be picked up by chunk producer when creating a new chunk.

mooori · 2024-07-18T11:40:22Z

cc #11808

mooori · 2024-08-02T10:46:58Z

Status update 2024-08-02

A draft implementation is available here.
- The approach based on prometheus quantiles is too sluggish as buckets contain metrics for the node's entire uptime. However, the decision to increase/decrease the gas limit depends mostly on recent chunk apply times.
- Looking only at the most recent chunk apply time and delayed receipts gas works well for me in local runs. I can start with a gas limit that is either too high or too low and the adjustment mechanism brings it to a reasonable stable state that maxes out node performance.
- Parameters need fine tuning, but before getting into that I suggest merging the feature into master.
Reviewing these changes will probably be easier when the work is split into two PRs:
- A PR that adds the configuration: feat(config): add config for dynamic gas_limit adjustment #11863
- A PR that adds the adjustment logic: TODO based on the draft linked above
As alternative adjusting the gas limit in RuntimeAdapter might be more future proof, e.g. when benchmarks will be run in a lightweight environment. However, as I understand it, currently the gas limit is an external parameter for the runtime which is passed in here for each chunk. Doing the adjustment in RuntimeAdapter would require adding a mechanism for RuntimeAdapter to maintain or pass back the gas limit that it adjusted internally. That might be more intrusive on critical code paths compared to the current approach.

aborg-dev mentioned this issue Jun 3, 2024

[Tracking issue] Setup a continuous benchmark for single-shard throughput #11348

Closed

aborg-dev assigned mooori Jun 15, 2024

github-actions bot mentioned this issue Jul 1, 2024

Monthly issue metrics report #11690

Open

aborg-dev mentioned this issue Jul 3, 2024

[Tracking issue] Measure performance improvements of stateless validation on FT benchmark #11680

Closed

3 tasks

mooori mentioned this issue Aug 2, 2024

feat(config): add config for dynamic gas_limit adjustment #11863

Closed

mooori mentioned this issue Aug 16, 2024

fix: remove contradictory gas limit check #11958

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce a way to change capacity of the network on the fly in FT benchmark #11460

Introduce a way to change capacity of the network on the fly in FT benchmark #11460

aborg-dev commented Jun 3, 2024 •

edited

Loading

mooori commented Jun 18, 2024

aborg-dev commented Jun 19, 2024

mooori commented Jun 19, 2024

pugachAG commented Jun 20, 2024

mooori commented Jul 18, 2024

mooori commented Aug 2, 2024

Introduce a way to change capacity of the network on the fly in FT benchmark #11460

Introduce a way to change capacity of the network on the fly in FT benchmark #11460

Comments

aborg-dev commented Jun 3, 2024 • edited Loading

mooori commented Jun 18, 2024

Current situation

Proposal for approach to on the fly changes

Underlying assumption

Discussion

Next steps

Disclaimer

aborg-dev commented Jun 19, 2024

mooori commented Jun 19, 2024

pugachAG commented Jun 20, 2024

mooori commented Jul 18, 2024

mooori commented Aug 2, 2024

Status update 2024-08-02

aborg-dev commented Jun 3, 2024 •

edited

Loading