[Tracking issue] Setup a continuous benchmark for single-shard throughput #11348

aborg-dev · 2024-05-20T14:11:06Z

This is a tracking issue for setting up the benchmark that runs every day on the latest version of nearcore with a purpose of measuring the throughput that we can achieve on a representative workload.
It is a first step towards #10885

At the moment we are focusing on the following setup:

NEAR network with a single-shard
Traffic generated with Locust Fungible Token (FT) benchmark
The size of the FT contract needs to be big enough to expose costs of storage operations similar to the ones on mainnet
Network settings close to mainnet (e.g. epoch length of ~24 hours)

The current findings are described in this document.

Tasks

Give feedback

aborg-dev · 2024-05-20T14:51:51Z

A related issue about the same benchmark with multiple shards: #11347

This PR aims to MVP FT transfer benchmark (#11348). This `.sh` script restarts locust and neard if there are changes on remote. It will be runned on one of our VMs. Later it probably should be triggered by CI job, but for now we can just make a cron job which will run it time to time. Co-authored-by: Viktar Makouski <viktar@neaar.org>

MCJOHN974 · 2024-05-24T17:02:21Z

Status update:

Currently don't work, hope #11395 will fix that. Setup I'm trying to start is pretty simple -- 1 node, 1 shard, locust and neard on same machine, FT contract state flashing on each commit.

MCJOHN974 · 2024-05-28T17:53:56Z

Upd: now we have continuous benchmark, description: #11404.

aborg-dev · 2024-06-12T08:14:49Z

A summary of the current progress:

We have ran the benchmark with multiple configurations:
- With 1, 2 and 6 shards - the throughput scales roughly linearly for this workload
- With 10 PGas gas limit - the throughput scales roughly linearly for this workload
- With in-memory trie enabled - didn't see any noticeable improvement at the current state size, need further investigation
- On different hardware - the throughput heavily depends on it and has disk and CPU as bottlenecks
We have a database to store benchmark results and Grafana dashboard visualizing it
Experimented with a creation of a larger state for FT contract, this needed further optimizations to be performant enough
Set up automated runs of the benchmark every day

Next steps:

Publish benchmark results to DB and Grafana automatically - @MCJOHN974 working on this
Make benchmark more representative
- Larger contract state size - @Ekleog working on this
- Larger network (>5 validators) - @Akashin working on this
Investigate disk bottleneck at high throughput
Automate search of optimal gas limit - @mooori working on this
Quantify effects of in-memory trie

aborg-dev · 2024-06-19T15:12:17Z

A summary of the current progress:

@Akashin , @Ekleog Prepared a larger contract state for the benchmark (5 GB of contract size, 250GB of on-disk state): Identify and document the setup for large contract state for Locust FT benchmark #11359
@MCJOHN974 together with @mooori managed to send the first data point to benchmark DB from GCP using the end-to-end script [ft-benchmark] some fixes for benchmark infra #11604
@Akashin measured effects of in-memory in a setup with no trie caches and minimal state - it yields ~30% throughput improvement
@Akashin submitted Terraform scripts for setting up machines for a multi-node run: https://github.com/Near-One/infra-ops/pull/133

Next steps:

Find interesting workloads to test with a large contract size - this is not trivial, as the workload needs to keep the working set large, as otherwise all state will be cached and we will see no difference with a setup with no state (@Akashin)
Set up a cron job to run the end-to-end script on GCP every day (@MCJOHN974)
Run experiment with larger gas limit of 3 PGas (@MCJOHN974)
Prototype automatic gas limit adjustment for Introduce a way to change capacity of the network on the fly in FT benchmark #11460 (@mooori)
Run the benchmark with multiple validator nodes (@Akashin)

aborg-dev · 2024-07-03T09:13:57Z

To summarize latest progress:

We now have a benchmark that runs every 2 hours and is visualized on Grafana (by @MCJOHN974)
The large contract state is ready and can be used in the benchmark (@Akashin)
- Example run can be found in https://grafana.nearone.org/goto/TYz8A3wIR?orgId=1, https://grafana.nearone.org/goto/TYz8A3wIR?orgId=1
- It behaves as expected and slows down the network operation
- We still need to tune the size of the state that we actually want to use and enable it in the periodic benchmark runs
The work on automatic gas limit is in progress and we have a prototype that searches for optimal limit (@mooori)

I think we can close this off and continue the future work in #11680:

Running a benchmark with multiple nodes
Using optimal gas limit
Using appropriate state size

aborg-dev self-assigned this May 20, 2024

aborg-dev mentioned this issue May 20, 2024

Metric: number of fungible token transfers #10885

Open

MCJOHN974 mentioned this issue May 24, 2024

[ft transfer benchmark] Restart locust script #11391

Merged

MCJOHN974 self-assigned this May 24, 2024

aborg-dev mentioned this issue Jun 15, 2024

Investigate high memory and disk usage for FT benchmark #11585

Closed

aborg-dev mentioned this issue Jun 27, 2024

[Tracking issue] Measure performance improvements of stateless validation on FT benchmark #11680

Closed

3 tasks

aborg-dev closed this as completed Jul 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Tracking issue] Setup a continuous benchmark for single-shard throughput #11348

[Tracking issue] Setup a continuous benchmark for single-shard throughput #11348

aborg-dev commented May 20, 2024 •

edited

Loading

Tasks

aborg-dev commented May 20, 2024

MCJOHN974 commented May 24, 2024

MCJOHN974 commented May 28, 2024

aborg-dev commented Jun 12, 2024 •

edited

Loading

aborg-dev commented Jun 19, 2024

aborg-dev commented Jul 3, 2024

[Tracking issue] Setup a continuous benchmark for single-shard throughput #11348

[Tracking issue] Setup a continuous benchmark for single-shard throughput #11348

Comments

aborg-dev commented May 20, 2024 • edited Loading

Tasks

aborg-dev commented May 20, 2024

MCJOHN974 commented May 24, 2024

MCJOHN974 commented May 28, 2024

aborg-dev commented Jun 12, 2024 • edited Loading

aborg-dev commented Jun 19, 2024

aborg-dev commented Jul 3, 2024

aborg-dev commented May 20, 2024 •

edited

Loading

aborg-dev commented Jun 12, 2024 •

edited

Loading