Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure predictable latency to apply chunks #11143

Closed
3 of 5 tasks
aborg-dev opened this issue Apr 24, 2024 · 3 comments
Closed
3 of 5 tasks

Ensure predictable latency to apply chunks #11143

aborg-dev opened this issue Apr 24, 2024 · 3 comments
Labels
A-contract-runtime Area: contract compilation and execution, virtual machines, etc C-tracking-issue Category: a tracking issue T-contract-runtime Team: issues relevant to the contract runtime team

Comments

@aborg-dev
Copy link
Contributor

aborg-dev commented Apr 24, 2024

This is a tracking issue for a top-level goal to ensure that 99-percentile of apply chunk latency across all shards is below 800ms on the recommended validator hardware on mainnet.

Setup

Preview Give feedback

Workloads

Preview Give feedback
@aborg-dev aborg-dev added A-contract-runtime Area: contract compilation and execution, virtual machines, etc T-contract-runtime Team: issues relevant to the contract runtime team C-tracking-issue Category: a tracking issue labels Apr 24, 2024
@aborg-dev
Copy link
Contributor Author

aborg-dev commented Apr 24, 2024

Here is the VM Instance: https://console.cloud.google.com/compute/instancesDetail/zones/europe-west4-a/instances/crt-mainnet?hl=en&project=nearone-crt. It is using this terraform script: https://github.com/Near-One/infra-ops/pull/64

It is currently downloading blocks

Apr 24 13:00:13 crt-mainnet neard[6014]: 2024-04-24T13:00:13.293735Z  INFO stats: #117520067 Downloading blocks 17.79% (2519 left; at 117520067) 32 peers ⬇ 10.0 MB/s ⬆ 4.23 MB/s 1.00 bps 958 Tgas/s CPU: 472%, Mem: 9.67 GB

I believe it should automatically appear on nearone Grafana, I'll share the link when it does.

@aborg-dev
Copy link
Contributor Author

The node has been provisioned and is now running, here are the relevant Grafana dashboards:

I already see some long chunk apply latencies on shards 2, 3 and 5.

I think we can start with 2s threshold for alert, I see 2 such events on shard 5 in the last 6 hours. I'll add some logging to the node to find out the problematic block heights.

@aborg-dev
Copy link
Contributor Author

We decided to decommission the node and instead focus on the FT benchmark, so closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-contract-runtime Area: contract compilation and execution, virtual machines, etc C-tracking-issue Category: a tracking issue T-contract-runtime Team: issues relevant to the contract runtime team
Projects
None yet
Development

No branches or pull requests

1 participant