Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework resource metrics (#472 #472

Merged
merged 10 commits into from
Apr 24, 2023
Merged

Conversation

Nutomic
Copy link
Contributor

@Nutomic Nutomic commented Apr 20, 2023

Summary of changes

  • Previous Grafana dashboard was broken for some reason, had to recreate it from scratch
  • Dashboard now includes resource metrics
  • chain_account_balance was never written, added that now

Reference issue to close (if applicable)


Code Checklist

  • Tested
  • Documented

@Nutomic Nutomic force-pushed the felix/rework-resource-metrics branch from fc77fcf to 91c2f4d Compare April 20, 2023 10:31
Cargo.toml Outdated Show resolved Hide resolved
let total_gas_spent = register_counter!(opts!(
"resource_total_gas_spent",
"Total number of gas spent on resource",
labels!("resource_id" => &resource_name)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that idea of using labels here for resource_ids!
what if we could even go further by making it more readable by having the following:

  1. the API stays the same, it operates over the resource Ids, however, we could have many labels like
    chain_type (evm, substrate, ..etc) and chain_id which is another label that represent the chainID.
    target_system_type (contract, tree_id, ..etc) and target_system_value (0xAe91...e28D, 4, ..etc)

What do you think? we can now filter by labels like evm and chain id 5 with contract and its value 0xAe91...e28D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will give this a try. Not entirely sure how it will work with Grafana if there are for example identical chain_ids on two different chains. But it can probably work somehow.

Copy link
Collaborator

@shekohex shekohex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! left some comments.

let total_gas_spent = register_counter!(opts!(
"resource_total_gas_spent",
"Total number of gas spent on resource",
labels!("resource_id" => &resource_name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that idea of using labels here for resource_ids!
what if we could even go further by making it more readable by having the following:

  1. the API stays the same, it operates over the resource Ids, however, we could have many labels like
    chain_type (evm, substrate, ..etc) and chain_id which is another label that represent the chainID.
    target_system_type (contract, tree_id, ..etc) and target_system_value (0xAe91...e28D, 4, ..etc)

What do you think? we can now filter by labels like evm and chain id 5 with contract and its value 0xAe91...e28D

crates/relayer-utils/src/metric.rs Outdated Show resolved Hide resolved
)))?;
metrics
.account_balance_entry(typed_chain_id)
.set(balance.data.free as f64);
Copy link
Contributor Author

@Nutomic Nutomic Apr 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I mentioned before that the error handling in vanchor files is really awkward. We should change this to return a real error type to avoid map_err() to String everywhere.

@Nutomic Nutomic force-pushed the felix/rework-resource-metrics branch from f46d9b4 to 0efa051 Compare April 24, 2023 14:13

fn wei_to_gwei(wei: u128) -> f64 {
(wei / (10 ^ 9)) as f64
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didnt find any builtin function in subxt for this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use a similar approach with the one we did here #471, I guess it is okay to use ethers utils here.

@Nutomic
Copy link
Contributor Author

Nutomic commented Apr 24, 2023

This is finished. Tests keep timing out in CI but they are passing locally.

@Nutomic Nutomic requested a review from shekohex April 24, 2023 14:29
@drewstone drewstone changed the title Rework resource metrics Rework resource metrics (#472 Apr 24, 2023
@drewstone drewstone merged commit e768ad6 into develop Apr 24, 2023
@drewstone drewstone deleted the felix/rework-resource-metrics branch April 24, 2023 22:20
shekohex added a commit that referenced this pull request Jun 8, 2023
* Fast sync and merkle root validation (#354)

Co-authored-by: shekohex <dev+github@shadykhalifa.me>

* Update the CI to run on develop too (#362)

* Get the Eth2 Substrate hooked in to the relayer (#335)

Co-authored-by: Drew Stone <drewstone329@gmail.com>

* relayer transfer test (#365)

* relayer transfer test

* fix test

* use relayer for transfering assets

* add comments

* Update Relayer configration docs and Examples (#369)

Co-authored-by: Dustin Brickwood <dustinbrickwood204@gmail.com>
Co-authored-by: salman01zp <pathansalman555@gmail.com>

* Substrate governor update  (#371)

* send nonce+dkg_key

* governor update test

* update packages and tests

* downgrade nightly

* fix tests

* use typed chainId as Bridge key

* skip transfer test for now

* yarn format

---------

Co-authored-by: shekohex <dev+github@shadykhalifa.me>

* Bridge Registry pallet integration (#355)

Co-authored-by: shekohex <dev+github@shadykhalifa.me>
Co-authored-by: drewstone <drewstone329@gmail.com>
Co-authored-by: Salman Pathan <pathansalman555@gmail.com>

* fix dkd node connection (#378)

* Add etherscan api configuration (#372)

* Change how DKG client for bridge registry is initialized (#379)

* Update webb-rs (#400)

* Refactor service.rs into smaller modules (#405)

* Refactor service.rs into smaller modules

* make substrate related crates feature flagged

* Update the imports

* Split the services by the system

* Update Cargo.lock

* Add `assets` to the Relayer config (#407)

* delete unused file

* Add assets section to the config

* Update configration in tests

* Update example config with docs

* Make use of assets in the config when the token is not found on price oracle

* Update the return of the status code for errors

* Update docs

* Hardcode the secrit phrase

* Rename Fields with serde while serialization to be `camelCase` (#409)

* Rename Fields with serde while serialization to be `camelCase`

* more fields to change

* Update `chain-id` to `chainId`

* Update `chain-id` and `tree-id`

* rename `resource-id`

* create defaults module

* better using of serde `rename_all`feature

* be more consistent with the signing backend.

* Update docs and simple config files to reflect the new changes

* fix the code with the new changes

* [feat] Introduce Price Oracle Backends (#411)

Co-authored-by: drewstone <drewstone329@gmail.com>

* Add more clippy checks to avoid potential runtime crashes (#408)

* Get rid of unwrap usage

* Add more clippy checks

* add ReadSubstrateStorageError

* fixes

* review changes

* feat: Improve the `GasOracle` usage (#414)

* Handle webb tokens (#415)

* support webbAlpha

* use VAnchor contract

* handle webbtTNT-standalone

* trigger actions

* Add webbStandalone

* [feat] Add Webb Chains Info crate (#416)

* delete unused mod.rs file

* Add Webb Chains Info crate

* make use of the new package

* fix the chain id for mumbai

* abstract away coingecko specific ids

this hides the complexity of coingecko's coins ids away from
the original trait, now we can map them directly internally in the
backend implementation.

* make clippy happy

* support more chains

* update wrapped token names

* [chore] Create CODEOWNERS (#446)

Adds [Codeowners](https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-code-owners) file

* Fix asset transfer test (#445)

* fix asset transfer test

* trigger actions

* Use bridge registry for substrate chains (#447)

* use bridge registry for substrate

* cargo.lock file

* Attempt to speedup CI steps (ref #449) (#454)

* Bridge registry integration tests (#457)

* birdge registry test

* revert test timeout

* update test

* update doc

* Prometheus/Grafana setup (#413)

* Basic Prometheus/Grafana setup

* fix metrics endpoint

* Add full setup instructions

* docker volume, improved instructions

* Create basic Grafana dashboard with all available metrics (#463)

* Use provisioning to setup Grafana dashboard and alerts (#466)

* Add Webb Orbit Chains to the Supported Chains (#469)

* fix: `u64` overflow when updating relayer metrics (#471)

* Use transaction queue for DKG/Substrate (fixes #450) (#461)

* Use transaction queue for DKG/Substrate (fixes #450)

* dont need chain id param

* address review comments

* add proof-generation crate for masp integration (#467)

* add proof-generation crate for the masp

* add negative test case for batch-tree updating. It is (incorrectly) able to generate a proof with invalid inputs, but proof doesnt verify

* removing prints

* clippy

* clippy

* add cargo.lock

* updade proof-delegation to receive new transaction parameters

* ignore test during CI (since it doesnt download fixtures)

* update dvc file for solidity-fixtures

* up protocol-solidity version

* update lockfile

* revert vanchor fixture

* Update crates/circom-proving package name

Co-authored-by: shekohex <dev+github@shadykhalifa.me>

* change circom-proving to webb-circom-proving

* fmt

* clippy

* re fmt

* add cargo.lock

* downgrade protocol-solidity verifier contracts

* add lockfile...

---------

Co-authored-by: drewstone <drewstone329@gmail.com>
Co-authored-by: shekohex <dev+github@shadykhalifa.me>

* Rework resource metrics (#472

* Rework resource metrics (ref #387)

* Use Display trait for ResourceId

* split resource metric labels into separate values

* rework chain balance metrics

* fix clippy

* update dashboard

* fix overflow

* consistently use gwei for certain metrics fields

* Remove dynamic tx from relayer (#477)

* fix: DKG Signing Backend

* remove dynamic txs

* remove dynamic payload type

---------

Co-authored-by: Shady Khalifa <dev+github@shadykhalifa.me>

* Grafana alerts provisioning (#479)

Co-authored-by: Salman Pathan <pathansalman555@gmail.com>

* Run CI integration tests in parallel (#458)

* Run CI integration tests in parallel

* remove cargo build step

* fix test filter

* skip mixer tests

* disable `fail-fast`

---------

Co-authored-by: Salman Pathan <pathansalman555@gmail.com>
Co-authored-by: drewstone <drewstone329@gmail.com>
Co-authored-by: shekohex <dev+github@shadykhalifa.me>

* Try using shared key for rust cache to speedup CI (#483)

* Try using shared key for rust cache to speedup CI

* try cache linux check

* cache dvc, linux unit tests

* fix dvc cache path

* dont cache cargo cross builds

* Changes to CI conditions, push new commits on develop to `edge` tag (#480)

* Dont run checks on main/dev branch

* Push edge image from develop branch

* Fix name of docker edge tag (#487)

* fix and detect Leaf Cache flaky tests on EVM (#485)

* use hex values while printing the leaves in the logs

* update the types for the leaf cache response

* compare strings instead of bytes to reproduce the bug

* optmize the CI even more

* make the test retries 3 times just in case

* make the test predectable

* improve the lints and use the cache

* optmize the ci

* optmize the ci and update names

* fixing group names

* make group unique

* run integration tests on gnu linux instead of musl

* add cache to linux unit testings

* Update relayer to node v18 (#465)Co-authored-by: shekohex <dev+github@shadykhalifa.me>

* update to node v18

* update nvmrc

* update dkg-types

* update webb.js

* fix yarn.lock and update solidity packages

* update solidity packages

* update governor type change

---------

Co-authored-by: shekohex <dev+github@shadykhalifa.me>

* Merge relayer and grafana docker, add setup documentation (#486)

* Merge relayer and grafana docker, add setup documentation

* mention cronjob, remove old readme

---------

Co-authored-by: shekohex <dev+github@shadykhalifa.me>
Co-authored-by: drewstone <drewstone329@gmail.com>

* Use tangle runtime metadata (#462)

* use tangle runtime

* update configs

* localTangle

* update tests

* use local chainspec for tangle

* update tangle node and pallet idx

* tangle docker image for integration tests

* Merge remote-tracking branch 'origin/develop' into salman/tangle-runtime-metadata

* use tangle-substrate-types

* update docker image

* update images

* update tests

* update test

* remove .only

* fix evm-substrate cross chain test

---------

Co-authored-by: shekohex <dev+github@shadykhalifa.me>
Co-authored-by: drewstone <drewstone329@gmail.com>

* Setup Development env using Nix and Flakes (#492)

* fix: Add RetryClient to the EVM Providers

* bugfix: Custom Retry Policy for EVM providers (#497)

* bugfix: building MerkleTree history (#500)

* Update The logic for Merkle Tree building

* add dvc

* Update Example to use WebbAlpha contract address

* Use of the `chain_id` when possible instead of refetching it

* Make the default `max_blocks_per_step` is 500 blocks/step

* Correctly build the merkle tree in the leaves handler

* Remove the unused imports

* Cache the EVM Providers instead of creating ones every single time

* Change the way we return the cached leaves

* Update the error handling

* Update the leaves endpoint

* Make use of the correct ChainId type, instead of strings (for EVM)

* Do not fetch the chainId nor the latest block number when not needed

* Update the service file with the new changes

* Optmize the way we fetch events. (see details)

The way the old way was implemented is that at each iteration we used to
fetch the last block number, calculate the destination block number and
then fetch the events. And as long as we did not reach the latest block
(which is the case most of the time) we would repeat the process without
a cooldown. This was a waste of resources.

The new implementation is that we fetch the last block number and never
update it, untill we fully catch up with the latest block. This way we
can fetch the events in a fewer calls, without even cooling down, until
we reach the latest block. That's when we know we are fully synced and
until then we only update the latest block.

From now on, fetching the latest block should not be in a hot loop.
as we only fetch it after `polling interval` milliseconds.

* Use gmp if we are running tangle/dkg locally

* Formmating

* Update the tests with the new changes

* Make use of RustCache in CI

* Run `cargo fmt`

* Add a way to save the target block number

* Make use of the `target_block_number` feature

* Runs `cargo fmt`

* Ignore the .env file in the root of the repo

* clear the logs when not needed

* Fees for substrate (#375)

Co-authored-by: shekohex <dev+github@shadykhalifa.me>
Co-authored-by: Salman Pathan <pathansalman555@gmail.com>
Co-authored-by: drewstone <drewstone329@gmail.com>

* add avalanche fuji testnet to supported chains (#505)

* Fix and make DKG tests pass CI (#511)

* dkg test on CI

* Update CI: build relayer before executing test

* update and fix dkg tests

* use docker host

* add signature bridge test

* use docker host

* update signature bridge test

* update metadata and fix test

* remove test timeout

* Fix partial writes (#517)

* Update relayer endpoints (#519)

* update relayer leaf caching and info endpoints

* update relayer leaf caching and info endpoint

* update doc

* Trigger actions

* update info response and doc

* update dev branch version

* update build info

* feat: Proposals Queue (#515)

* flake.lock: Update

Flake lock file updates:

• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/44f30edf5661d86fb3a95841c35127f3d0ea8b0f' (2023-05-02)
  → 'github:NixOS/nixpkgs/d4825e5e4ac1de7d5bb99381534fd0af3875a26d' (2023-05-16)
• Updated input 'rust-overlay':
    'github:oxalica/rust-overlay/d59c3fa0cba8336e115b376c2d9e91053aa59e56' (2023-05-03)
  → 'github:oxalica/rust-overlay/65c3f2655f52a81e1b3e629d4c07df4873d0f2bb' (2023-05-16)

* Base case testing

* Remove `ProposalsStore`

* Add more logging

* Add ProposalsQueue

* Move crate the workspace

* Added ProposalQueue with Polices

* Move `parking_lot` to the workspace

* Remove the Queue type from the PSB

* Simplify the Proposal Queue

* Working Nonce and Time delay Polices

* Working on fixing the current impl

* Working on fixing the current Nonce Policy Impl

* Ups

* updates that makes it working

* cargo fmt

* fix clippy

* Update CODEOWNERS

* remove unused test files

* feat(tests): add `SmartAnchorUpdatesConfig` to `LocalChain` configuration options

SUMMARY:
* Added a new configuration option smartAnchorUpdates to the ExportedConfigOptions type in localTestnet.ts file.
* Added the SmartAnchorUpdatesConfig interface in webbRelayer.ts file.

* add simulation

* fix failing test

* Trigger CI

* fix tests

---------

Co-authored-by: Salman Pathan <pathansalman555@gmail.com>

* Handle substrate node disconnection (#522)

* create new client if connection drops

* remove logs

* remove unused dependency

* clippy fix

* remove unused deps

* Fix error handling (#523)

* fix: Move tx signing logic to the substrate transaction queue (#525)

* Add logging to the create proposal handler

* Make substrate chain id generic

* Move the signing logic to the tx queue

* Update the logging format

* Add Typed Erased Static Transaction Payload for Substrate

* cargo fmt

* Use `TypeErasedStaticTxPayload` for Signature Bridge Watcher

* fix substrate failing tests

* Add Timelag middleware and code cleanup (#527)

* Update Orbit Network chainIds (#532)

* Bundle Multiple EVM Providers (#531)

* bundle multiple providers in retry client

* update doc

* undo quorum changes

* multi provider client

* use multi provider client

* error handling and update readme doc

* update docs in config files and incorporate review feedback

* Prepare to Release stable version 0.5.0 (#533)

* Release 0.5.0

---------

Co-authored-by: Salman Pathan <pathansalman555@gmail.com>
Co-authored-by: Thomas Braun <38082993+tbraun96@users.noreply.github.com>
Co-authored-by: Drew Stone <drewstone329@gmail.com>
Co-authored-by: Dustin Brickwood <dustinbrickwood204@gmail.com>
Co-authored-by: Nutomic <me@nutomic.com>
Co-authored-by: Semar Augusto <semaraugusto@dcc.ufmg.br>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants