Incorrect handling of the metrics when a packet is relayed by another relayer #2467
Labels
A: bug
Admin: something isn't working
A: help-wanted
Admin: extra attention is needed, good for seniors
I: telemetry
Internal: related to Telemetry & metrics
Summary of Bug
Periodically, the packet worker pops operational data from the queues and checks whether they were already submitted by another relayer (see this function). From the code, it looks like the metrics are not updated in this scenario.
Below, I try to create a scenario where a second relayer instance relays the packet and the metrics of the first relayer become inconsistent.
Version
hermes 1.0.0-rc.0+6c23af53
Steps to Reproduce
You should run this on branch ali/fix_oldest_sequence (or master after merge) because the process used to generate pending packets has a high probability of failure and, if it fails, will generate a timeout. This branch handles correctly the
oldest_*
metrics when a timeout occurs. If you are not using this branch, a timeout will freeze the metrics.hermes start
hermes tx ft-transfer --dst-chain ibc-0 --src-chain ibc-1 --src-port transfer --src-channel channel-0 --amount 1000 --timeout-seconds 10
should work most of the times.hermes query packet pending --chain ibc-1 --port transfer --channel channel-0
You should see an output similar to this :
oldest_*
are frozen at a non zero value. If it's not the case then no packet is pending or the code is not running on the correct branch.You should see non zero metrics for these two fields :
hermes clear packets --chain ibc-1 --port transfer --channel channel-0
hermes query packet pending --chain ibc-1 --port transfer --channel channel-0
.You should see this output :
Acceptance Criteria
Metrics are correctly updated when relayed by another relayer.
For Admin Use
The text was updated successfully, but these errors were encountered: