Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for disabling health check #1336

Closed
5 of 7 tasks
Tracked by #1350
adizere opened this issue Sep 8, 2021 · 3 comments · Fixed by #1361
Closed
5 of 7 tasks
Tracked by #1350

Support for disabling health check #1336

adizere opened this issue Sep 8, 2021 · 3 comments · Fixed by #1361
Assignees
Labels
I: CLI Internal: related to the relayer's CLI I: configuration Internal: related to Hermes configuration O: new-feature Objective: cause to add a new feature or support O: usability Objective: cause to improve the user experience (UX) and ease using the product
Milestone

Comments

@adizere
Copy link
Member

adizere commented Sep 8, 2021

Crate

relayer-cli

Summary

A feedback from the Cephalopod ops team is that they would like a method to optionally disable the health checkup that Hermes does indiscriminately.

Problem Definition

Whenever Hermes starts up, before running any command, it runs a health checkup mechanism that involves a few rounds of RPC calls to each chain. This functionality is especially useful for users who have not much experience with using Hermes (because the health check can surface problems that would otherwise be hidden cf #697). The functionality, however, is not very useful for power-users such as relayer operators.

The health checkup involves pulling the genesis file, and some chains (e.g., hub-4) have very large genesis files (~100MB). This means that the health check is a liability slowing down the whole ops process.

Proposal

A new option is necessary to allow disabling the health checkup mechanism.
This is necessary in particular for CLIs such as:

  • tx raw packet-recv,
  • packet-ack,
  • query unreceived-acks or
  • query unreceived-packets.

Acceptance Criteria


For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate milestone (priority) applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@adizere adizere added O: new-feature Objective: cause to add a new feature or support I: CLI Internal: related to the relayer's CLI O: usability Objective: cause to improve the user experience (UX) and ease using the product I: configuration Internal: related to Hermes configuration labels Sep 8, 2021
@adizere adizere added this to the 09.2021 milestone Sep 8, 2021
@adizere adizere self-assigned this Sep 8, 2021
@adizere adizere mentioned this issue Sep 8, 2021
10 tasks
@romac
Copy link
Member

romac commented Sep 9, 2021

It is unclear which method would be best for disabling the health checkup, either a new config.toml option or a global CLI flag. This can be decided as we go along after consulting with Cephalopod.

I think such commands should just never do the health check (but we can add a command that does just that). This could be achieved by supplying an option to the chain runtime when starting it, but there is no need to expose this in the config file or the CLI options (aside perhaps for the start command).

@adizere
Copy link
Member Author

adizere commented Sep 9, 2021

It is unclear which method would be best for disabling the health checkup, either a new config.toml option or a global CLI flag. This can be decided as we go along after consulting with Cephalopod.

I think such commands should just never do the health check (but we can add a command that does just that). This could be achieved by supplying an option to the chain runtime when starting it, but there is no need to expose this in the config file or the CLI options (aside perhaps for the start command).

Perfect timing for this suggestion, Romain. Mircea and I were reading your idea and concluded it would be the best way to go forward. It seems not even the start command would require the health checkup.

Concrete notes after discussing with Cephalopod:

  • Between the option of having a global flag for all CLIs (e.g., --no-health-check or a config.toml toggle), best would be to entirely bypass the health check in all CLIs, except for a dedicated new CLI that does solely the health check, e.g., hermes health-check
  • the health check proved useful at least on occasions when the operator misconfigured the max_tx_size to exceed the genesis/max_block_bytes parameter, so the check itself is useful

@andynog
Copy link
Contributor

andynog commented Sep 10, 2021

@adizere and @romac, I will look into this one. Also for the health check for the max_block_bytes I'll try to replace the logic. Using the /genesis endpoint might not be optimal because the genesis information might be huge and might time out when retrieving the information, probably a better RPC to get that information is the /consensus_params endpoint like https://rpc-cosmos.cosmostation.io/consensus_params?height=7511042.

This endpoint might provide a better performance and these params can be changed through governance so it's better to take a value close to the latest block height available to ensure it's accurate. The genesis value might not reflect the current value for that parameter.

@romac romac mentioned this issue Sep 14, 2021
11 tasks
romac added a commit that referenced this issue Sep 19, 2021
…art` (#1361)

* Initial implementation logic for the health-check command (#1336)

* Move health check code in `Chain::health_check`

* Perform health check on supervisor start

* Improve output of `health-check` command

* Use the `/consensus_params` endpoint to get the max block size

* Update comment

* Display non-debug chain id in health check command

* Use `latest_consensus_params` endpoint

* Revert to using tendermint-rs master

* Add .changelog entry

* Remove stacktrace from health check logs

Co-authored-by: Andy Nogueira <me@andynogueira.dev>
Co-authored-by: Romain Ruetschi <romain@informal.systems>
hu55a1n1 pushed a commit to hu55a1n1/hermes that referenced this issue Sep 13, 2022
…art` (informalsystems#1361)

* Initial implementation logic for the health-check command (informalsystems#1336)

* Move health check code in `Chain::health_check`

* Perform health check on supervisor start

* Improve output of `health-check` command

* Use the `/consensus_params` endpoint to get the max block size

* Update comment

* Display non-debug chain id in health check command

* Use `latest_consensus_params` endpoint

* Revert to using tendermint-rs master

* Add .changelog entry

* Remove stacktrace from health check logs

Co-authored-by: Andy Nogueira <me@andynogueira.dev>
Co-authored-by: Romain Ruetschi <romain@informal.systems>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I: CLI Internal: related to the relayer's CLI I: configuration Internal: related to Hermes configuration O: new-feature Objective: cause to add a new feature or support O: usability Objective: cause to improve the user experience (UX) and ease using the product
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants