Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix crash during initialization of event monitor when node is down #895

Merged
merged 10 commits into from
May 6, 2021

Conversation

romac
Copy link
Member

@romac romac commented May 4, 2021

Closes: #863

Description

  • Fix crash during initialization of event monitor when node is down
  • Skip spawning the supervisor for a pair of chains if the chain runtime initialization fails
  • Improve modeling of errors in event monitor
  • Add more context to worker error trace (ie. path short name)

Follow-up

  • It seems like the WebSocketClient does not notify us when the connection goes down so we cannot print a warning and try to reconnect. Will investigate and open an issue on tendermint-rs.

For contributor use:

  • Updated the Unreleased section of CHANGELOG.md with the issue.
  • If applicable: Unit tests written, added test to CI.
  • Linked to Github issue with discussion and accepted design OR link to spec that describes this work.
  • Updated relevant documentation (docs/) and code comments.
  • Re-reviewed Files changed in the Github PR explorer.

@romac
Copy link
Member Author

romac commented May 5, 2021

It seems like the WebSocketClient does not notify us when the connection goes down so we cannot print a warning and try to reconnect. Will investigate and open an issue on tendermint-rs.

@romac romac marked this pull request as ready for review May 5, 2021 09:30
@romac romac requested a review from ancazamfir as a code owner May 5, 2021 09:30
@romac romac requested a review from adizere May 5, 2021 09:30
@romac romac marked this pull request as draft May 5, 2021 14:50
@romac romac marked this pull request as ready for review May 5, 2021 15:55
Comment on lines +406 to +407
let span = error_span!("worker loop", worker = %self);
let _guard = span.enter();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

Copy link
Collaborator

@ancazamfir ancazamfir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff!!! Thanks Romain!

@ancazamfir ancazamfir merged commit 37f05c9 into master May 6, 2021
@ancazamfir ancazamfir deleted the romac/handle-ws-failure branch May 6, 2021 07:16
hu55a1n1 pushed a commit to hu55a1n1/hermes that referenced this pull request Sep 13, 2022
…nformalsystems#895)

* Introduce error type for event monitor

* Do not crash if chain runtime fails to subscribe to events

* Gracefully handle runtime init error in `start-multi` command

* Improve worker error context

* Fix chain id in error message if spawning chain runtime fails

* Print channel error when one arises

* Refactor event monitor loop a little bit

* Update changelog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Event monitor crashes when the RPC node goes away
2 participants