-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Catch panics in Non-tx related code #1305
Comments
How are we planning to distinguish between what's panic worthy and what may be skipped before we get to the sentinel prefix? |
What's the recovery mechanism for failed hook executions? I.e. what is the procedure to investigate, understand and potentially fix them? Or will they just fail silently? |
Since the majority of panics are unanticipated, I propose that the way this would work is a given module would decide that I want any panic within this module to be state machine halting. (And the default be for module panics to not cause state machine halts) If thats deemed as fine, then a given module can wrap its panics if it wants to make them state machine halting, e.g. https://play.golang.com/p/ZUFo0KqevWJ We'd expose a function like that in osmoutils that adds the sentinel prefix to a panic message
A failed hook execution will currently log as an error in a node. Hopefully theres a process for errors observed in full nodes in production to surface to github issues. Though this may not be the case due to noise from p2p logs. (Has there been any work Tendemrint side to combine these logs, or separate them per reactor into different files?) |
At the moment, any panic in BeginBlock, or EndBlock will halt the chain. This by proxy means that any panic in epoch logic will halt the chain. Only panics inside of transactions get caught.
The causes of errors that will lead to panics in BeginBlock/EndBlock should not all be assumed to be chain halt worthy. In fact a lot may stem from edge cases as more automated chain logic is added, that should just be reported but not halt the chain.
There are some cases where the intent may explicitly be for a panic to halt the chain. In the future, I think there should be a sentinel prefix, which if a panic message starts with leads to chain halt.
I propose that we:
One thing we need to be cognizant of, is that this doesn't slow down epoch processing due to: cosmos/cosmos-sdk#10310 . It is unclear to what extent that may be a concern here, after Roman's IAVL improvements, and should be empirically measured.
The text was updated successfully, but these errors were encountered: