Catch panics in Non-tx related code #1305

ValarDragon · 2022-04-20T19:25:52Z

At the moment, any panic in BeginBlock, or EndBlock will halt the chain. This by proxy means that any panic in epoch logic will halt the chain. Only panics inside of transactions get caught.

The causes of errors that will lead to panics in BeginBlock/EndBlock should not all be assumed to be chain halt worthy. In fact a lot may stem from edge cases as more automated chain logic is added, that should just be reported but not halt the chain.

There are some cases where the intent may explicitly be for a panic to halt the chain. In the future, I think there should be a sentinel prefix, which if a panic message starts with leads to chain halt.

I propose that we:

Catch panics in epoch hooks, and if a single epoch hook logic fails, revert its state change, and then proceed to the next hook
Catch panics in beginblock and endblock,
Make a 'signal code' that panics can use to indicate halting anyway.

One thing we need to be cognizant of, is that this doesn't slow down epoch processing due to: cosmos/cosmos-sdk#10310 . It is unclear to what extent that may be a concern here, after Roman's IAVL improvements, and should be empirically measured.

p0mvn · 2022-04-21T16:25:30Z

There are some cases where the intent may explicitly be for a panic to halt the chain. In the future, I think there should be a sentinel prefix, which if a panic message starts with leads to chain halt.

How are we planning to distinguish between what's panic worthy and what may be skipped before we get to the sentinel prefix?

alexanderbez · 2022-04-21T19:30:16Z

Catch panics in epoch hooks, and if a single epoch hook logic fails, revert its state change, and then proceed to the next hook

What's the recovery mechanism for failed hook executions? I.e. what is the procedure to investigate, understand and potentially fix them? Or will they just fail silently?

ValarDragon · 2022-04-21T19:55:47Z

How are we planning to distinguish between what's panic worthy and what may be skipped before we get to the sentinel prefix?

Since the majority of panics are unanticipated, I propose that the way this would work is a given module would decide that I want any panic within this module to be state machine halting. (And the default be for module panics to not cause state machine halts)

If thats deemed as fine, then a given module can wrap its panics if it wants to make them state machine halting, e.g. https://play.golang.com/p/ZUFo0KqevWJ

We'd expose a function like that in osmoutils that adds the sentinel prefix to a panic message

What's the recovery mechanism for failed hook executions? I.e. what is the procedure to investigate, understand and potentially fix them? Or will they just fail silently?

A failed hook execution will currently log as an error in a node. Hopefully theres a process for errors observed in full nodes in production to surface to github issues. Though this may not be the case due to noise from p2p logs. (Has there been any work Tendemrint side to combine these logs, or separate them per reactor into different files?)

daniel-farina added this to Osmosis Chain Development Apr 20, 2022

daniel-farina moved this to 🔍 Needs Review in Osmosis Chain Development Apr 20, 2022

This was referenced Apr 20, 2022

Catch panics in endblock / beginblock code, and in epoch code #1278

Closed

Epoch hooks, panic recovery #1308

Merged

ValarDragon mentioned this issue Apr 25, 2022

Return error instead of panicking when modifying a sealed app cosmos/cosmos-sdk#11600

Closed

ValarDragon mentioned this issue Aug 22, 2022

EPIC: Isolated Errors cosmos/cosmos-sdk#12985

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Catch panics in Non-tx related code #1305

Catch panics in Non-tx related code #1305

ValarDragon commented Apr 20, 2022 •

edited

Loading

p0mvn commented Apr 21, 2022

alexanderbez commented Apr 21, 2022

ValarDragon commented Apr 21, 2022

Catch panics in Non-tx related code #1305

Catch panics in Non-tx related code #1305

Comments

ValarDragon commented Apr 20, 2022 • edited Loading

p0mvn commented Apr 21, 2022

alexanderbez commented Apr 21, 2022

ValarDragon commented Apr 21, 2022

ValarDragon commented Apr 20, 2022 •

edited

Loading