-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Never panic during execution proof check #3504
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grammar nits. Also, a proper enum should be used instead of a Boolean flag.
Co-Authored-By: DemiMarie-parity <48690212+DemiMarie-parity@users.noreply.github.com>
Co-Authored-By: DemiMarie-parity <48690212+DemiMarie-parity@users.noreply.github.com>
Co-Authored-By: DemiMarie-parity <48690212+DemiMarie-parity@users.noreply.github.com>
…b.com/paritytech/substrate into do_not_panic_when_checking_exec_proof
/// Unwind when panic occurs. | ||
Unwind, | ||
/// Always unwind even if someone changes strategy to Abort afterwards. | ||
NeverAbort, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh no, this is worrysome because if I see force_abort
it doesn't mean that it will abort. The name force_abort
doesn't add clarity as well...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - I agree :) I could rename to unwind()
, abort()
and force_unwind()
if you think it is better. Or you could suggest some other naming
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes looks good broadly, however...
TBH, I wasn't really happy about this panic_abort
situtation in externalities which seemed to be clunky, error prone and hard to reason about. But this PR brings it to a new level.
Now we admit that panics can happen in not runtime code and in some situations they are OK even though they are forced to abort the process. To add on that this looks esp. weird in the light of our proof-or-remove policy for the panics.
I agree with your point that this case is more of a edge-case and the runtime should not care about it (after all what kind of handling we can do there anyway? The only reasonable way I see is to panic). I am just not sure about the solution.
And yeah, I don't have a better idea either...
Co-Authored-By: DemiMarie-parity <48690212+DemiMarie-parity@users.noreply.github.com>
Co-Authored-By: DemiMarie-parity <48690212+DemiMarie-parity@users.noreply.github.com>
I'm not super-happy with this solution too, but also was unable to find better solution for 2 days. Tried many options (could share some on request), but this one looked 'mostly OK' to me. And let me be a devil's advocate here - expects in |
To be clear, I am not proposing removing panics there and I also share the opinion that aborting the process is better than swallowing/coercing these panics to runtime execution errors. Ok, I think a better solution might be to centralize panic handling. I.e. instead of directly specifying code scopes where either to abort, to unwind or to forceful abort, we could instead do something like this:
We track the source of a panic to know where a specific panic came from: runtime or externalities. All panics unwind regardless where they came from. We catch them before entering to the runtime code. We handle these panics depending on the mode (trusted/not trusted) and source. E.g. we would return a runtime error if it came from the runtime, if panic came from externalities, we would return a special backend error in case we are in untrusted backend mode or straight abort if the backend is trusted. |
I'm not sure I got the idea - your suggestion (at least how I understood it) is to match storage.get(hash) {
Ok(value) => Ok(value),
Err(error) => {
i_m_going_to_panic_inside_externalities();
panic!("Externalities not allowed to fail within runtime");
}
} , or
If that's what you suggesting, I do not see - why it is better than scoped approach :/ |
Yeah, basically you can think of it as "throwing an exception" and yes, you are right, there is no way to directly associate a custom payload with a panic. However, we already doing precisely that with panic-handler. So code-wise my idea is something like this. Externalities could look almost the same from outside. fn original_storage(&self, key: &[u8]) -> Option<Vec<u8>> {
let _guard = ExtGuard::ext_guard();
self.backend.storage(key).expect(EXT_NOT_ALLOWED_TO_FAIL)
} However, they use an The consequence of this change is that all panics both from the (native) runtime and from externalities always unwind. Then, somewhere where we call into a runtime, we put a single centralized panic handler: match catch_unwind(|| call_into_runtime(..)) {
Ok(result) => Ok(result),
Err(_) if untrusted_backend && ExtGuard::is_in_ext() => Err(InvalidProof),
err @ Err(_) => resume_unwind(err),
} Advantage of this approach is that you'd have a single place where you decide what to do in case of panic, instead of overriding logic in specific places. |
OK, I still do not think it is any better than proposed implementation. That's why:
fn fn_from_ext() {
we_are_in_ext();
// do what required
we_are_not_in_ext();
} in every method of |
I see really good points there! However, it seems to me that they mostly address the particular code snippet. Of course, I am not even sure if it compiles and there might be other issues: I haven't delved into the code to check everything. IMO they should be solvable (e.g. pt.1 is solved by using a closure) if we put some thought into it (OTOH maybe some local hacks would be required.). It seems to be that the main issue is with
I'd argue that this is perflectly fine and reflects the real state of things. Let's remind ourselves that the runtime, or rather it's native version, is really special. The runtime is written with an assumption that is has some sort of ambient state around it. In wasm runtime we have wasm_executor. For the native API we have this: note all of this
Right, although it doesn't seem to me as an advantage: it allows code that overrides overriden modes which would be perfectly valid and there is no way to check that. OTOH, the explicitly scoped approach would allow you to actually check if the closure was called outside the runtime call scope to catch this. If you bought on the advantages of the approach we can discuss the details : ) |
I think the real problem here is that we are using panics for exception handling, which they are’t designed for. That brings in all of exception handling’s problems. A better idea, in my opinion, is to wrap a |
@pepyakin Nope - my comment is not about the snippet, it is mostly about number of hacks we'll need to introduce to 'centralize' panic handling. IMO instead this makes panic handling spread all over the codebase (by introducing additional strange global vars, by preserving panic' artifacts for processing elsewhere) + introducing additional error-prone assumptions (like marking everything That said, I don't think that 'centralized' panic handling worth implementing that structure you've suggested + I'm not seeing how this makes panic handling actually centralized. I do not want to continue arguing, though - I think we both have had enough arguments here. I can fix the issue once the 'centralized handling' is ready, but I'm not going to sign it off, or be a co-author. Feel free to close this PR once there's issue/other PR that suggests/introduces that new panic handling mechanism. |
While I still think that the centralized approach is worth to explore, I am not 100% sure that it will be fruitful. Failing to convience you in advantages of this didn't give me more confidence. Since I am completely agree that we spent on this enough time, I wouldn't block merging this PR and not going to work towards exploring my idea unless someone else comes in and either supports it or has a better idea. (I also think that this hasn't had enough attention from the rest of the team, but oh well). Anyway, the code looks good in that it implements the stated approach properly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally ok, couple of nits.
Co-Authored-By: Gavin Wood <gavin@parity.io>
Co-Authored-By: Gavin Wood <gavin@parity.io>
…b.com/paritytech/substrate into do_not_panic_when_checking_exec_proof
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
* do not panic during execution proof check * Update core/state-machine/src/lib.rs Co-Authored-By: DemiMarie-parity <48690212+DemiMarie-parity@users.noreply.github.com> * Update core/state-machine/src/lib.rs Co-Authored-By: DemiMarie-parity <48690212+DemiMarie-parity@users.noreply.github.com> * Update core/state-machine/src/lib.rs Co-Authored-By: DemiMarie-parity <48690212+DemiMarie-parity@users.noreply.github.com> * BackendTrustLevel enum * up runtime version * Update core/state-machine/src/lib.rs Co-Authored-By: DemiMarie-parity <48690212+DemiMarie-parity@users.noreply.github.com> * Update core/state-machine/src/lib.rs Co-Authored-By: DemiMarie-parity <48690212+DemiMarie-parity@users.noreply.github.com> * fixed some grumbles * Update core/state-machine/src/testing.rs Co-Authored-By: Gavin Wood <gavin@parity.io> * Update core/state-machine/src/lib.rs Co-Authored-By: Gavin Wood <gavin@parity.io> * mov where * spaces -> tabs (to restart build)
* do not panic during execution proof check * Update core/state-machine/src/lib.rs Co-Authored-By: DemiMarie-parity <48690212+DemiMarie-parity@users.noreply.github.com> * Update core/state-machine/src/lib.rs Co-Authored-By: DemiMarie-parity <48690212+DemiMarie-parity@users.noreply.github.com> * Update core/state-machine/src/lib.rs Co-Authored-By: DemiMarie-parity <48690212+DemiMarie-parity@users.noreply.github.com> * BackendTrustLevel enum * up runtime version * Update core/state-machine/src/lib.rs Co-Authored-By: DemiMarie-parity <48690212+DemiMarie-parity@users.noreply.github.com> * Update core/state-machine/src/lib.rs Co-Authored-By: DemiMarie-parity <48690212+DemiMarie-parity@users.noreply.github.com> * fixed some grumbles * Update core/state-machine/src/testing.rs Co-Authored-By: Gavin Wood <gavin@parity.io> * Update core/state-machine/src/lib.rs Co-Authored-By: Gavin Wood <gavin@parity.io> * mov where * spaces -> tabs (to restart build)
* do not panic during execution proof check * Update core/state-machine/src/lib.rs Co-Authored-By: DemiMarie-parity <48690212+DemiMarie-parity@users.noreply.github.com> * Update core/state-machine/src/lib.rs Co-Authored-By: DemiMarie-parity <48690212+DemiMarie-parity@users.noreply.github.com> * Update core/state-machine/src/lib.rs Co-Authored-By: DemiMarie-parity <48690212+DemiMarie-parity@users.noreply.github.com> * BackendTrustLevel enum * up runtime version * Update core/state-machine/src/lib.rs Co-Authored-By: DemiMarie-parity <48690212+DemiMarie-parity@users.noreply.github.com> * Update core/state-machine/src/lib.rs Co-Authored-By: DemiMarie-parity <48690212+DemiMarie-parity@users.noreply.github.com> * fixed some grumbles * Update core/state-machine/src/testing.rs Co-Authored-By: Gavin Wood <gavin@parity.io> * Update core/state-machine/src/lib.rs Co-Authored-By: Gavin Wood <gavin@parity.io> * mov where * spaces -> tabs (to restart build)
closes #3470
Description of problem: remote execution proof only contains parts of state (trie) that were touched during execution on remote node. Like (simplifying) if full state is
{ K1=V1, K2=V2, K3=V3 }
, then remote proof could be{ K2=V2 }
. And if, when executed locally, code will try to accessK3
, it'll probably panic, because as we all knowExternalities not allowed to fail within runtime
. So this assumption isn't valid for this case.When I was thinking about possible solution, I assumed that:
Externalities
methods returnResult
- this'll lead either to changing the whole runtime(s) to return results everywhere, or infinite number of expects in runtime code, or the same number of expects in some glue code (WasmExecutor
, runtime support, ...);Externalities
(that's what happens now).So the solution I suggest is to:
NeverAbort
), which will ignore all further changes toAbort
until we drop the guard;Externalities
there's nowStorageExternalities
. The main difference is that all methods of all implementations ofStorageExternalities
should only modify its internal state (likeOverlayChanges
), which must be discarded if execution fails;Externalities
:with_ext
- will work as before.with_storage_ext
- will provide access only toStorageExternalities
part AND will capture all panics, turning them intoError::Runtime
;StorageExternalities
when wasm code is executing, it will be captured and transformed into execution error, allowing us to handle this properly. As before, if 'trusted state backend' is used (it is, in all cases, but the remote execution proof check), all externalities panics will lead to process abort.