Never panic during execution proof check #3504

svyatonik · 2019-08-28T15:02:15Z

Description of problem: remote execution proof only contains parts of state (trie) that were touched during execution on remote node. Like (simplifying) if full state is { K1=V1, K2=V2, K3=V3 }, then remote proof could be { K2=V2 }. And if, when executed locally, code will try to access K3, it'll probably panic, because as we all know Externalities not allowed to fail within runtime. So this assumption isn't valid for this case.

When I was thinking about possible solution, I assumed that:

we do not want to make Externalities methods return Result - this'll lead either to changing the whole runtime(s) to return results everywhere, or infinite number of expects in runtime code, or the same number of expects in some glue code (WasmExecutor, runtime support, ...);
we only want to capture unwindings of panics from within Externalities (that's what happens now).

So the solution I suggest is to:

during execution proof check, mark state backend as 'untrusted';
if state backend is untrusted, then we're turning on special panic handler mode (NeverAbort), which will ignore all further changes to Abort until we drop the guard;
in addition to Externalities there's now StorageExternalities. The main difference is that all methods of all implementations of StorageExternalities should only modify its internal state (like OverlayChanges), which must be discarded if execution fails;
in wasm executor there are now two methods that provide access to Externalities: with_ext - will work as before. with_storage_ext - will provide access only to StorageExternalities part AND will capture all panics, turning them into Error::Runtime;
so, when working with 'untrusted state backend' if panic occurs within StorageExternalities when wasm code is executing, it will be captured and transformed into execution error, allowing us to handle this properly. As before, if 'trusted state backend' is used (it is, in all cases, but the remote execution proof check), all externalities panics will lead to process abort.

Demi-Marie

Grammar nits. Also, a proper enum should be used instead of a Boolean flag.

core/state-machine/src/lib.rs

Co-Authored-By: DemiMarie-parity <48690212+DemiMarie-parity@users.noreply.github.com>

…b.com/paritytech/substrate into do_not_panic_when_checking_exec_proof

pepyakin · 2019-08-29T15:25:15Z

core/panic-handler/src/lib.rs

+	/// Unwind when panic occurs.
+	Unwind,
+	/// Always unwind even if someone changes strategy to Abort afterwards.
+	NeverAbort,


Oh no, this is worrysome because if I see force_abort it doesn't mean that it will abort. The name force_abort doesn't add clarity as well...

Yes - I agree :) I could rename to unwind(), abort() and force_unwind() if you think it is better. Or you could suggest some other naming

pepyakin

The changes looks good broadly, however...

TBH, I wasn't really happy about this panic_abort situtation in externalities which seemed to be clunky, error prone and hard to reason about. But this PR brings it to a new level.

Now we admit that panics can happen in not runtime code and in some situations they are OK even though they are forced to abort the process. To add on that this looks esp. weird in the light of our proof-or-remove policy for the panics.

I agree with your point that this case is more of a edge-case and the runtime should not care about it (after all what kind of handling we can do there anyway? The only reasonable way I see is to panic). I am just not sure about the solution.

And yeah, I don't have a better idea either...

core/executor/src/wasm_executor.rs

core/state-machine/src/lib.rs

Co-Authored-By: DemiMarie-parity <48690212+DemiMarie-parity@users.noreply.github.com>

svyatonik · 2019-08-30T05:05:34Z

I'm not super-happy with this solution too, but also was unable to find better solution for 2 days. Tried many options (could share some on request), but this one looked 'mostly OK' to me.

And let me be a devil's advocate here - expects in Ext implementation do not fit proof-or-remove policy, for sure. And I would prefer them not to be there. Nevertheless, there were so many issues that we have found + not ignored simply because process has aborted + fixed because they're panics.

pepyakin · 2019-09-02T11:06:14Z

To be clear, I am not proposing removing panics there and I also share the opinion that aborting the process is better than swallowing/coercing these panics to runtime execution errors.

Ok, I think a better solution might be to centralize panic handling.

I.e. instead of directly specifying code scopes where either to abort, to unwind or to forceful abort, we could instead do something like this:

           ┌──────────────────────────────┐   
           │  > Panics are caught here <  │   
           │Handling depends on source and│   
           │          exec mode           │   
        ┌──┴──────────────────────────────┴──┐
        │                                    │
        │                                    │
        │  ▲                                 │
           │                                  
           │                                  
           │    ┌─────────────────┐           
           │    │                 │           
           │    │                 │           
  Panics   │    │     Runtime     │           
 bubble up │    │                 │           
 in stack  │    │                 │           
           │    │                 │           
           │    └─────────────────┘           
           │                                  
           │    ━━━━━━━━━━━━━━━━━━━           
           │                                  
                   Externalities

We track the source of a panic to know where a specific panic came from: runtime or externalities. All panics unwind regardless where they came from. We catch them before entering to the runtime code. We handle these panics depending on the mode (trusted/not trusted) and source. E.g. we would return a runtime error if it came from the runtime, if panic came from externalities, we would return a special backend error in case we are in untrusted backend mode or straight abort if the backend is trusted.

svyatonik · 2019-09-03T07:11:36Z

I'm not sure I got the idea - your suggestion (at least how I understood it) is to throw ExternalitiesError() from all ext.rs panics. But there's no way to associate some data with panic in rust, right? So we must store the 'type' of panic somewhere else. Some global var? So basically - either we will add something like

match storage.get(hash) {
  Ok(value) => Ok(value),
  Err(error) => {
     i_m_going_to_panic_inside_externalities();
     panic!("Externalities not allowed to fail within runtime");
  }
}

, or

let ext_result = catch_unwind(|| { ext.storage(hash); })
if let Err(error) = ext_result {
     i_m_going_to_panic_inside_externalities();
     resume_unwind(error);
}

If that's what you suggesting, I do not see - why it is better than scoped approach :/

pepyakin · 2019-09-03T11:20:10Z

Yeah, basically you can think of it as "throwing an exception" and yes, you are right, there is no way to directly associate a custom payload with a panic. However, we already doing precisely that with panic-handler.

So code-wise my idea is something like this.

Externalities could look almost the same from outside.

	fn original_storage(&self, key: &[u8]) -> Option<Vec<u8>> {
		let _guard = ExtGuard::ext_guard();
		self.backend.storage(key).expect(EXT_NOT_ALLOWED_TO_FAIL)
	}

However, they use an ext_guard or something. ExtGuard can also answer the is_in_ext() -> bool query. Note that it is similar to AbortGuard in that it also has a thread_local! flag. However, in contrast to AbortGuard, ExtGuard doesn't change any panic semantics - it is just a dumb flag.

The consequence of this change is that all panics both from the (native) runtime and from externalities always unwind.

Then, somewhere where we call into a runtime, we put a single centralized panic handler:

match catch_unwind(|| call_into_runtime(..)) {
  Ok(result) => Ok(result),
  Err(_) if untrusted_backend && ExtGuard::is_in_ext() => Err(InvalidProof),
  err @ Err(_) => resume_unwind(err),
}

Advantage of this approach is that you'd have a single place where you decide what to do in case of panic, instead of overriding logic in specific places.

svyatonik · 2019-09-03T14:11:18Z

OK, I still do not think it is any better than proposed implementation. That's why:

ExtGuard won't work that way, simply because ExtGuard::drop() will be called on panic, which will set is_in_ext to false again. So we'll either need additional catch_unwinds, or something like:

fn fn_from_ext() {
	we_are_in_ext();
	// do what required
	we_are_not_in_ext();
}

in every method of Ext (and all other non-test Externalities implementations that will ever be created);
2) actually, some kind of ExtGuard will only be helpful where we call catch_unwind - we may want clear that flag. And, tbh, we must clear it, because it has very strange semantics - 'last panic has happened inside Externalities'. This semantics is only defined within runtime call. Compare this to general panic handler strategy, which is always valid;
3) if we'll use resume_unwind, that would mean that the process won't abort if panic happens in non-main thread. I think we agreed that this isn't good;
4) if we'll abort instead of resume_unwind, then we're losing location+backtrace of original panic. We can try to save this in the same global variable, but this makes the whole construction even worse. Like infamous GetLastError function from WinApi;
5) and last, but not the least, we'll have to mark closure used in catch_unwind as AssertUnwindSafe. And though the handler you're proposing should be fine with that, I still do not like the fact - objects that will be passed to that closure aren't actually safe for unwinding. That's why I've separated Externalities and StorageExternalities traits.

pepyakin · 2019-09-03T15:48:54Z

I see really good points there! However, it seems to me that they mostly address the particular code snippet. Of course, I am not even sure if it compiles and there might be other issues: I haven't delved into the code to check everything. IMO they should be solvable (e.g. pt.1 is solved by using a closure) if we put some thought into it (OTOH maybe some local hacks would be required.).

It seems to be that the main issue is with

This semantics is only defined within runtime call. Compare this to general panic handler strategy, which is always valid;

I'd argue that this is perflectly fine and reflects the real state of things. Let's remind ourselves that the runtime, or rather it's native version, is really special. The runtime is written with an assumption that is has some sort of ambient state around it. In wasm runtime we have wasm_executor. For the native API we have this: note all of this ext::with and also the with_externalities. This implies that the runtime is inherently scoped.

Compare this to general panic handler strategy, which is always valid;

Right, although it doesn't seem to me as an advantage: it allows code that overrides overriden modes which would be perfectly valid and there is no way to check that. OTOH, the explicitly scoped approach would allow you to actually check if the closure was called outside the runtime call scope to catch this.

If you bought on the advantages of the approach we can discuss the details : )

Demi-Marie · 2019-09-04T00:34:18Z

I think the real problem here is that we are using panics for exception handling, which they are’t designed for. That brings in all of exception handling’s problems.

A better idea, in my opinion, is to wrap a catch_unwind around exactly the blocks of code where we do expect to panic, and no others.

svyatonik · 2019-09-04T05:39:13Z

@pepyakin Nope - my comment is not about the snippet, it is mostly about number of hacks we'll need to introduce to 'centralize' panic handling. IMO instead this makes panic handling spread all over the codebase (by introducing additional strange global vars, by preserving panic' artifacts for processing elsewhere) + introducing additional error-prone assumptions (like marking everything AssertUnwindSafe) + by changing implementations. Instead of aborting as soon, as we see the panic, we're trying to introduce (setjmp+longjmp+static)-type exception handling just to...abort.

That said, I don't think that 'centralized' panic handling worth implementing that structure you've suggested + I'm not seeing how this makes panic handling actually centralized. I do not want to continue arguing, though - I think we both have had enough arguments here. I can fix the issue once the 'centralized handling' is ready, but I'm not going to sign it off, or be a co-author. Feel free to close this PR once there's issue/other PR that suggests/introduces that new panic handling mechanism.

pepyakin · 2019-09-04T12:33:51Z

While I still think that the centralized approach is worth to explore, I am not 100% sure that it will be fruitful. Failing to convience you in advantages of this didn't give me more confidence.

Since I am completely agree that we spent on this enough time, I wouldn't block merging this PR and not going to work towards exploring my idea unless someone else comes in and either supports it or has a better idea. (I also think that this hasn't had enough attention from the rest of the team, but oh well).

Anyway, the code looks good in that it implements the stated approach properly.

core/state-machine/src/lib.rs

core/state-machine/src/testing.rs

gavofyork

Generally ok, couple of nits.

Co-Authored-By: Gavin Wood <gavin@parity.io>

…b.com/paritytech/substrate into do_not_panic_when_checking_exec_proof

Demi-Marie

LGTM!

* do not panic during execution proof check * Update core/state-machine/src/lib.rs Co-Authored-By: DemiMarie-parity <48690212+DemiMarie-parity@users.noreply.github.com> * Update core/state-machine/src/lib.rs Co-Authored-By: DemiMarie-parity <48690212+DemiMarie-parity@users.noreply.github.com> * Update core/state-machine/src/lib.rs Co-Authored-By: DemiMarie-parity <48690212+DemiMarie-parity@users.noreply.github.com> * BackendTrustLevel enum * up runtime version * Update core/state-machine/src/lib.rs Co-Authored-By: DemiMarie-parity <48690212+DemiMarie-parity@users.noreply.github.com> * Update core/state-machine/src/lib.rs Co-Authored-By: DemiMarie-parity <48690212+DemiMarie-parity@users.noreply.github.com> * fixed some grumbles * Update core/state-machine/src/testing.rs Co-Authored-By: Gavin Wood <gavin@parity.io> * Update core/state-machine/src/lib.rs Co-Authored-By: Gavin Wood <gavin@parity.io> * mov where * spaces -> tabs (to restart build)

do not panic during execution proof check

f373a98

svyatonik added A0-please_review Pull request needs code review. M4-core labels Aug 28, 2019

svyatonik requested a review from pepyakin as a code owner August 28, 2019 15:02

Demi-Marie suggested changes Aug 29, 2019

View reviewed changes

Update core/state-machine/src/lib.rs

bfc78d8

Co-Authored-By: DemiMarie-parity <48690212+DemiMarie-parity@users.noreply.github.com>

svyatonik requested a review from kianenigma as a code owner August 29, 2019 05:12

svyatonik and others added 7 commits August 29, 2019 08:12

Update core/state-machine/src/lib.rs

9d51eee

Co-Authored-By: DemiMarie-parity <48690212+DemiMarie-parity@users.noreply.github.com>

Update core/state-machine/src/lib.rs

25051c0

Co-Authored-By: DemiMarie-parity <48690212+DemiMarie-parity@users.noreply.github.com>

BackendTrustLevel enum

95b64e6

up runtime version

a5de0cd

Merge branch 'do_not_panic_when_checking_exec_proof' of https://githu…

35e9419

…b.com/paritytech/substrate into do_not_panic_when_checking_exec_proof

Merge branch 'master' into do_not_panic_when_checking_exec_proof

90def8d

Merge branch 'master' into do_not_panic_when_checking_exec_proof

2c650f6

pepyakin reviewed Aug 29, 2019

View reviewed changes

core/executor/src/wasm_executor.rs Outdated Show resolved Hide resolved

core/state-machine/src/lib.rs Outdated Show resolved Hide resolved

core/state-machine/src/lib.rs Outdated Show resolved Hide resolved

core/state-machine/src/lib.rs Outdated Show resolved Hide resolved

Demi-Marie suggested changes Aug 29, 2019

View reviewed changes

core/state-machine/src/lib.rs Outdated Show resolved Hide resolved

core/state-machine/src/lib.rs Show resolved Hide resolved

core/state-machine/src/lib.rs Outdated Show resolved Hide resolved

svyatonik and others added 3 commits August 30, 2019 07:27

Update core/state-machine/src/lib.rs

a16359e

Co-Authored-By: DemiMarie-parity <48690212+DemiMarie-parity@users.noreply.github.com>

Update core/state-machine/src/lib.rs

36a5b97

Co-Authored-By: DemiMarie-parity <48690212+DemiMarie-parity@users.noreply.github.com>

fixed some grumbles

3d5aaa5

Demi-Marie approved these changes Sep 4, 2019

View reviewed changes

Demi-Marie self-requested a review September 4, 2019 00:34

Merge branch 'master' into do_not_panic_when_checking_exec_proof

e443e77

svyatonik requested review from andresilva and tomusdrw as code owners September 4, 2019 19:58

gavofyork reviewed Sep 7, 2019

View reviewed changes

core/state-machine/src/lib.rs Outdated Show resolved Hide resolved

gavofyork reviewed Sep 7, 2019

View reviewed changes

core/state-machine/src/testing.rs Outdated Show resolved Hide resolved

gavofyork reviewed Sep 7, 2019

View reviewed changes

core/state-machine/src/testing.rs Outdated Show resolved Hide resolved

gavofyork approved these changes Sep 7, 2019

View reviewed changes

svyatonik and others added 5 commits September 9, 2019 08:18

Update core/state-machine/src/testing.rs

36dd630

Co-Authored-By: Gavin Wood <gavin@parity.io>

Update core/state-machine/src/lib.rs

cbe574a

Co-Authored-By: Gavin Wood <gavin@parity.io>

mov where

95564c9

Merge branch 'master' into do_not_panic_when_checking_exec_proof

da8cdc1

Merge branch 'do_not_panic_when_checking_exec_proof' of https://githu…

652c03c

…b.com/paritytech/substrate into do_not_panic_when_checking_exec_proof

Demi-Marie approved these changes Sep 10, 2019

View reviewed changes

svyatonik added A3-needsresolving and removed A0-please_review Pull request needs code review. labels Sep 11, 2019

Merge branch 'master' into do_not_panic_when_checking_exec_proof

7c2c942

svyatonik added A8-looksgood and removed A3-needsresolving labels Sep 11, 2019

spaces -> tabs (to restart build)

e0c0ac7

svyatonik merged commit 12896e9 into master Sep 11, 2019

svyatonik deleted the do_not_panic_when_checking_exec_proof branch September 11, 2019 07:27

pepyakin mentioned this pull request Oct 9, 2020

Deprecation and Removal of Substrate Native Runtime Optimization #7288

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Never panic during execution proof check #3504

Never panic during execution proof check #3504

svyatonik commented Aug 28, 2019

Demi-Marie left a comment

pepyakin Aug 29, 2019

svyatonik Aug 30, 2019

pepyakin left a comment

svyatonik commented Aug 30, 2019

pepyakin commented Sep 2, 2019

svyatonik commented Sep 3, 2019

pepyakin commented Sep 3, 2019 •

edited

Loading

svyatonik commented Sep 3, 2019

pepyakin commented Sep 3, 2019 •

edited

Loading

Demi-Marie commented Sep 4, 2019

svyatonik commented Sep 4, 2019

pepyakin commented Sep 4, 2019

gavofyork left a comment

Demi-Marie left a comment

Never panic during execution proof check #3504

Never panic during execution proof check #3504

Conversation

svyatonik commented Aug 28, 2019

Demi-Marie left a comment

Choose a reason for hiding this comment

pepyakin Aug 29, 2019

Choose a reason for hiding this comment

svyatonik Aug 30, 2019

Choose a reason for hiding this comment

pepyakin left a comment

Choose a reason for hiding this comment

svyatonik commented Aug 30, 2019

pepyakin commented Sep 2, 2019

svyatonik commented Sep 3, 2019

pepyakin commented Sep 3, 2019 • edited Loading

svyatonik commented Sep 3, 2019

pepyakin commented Sep 3, 2019 • edited Loading

Demi-Marie commented Sep 4, 2019

svyatonik commented Sep 4, 2019

pepyakin commented Sep 4, 2019

gavofyork left a comment

Choose a reason for hiding this comment

Demi-Marie left a comment

Choose a reason for hiding this comment

pepyakin commented Sep 3, 2019 •

edited

Loading

pepyakin commented Sep 3, 2019 •

edited

Loading