Specification for Transactional Storage as Default in FRAME #10806

shawntabrizi · 2022-02-07T06:31:23Z

This issue will outline how I believe we should make transactional storage production usable within FRAME, and even a default part of all FRAME extrinsics.

Steps

Add Storage Layer Limit Add Limit to Tranasctional Layers #10808
Make spawning more layers explicit, and implicitly keep only 1 layer if it already exists
Name change

Background

At a very high level, storage in the runtime has two main abstractions:

An underlying DB, which is normally written to a hard-disk.
An in-memory overlay which is used to keep track of read/written keys until the end of a block where those values are then committed to the DB.

When writing function within the runtime, making modifications to storage will affect the values in the in-memory overlay. These changes are pooled together with any other changes that happened in previous and future transactions during the block building process, and thus if storage is modified during an extrinsic, it is not possible to undo those specific changes.

This has lead to a critical "best practice" when doing runtime development: "verify first, write last", which says that you should never return an Err from an extrinsic after you have modified storage, as those changes will persist, and you may end up with an inconsistent state.

An easy example of this can be seen by executing the following:

#[test]
fn storage_modified_then_error() {
	fn set_value_then_error(v: u32) -> DispatchResult {
		Value::set(v);
		Err("never do this")?;
		Ok(())
	}

	TestExternalities::default().execute_with(|| {
		// `assert_noop` guarantees that the state root is unchanged after some function is called
		// and an error is returned.
		// This assertion will fail, because storage was already modified before the error was
		// returned.
		assert_noop!(set_value_then_error(3), "never do this");
	});
}

What is `transactional`?

Transactional is a feature that was implemented specifically to address this problem. The original PR can be found here: #6269.

Basically, the low level storage APIs now provide a way to spawn additional in-memory overlays and choose whether you want to commit those changes to the main storage overlay or not. When used properly, this can allow you to isolate changes which came about due to a specific extrinsic, and at any point, choose not to commit them.

#[test]
fn storage_modified_then_error_with_transactional() {
	#[transactional] // <-- This flag fixes our bug
	fn set_value_then_error_transactional(v: u32) -> DispatchResult {
		Value::set(v);
		Err("this is totally okay")?;
		Ok(())
	}

	TestExternalities::default().execute_with(|| {
		// This now succeeds!
		assert_noop!(set_value_then_error_transactional(3), "this is totally okay");
	});
}

Additionally, transactional functions can be nested, such that each time a new transactional layer is created, you can choose whether you want to commit those changes to the transactional layer below you. This means you can have storage modifying functions nested within other storage modifying functions, and have pretty much full control over what you do and do not want to commit to the final database.

Problems with the current system

The current transactional system does not take into account limitations of the runtime in terms of computational or memory usage, thus it is not really "safe by default" to use in the runtime. A user can nest transactional layers as much as they want, and there really is no integration of this functionality specifically for benchmarking worst case scenarios.

Computational Overhead

There is a non-zero cost to resolving a transactional layer into the overlay below it.

We don't really need to copy. All values are stored in heap and we just move pointers.
So the overhead does not depend on the size of the storage items but only on the count of items.

Assuming multiple nested layers, then that layer would need to be copied to the layer below, and so on.

The overhead is very low relative to other kinds of operations within the runtime, but still, at a high enough nesting level, is non-negligible.

Memory Overhead

From @athei, all of the transactional layers use client memory, not Wasm, so there should be no practical resource limitations here.

What we want

The end goal for FRAME is to make it as easy as possible to write Pallets which are correct by default. The chance that users can make a mistake of committing changes to storage, returning an error, and then expecting nothing to change is very high. This is especially true when we note that most Pallet developers are probably use to writing code exactly in this way from Smart Contract development on platforms like Ethereum.

As such, FRAME wants to take advantage of transactional as both usable and potentially even a default part of the Pallet development experience.

To do that, we must address some of the problems with the existing system.

Proposed Solutions

To make this feature production ready, we need to address a couple of different problems.

Hard limit to nesting transactional layers

As mentioned above, there is currently no limits in place for nesting transactional layers, however we know that there is a non-zero resource impact when doing this. Even within software development, there are stack limits which when reached, lead to stack-overflows.

I propose we introduce a conservative hard limit of 10 nested transactional layers as a default in FRAME. Potentially we could allow users to override or bypass this limit through lower level functions, but when simply using the transactional feature, this limit should be enforced.

When the limit is reached, trying to spawn a new transactional layer will return an Err, and can be handled by developer.

Default single transaction layer per extrinsic

The overhead of a single transactional layer should be negligible for nearly all runtime functions, and the benefits in terms of developer experience are huge.

I propose that all extrinsics by default spawn a single transactional storage layer.

If that extrinsic returns Err, we can expect that all modified storage items will be reverted, and the underlying state root will be unaffected. However, if that function returns Ok, we will instead commit any changes which are present in this single transactional level, just like developers would expect from Smart Contract platforms.

By default, dispatching other calls within a call would not lead to generating more transactional layers.

Dispatch with transactional layer

There are cases where users may want to dispatch another call within its own transactional layer, but within the limits defined above.

In that case, the user can call a specific dispatch_with_transactional(call) which will explicitly spawn a new transactional layer and then execute the call, allowing the user to handle the result.

Currently the #[tranasctional] tag is placed above different function definitions, but this does not really make sense if all extrinsics spawn at least one transactional layer by default. Instead, it should be the person writing the dispatch function to determine if a function they are calling should be called within an additional transactional context.

Annotation for safe without storage layer

Now that the default behavior of extrinsics will be to spawn at least one transactional layer, we can introduce an opt-in optimization where a user can state that a function is safe to be executed without its own transactional layer.

For example:

/// This function is safe to execute without an additional transactional storage layer.
#[without_transactional]
fn set_value(x: u32) -> DispatchResult {
    Self::check_value(x)?;
    MyStorage::set(x);
    Ok(())
}

When this function is called directly, a transactional layer should not spawn for it.

If a user called dispatch_with_transactional to this function, a transactional layer also does not need to spawn.

Name Change

As a final part to this specification, I propose we drop the name transactional for something more clear to what is happening here. The term transaction is already confusing within the Substrate ecosystem, especially in context with extrinsics. Also, it is not clear that the behavior here really has anything to do with transactions.

Instead I propose we call these apis storage_layers, and basically replace all uses of transnational with some appropriate use of that term.

For example:

dispatch_with_storage_layer
#[without_storage_layer]
get_storage_layer() -> u8
add_storage_layer() -> Result
etc...

The text was updated successfully, but these errors were encountered:

athei · 2022-02-07T09:51:04Z

This all makes sense to me:

Name change: Transactional is a good name when you are a database but it is too overloaded in our case. Name change is warranted. I agree.
Making transactional opt-out: I like that we revert on Err by default. But is the performance gain really worth the hassle of allowing an opt-out? This will increase review time for code that does this. We need rules when you should opt-out then. I lean towards "not worth". Or we should make clear that this really is the exception.
Hard limit: Yeah makes sense. Question is how to implement that. Maybe we can add a new fallible version of start_transaction which returns an Err when the limit is reached? This would require a client update, though.

shawntabrizi · 2022-04-04T05:15:27Z

Now that transactional layers are tracked, we should make it so that a new transactional layer is not spawned by default, but instead, only spawned if a transactional layer does not exist.

A user can instead explicitly spawn a new nested transactional layer if they want.

Something like this? https://github.com/paritytech/substrate/compare/shawntabrizi-transactional-nesting?expand=1

athei · 2022-04-04T12:16:50Z

But this will change the behavior. When annotating #[transactional] you would expect that only the changes done in this extrinsic will be rolled back. If you are part of a batch you will also roll back the changes made by the other extrinsics of the batch?

shawntabrizi · 2022-04-04T20:45:13Z

The main thing we don't want is that when dispatching a call from one extrinsic, we don't go into another layer unless the user wants that.

so call.dispatch(params) doesnt spawn a new layer, but call.dispatch_with_transactional(params) will.

This will mean that all things like batch will be safe from any limits since it will never spawn more than 1 layer.

Also, some functions may be marked as require_transactional, but does not actually want to spawn a new layer. In this case we spawn a new layer if none exist, or we just use the existing layer.

athei · 2022-04-05T07:37:36Z

Okay I misunderstood your changes. So #[transactional] will always spawn a layer but #[with_transaction] only on demand. I have a hard time imagining some code which wants to use the latter one. I can't find a single user of #[require_transaction] within substrate. Do you have an example?

xlc · 2022-04-05T07:43:49Z

I proposed & implemented require_transactional initially because we (Acala) need it. You can find some usages at Acala & ORML.

It should be used to annotate all the methods that could fail after storage write.

https://github.com/open-web3-stack/open-runtime-module-library/blob/c439a50e01944aedeef33231e0824a17ed1813bc/xtokens/src/lib.rs#L904

https://github.com/AcalaNetwork/Acala/blob/f4b80d7200c19b78d3777e8a4a87bc6893740d23/modules/nft/src/lib.rs#L387

bkchr · 2022-04-05T09:03:10Z

so call.dispatch(params) doesnt spawn a new layer, but call.dispatch_with_transactional(params) will.

Please don't let us do this. If you want a new layer, just run with with_transaction(|| call.dispatch(params)).

layer but #[with_transaction] only on demand

But then we don't need require_transactional anymore? I mean both are expressing the same, but just a little bit differently?

athei · 2022-04-05T09:07:35Z

But then we don't need require_transactional anymore? I mean both are expressing the same, but just a little bit differently?

Yes. In Shawn's proposal require_transactional was renamed to with_transaction.

athei · 2022-04-05T09:31:31Z

Please don't let us do this. If you want a new layer, just run with with_transaction(|| call.dispatch(params)).

Why? It seems like a harmless convenience function to me.

bkchr · 2022-04-05T10:41:55Z

Okay, I just checked the code and dispatch_bypass_filter is already in some kind of extension trait. If we add dispatch_with_transactional into another extension trait, I'm fine with it. I just had the fear that we add this all to the same trait, which was already not true from the beginning.

Every extrinsic now runs in transaction implicitly, and `#[transactional]` on pallet dispatchable is now meaningless Upstream-Change: paritytech/substrate#10806 Signed-off-by: Yaroslav Bolyukin <iam@lach.pw>

shawntabrizi added the J1-meta A specific issue for grouping tasks or bugs of a specific category. label Feb 7, 2022

shawntabrizi mentioned this issue Feb 7, 2022

Benchmark overhead of #[transactional] #8975

Closed

athei mentioned this issue Feb 7, 2022

Add host function that enforces transaction nesting depth #10811

Closed

shawntabrizi mentioned this issue Mar 24, 2022

Use Storage for Tracking Transactional Layers #10744

Closed

shawntabrizi mentioned this issue May 19, 2022

Storage Layer for All FRAME Extrinsics #11431

Merged

paritytech-processbot bot closed this as completed in #11431 May 26, 2022

This was referenced May 26, 2022

Add #[without_storage_layer] tag which explicitly allows an extrinsic to not have a storage layer #11533

Closed

VISION: Try..catch for substrate runtimes paritytech/polkadot-sdk#370

Open

tdelabro mentioned this issue Jun 13, 2022

feature master benefeciary kabocha-network/pallet_supersig#10

Merged

CertainLach mentioned this issue Jul 21, 2022

Upgrade to polkadot v0.9.26 UniqueNetwork/unique-chain#445

Closed

shawntabrizi mentioned this issue Sep 27, 2022

Member Request polkadot-fellows/seeding#8

Merged

hbulgarini mentioned this issue Jul 21, 2023

HB SBP Milestone review #2 Acurast/acurast-core#110

Closed

gpestana mentioned this issue Aug 7, 2023

Add simple collator election mechanism paritytech/cumulus#2960

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specification for Transactional Storage as Default in FRAME #10806

Specification for Transactional Storage as Default in FRAME #10806

shawntabrizi commented Feb 7, 2022 •

edited

Loading

athei commented Feb 7, 2022 •

edited

Loading

shawntabrizi commented Apr 4, 2022

athei commented Apr 4, 2022 •

edited

Loading

shawntabrizi commented Apr 4, 2022

athei commented Apr 5, 2022

xlc commented Apr 5, 2022 •

edited

Loading

bkchr commented Apr 5, 2022

athei commented Apr 5, 2022

athei commented Apr 5, 2022

bkchr commented Apr 5, 2022

Specification for Transactional Storage as Default in FRAME #10806

Specification for Transactional Storage as Default in FRAME #10806

Comments

shawntabrizi commented Feb 7, 2022 • edited Loading

Steps

Background

What is transactional?

Problems with the current system

Computational Overhead

Memory Overhead

What we want

Proposed Solutions

Hard limit to nesting transactional layers

Default single transaction layer per extrinsic

Dispatch with transactional layer

Annotation for safe without storage layer

Name Change

athei commented Feb 7, 2022 • edited Loading

shawntabrizi commented Apr 4, 2022

athei commented Apr 4, 2022 • edited Loading

shawntabrizi commented Apr 4, 2022

athei commented Apr 5, 2022

xlc commented Apr 5, 2022 • edited Loading

bkchr commented Apr 5, 2022

athei commented Apr 5, 2022

athei commented Apr 5, 2022

bkchr commented Apr 5, 2022

shawntabrizi commented Feb 7, 2022 •

edited

Loading

What is `transactional`?

athei commented Feb 7, 2022 •

edited

Loading

athei commented Apr 4, 2022 •

edited

Loading

xlc commented Apr 5, 2022 •

edited

Loading