Refactor contract initialization and finalization #280

Robbepop · 2019-11-27T10:42:28Z

Tasks

New System

Deploy

Decode byte input and check for proper selector and argument encoding.
Construct the base contract state (via selected constructor).
Initialize contract storage via FlushAt trait.

Call

Decode byte input and check for proper selector and argument encoding.
Fetch from contract storage and reconstruct storage state struct via FetchAt trait.
Run selected contract message M.
If message M is able to mutate the contract storage flush the storage via FlushAt triat.

Traits

/// Defines the static size on the storage for the implementing type.
pub trait StorageSize {
    /// This constant is ideally computed via Rust proc. macros
    /// as a summation of the struct's field storage sizes.
    const SIZE: u64;
}

/// Flushes the state of the type to the contract storage.
pub trait FlushAt {
    fn flush_at(&self, at: Key);
}

/// Fetches the state of the type from the contract storage.
pub trait FetchAt {
    fn fetch_at(at: Key);
}

Current State

The ink_core::storage::Flush trait is defined as:

pub trait Flush {
    fn flush(&mut self) {}
}

Flushing is the process of storing back all intermediate computed and cached values back into the real contract storage. We do this to prevent loading and storing from and to the contract storage for every operation. Instead we try to cache things in the memory because it is much faster and we do not have to encode or decode it all the time.

So the process behind flushing is very important for performance and to not waste gas.

Problem & Proposal

The problem with the current flushing system and trait are:

Because fn flush(&mut self) has no at: Key argument all flushable storage entities must know their location in the storage. This prevent us from implementing a useful implementation of Flush for a variety of types such as all primitive types etc.
We propose to redesign the Flush trait in a way that types no longer necessarily have to know their exact location in storage. Types that do know their storage location shall experience no obvious downsides due to the new redesigned Flush trait.
The flush method takes self by exclusive reference (&mut self) which has been done to make the interface friendly towards cache implementations because caches generally have to mutate under the trait implementation since they are flushed and emptied. Types that do not have such a cache generally do not need to be mutated. We propose to change it to just use &self and don't go the extra path to satisfy types with caches since caches in Rust have to have interior mutability anyways.

Storage Mapping

It isn't enough to simply redesign the Flush trait. We also need to redesign how storage entities are mapped into the storage if they are no longer explicitly aware of their storage locations.

Looking at the following example:

#[ink(storage)]
struct Storage1 {
    val: storage::Value<bool>,
    vec: storage::Vec<i32>,
}

We see that currently this storage struct is layed out in the contract storage in the following way:

| val | vec::len | ... vec::elems ... |

Where

| foo | represents a single cell and | foo | bar | would represent two.
A cell that looks like | ... foo ... | represents a chunk of cells.

So we see that right now we store all those elements in their respective cell in the root.
Note that also each element in the vec has its own cell that is contiguously aligned.

But if we are always in need of all the elements anyways it makes sense to store them under the same cell. This is done by:

#[ink(storage)]
struct Storage2 {
    same_cell: storage::Value<(
        bool, // val
        storage::Vec<i32>, // vec
    )>,
}

So we end up having indirections where we have previously used the storage::Vec.
This is because internally the storage::Vec looks like this:

struct Vec<T> {
    len: storage::Value<u32>, // single cell
    elems: SyncChunk<T>,     // chunk of cells
}

Potential Solution

Another problem of the current system is that our storage abstractions are not fine grained enough. Taking the storage::Value:

It provides an indirection through having its own key
It provides a cache to cache the internal storage value

So actually the storage::Value does two things at the same time.
This means we could break it up into two different components with different jobs.

storage::Box<T>: Provides an indirection. Useful for splitting things apart into their own cells.
storage::Cached<T>: Provides a cache to cache the internal storage entity.

To emulate the old behavior one would simply use: storage::Box<storage::Cached<T>>

Note that since storage::Cached<T> doesn't have its own key we are back at our original proposal to make Flush key aware.

Why this works

The new design differentiates between two classes of storage entities.

Eager: Storage entities that do no caching and immediately load their values upon contract start.
Lazy: Storage entities that only load from the actual contract storage when needed.

Note that only the Lazy storage entities are in need of their own storage keys to be able to load from their mapped storage key when needed. The Eager storage entities only ever have to load and store from and to contract storage upon contract storage allocation and upon flushing. During the time of contract execution they solely operate on their memory mapped values.

One could say that memory::Vec is an eager vector type that eagerly loads all vector elements upon allocation and storage::Vec is a lazy vector type that loads an element only if needed. Also another distinction between them is that memory::Vec stores all elements under a single cell whereas storage::Vec provides every element its own cell.

The storage::Value type no longer exists but instead we have storage::Box which is a lazy storage value because of its additional indirection (however it itself is loaded eagerly) and storage::Cache which is lazy because it really loads the underlying value only when needed but requires a storage key as well for doing so.

Values such as plain i32 can be used as storage entities with the new design. They are Eager storage entities since they have no notion of storage keys. During contract execution the contract operates on their plain values and upon flushing the contract provides them with their correct storage locations to make them able to flush there.

Initialization

Due to the introduction of Eager storage entities we have a need to change the way a contract initializes itself since Eager types under the current initialization scheme would always result in a panic upon contract deployment. This is because they would expect values at their mapped storage locations even though the contract storage hasn't been touched at that point in time yet. So we are in need of changing the contract initialization scheme to make it possible to allow for mapped Lazy storage entities as well as unmapped Eager storage entities.

One way forward is to finally declare #[ink(constructor)] to be of signature:

#[ink(constructor)]
fn new(arg1: Foo, arg2: Bar, ...) -> Self { ... }

In other words: Make them just like any other Rust constructors.
Instead of automating the whole allocation and try-default-initialization machinery, contract writers would instead simply write their initialized values just as if it was a normal Rust constructor. Since flush no longer requires storage entities to know where they are mapped in storage we can simply do this.

Concrete Proposal

Introduce three new traits and remove the former old Flush trait:

Proposal 1

pub trait FlushAt {
    fn flush_at(&self, at: Key, seal: Sealed);
}

pub trait FlushPropagate {
    fn flush_propagate(&self, at: Key, seal: Sealed) {}
}

pub trait Flush: FlushAt + FlushPropagate {
    fn flush(&self, at: Key);
}

const _: () = {
    /// Prevents `FlushAt` and `FlushForward` trait methods to be called
    /// from user code. 
    pub struct Sealed {
        seal: (),
    }

    impl<T> Flush for T
    where
        T: FlushAt + FlushPropagate
    {
        fn flush(&self, at: Key) {
            FlushPropagate::flush_propagate(&self, at, Sealed { });
            FlushAt::flush_at(&self, at, Sealed { });
        }
    }
};

Users should only ever interact with the main Flush trait still and still only call that.
FlushAt is used to flush the entity at hand itself.
FlushPropagate is used to flush nested components of the entity.
We do the separation to guarantee that the order in which the flush happens can be relied upon.

Advantages

Strict splitting of FlushAt and FlushForward
Automated Flush implementation that can be relied upon (propagate then flush)

Downsides

3 traits in total
Might be misleading or confusing if not known what to call
Seals might be a bit too pervasive in signatures

Proposal 2

Just go with one trait as usual.

pub trait FlushAt {
    fn flush_at(&self, at: Key);
}

Advantages

Simple Interface

Downsides

No strict guarantee of the order of recursion in nested structs

Rename

We should generally think about renaming our traits.
Flushing has no proper english opposite but if we use Push instead of Flush we could introduce AllocateUsing as the new Pull.
So when executing a contract the first thing we do is to Pull from storage and the last thing we do is to Push back to storage.
With the refactored Flush (or Push) trait we'd have to also change semantics of AllocateUsing so that the renaming is justified.

Downside

By introducing at: Key into the new Flush trait we would be no longer able to move all of these computations into compilation time since the opposite Pull from storage (that would then be required to make all of this work) is directly connected with the live contract storage because of Eager storage entities.

This is the biggest downside I see because we'd have to do this for every ink! contract execution all over. We could introduce some speedups by using u128 or even u64 instead of a whole [u8; 32] key, however, especially for large static storage entities (e.g. an #[ink(storage)] struct with lots of nested fields) this computation might be compute intense when scaling it up to the point that it is required for every contract execution.

Before & After

BEFORE

All storage entities are lazy.
All storage entities know their storage locations through a sweep by AllocateUsing
All storage entities can flush themselves with Flush
Only a selected set of types are storage entities, e.g.
- storage::Value<T>
- storage::Vec<T>
- storage::BTreeMap<K, V>
Storage entities are limited combinable, e.g.
- storage::Value<storage::Value<i32>> is certainly not what you want
- storage::Value<storage::Vec<u8>> as well
- storage::Vec<storage::Value<i32>> is pretty useless, too

AFTER

All storage entities are either lazy or eager
Lazy storage entities know their storage locations through a sweep by Pull, eager storage entities will load immediately through Pull
Storage entities are guided upon Flush (or Push) by the at: Key parameter
- Still: Lazy storage entities could in theory still be pushed without at
Most Rust types can be eager storage entities
The provided storage entities are better combinable, e.g.
- storage::Box<storage::Cached<T>> or storage::Cached<T> is comparable do the same as today's storage::Value
- We'd be able to just use storage::Cached<T> is most places where we used storage::Value<T> before, removing some indirections in nested combinations such as:
  - storage::Value<storage::Vec<T>> where the len field of storage::Vec<T> has been indirected twice and would now be storage::Cached<storage::Vec<T>> without additional indirection.

Proposal 3

This proposal goes a completely different approach.
We now base our whole computation on top of the StorageSize trait:

pub trait StorageSize {
   const SIZE: u64 = 0;
}

It needs to be implemented for all storage entities and can also be default implemented by primitive types such as u32, bool, etc.
The SIZE constant value describes how many storage cells it requires in order to operate.
For example a single storage cell requires one cell, whereas a storage chunk requires 2^32 cells and a storage::Vec consisting of a storage::SyncCell and a storage::SyncChunk thus requires 1 + 2^32 cells.
The dynamic allocator requires 2^64 cells.

By introducing the new concept for the Key type based on u128 we can now take advantage of the fact that u128 can compute many operations at compile time.
Coupled with const_fn functions we declare that all storage entities must be constructible given a single Key as offset using the following signature:

pub const fn from_offset(offset: Key) -> Self;

Unfortunately we cannot decode this as trait at the moment since Rust support for const_fn traits is not implemented, yet.

Using these key components we can guarantee const construction of our storage entities so no more runtime computation is required for constructing the contract's storage entities.
Note that we still need to support Flush as described above with the additional at: Key component to make it work for all Rust types.

Problems

The problems with the 3rd approach are:

The signature of pub const fn from_offset(offset: Key) -> Self is problematic for Eager storage types such as u32, bool, etc. since they cannot be properly initialized by a storage value at compile-time. We instead have the need to encode this into the type system and instead provide a return type such as MaybeUninit<Self> or Result<Self, Err>, some combination Result<MaybeUninit<Self>, Err> or something unique:

enum FromOffsetResult<T> {
    /// For Lazy storage entities that include some Eager storage entities.
    /// They cannot be both at the same time because their sub-fields expect `T` but
    /// might receive `MaybeUninit<T>` for Eager storage entities sub fields.
    Err(FromOffsetError),
    /// For Lazy storage entities because they can be constructed totally fine by this.
    Ok(T),
    /// For Eager storage entities because they can be constructed but do not have a definite value accessible at compile-time.
    Uninit(T),
}

The text was updated successfully, but these errors were encountered:

ascjones · 2019-11-29T15:44:19Z

Judging by the task list you appear to favour the 3 trait approach in Proposal 1. I'm inclined to agree. Is this the main reason:

By introducing at: Key into the new Flush trait we would be no longer able to move all of these computations into compilation time

Robbepop · 2019-11-29T16:08:02Z

I'm inclined to agree.

Your job is to disagree with me. :D

You can view this whole write-down as a write-down of my thoughts about this topic.
I also have many thoughts about this that haven't made it into this write down yet.
With the so far presented stuff I am also not happy because it won't solve our problems with getting common computations into compile-time as much as possible.

I wrote that the at: Key parameter might prevent us from getting things more into compile-time but to be honest this leaves out details that explain why. One of the simple answers to that is that const_fn doesn't work with traits. So we are bound to nightly features or to other ways of handling the whole thing.

I also proposed in another thread to change our Key type from [u8; 32] to u128 to make this transition easier.

Robbepop added A-ink_storage [ink_storage] Work Item B-design Designing a new component, interface or functionality. B-enhancement New feature or request and removed B-enhancement New feature or request labels Nov 27, 2019

Robbepop changed the title ~~Refactor the core Flush trait~~ Refactor contract initialization and finalization Jan 13, 2020

athei linked a pull request Feb 28, 2020 that will close this issue

Implement storage (revision 2) module #311

Merged

81 tasks

This was referenced Mar 21, 2020

Refactor Initialize and AllocateUsing core traits #244

Closed

Add PresetAllocator - The Prellocator #17

Closed

Robbepop closed this as completed in #311 May 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor contract initialization and finalization #280

Refactor contract initialization and finalization #280

Robbepop commented Nov 27, 2019 •

edited

Loading

ascjones commented Nov 29, 2019

Robbepop commented Nov 29, 2019

Refactor contract initialization and finalization #280

Refactor contract initialization and finalization #280

Comments

Robbepop commented Nov 27, 2019 • edited Loading

Tasks

New System

Deploy

Call

Traits

Current State

Problem & Proposal

Storage Mapping

Potential Solution

Why this works

Initialization

Concrete Proposal

Proposal 1

Proposal 2

Rename

Downside

Before & After

Proposal 3

Problems

ascjones commented Nov 29, 2019

Robbepop commented Nov 29, 2019

Robbepop commented Nov 27, 2019 •

edited

Loading