Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: update StateMap to take borrowee of keys #644

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion module-system/module-implementations/sov-evm/src/call.rs
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ impl<C: sov_modules_api::Context> Evm<C> {
let cfg_env = get_cfg_env(&block_env, cfg, None);

let hash = evm_tx_recovered.hash();
self.transactions.set(&hash, &tx, working_set);
self.transactions.set(&hash[..], &tx, working_set);

let evm_db: EvmDb<'_, C> = self.get_db(working_set);

Expand Down
4 changes: 2 additions & 2 deletions module-system/module-implementations/sov-evm/src/query.rs
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ impl<C: sov_modules_api::Context> Evm<C> {
working_set: &mut WorkingSet<C::Storage>,
) -> RpcResult<Option<Transaction>> {
info!("evm module: eth_getTransactionByHash");
let evm_transaction = self.transactions.get(&hash.into(), working_set);
let evm_transaction = self.transactions.get(&hash[..], working_set);
let result = evm_transaction.map(Transaction::try_from).transpose();
result.map_err(|e| to_jsonrpsee_error_object(e, "ETH_RPC_ERROR"))
}
Expand All @@ -75,7 +75,7 @@ impl<C: sov_modules_api::Context> Evm<C> {
working_set: &mut WorkingSet<C::Storage>,
) -> RpcResult<Option<TransactionReceipt>> {
info!("evm module: eth_getTransactionReceipt");
let receipt = self.receipts.get(&hash.into(), working_set);
let receipt = self.receipts.get(&hash[..], working_set);
Ok(receipt.map(|r| r.into()))
}

Expand Down
18 changes: 9 additions & 9 deletions module-system/sov-state/src/internal_cache.rs
Original file line number Diff line number Diff line change
Expand Up @@ -42,12 +42,12 @@ impl StorageInternalCache {
/// Gets a value from the cache or reads it from the provided `ValueReader`.
pub(crate) fn get_or_fetch<S: Storage>(
&mut self,
key: StorageKey,
key: &StorageKey,
value_reader: &S,
witness: &S::Witness,
) -> Option<StorageValue> {
let cache_key = key.clone().as_cache_key();
let cache_value = self.get_value_from_cache(cache_key.clone());
let cache_key = key.as_cache_key();
let cache_value = self.get_value_from_cache(cache_key);

match cache_value {
cache::ValueExists::Yes(cache_value_exists) => cache_value_exists.map(Into::into),
Expand All @@ -56,30 +56,30 @@ impl StorageInternalCache {
let storage_value = value_reader.get(key, witness);
let cache_value = storage_value.as_ref().map(|v| v.clone().as_cache_value());

self.add_read(cache_key, cache_value);
self.add_read(cache_key.clone(), cache_value);
storage_value
}
}
}

pub fn try_get(&self, key: StorageKey) -> ValueExists {
pub fn try_get(&self, key: &StorageKey) -> ValueExists {
let cache_key = key.as_cache_key();
self.get_value_from_cache(cache_key)
}

pub(crate) fn set(&mut self, key: StorageKey, value: StorageValue) {
let cache_key = key.as_cache_key();
let cache_key = key.to_cache_key();
let cache_value = value.as_cache_value();
self.tx_cache.add_write(cache_key, Some(cache_value));
}

pub(crate) fn delete(&mut self, key: StorageKey) {
let cache_key = key.as_cache_key();
let cache_key = key.to_cache_key();
self.tx_cache.add_write(cache_key, None);
}

fn get_value_from_cache(&self, cache_key: CacheKey) -> cache::ValueExists {
self.tx_cache.get_value(&cache_key)
fn get_value_from_cache(&self, cache_key: &CacheKey) -> cache::ValueExists {
self.tx_cache.get_value(cache_key)
}

pub fn merge_left(
Expand Down
110 changes: 85 additions & 25 deletions module-system/sov-state/src/map.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
use std::marker::PhantomData;
use core::borrow::Borrow;
use core::marker::PhantomData;

use borsh::{BorshDeserialize, BorshSerialize};
use thiserror::Error;
Expand All @@ -21,57 +22,116 @@ pub enum Error {
MissingValue(Prefix, StorageKey),
}

impl<K: BorshSerialize, V: BorshSerialize + BorshDeserialize> StateMap<K, V> {
pub fn new(prefix: Prefix) -> Self {
impl<K, V> StateMap<K, V>
where
K: BorshSerialize,
V: BorshSerialize + BorshDeserialize,
{
/// Creates a new `StateMap`.
pub const fn new(prefix: Prefix) -> Self {
Self {
_phantom: (PhantomData, PhantomData),
prefix,
}
}

/// Inserts a key-value pair into the map.
pub fn set<S: Storage>(&self, key: &K, value: &V, working_set: &mut WorkingSet<S>) {
working_set.set_value(self.prefix(), key, value)
pub fn set<S, Q>(&self, key: &Q, value: &V, working_set: &mut WorkingSet<S>)
where
S: Storage,
Q: BorshSerialize + ?Sized,
K: Borrow<Q>,
{
working_set.set_value(self.prefix(), &key, value)
}

/// Returns the value corresponding to the key or None if key is absent in the StateMap.
pub fn get<S: Storage>(&self, key: &K, working_set: &mut WorkingSet<S>) -> Option<V> {
working_set.get_value(self.prefix(), key)
///
/// # Examples
///
/// We can use as argument any type that can be borrowed by the key.
///
/// ```rust
///
/// fn foo<S>(map: StateMap<Vec<u8>, u64>, key: &[u8], ws: &mut WorkingSet<S>) -> Option<u64>
/// where
/// S: Storage,
/// {
/// // we perform the `get` with a slice, and not the `Vec`. it is so because `Vec` borrows
/// // `[T]`.
/// map.get(&key[..], ws)
/// }
/// ```
///
/// However, some concrete types won't implement `Borrow`, but we can easily cast them into
/// common types that will
///
/// ```rust
///
/// fn foo<S>(map: StateMap<Vec<u8>, u64>, key: [u8; 32], ws: &mut WorkingSet<S>) -> Option<u64>
/// where
/// S: Storage,
/// {
/// map.get(&key[..], ws)
/// }
/// ```
Copy link
Member

@bkolad bkolad Aug 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an important hidden assumption we need to address. The standard requirement for Borrow is" Eq, Ord, and Hash must be equivalent for borrowed and owned values We are adding another requirement here: serialized version of borrowed and owned value must be equivalent. So it could happen that we have 2 types, A that can be borrowed as B, everything works for some codec, but if someone decides to change the codec to something else via #648 and it will stop working because for example vec and &[] have different representation.

Is it even true for borsh?
Like Vec<_> can be borrowed as slice but it looks like they have a different serialization logic for ZST:

@citizen-stig @preston-evans98 @neysofu WDT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are adding another requirement here: serialized version of borrowed and owned value must be equivalent.

Yes, by adding this requirement we allow to reduce cloning.
Question is, can we express this restriction in the code, so users won't get hit by

Is it even true for borsh?

But what if you have ZST as key in the first place? Would it suppose to break?

So I have this test and it looks like this;

   use sov_state::{DefaultStorageSpec, Prefix, ProverStorage, StateMap};
    use super::*;

    #[test]
    fn check_something() {
        let tmpdir = tempfile::tempdir().unwrap();
        let mut working_set =
            WorkingSet::new(ProverStorage::<DefaultStorageSpec>::with_path(tmpdir.path()).unwrap());

        let prefix = Prefix::new("test".as_bytes().to_vec());
        let map_1: StateMap<Vec<()>, u8> = StateMap::new(prefix.clone());

        let d1 = [(); 2 ^ 64].to_vec();

        map_1.set(&d1, &3, &mut working_set);
        let a = map_1.get(&d1, &mut working_set);
        println!("A: {:?}", a);

        let map_2: StateMap<&'static [()], u8> = StateMap::new(prefix);

        let d2 = [(); 2 ^ 64];
        map_2.set(&&d2[..], &3, &mut working_set);

        let b = map_2.get(&&d2[..], &mut working_set);
        println!("B: {:?}", b);

        assert!(a.is_some());
        assert!(b.is_some());
        assert_eq!(a, b);
    }

Nothing blew up. But probably I misunderstood question.

I hope we can agree that there's usability problem. Often you have only reference to slice of bytes and you need to use it as key, so you have to clone it. Then during set it is cloned again:

impl StorageKey {
    pub fn new<K: BorshSerialize>(prefix: &Prefix, key: &K) -> Self {
        let encoded_key = key.try_to_vec().unwrap();

So we trade off API purity for double cloning. We need to understand that.
And in current scenario, we push burden to our customers. So I think it is in our interest to find solution to this problem. Maybe we need to add some constraints to codec type, so it won't be easy to use coded which has different serialization. Maybe each codec have to implement those kind of checks for types that can be keys.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know it is not strong reason, but how likely it is that someone will use ZST for keys?

Also, can we limit CacheKey current implementation not to all AsRef but only to Vec and slice of bytes. That might be option, since that will be most probable use case.

Copy link
Member

@bkolad bkolad Aug 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@citizen-stig
So this

map.get(&vec[()]) 

panics while

map.get(&[();1]);

won't. And in the test probably we could get out of memory when we try to deserialize Vec<()> values which are serialized as very big slices instead of keys.

So it is inconsistent.
The ZST and slice vs vec is just an example of a bigger problem. We added another requirement to Borrow, and all the serialization formats in cartes.io don't care about it (like for bincode something else might break). To use it safely a user has to examine all types which can be borrowed and check if the implementation of the sterilization format they choose doesn't break the assumption (and won't break it in the future). We already saw that it is a broken event in the simplest case.

We could have another trait OurBorrow with the additional requirement documented but still, 3rd party serialization crates won't care.

For HashMap this kind of API works, because everyone in the rust ecosystem agreed on Eq, Ord, and Hash, must be equivalent for borrowed and owned value but this is just a convention. You can easily implement Borrow & Eq or Hash for your type which will break the HashMap. There is no way to express it in type system.

Copy link
Member

@neysofu neysofu Aug 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@citizen-stig your example gave me an idea. You tested Vec<()> against &[()], which have the same encodings in Borsh, so it succeeds. What about arrays, though? Arrays don't have a length marker because there's no need for those, yet they Borrow to slices. This test fails:

#[test]
fn check_something() {
    use super::*;
    use crate::{DefaultStorageSpec, Prefix, ProverStorage};

    let tmpdir = tempfile::tempdir().unwrap();
    let mut working_set =
        WorkingSet::new(ProverStorage::<DefaultStorageSpec>::with_path(tmpdir.path()).unwrap());

    let prefix = Prefix::new("test".as_bytes().to_vec());

    type Key = [u8; 4];
    let map: StateMap<Key, u32> = StateMap::new(prefix.clone());

    let key = [0u8; 4];
    let key_ref_array: &[u8; 4] = <Key as Borrow<[u8; 4]>>::borrow(&key);
    let key_ref_slice: &[u8] = <Key as Borrow<[u8]>>::borrow(&key);

    // We set with &[u8; 4]...
    map.set(key_ref_array, &3, &mut working_set);
    // ...getting with &[u8; 4] works, obviously...
    assert!(map.get(key_ref_array, &mut working_set).is_some());
    // ...but changing the target of the Borrow to &[u8] doesn't.
    assert!(map.get(key_ref_slice, &mut working_set).is_some());
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bkolad I didn't see your comment before posting mine, but yeah basically +1.

Copy link
Member

@neysofu neysofu Aug 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One possibility would be to extend StateKeyCodec and shift the responsibility of ensuring that K and Q serialize to the same byte sequence to the codec author, instead of the caller.

pub trait StateKeyCodec<K, Q = K> {
    type KeyError: std::fmt::Debug;

    fn encode_key(&self, key: &Q) -> Vec<u8>;

    fn try_decode_key(&self, bytes: &[u8]) -> Result<K, Self::KeyError>;

    // ...
}

Effectively, the default type parameter Q = K would be an opt-in mechanism. We'd have to modify this PR to allow only the types Q for which the codec is valid: impl<K, V, C, Q> StateMap<K, V, C> where C: StateKeyCodec<K, Q> + StateValueCodec<V>.

Blanket implementations like the following won't support the borrowee pattern, and it'll be up to us to choose when and for which types to add support:

impl<K> StateKeyCodec<K> for BorshCodec
where
    K: borsh::BorshSerialize + borsh::BorshDeserialize,
{
   // ...
}

I'd have to think it more through, but I think it would work. Whether or not it's worth it is another story.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bkolad

So

map.get(&vec[()]) 

// panics while

Does not panic for me:

        type Key = Vec<()>;
        let map: StateMap<Key, u32> = StateMap::new(prefix.clone());

        let a = map.get(&vec![()], &mut working_set);
        assert!(a.is_none())

@neysofu , thank you for your example, I can confirm that it does not fetch key correctly =(.

I just want to reiterate my point. I am not trying to prove that ZST types suppose to work. Or that serialization adds extra constraints.

I just want to empathize, that there's undeniably usability problem for people in a simple case of &Vec<u8> or &String. And we have 2 options:

  1. "Sorry devs, that's life, deal with it".
  2. Try to find some practical solution, that will improve usability and bring value to the users.

In my opinion second option is preferable, even if it is harder.
Yes, people can brake HashMap, and that's fine, because everyday usage is more important, than edge cases.
I think the same way if we push for this convention, it can be easier to use our SDK everyday.

So to use it safely a user has to examine all types which can be borrowed and check if the implementation of the sterilization format they choose doesn't break the assumption (and won't break it in the future). We already saw that it is a broken event in the simplest case.

Can you please provide another example, besides ZST, which can brake it?

all types which can be borrowed

All types that are used as keys.

(and won't break it in the future)

Breaking change in serialization library is a big thing on itself. If you have running system and serialization for Vec has changed, you cannot just upgrade.

Anyway.

Other options that we can use:

  • some automatic way to check code base, that anything used in a key, should have same serialization with given coded. We can even add macro that will generate those tests.
  • Update our documentation to make sure that if people select something special for the case, that they should be carefule.

Copy link
Member

@bkolad bkolad Aug 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@citizen-stig we are on the old version of borsh, the check and panic is in the latest master.
I am not against improving it if we can I just don't see any way.
If @neysofu solution works and we don't complicate the code much then ok but I would avoid relaying on documentation for this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try putting something together 👍

pub fn get<S, Q>(&self, key: &Q, working_set: &mut WorkingSet<S>) -> Option<V>
where
S: Storage,
Q: BorshSerialize + ?Sized,
K: Borrow<Q>,
{
working_set.get_value(self.prefix(), &key)
}

/// Returns the value corresponding to the key or Error if key is absent in the StateMap.
pub fn get_or_err<S: Storage>(
&self,
key: &K,
working_set: &mut WorkingSet<S>,
) -> Result<V, Error> {
///
/// For reference, check [Self::get].
pub fn get_or_err<S, Q>(&self, key: &Q, working_set: &mut WorkingSet<S>) -> Result<V, Error>
where
S: Storage,
Q: BorshSerialize + ?Sized,
K: Borrow<Q>,
{
self.get(key, working_set).ok_or_else(|| {
Error::MissingValue(self.prefix().clone(), StorageKey::new(self.prefix(), key))
Error::MissingValue(self.prefix().clone(), StorageKey::new(self.prefix(), &key))
})
}

/// Removes a key from the StateMap, returning the corresponding value (or None if the key is absent).
pub fn remove<S: Storage>(&self, key: &K, working_set: &mut WorkingSet<S>) -> Option<V> {
working_set.remove_value(self.prefix(), key)
pub fn remove<S, Q>(&self, key: &Q, working_set: &mut WorkingSet<S>) -> Option<V>
where
S: Storage,
Q: BorshSerialize + ?Sized,
K: Borrow<Q>,
{
working_set.remove_value(self.prefix(), &key)
}

/// Removes a key from the StateMap, returning the corresponding value (or Error if the key is absent).
pub fn remove_or_err<S: Storage>(
&self,
key: &K,
working_set: &mut WorkingSet<S>,
) -> Result<V, Error> {
self.remove(key, working_set).ok_or_else(|| {
Error::MissingValue(self.prefix().clone(), StorageKey::new(self.prefix(), key))
pub fn remove_or_err<S, Q>(&self, key: &Q, working_set: &mut WorkingSet<S>) -> Result<V, Error>
where
S: Storage,
Q: BorshSerialize + ?Sized,
K: Borrow<Q>,
{
self.remove(&key, working_set).ok_or_else(|| {
Error::MissingValue(self.prefix().clone(), StorageKey::new(self.prefix(), &key))
})
}

/// Deletes a key from the StateMap.
pub fn delete<S: Storage>(&self, key: &K, working_set: &mut WorkingSet<S>) {
working_set.delete_value(self.prefix(), key);
pub fn delete<S, Q>(&self, key: &Q, working_set: &mut WorkingSet<S>)
where
S: Storage,
Q: BorshSerialize + ?Sized,
K: Borrow<Q>,
{
working_set.delete_value(self.prefix(), &key);
}

pub fn prefix(&self) -> &Prefix {
/// Returns the storage prefix for the instance.
pub const fn prefix(&self) -> &Prefix {
&self.prefix
}
}
13 changes: 8 additions & 5 deletions module-system/sov-state/src/prover_storage.rs
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ impl<S: MerkleProofSpec> ProverStorage<S> {
})
}

fn read_value(&self, key: StorageKey) -> Option<StorageValue> {
fn read_value(&self, key: &StorageKey) -> Option<StorageValue> {
match self
.db
.get_value_option_by_key(self.db.get_next_version(), key.as_ref())
Expand All @@ -68,7 +68,7 @@ impl<S: MerkleProofSpec> Storage for ProverStorage<S> {
Self::with_path(config.path.as_path())
}

fn get(&self, key: StorageKey, witness: &Self::Witness) -> Option<StorageValue> {
fn get(&self, key: &StorageKey, witness: &Self::Witness) -> Option<StorageValue> {
let val = self.read_value(key);
witness.add_hint(val.clone());
val
Expand Down Expand Up @@ -240,7 +240,7 @@ mod test {
.validate_and_commit(cache, &witness)
.expect("storage is valid");

assert_eq!(test.value, prover_storage.get(test.key, &witness).unwrap());
assert_eq!(test.value, prover_storage.get(&test.key, &witness).unwrap());
assert_eq!(prover_storage.db.get_next_version(), test.version + 1)
}
}
Expand All @@ -251,7 +251,7 @@ mod test {
for test in tests {
assert_eq!(
test.value,
storage.get(test.key, &Default::default()).unwrap()
storage.get(&test.key, &Default::default()).unwrap()
);
}
}
Expand Down Expand Up @@ -284,7 +284,10 @@ mod test {
{
let prover_storage = ProverStorage::<DefaultStorageSpec>::with_path(path).unwrap();
assert!(!prover_storage.is_empty());
assert_eq!(value, prover_storage.get(key, &Default::default()).unwrap());
assert_eq!(
value,
prover_storage.get(&key, &Default::default()).unwrap()
);
}
}
}
23 changes: 11 additions & 12 deletions module-system/sov-state/src/scratchpad.rs
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ impl<S: Storage> StateCheckpoint<S> {
}
}

pub fn get(&mut self, key: StorageKey) -> Option<StorageValue> {
pub fn get(&mut self, key: &StorageKey) -> Option<StorageValue> {
self.delta.get(key)
}

Expand Down Expand Up @@ -105,7 +105,7 @@ impl<S: Storage> WorkingSet<S> {
}
}

pub(crate) fn get(&mut self, key: StorageKey) -> Option<StorageValue> {
pub(crate) fn get(&mut self, key: &StorageKey) -> Option<StorageValue> {
self.delta.get(key)
}

Expand Down Expand Up @@ -152,7 +152,7 @@ impl<S: Storage> WorkingSet<S> {
storage_key: &K,
) -> Option<V> {
let storage_key = StorageKey::new(prefix, storage_key);
self.get_decoded(storage_key)
self.get_decoded(&storage_key)
}

pub(crate) fn remove_value<K: BorshSerialize, V: BorshDeserialize>(
Expand All @@ -161,7 +161,7 @@ impl<S: Storage> WorkingSet<S> {
storage_key: &K,
) -> Option<V> {
let storage_key = StorageKey::new(prefix, storage_key);
let storage_value = self.get_decoded(storage_key.clone())?;
let storage_value = self.get_decoded(&storage_key)?;
self.delete(storage_key);
Some(storage_value)
}
Expand All @@ -171,7 +171,7 @@ impl<S: Storage> WorkingSet<S> {
self.delete(storage_key);
}

fn get_decoded<V: BorshDeserialize>(&mut self, storage_key: StorageKey) -> Option<V> {
fn get_decoded<V: BorshDeserialize>(&mut self, storage_key: &StorageKey) -> Option<V> {
let storage_value = self.get(storage_key)?;

// It is ok to panic here. Deserialization problem means that something is terribly wrong.
Expand All @@ -183,21 +183,20 @@ impl<S: Storage> WorkingSet<S> {
}

impl<S: Storage> RevertableDelta<S> {
fn get(&mut self, key: StorageKey) -> Option<StorageValue> {
let key = key.as_cache_key();
if let Some(value) = self.writes.get(&key) {
fn get(&mut self, key: &StorageKey) -> Option<StorageValue> {
if let Some(value) = self.writes.get(key.as_cache_key()) {
return value.clone().map(Into::into);
}
self.inner.get(key.into())
self.inner.get(key)
}

fn set(&mut self, key: StorageKey, value: StorageValue) {
self.writes
.insert(key.as_cache_key(), Some(value.as_cache_value()));
.insert(key.to_cache_key(), Some(value.as_cache_value()));
}

fn delete(&mut self, key: StorageKey) {
self.writes.insert(key.as_cache_key(), None);
self.writes.insert(key.to_cache_key(), None);
}
}

Expand Down Expand Up @@ -253,7 +252,7 @@ impl<S: Storage> Debug for Delta<S> {
}

impl<S: Storage> Delta<S> {
fn get(&mut self, key: StorageKey) -> Option<StorageValue> {
fn get(&mut self, key: &StorageKey) -> Option<StorageValue> {
self.cache.get_or_fetch(key, &self.inner, &self.witness)
}

Expand Down
10 changes: 8 additions & 2 deletions module-system/sov-state/src/storage.rs
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,13 @@ impl StorageKey {
self.key.clone()
}

pub fn as_cache_key(self) -> CacheKey {
pub fn as_cache_key(&self) -> &CacheKey {
// Safety: they are currently equivalent
// TODO https://github.com/Sovereign-Labs/sovereign-sdk/issues/643
unsafe { core::mem::transmute(self) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we don't use #[repr(transparent)], I think rustc is allowed to use whatever layout it wants for storage keys; so while they're likely the same as cache keys in practice we can't guarantee that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there all just wrapper around arc, wouldn't cloning arc just work?

Copy link
Contributor Author

@vlopes11 vlopes11 Aug 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@preston-evans98 I think it's ok for this case since the type declaration is the same. The transparent would be required when we convert from struct to enum, for example. In that case, the compiler would, maybe, use a different layout.

@citizen-stig strictly speaking, cloning the arc into an owned cache key would be just fine as internally it would just increment the reference count to the vec. I'll dive into more details of what I had in mind, but first, a preliminary snippet:

pub fn as_cache_key(&self) -> &CacheKey {
    &CacheKey { key: self.key }
}

The snippet above doesn't work because CacheKey is created inside the function, even tho it contains just a pointer that will be valid for the lifetime of StorageKey.

The idea of having references to the storage key instead of owned storage key has two main pillars:

  • The assumption of storage key being a pointer is an implementation detail of the struct; otherwise, it would implement Copy
  • According to the snippet below, we need a reference to the Arc to satisfy the API constraints
use std::{borrow::Borrow, marker::PhantomData, sync::Arc};

#[derive(Default)]
pub struct Map<K, V> {
    _data: PhantomData<(K, V)>,
}

impl<K, V> Map<K, V> {
    pub fn get<Q>(&self, _k: &Q) -> Option<V>
    where
        Q: ?Sized,
        K: Borrow<Q>,
    {
        todo!()
    }
}

Map::<Vec<u8>, u8>::default().get(&[0][..]); // works fine
Map::<Arc<Vec<u8>>, u8>::default().get(&Arc::new(vec![0u8])); // works too since arc borrows
                                                              // whatever T borrows
Map::<Arc<Vec<u8>>, u8>::default().get(Arc::new(vec![0u8])); // doesn't work since we expect a
                                                             // reference to a borrowable type

The idea of making StorageKey always a reference challenges the assumption passed to the user that cloning this key is fine (in reality it is, but as mentioned above, it is an implementation detail of the struct).

Also, the idea of making StorageKey always a reference is more ergonomic to work with map keys that will, by default, expect references of generics.

Wdyt of this approach?


There is also a rather questionable point with the PR. The previous API was using as_cache_key to convert owned -> owned. Even tho it is a sort of free conversion (pointer to pointer), it is still owned to owned and maybe it should be to_* 🤔

Copy link
Contributor Author

@vlopes11 vlopes11 Aug 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally, this solution is meant to be a cheap one until we decide what to do with #643 . If we unify the types, we could even implement copy for the struct and treat it as a pointer.

The intent was to preserve a change scope separation to keep the changeset of the PR minimal, but we can easily integrate the solution of #643 into this PR

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's ok for this case since the type declaration is the same. The transparent would be required when we convert from struct to enum, for example. In that case, the compiler would, maybe, use a different layout.

Logically I agree, espeically that there's only single inner attribute.

But I still think that this reasoning might be not 100% sound, because instead of relying on contract with compiler, we rely on our understanding of compiler.
Maybe let's just add repr(transparent) to be on the safe side?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! I added the transparent directive and a test-only assertion that will pass only if the two structs are exactly equal

7aae44d

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To recap, two options were proposed here:

  1. Clone the Arc.
  2. std::mem::transmute.

I appreciate the the safeguards we added here, and I'm actually convinced it's safe (MIRI agrees), but we're still relying on the internals of the two structs as an implementation detail during the transmute. The first option seems simpler to understand, 100% safe, and quite intuitive. It relies on an implementation detail, which is unfortunate, but the same can be said about this unsafe.

I'd be for dropping this unsafe and clone the Arc instead, which is a safer stopgap until #643 lands.

}

pub fn to_cache_key(self) -> CacheKey {
CacheKey { key: self.key }
}
Comment on lines +47 to 49
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vlopes11 I agree with your previous point here that this method is a bit of a misnomer. I think it should be named into_cache_key because it's an owned-to-owned conversion: https://rust-lang.github.io/api-guidelines/naming.html#ad-hoc-conversions-follow-as_-to_-into_-conventions-c-conv

}
Expand Down Expand Up @@ -137,7 +143,7 @@ pub trait Storage: Clone {
fn with_config(config: Self::RuntimeConfig) -> Result<Self, anyhow::Error>;

/// Returns the value corresponding to the key or None if key is absent.
fn get(&self, key: StorageKey, witness: &Self::Witness) -> Option<StorageValue>;
fn get(&self, key: &StorageKey, witness: &Self::Witness) -> Option<StorageValue>;

/// Returns the latest state root hash from the storage.
fn get_state_root(&self, witness: &Self::Witness) -> anyhow::Result<[u8; 32]>;
Expand Down
2 changes: 1 addition & 1 deletion module-system/sov-state/src/zk_storage.rs
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ impl<S: MerkleProofSpec> Storage for ZkStorage<S> {
Ok(Self::new(config))
}

fn get(&self, _key: StorageKey, witness: &Self::Witness) -> Option<StorageValue> {
fn get(&self, _key: &StorageKey, witness: &Self::Witness) -> Option<StorageValue> {
witness.get_hint()
}

Expand Down