-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: update StateMap
to take borrowee of keys
#644
Conversation
This commit updates the API of `StateMap` to behave as common map implementations in Rust: the argument of a get method will receive a value that can be borrowed from the concrete key type. This is useful for recurrent situations such as when we want to perform a get from a map of `Vec<u8>` without cloning the key index, by using a slice. The focus of the PR is to touch the public API of the map and trait, without changing too much the internals. A new issue is introduced to propose an upgrade: #643 It also updates the public interface of the storage to take references by default, instead of owned values. It will allow the user to perform storage queries without necessarily cloning the values. Resolves #427
pub fn as_cache_key(&self) -> &CacheKey { | ||
// Safety: they are currently equivalent | ||
// TODO https://github.com/Sovereign-Labs/sovereign-sdk/issues/643 | ||
unsafe { core::mem::transmute(self) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we don't use #[repr(transparent)]
, I think rustc is allowed to use whatever layout it wants for storage keys; so while they're likely the same as cache keys in practice we can't guarantee that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there all just wrapper around arc, wouldn't cloning arc just work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@preston-evans98 I think it's ok for this case since the type declaration is the same. The transparent would be required when we convert from struct to enum, for example. In that case, the compiler would, maybe, use a different layout.
@citizen-stig strictly speaking, cloning the arc into an owned cache key would be just fine as internally it would just increment the reference count to the vec. I'll dive into more details of what I had in mind, but first, a preliminary snippet:
pub fn as_cache_key(&self) -> &CacheKey {
&CacheKey { key: self.key }
}
The snippet above doesn't work because CacheKey
is created inside the function, even tho it contains just a pointer that will be valid for the lifetime of StorageKey
.
The idea of having references to the storage key instead of owned storage key has two main pillars:
- The assumption of storage key being a pointer is an implementation detail of the struct; otherwise, it would implement
Copy
- According to the snippet below, we need a reference to the Arc to satisfy the API constraints
use std::{borrow::Borrow, marker::PhantomData, sync::Arc};
#[derive(Default)]
pub struct Map<K, V> {
_data: PhantomData<(K, V)>,
}
impl<K, V> Map<K, V> {
pub fn get<Q>(&self, _k: &Q) -> Option<V>
where
Q: ?Sized,
K: Borrow<Q>,
{
todo!()
}
}
Map::<Vec<u8>, u8>::default().get(&[0][..]); // works fine
Map::<Arc<Vec<u8>>, u8>::default().get(&Arc::new(vec![0u8])); // works too since arc borrows
// whatever T borrows
Map::<Arc<Vec<u8>>, u8>::default().get(Arc::new(vec![0u8])); // doesn't work since we expect a
// reference to a borrowable type
The idea of making StorageKey
always a reference challenges the assumption passed to the user that cloning this key is fine (in reality it is, but as mentioned above, it is an implementation detail of the struct).
Also, the idea of making StorageKey
always a reference is more ergonomic to work with map keys that will, by default, expect references of generics.
Wdyt of this approach?
There is also a rather questionable point with the PR. The previous API was using as_cache_key
to convert owned -> owned. Even tho it is a sort of free conversion (pointer to pointer), it is still owned to owned and maybe it should be to_*
🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally, this solution is meant to be a cheap one until we decide what to do with #643 . If we unify the types, we could even implement copy for the struct and treat it as a pointer.
The intent was to preserve a change scope separation to keep the changeset of the PR minimal, but we can easily integrate the solution of #643 into this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's ok for this case since the type declaration is the same. The transparent would be required when we convert from struct to enum, for example. In that case, the compiler would, maybe, use a different layout.
Logically I agree, espeically that there's only single inner attribute.
But I still think that this reasoning might be not 100% sound, because instead of relying on contract with compiler, we rely on our understanding of compiler.
Maybe let's just add repr(transparent)
to be on the safe side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! I added the transparent directive and a test-only assertion that will pass only if the two structs are exactly equal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To recap, two options were proposed here:
- Clone the
Arc
. std::mem::transmute
.
I appreciate the the safeguards we added here, and I'm actually convinced it's safe (MIRI agrees), but we're still relying on the internals of the two structs as an implementation detail during the transmute. The first option seems simpler to understand, 100% safe, and quite intuitive. It relies on an implementation detail, which is unfortunate, but the same can be said about this unsafe
.
I'd be for dropping this unsafe
and clone the Arc
instead, which is a safer stopgap until #643 lands.
Codecov Report
|
/// { | ||
/// map.get(&key[..], ws) | ||
/// } | ||
/// ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an important hidden assumption we need to address. The standard requirement for Borrow is" Eq, Ord, and Hash must be equivalent for borrowed and owned values
We are adding another requirement here: serialized version of borrowed and owned value must be equivalent. So it could happen that we have 2 types, A
that can be borrowed as B
, everything works for some codec
, but if someone decides to change the codec to something else via #648 and it will stop working because for example vec and &[]
have different representation.
Is it even true for borsh?
Like Vec<_> can be borrowed as slice but it looks like they have a different serialization logic for ZST:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are adding another requirement here: serialized version of borrowed and owned value must be equivalent.
Yes, by adding this requirement we allow to reduce cloning.
Question is, can we express this restriction in the code, so users won't get hit by
Is it even true for borsh?
But what if you have ZST as key in the first place? Would it suppose to break?
So I have this test and it looks like this;
use sov_state::{DefaultStorageSpec, Prefix, ProverStorage, StateMap};
use super::*;
#[test]
fn check_something() {
let tmpdir = tempfile::tempdir().unwrap();
let mut working_set =
WorkingSet::new(ProverStorage::<DefaultStorageSpec>::with_path(tmpdir.path()).unwrap());
let prefix = Prefix::new("test".as_bytes().to_vec());
let map_1: StateMap<Vec<()>, u8> = StateMap::new(prefix.clone());
let d1 = [(); 2 ^ 64].to_vec();
map_1.set(&d1, &3, &mut working_set);
let a = map_1.get(&d1, &mut working_set);
println!("A: {:?}", a);
let map_2: StateMap<&'static [()], u8> = StateMap::new(prefix);
let d2 = [(); 2 ^ 64];
map_2.set(&&d2[..], &3, &mut working_set);
let b = map_2.get(&&d2[..], &mut working_set);
println!("B: {:?}", b);
assert!(a.is_some());
assert!(b.is_some());
assert_eq!(a, b);
}
Nothing blew up. But probably I misunderstood question.
I hope we can agree that there's usability problem. Often you have only reference to slice of bytes and you need to use it as key, so you have to clone it. Then during set it is cloned again:
impl StorageKey {
pub fn new<K: BorshSerialize>(prefix: &Prefix, key: &K) -> Self {
let encoded_key = key.try_to_vec().unwrap();
So we trade off API purity for double cloning. We need to understand that.
And in current scenario, we push burden to our customers. So I think it is in our interest to find solution to this problem. Maybe we need to add some constraints to codec
type, so it won't be easy to use coded which has different serialization. Maybe each codec have to implement those kind of checks for types that can be keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it is not strong reason, but how likely it is that someone will use ZST for keys?
Also, can we limit CacheKey current implementation not to all AsRef but only to Vec and slice of bytes. That might be option, since that will be most probable use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@citizen-stig
So this
map.get(&vec[()])
panics while
map.get(&[();1]);
won't. And in the test probably we could get out of memory when we try to deserialize Vec<()>
values which are serialized as very big slices instead of keys.
So it is inconsistent.
The ZST and slice vs vec is just an example of a bigger problem. We added another requirement to Borrow
, and all the serialization formats in cartes.io don't care about it (like for bincode
something else might break). To use it safely a user has to examine all types which can be borrowed and check if the implementation of the sterilization format they choose doesn't break the assumption (and won't break it in the future). We already saw that it is a broken event in the simplest case.
We could have another trait OurBorrow
with the additional requirement documented but still, 3rd party serialization crates won't care.
For HashMap
this kind of API works, because everyone in the rust ecosystem agreed on Eq, Ord, and Hash, must be equivalent for borrowed and owned value
but this is just a convention. You can easily implement Borrow
& Eq
or Hash
for your type which will break the HashMap
. There is no way to express it in type system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@citizen-stig your example gave me an idea. You tested Vec<()>
against &[()]
, which have the same encodings in Borsh, so it succeeds. What about arrays, though? Arrays don't have a length marker because there's no need for those, yet they Borrow
to slices. This test fails:
#[test]
fn check_something() {
use super::*;
use crate::{DefaultStorageSpec, Prefix, ProverStorage};
let tmpdir = tempfile::tempdir().unwrap();
let mut working_set =
WorkingSet::new(ProverStorage::<DefaultStorageSpec>::with_path(tmpdir.path()).unwrap());
let prefix = Prefix::new("test".as_bytes().to_vec());
type Key = [u8; 4];
let map: StateMap<Key, u32> = StateMap::new(prefix.clone());
let key = [0u8; 4];
let key_ref_array: &[u8; 4] = <Key as Borrow<[u8; 4]>>::borrow(&key);
let key_ref_slice: &[u8] = <Key as Borrow<[u8]>>::borrow(&key);
// We set with &[u8; 4]...
map.set(key_ref_array, &3, &mut working_set);
// ...getting with &[u8; 4] works, obviously...
assert!(map.get(key_ref_array, &mut working_set).is_some());
// ...but changing the target of the Borrow to &[u8] doesn't.
assert!(map.get(key_ref_slice, &mut working_set).is_some());
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bkolad I didn't see your comment before posting mine, but yeah basically +1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One possibility would be to extend StateKeyCodec
and shift the responsibility of ensuring that K
and Q
serialize to the same byte sequence to the codec author, instead of the caller.
pub trait StateKeyCodec<K, Q = K> {
type KeyError: std::fmt::Debug;
fn encode_key(&self, key: &Q) -> Vec<u8>;
fn try_decode_key(&self, bytes: &[u8]) -> Result<K, Self::KeyError>;
// ...
}
Effectively, the default type parameter Q = K
would be an opt-in mechanism. We'd have to modify this PR to allow only the types Q
for which the codec is valid: impl<K, V, C, Q> StateMap<K, V, C> where C: StateKeyCodec<K, Q> + StateValueCodec<V>
.
Blanket implementations like the following won't support the borrowee pattern, and it'll be up to us to choose when and for which types to add support:
impl<K> StateKeyCodec<K> for BorshCodec
where
K: borsh::BorshSerialize + borsh::BorshDeserialize,
{
// ...
}
I'd have to think it more through, but I think it would work. Whether or not it's worth it is another story.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So
map.get(&vec[()])
// panics while
Does not panic for me:
type Key = Vec<()>;
let map: StateMap<Key, u32> = StateMap::new(prefix.clone());
let a = map.get(&vec![()], &mut working_set);
assert!(a.is_none())
@neysofu , thank you for your example, I can confirm that it does not fetch key correctly =(.
I just want to reiterate my point. I am not trying to prove that ZST types suppose to work. Or that serialization adds extra constraints.
I just want to empathize, that there's undeniably usability problem for people in a simple case of &Vec<u8>
or &String
. And we have 2 options:
- "Sorry devs, that's life, deal with it".
- Try to find some practical solution, that will improve usability and bring value to the users.
In my opinion second option is preferable, even if it is harder.
Yes, people can brake HashMap, and that's fine, because everyday usage is more important, than edge cases.
I think the same way if we push for this convention, it can be easier to use our SDK everyday.
So to use it safely a user has to examine all types which can be borrowed and check if the implementation of the sterilization format they choose doesn't break the assumption (and won't break it in the future). We already saw that it is a broken event in the simplest case.
Can you please provide another example, besides ZST, which can brake it?
all types which can be borrowed
All types that are used as keys.
(and won't break it in the future)
Breaking change in serialization library is a big thing on itself. If you have running system and serialization for Vec has changed, you cannot just upgrade.
Anyway.
Other options that we can use:
- some automatic way to check code base, that anything used in a key, should have same serialization with given coded. We can even add macro that will generate those tests.
- Update our documentation to make sure that if people select something special for the case, that they should be carefule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@citizen-stig we are on the old version of borsh, the check and panic is in the latest master
.
I am not against improving it if we can I just don't see any way.
If @neysofu solution works and we don't complicate the code much then ok but I would avoid relaying on documentation for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try putting something together 👍
pub fn to_cache_key(self) -> CacheKey { | ||
CacheKey { key: self.key } | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vlopes11 I agree with your previous point here that this method is a bit of a misnomer. I think it should be named into_cache_key
because it's an owned-to-owned conversion: https://rust-lang.github.io/api-guidelines/naming.html#ad-hoc-conversions-follow-as_-to_-into_-conventions-c-conv
pub fn as_cache_key(&self) -> &CacheKey { | ||
// Safety: they are currently equivalent | ||
// TODO https://github.com/Sovereign-Labs/sovereign-sdk/issues/643 | ||
unsafe { core::mem::transmute(self) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To recap, two options were proposed here:
- Clone the
Arc
. std::mem::transmute
.
I appreciate the the safeguards we added here, and I'm actually convinced it's safe (MIRI agrees), but we're still relying on the internals of the two structs as an implementation detail during the transmute. The first option seems simpler to understand, 100% safe, and quite intuitive. It relies on an implementation detail, which is unfortunate, but the same can be said about this unsafe
.
I'd be for dropping this unsafe
and clone the Arc
instead, which is a safer stopgap until #643 lands.
Closing in favor of #718 🎉 |
Description
This commit updates the API of
StateMap
to behave as common map implementations in Rust: the argument of a get method will receive a value that can be borrowed from the concrete key type.This is useful for recurrent situations such as when we want to perform a get from a map of
Vec<u8>
without cloning the key index, by using a slice.The focus of the PR is to touch the public API of the map and trait, without changing too much the internals. A new issue is introduced to propose an upgrade: #643
It also updates the public interface of the storage to take references by default, instead of owned values. It will allow the user to perform storage queries without necessarily cloning the values.
Linked Issues
SovState
use Borrow to behave similar to std::collections::HashMap #427Testing
An example of the new API is included as doctest