-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: update StateMap
to take borrowee of keys
#644
Changes from all commits
d0e0bfb
6e368cb
44cd263
86e0505
7aae44d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -14,6 +14,7 @@ use crate::Prefix; | |
|
||
// `Key` type for the `Storage` | ||
#[derive(Clone, PartialEq, Eq, Debug, Serialize, Deserialize, BorshDeserialize, BorshSerialize)] | ||
#[repr(transparent)] | ||
pub struct StorageKey { | ||
key: Arc<Vec<u8>>, | ||
} | ||
|
@@ -29,7 +30,21 @@ impl StorageKey { | |
self.key.clone() | ||
} | ||
|
||
pub fn as_cache_key(self) -> CacheKey { | ||
pub fn as_cache_key(&self) -> &CacheKey { | ||
debug_assert_eq!({ | ||
let StorageKey { key } = self; | ||
key | ||
}, { | ||
let CacheKey { key } = unsafe { core::mem::transmute::<_, &CacheKey>(self) }; | ||
key | ||
}, "the structure of the CacheKey must be exactly equal to the StorageKey so the unsafe transformation will work as expected"); | ||
|
||
// Safety: they are currently equivalent | ||
// TODO https://github.com/Sovereign-Labs/sovereign-sdk/issues/643 | ||
unsafe { core::mem::transmute(self) } | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since we don't use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since there all just wrapper around arc, wouldn't cloning arc just work? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @preston-evans98 I think it's ok for this case since the type declaration is the same. The transparent would be required when we convert from struct to enum, for example. In that case, the compiler would, maybe, use a different layout. @citizen-stig strictly speaking, cloning the arc into an owned cache key would be just fine as internally it would just increment the reference count to the vec. I'll dive into more details of what I had in mind, but first, a preliminary snippet: pub fn as_cache_key(&self) -> &CacheKey {
&CacheKey { key: self.key }
} The snippet above doesn't work because The idea of having references to the storage key instead of owned storage key has two main pillars:
use std::{borrow::Borrow, marker::PhantomData, sync::Arc};
#[derive(Default)]
pub struct Map<K, V> {
_data: PhantomData<(K, V)>,
}
impl<K, V> Map<K, V> {
pub fn get<Q>(&self, _k: &Q) -> Option<V>
where
Q: ?Sized,
K: Borrow<Q>,
{
todo!()
}
}
Map::<Vec<u8>, u8>::default().get(&[0][..]); // works fine
Map::<Arc<Vec<u8>>, u8>::default().get(&Arc::new(vec![0u8])); // works too since arc borrows
// whatever T borrows
Map::<Arc<Vec<u8>>, u8>::default().get(Arc::new(vec![0u8])); // doesn't work since we expect a
// reference to a borrowable type The idea of making Also, the idea of making Wdyt of this approach? There is also a rather questionable point with the PR. The previous API was using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Finally, this solution is meant to be a cheap one until we decide what to do with #643 . If we unify the types, we could even implement copy for the struct and treat it as a pointer. The intent was to preserve a change scope separation to keep the changeset of the PR minimal, but we can easily integrate the solution of #643 into this PR There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Logically I agree, espeically that there's only single inner attribute. But I still think that this reasoning might be not 100% sound, because instead of relying on contract with compiler, we rely on our understanding of compiler. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sounds good! I added the transparent directive and a test-only assertion that will pass only if the two structs are exactly equal There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks great, thanks! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To recap, two options were proposed here:
I appreciate the the safeguards we added here, and I'm actually convinced it's safe (MIRI agrees), but we're still relying on the internals of the two structs as an implementation detail during the transmute. The first option seems simpler to understand, 100% safe, and quite intuitive. It relies on an implementation detail, which is unfortunate, but the same can be said about this I'd be for dropping this |
||
} | ||
|
||
pub fn to_cache_key(self) -> CacheKey { | ||
CacheKey { key: self.key } | ||
} | ||
Comment on lines
+47
to
49
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @vlopes11 I agree with your previous point here that this method is a bit of a misnomer. I think it should be named |
||
} | ||
|
@@ -137,7 +152,7 @@ pub trait Storage: Clone { | |
fn with_config(config: Self::RuntimeConfig) -> Result<Self, anyhow::Error>; | ||
|
||
/// Returns the value corresponding to the key or None if key is absent. | ||
fn get(&self, key: StorageKey, witness: &Self::Witness) -> Option<StorageValue>; | ||
fn get(&self, key: &StorageKey, witness: &Self::Witness) -> Option<StorageValue>; | ||
|
||
/// Returns the latest state root hash from the storage. | ||
fn get_state_root(&self, witness: &Self::Witness) -> anyhow::Result<[u8; 32]>; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an important hidden assumption we need to address. The standard requirement for Borrow is"
Eq, Ord, and Hash must be equivalent for borrowed and owned values
We are adding another requirement here: serialized version of borrowed and owned value must be equivalent. So it could happen that we have 2 types,A
that can be borrowed asB
, everything works forsome codec
, but if someone decides to change the codec to something else via #648 and it will stop working because for examplevec and &[]
have different representation.Is it even true for borsh?
Like Vec<_> can be borrowed as slice but it looks like they have a different serialization logic for ZST:
@citizen-stig @preston-evans98 @neysofu WDT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, by adding this requirement we allow to reduce cloning.
Question is, can we express this restriction in the code, so users won't get hit by
But what if you have ZST as key in the first place? Would it suppose to break?
So I have this test and it looks like this;
Nothing blew up. But probably I misunderstood question.
I hope we can agree that there's usability problem. Often you have only reference to slice of bytes and you need to use it as key, so you have to clone it. Then during set it is cloned again:
So we trade off API purity for double cloning. We need to understand that.
And in current scenario, we push burden to our customers. So I think it is in our interest to find solution to this problem. Maybe we need to add some constraints to
codec
type, so it won't be easy to use coded which has different serialization. Maybe each codec have to implement those kind of checks for types that can be keys.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know it is not strong reason, but how likely it is that someone will use ZST for keys?
Also, can we limit CacheKey current implementation not to all AsRef but only to Vec and slice of bytes. That might be option, since that will be most probable use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@citizen-stig
So this
panics while
won't. And in the test probably we could get out of memory when we try to deserialize
Vec<()>
values which are serialized as very big slices instead of keys.So it is inconsistent.
The ZST and slice vs vec is just an example of a bigger problem. We added another requirement to
Borrow
, and all the serialization formats in cartes.io don't care about it (like forbincode
something else might break). To use it safely a user has to examine all types which can be borrowed and check if the implementation of the sterilization format they choose doesn't break the assumption (and won't break it in the future). We already saw that it is a broken event in the simplest case.We could have another trait
OurBorrow
with the additional requirement documented but still, 3rd party serialization crates won't care.For
HashMap
this kind of API works, because everyone in the rust ecosystem agreed onEq, Ord, and Hash, must be equivalent for borrowed and owned value
but this is just a convention. You can easily implementBorrow
&Eq
orHash
for your type which will break theHashMap
. There is no way to express it in type system.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@citizen-stig your example gave me an idea. You tested
Vec<()>
against&[()]
, which have the same encodings in Borsh, so it succeeds. What about arrays, though? Arrays don't have a length marker because there's no need for those, yet theyBorrow
to slices. This test fails:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bkolad I didn't see your comment before posting mine, but yeah basically +1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One possibility would be to extend
StateKeyCodec
and shift the responsibility of ensuring thatK
andQ
serialize to the same byte sequence to the codec author, instead of the caller.Effectively, the default type parameter
Q = K
would be an opt-in mechanism. We'd have to modify this PR to allow only the typesQ
for which the codec is valid:impl<K, V, C, Q> StateMap<K, V, C> where C: StateKeyCodec<K, Q> + StateValueCodec<V>
.Blanket implementations like the following won't support the borrowee pattern, and it'll be up to us to choose when and for which types to add support:
I'd have to think it more through, but I think it would work. Whether or not it's worth it is another story.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bkolad
So
Does not panic for me:
@neysofu , thank you for your example, I can confirm that it does not fetch key correctly =(.
I just want to reiterate my point. I am not trying to prove that ZST types suppose to work. Or that serialization adds extra constraints.
I just want to empathize, that there's undeniably usability problem for people in a simple case of
&Vec<u8>
or&String
. And we have 2 options:In my opinion second option is preferable, even if it is harder.
Yes, people can brake HashMap, and that's fine, because everyday usage is more important, than edge cases.
I think the same way if we push for this convention, it can be easier to use our SDK everyday.
Can you please provide another example, besides ZST, which can brake it?
All types that are used as keys.
Breaking change in serialization library is a big thing on itself. If you have running system and serialization for Vec has changed, you cannot just upgrade.
Anyway.
Other options that we can use:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@citizen-stig we are on the old version of borsh, the check and panic is in the latest
master
.I am not against improving it if we can I just don't see any way.
If @neysofu solution works and we don't complicate the code much then ok but I would avoid relaying on documentation for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try putting something together 👍