Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement stable type id #182

Open
xlc opened this issue Jun 15, 2023 · 6 comments
Open

Implement stable type id #182

xlc opened this issue Jun 15, 2023 · 6 comments

Comments

@xlc
Copy link
Contributor

xlc commented Jun 15, 2023

Something similar to TypeId but that's stable. i.e. It never changes as long as the shape of the type renames the same.

This will be useful to detect unexpected scale codec breaking change, implement safe any type map, etc.

Basically this should be a deterministic secure hash of the Type without path.

@jsdw
Copy link
Contributor

jsdw commented Jun 15, 2023

Just to note that we do such hashing in Subxt already outside of scale-info (for the specific purpose of checking that subxt codegen aligns with the current node metadata:

https://github.com/paritytech/subxt/blob/f8b1b2bf945518e75a7f4d81df0c33d8537c0872/metadata/src/utils/validation.rs#L178

In order to hash types in this way, one needs to iterate through them once the type registry has been built, so I wonder whether there is any utility in adding this logic to scale-info itself rather than some logic on top?

Also, funnily enough we have a PR open for a diff command in our CLI tool to spot potentially breaking changes between metadatas/nodes that may interest you; it makes use of this a little since we can get "stable" hashes of pallets and calls etc already:

paritytech/subxt#1015

@xlc
Copy link
Contributor Author

xlc commented Jun 15, 2023

One more requirement I would like to add: I want to use this in wasm runtime so it should be efficient. It will be great if the derive macro can just generate the hash at compile time and expose it as a const.

@jsdw
Copy link
Contributor

jsdw commented Jun 21, 2023

I think I may have misunderstood this before, because there are (at least) two ways one could hash types:

  1. We look at the TypeInfo definition directly (which can have generic params in) and create a hash based on the shape of that. (one could backward-compatibly add eg a type_hash() -> Option[u8; 32]> type function to TypeInfo which is None by default. Or one could make every type return a hash, but by default the hash is calculated at runtime unless the user (or derive macro) overrides the type_hash() function with a pre-computed output).
  2. We look at the concrete types stored in the type registry (where each type has a numeric ID already which corresponds to it). Generic params now point to specific type IDs, and each concrete instantiation (eg Option<u8> and Option<bool> would have a different type ID. This can't be done in the derive macro, and requires the type registry to already exist (this is the version we do in Subxt; it allows us to eg compare eg types of calls and storage entries and see what has changed between runtime updates).

Are you looking for something more like 1? Also, out of curiosity because I want to understand how it would be used better; do you have any examples for things you want to do with this hash?

@xlc
Copy link
Contributor Author

xlc commented Jun 21, 2023

I am not 100% sure what’s my requirements so let me give you my use case and see if we can come up something together.

I have this idea when working on open-web3-stack/open-runtime-module-library#927

Essentially I want a AnyMap that the value can be any type and yet I want some type safety. The normal Rust solution will be using std::TypeId but it could change across compiler version and therefore not suitable for persistent.

So I want a stable TypeId and use it as the key for this storage map. This ensures I can always decode the value into the correct type.

Another potential use case is store the type of along with encoded data and be able to detect unexpected format breaking change or bad decode at runtime.

@jsdw
Copy link
Contributor

jsdw commented Jun 21, 2023

Aah ok, I think I see! To re-iterate (just to make sure I do understand):

  • You want to be able to have a map from "Type ID" to value in storage. Let's just call it an AnyMap.
  • This is runtime code and so needs to be fairly efficient (ie pre-computed IDs ideally).
  • Given the Type ID, you'd want to be able to retrieve the SCALE TypeInfo so that it can be decoded I guess? (note; this would allow you to decode the type into something dynamic like a scale-value::Value but not something static unless you know ahead of time what you'll decode to and can just use the TypeInfo to check that it lines up).
  • The Type ID must be consistent across runtime updates and compiler versions (this is tough because the type registry stored in metadata is just a vec of types, and across updates the positions of types in it can (and are likely to) change).

Similar limitations to the rust AnyMap would apply; you already need to know the Type ID (ie the concrete type you're trying to decode the value into or encode it from) up front.

I realise I was a little wrong above; I imagine that it would actually be possible to write a function that takes some arbitrary type which implements TypeInfo and gives back a consistent hash type Id for it. I'm not sure whether that sort of thing should actually be a part of scale-info though, mainly because:

  • Making it a required function/constant of the TypeInfo trait is tough; how would people know what hash their type equals when they impl TypeInfo on a new type? (I see such hashing as instead building on top of TypeInfo).
  • How could we pre-compute hashes for types with generic params in? Each instantiation of that type eg Vec<u8>, Vec<u16> etc would need a different hash, so they'd need computing at runtime I think.

I think instead, I'd keep this separate from TypeInfo and instead build on it, so the approach I'd take is something like building a new crate that at its core has something like:

fn get_type_hash<T: TypeInfo>() -> [u8; 32] {
   // ... actual hashing logic here
}

trait TypeInfoHash: TypeInfo { 
    fn type_hash() -> [u8; 32] {
        // default impl gets hash at runtime:
        get_type_hash()
    }; 
}
  • Now, any built-in types without generics could have a hardcoded hash based on their TypeInfo (not sure how best to hard code these, but it's possible to somehow).
  • Any types with generics would need to compute a hash at runtime depending on the generic param, so they couldn't have this hard-coded hash.
  • A derive macro optionally could be written too to derive hashes for custom types based on the TypeInfo for those types. Again, it could only hardcode hashes for types without generics in their TypeInfo.

It would be quite a lot of work I think, but it's all work that would need doing one way or the other anyway; nothing is really saved by doing it in the scale-info crate.

@xlc
Copy link
Contributor Author

xlc commented Jun 22, 2023

I like your idea. For generic types, I guess we could do something like:

hash(struct Foo<T1, T2>(T t, T2)) = hash(hash(Foo<_1, _2>) ++ hash(T1) ++ hash(T2))

To calculate hash of Foo<_1, _2>, we simply replace reference of the T1 or T2 to a generic parameter index.

In this way, we can calculate the hash of generic types at runtime relatively efficient because hash(T1) should be a constant (provided T1 does not have generic type parameter).

But yeah this is indeed a lot more complicated than my initial estimation and I have found another solution to my original problem so I don't need this right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants