Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merkleized metadata #78

Merged
merged 23 commits into from
Mar 15, 2024
Merged

Merkleized metadata #78

merged 23 commits into from
Mar 15, 2024

Conversation

bkchr
Copy link
Contributor

@bkchr bkchr commented Feb 22, 2024

No description provided.

@xlc
Copy link
Contributor

xlc commented Feb 27, 2024

Maybe we can have a reference test section with some test data? We already have PoC so it shouldn't be hard to make some just by taking from existing tests.

@bkchr
Copy link
Contributor Author

bkchr commented Feb 27, 2024

Maybe we can have a reference test section with some test data? We already have PoC so it shouldn't be hard to make some just by taking from existing tests.

Can you clarify what you mean by test data? What should this section show?

@xlc
Copy link
Contributor

xlc commented Feb 28, 2024

A random example: https://eips.ethereum.org/EIPS/eip-145#test-cases

In this case, I am expecting something like

Input: An example metadata JSON
Output: A digest hash

@bkchr
Copy link
Contributor Author

bkchr commented Feb 28, 2024

Input: An example metadata JSON

The metadata is multiple 100KB's big. I don't want to paste this into the document 🙈 I mean I see what you want, but I don't really see on what this brings us here?

@josepot
Copy link
Contributor

josepot commented Feb 28, 2024

I have a couple of questions:

  1. In the recent metadata versions (at least in versions 14 and 15), the BitSequence definition looks like this:
struct BitSequence {
    bit_store_type: TypeRef,
    bit_order_type: TypeRef,
}

Which I personally find it quite annoying because the TypeRef of the bit_order_type is always an empty composite, and the only way to infer the least_significant_bit_first field is to look into the path property of said reference, which is quite brittle.

The proposed BitSequence definition of this spec is a lot nicer:

struct BitSequence {
    num_bytes: u8,
    least_significant_bit_first: bool,
}

However, unless I'm mistaken, this change won't make it possible to generate the MetadataDigest using "old" versions of the metadata like v14 and v15, correct?

  1. MetadataDigest has references to the fields: token_symbol and decimals. However, AFAIK those fields are currently not being stored on-chain. So, I'm guessing that this RFC implies that these fields will be stored on-chain moving forward?

If that's the case, wouldn't it be beneficial to also mention in the spec what's the canonical way for accessing those field values?

Furthermore, it would probably be a good idea to state what the canonical way is for accessing the values for the fields spec_name, spec_symbol and base58_prefix, which IIRC they are today present in the constants of the "System" pallet. I'm not sure what the right way to reflect that in the spec is, TBH.

@bkchr
Copy link
Contributor Author

bkchr commented Feb 28, 2024

However, unless I'm mistaken, this change won't make it possible to generate the MetadataDigest using "old" versions of the metadata like v14 and v15, correct?

https://github.com/bkchr/merkelized-metadata/blob/f73707f4a0d0dea8cf2c22f632eed828f244f377/src/from_frame_metadata.rs#L332-L359 this is the implementation. There are no other reasonable values for num_bytes nor least_significant_bit_first.

If that's the case, wouldn't it be beneficial to also mention in the spec what's the canonical way for accessing those field values?

I think the current assumption is that these values are just copied from the chain spec. I mean I left as an unresolved question in the RFC that this isn't really perfect. This goes into the direction of semantic information, which we are not supporting at all. It also ignores chains like asset-hub where you can pay with one of the "unlimited" number of assets. All in all not a great solution. I'm open for suggestions.

Furthermore, it would probably be a good idea to state what the canonical way is for accessing the values for the fields spec_name, spec_symbol and base58_prefix, which IIRC they are today present in the constants of the "System" pallet.

Yeah, as you said spec_name, spec_version and base58_prefix are part of the constants. They are also "real" constants per chain and don't change that much. Not sure we need to mention where to get them from? Especially as these values don't "randomly" change. A wallet could also call Core_version to get the runtime version and then fetch these values. If you think I should mention this, I can add a sentence to the RFC.

@xlc
Copy link
Contributor

xlc commented Feb 28, 2024

No need to put a real metadata. Just a hand crafted minimal one would do.
The idea is it will be used as a test case that future implementations must ensure the outcome always matches. The same reason other spec/eip have test cases.

@bkchr
Copy link
Contributor Author

bkchr commented Mar 1, 2024

There a quite a lot of different types and combinations of them that would be required to make any useful test case. I mean I'm in for ensuring that future implementation are compatible, but we should ensure this by testing these implementation against each.

bkchr and others added 3 commits March 4, 2024 23:30
- Include `type_id` in the `Type` directly.
- Change `variant` to `Compact<u32>`
Co-authored-by: Josep M Sobrepere <jm.sobrepere@gmail.com>
Co-authored-by: Josep M Sobrepere <jm.sobrepere@gmail.com>
@bkchr
Copy link
Contributor Author

bkchr commented Mar 4, 2024

Current implementation: https://github.com/bkchr/merkleized-metadata

struct EnumerationVariant {
name: String,
fields: Vec<Field>,
index: Compact<u32>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a u8, right? Or I'm missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was done on purpose. From time to time the idea pops up to change this index in an enum to a Compact. So, far no one has done this 🙈

For most of the enums this will be u8 any way when using Compact.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good👍

Copy link
Contributor

@SBalaguer SBalaguer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some text edit suggestions.

text/0078-merkleized-metadata.md Outdated Show resolved Hide resolved
text/0078-merkleized-metadata.md Outdated Show resolved Hide resolved
text/0078-merkleized-metadata.md Outdated Show resolved Hide resolved
text/0078-merkleized-metadata.md Outdated Show resolved Hide resolved
text/0078-merkleized-metadata.md Outdated Show resolved Hide resolved
text/0078-merkleized-metadata.md Outdated Show resolved Hide resolved
text/0078-merkleized-metadata.md Outdated Show resolved Hide resolved

The `MetadataDigest` itself is represented as an `enum`. This is done to make it future proof, because a `SCALE` encoded `enum` is prefixed by the `index` of the variant. This `index` represents the version of the digest. As seen above, there is no `index` zero and it starts directly with one. Version one of the digest contains the following elements:

- `type_information_tree_root`: The root of the [Merkleized type information](#merkleizing-type-information) tree.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the link point to "Building the Merkle Tree Root?"

}
```

To begin with, `TypeRef`. This is a unique identifier for a type as found in the type information. Using this `TypeRef`, it is possible to look up the type in the type information tree. More details on this process can be found in the section [Merkleizing type information](#merkleizing-type-information).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above about to which section the link refers to.

bkchr and others added 4 commits March 6, 2024 01:01
Co-authored-by: Santi Balaguer <santiago.balaguer@gmail.com>
Co-authored-by: Santi Balaguer <santiago.balaguer@gmail.com>
bkchr and others added 2 commits March 6, 2024 12:43
Co-authored-by: Santi Balaguer <santiago.balaguer@gmail.com>
text/0078-merkleized-metadata.md Outdated Show resolved Hide resolved
text/0078-merkleized-metadata.md Outdated Show resolved Hide resolved
Co-authored-by: Carlo Sala <carlosalag@protonmail.com>
text/0078-merkleized-metadata.md Outdated Show resolved Hide resolved

The `Hash` is 32 bytes long and `blake3` is used for calculating it. The hash of the `MetadataDigest` is calculated by `blake3(SCALE(MetadataDigest))`. Therefore, `MetadataDigest` is at first `SCALE` encoded, and then those bytes are hashed.

The `MetadataDigest` itself is represented as an `enum`. This is done to make it future proof, because a `SCALE` encoded `enum` is prefixed by the `index` of the variant. This `index` represents the version of the digest. As seen above, there is no `index` zero and it starts directly with one. Version one of the digest contains the following elements:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the enum variant might be encoded as a compact, I'd clarify that it's a u8 in this case (or a compact, I don't really care). Just let's clarify it to make it future-proof

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can not change SCALE that easily any way. So, I would not mention this here yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough 👍🏻

text/0078-merkleized-metadata.md Outdated Show resolved Hide resolved
CompactU16,
CompactU32,
CompactU64,
CompactU128,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we add CompactU256 to make it 100% future-proof? It might happen at some point.

text/0078-merkleized-metadata.md Outdated Show resolved Hide resolved
bkchr and others added 3 commits March 6, 2024 15:18
Co-authored-by: Carlo Sala <carlosalag@protonmail.com>
- `address_ty`: The address type used by the chain.
- `call_ty`: The `call` type used by the chain. The `call` in FRAME based runtimes represents the type of transaction being executed on chain. It references the actual function to execute and the parameters of this function.
- `signature_ty`: The signature type used by the chain.
- `signed_extensions`: FRAME based runtimes can extend the base extrinsic with extra information. This extra information that is put into an extrinsic is called "signed extensions". These extensions offer the runtime developer the possibility to include data directly into the extrinsic, like `nonce`, `tip`, amongst others. This means that the this data is sent alongside the extrinsic to the runtime. The other possibility these extensions offer is to include extra information only in the signed data that is signed by the sender. This means that this data needs to be known by both sides, the signing side and the verification side. An example for this kind of data is the *genesis hash* that ensures that extrinsics are unique per chain. Another example is the *metadata hash* itself that will also be included in the signed data. The offline wallets need to know which signed extensions are present in the chain and this is communicated to them using this field.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the terminology "signed extension" now become "transaction extension" since paritytech/polkadot-sdk#2280


The `TypeDef` variants have the following meaning:

- `Composite`: A `struct` like type that is composed of multiple different fields. Each `Field` can have its own type. A `Composite` with no fields is expressed as primitive type `Void`.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe mention the order of the fields is significant

The `TypeDef` variants have the following meaning:

- `Composite`: A `struct` like type that is composed of multiple different fields. Each `Field` can have its own type. A `Composite` with no fields is expressed as primitive type `Void`.
- `Enumeration`: Stores a `EnumerationVariant`. A `EnumerationVariant` is a struct that is described by a name, an index and a vector of `Field`s, each of which can have it's own type. Typically `Enumeration`s have more than just one variant, and in those cases `Enumeration` will appear multiple times, each time with a different variant, in the type information. `Enumeration`s can become quite large, yet usually for decoding a type only one variant is required, therefore this design brings optimizations and helps reduce the size of the proof. An `Enumeration` with no variants is expressed as primitive type `Void`.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is worth mentioning, as below that each of the Enumeration instances will have the same TypeRef lookup id. This detail helped me understand how this worked.

For Enumerations all variants have the same unique identifier, while they are represented as multiple type information. All variants need to have the same unique identifier as the reference doesn't know which variant will appear in the actual encoded data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean this is explained in Generating TypeRef.

## Future Directions and Related Material

- Does it work with all kind of offline wallets?
- Generic types currently appear multiple times in the metadata with each instantiation. It could be may be useful to have generic type only once in the metadata and declare the generic parameters at their instantiation.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I once tried to do this at the scale-info level, see paritytech/scale-info#12, but abandoned it.

It would be considerably easier to do as a second pass over the already generated type information.

@bkchr
Copy link
Contributor Author

bkchr commented Mar 7, 2024

/rfc propose

1 similar comment
@bkchr
Copy link
Contributor Author

bkchr commented Mar 8, 2024

/rfc propose

@paritytech-rfc-bot
Copy link
Contributor

Hey @bkchr, here is a link you can use to create the referendum aiming to approve this RFC number 0078.

Instructions
  1. Open the link.

  2. Switch to the Submission tab.

  1. Adjust the transaction if needed (for example, the proposal Origin).

  2. Submit the Transaction


It is based on commit hash b42ba43600f1ff18e9bfb74bcd042cdea2c983fb.

The proposed remark text is: RFC_APPROVE(0078,9378ab7a3871f5051e33f54c11f1a75480e5e2b770b7c6c41aea6be87d745200).

Copy link

github-actions bot commented Mar 8, 2024

Voting for this referenda is ongoing.

Vote for it here


- Does it work with all kind of offline wallets?
- Generic types currently appear multiple times in the metadata with each instantiation. It could be may be useful to have generic type only once in the metadata and declare the generic parameters at their instantiation.
- The metadata doesn't contain any kind of semantic information. This means that the offline wallet for example doesn't know what is a balance etc. The current solution for this problem is to match on the type name, but this isn't a sustainable solution.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. My main complaint about cold wallets at this point is that the rendering is sometimes not helpful. If e.g. an account address is rendered as a decimal or hexadecimal array ... that's not helpful in checking the correctness.

Copy link

PR can be merged.

Write the following command to trigger the bot

/rfc process 0x5945e40d7cc049b8176a4de631ab20d7e2bb8ac9576fa4de2e620ec7660a0272

@lolmcshizz
Copy link

/rfc process 0x5945e40d7cc049b8176a4de631ab20d7e2bb8ac9576fa4de2e620ec7660a0272

@paritytech-rfc-bot paritytech-rfc-bot bot merged commit 56aab07 into main Mar 15, 2024
@paritytech-rfc-bot
Copy link
Contributor

The on-chain referendum has approved the RFC.

@bkchr bkchr deleted the bkchr-merkelized-metadata branch March 15, 2024 10:09
@anaelleltd anaelleltd added the Implemented Is merged or live as a feature/service. label Sep 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Implemented Is merged or live as a feature/service.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants