-
Notifications
You must be signed in to change notification settings - Fork 325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CIP-0046? | Merkelised Plutus Scripts #385
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,270 @@ | ||
--- | ||
CIP: ? | ||
Title: Merkelised Plutus Scripts | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To be annoyingly pedantic, I'll point out that the process is named after Ralph Merkle, so it's Merklisation (or Merklization) rather than Merkelisation (which sounds like something from German politics). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I realise this. I had thought of this, but Merklisation looks odd. |
||
Authors: Las Safin <las@mlabs.city> | ||
Status: Draft | ||
Type: Standards | ||
Created: 2022-11-29 | ||
License: <CC-BY-4.0> | ||
--- | ||
|
||
## Abstract | ||
|
||
Currently, the hash of a script is simply the hash of its [serialisation]( | ||
https://github.com/input-output-hk/plutus/blob/a645d1ee0dd5efcd7a7da24678461e07396ad26e/plutus-ledger-api/src/PlutusLedgerApi/Common/SerialisedScript.hs#L88). | ||
This CIP proposes changing this such that the hash of a script (term) | ||
is a function of its immediate children's hashes, forming a Merkle Tree from the AST. | ||
This allows one to shallowly verify a script's hash, and is useful on Cardano, | ||
because it allows scripts to **check that a script hash is an instantiation of a parameterised script**. | ||
|
||
In addition, a `blake2b_224` built-in function must be added. | ||
|
||
This is inspired by [BIP-144](https://github.com/bitcoin/bips/blob/master/bip-0114.mediawiki), | ||
but the motivations are very different. | ||
|
||
## Motivation | ||
|
||
Given some core logic expressible as a script, it is common to have parameters in | ||
the form of constants, e.g. fees, references to other scripts, magical numbers. | ||
|
||
These parameters can either be put in a datum somewhere, or can be put into the | ||
script itself, either by inlining them, or applying the unapplied script to the constants. | ||
|
||
On-chain it is currently hard to check that one script is an applied form | ||
of another script. In cases where that is necessary, datums are instead used. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You make it sound like this is a problem... but this is basically what the datum is for. What I want to know is why that isn't sufficient. |
||
|
||
By Merkelising the hashing, we make this possible, | ||
which unlocks checking that a script is an application of another script to some parameter. | ||
|
||
Example reasons to apply the parameters to the script: | ||
- Staking validators currently don't support datums, and all staking validators | ||
share a single rewards account. Allowing checking applied parameters | ||
makes staking validators much more powerful. (More about this below) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems like the main case where it actually makes a substantial difference, because you don't have a datum. |
||
- Constants can be included in reference script, leading to less CPU and memory used, | ||
since they don't have to be parsed from the adjacent datum (somewhat cheap) | ||
or the script context (very expensive). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This doesn't seem that compelling to me. |
||
- A script address + datum can't fit in an address, | ||
if you want that you also need this (or need to change what an address is). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, but I have elsewhere suggested that we should fix this by extending CIP-13 to include datums. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't see how that affects addresses in the ledger. |
||
|
||
## Specification | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This would also need changes to the ledger spec. At the moment, the ledger doesn't deserialise Plutus scripts at all, it passes them to the evaluator still serialised, and despite this it can still hash them etc, straightforwardly. This CIP would probably require changing that in the spec and the implementation, so that the ledger has deserialized scripts around (one reason for this is that deserialization can fail, whereas hashing cannot). It might be good to have at least a sketch of those changes here. I also don't know whether it violates any principles of the ledger to not have the hash of an item be the hash of its serialised bytes. I think that's true for everything else, it's possible that there's a reason for that (e.g. making it possible to check hashes without having to know the serialization). |
||
|
||
The hash of a script will be derived directly from the AST, rather than its serialisation. | ||
Currently, its formed by hashing the serialisation prefixed with a byte that represents its version, e.g. 0x02 for Plutus V2. | ||
|
||
The hash of a script becomes the hash of the prefix version annotation prepended to the hash of the term. | ||
|
||
[`Term`](https://github.com/input-output-hk/plutus/blob/a645d1ee0dd5efcd7a7da24678461e07396ad26e/plutus-core/untyped-plutus-core/src/UntypedPlutusCore/Core/Type.hs#L69) | ||
currently has 8 constructors. On-chain, annotations are always the unit type, | ||
and are hence ignored for this algorithm. Each case/constructor is generally handled by | ||
hashing the concatenation of a prefix (single byte corresponding to the | ||
constructor index) along with the hashes of the arguments passed to the constructor. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is slightly different to what @kwxm wrote here (https://github.com/input-output-hk/plutus/blob/master/doc/notes/plutus-core/Merklisation/Merklisation-notes.md#modified-merklisation-technique), which I think also included the serialized versions of the nodes in the value that gets hashed. Not sure if that's important, Kenneth do you remember? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure what you mean. It talks about "[serialising] all of the contents of the node into bytestrings", but I think by "contents" I meant all of the fields (things like variable names) except subnodes: you wouldn't serialise those and calculate hashes, but instead recursively apply the Merkle hash process. I think the overall process is basically similar to what's going on here. |
||
|
||
Similar code can be found [in Plutarch](https://github.com/Plutonomicon/plutarch-plutus/blob/95e40b42a1190191d0a07e3e4e938b72e6f75268/Plutarch/Internal.hs#L100) (for a slightly different AST). | ||
|
||
To avoid giving a single script two hashes, | ||
this system must be used (exclusively) since at least a version after Plutus V2. | ||
|
||
The algorithm for checking a script hash against a supplied script (of a new version) | ||
in the ledger will change slightly: rather than hashing the supplied serialised | ||
script directly, the decoding of the serialised script must be hashed. | ||
(NB: the hashing and decoding can be fused to avoid intermediary structures.) | ||
|
||
To allow computing the hash in scripts, we must support `blake2b-224` in Plutus scripts | ||
as it's what is currently used. This algorithm used might change in the future, but that is | ||
not relevant for this CIP. | ||
|
||
### Hashing `Error` | ||
|
||
Since there are no children, the hash of the `Error` term is the | ||
hash of the prefix byte for the `Error` constructor. | ||
You could theoretically choose any random number as the hash, | ||
but it has to be proven to be random, hence hashing the prefix byte | ||
is the best option. | ||
|
||
In pseudocode: `hash prefix` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. At first I found it a little confusing that everything used There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes. |
||
|
||
### Hashing `Builtin`, `Var` | ||
|
||
The hash of a `Builtin` is the hash of the prefix prepended to the base-256 encoded | ||
(i.e. serialised to bytestring) index of the built-in function. | ||
Because there are less than 256 built-ins, this is currently the same | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Less than 256 or less than 257? I think that if we had 256 you could still get away with one byte here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 257 |
||
as hashing the prefix byte prepended to the byte containing the index of | ||
the built-in. | ||
|
||
`Var` is handled the exact same way (with a different prefix), | ||
but it's in this case feasible for the index to be more than 255. | ||
|
||
In pseudocode: `hash $ prefix <> serialiseBase256 index` | ||
|
||
### Hashing `Apply`, `Force`, `Delay` | ||
|
||
These are hashed by hashing the result of prepending the prefix | ||
byte to the concatenation of the hashes of the children. | ||
|
||
In pseudocode: `hash $ foldl' (<>) prefix (hash <$> children)` | ||
|
||
### Hashing `LamAbs` | ||
|
||
This works the exact same way as above, notably, the _name_ is excluded | ||
as it's a constant in the de-Bruijn encoding. | ||
|
||
In pseudocode: `hash $ prefix <> hash body` | ||
|
||
### Hashing `Constant` | ||
|
||
The universe of types used on-chain is always `DefaultUni`. | ||
Each possible data type is handled differently, with each having | ||
a different prefix. The total number of prefixes does not exceed | ||
255. If it did, the prefix would have to be increased to two bytes. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it 255 or 256? I think any unsigned byte is a valid prefix, but I could be wrong. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You are right. I'm dumb. |
||
|
||
In addition: | ||
Negative integers and non-negative integers have separate prefixes. | ||
False and True also have separate prefixes. | ||
|
||
#### Hashing non-negative integers | ||
|
||
The serialisation according to [CIP-58](https://github.com/cardano-foundation/CIPs/blob/a1b9ff0190ad9f3e51ae23e85c7a8f29583278f0/CIP-%3F/README.md#representation-of-builtininteger-as-builtinbytestring-and-conversions), | ||
prefixed with the two-byte prefix, is hashed. | ||
|
||
In pseudocode: `hash $ prefix <> prefix' <> serialiseCIP58 n` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What's going on here? Is it that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh yes, I guess that's what the sentence on line 121 means. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, I think this is a mistake. This is a previous scheme I had, but there's no reason not to collapse it into one byte. |
||
|
||
#### Hashing negative integers | ||
|
||
The same algorithm as above is used, but the number hashed is `1 - n`. | ||
|
||
In pseudocode: `hash $ prefix <> prefix' <> serialiseCIP58 (1 - n)` | ||
|
||
#### Hashing bytestrings | ||
|
||
The bytestring is hashed as-is. | ||
We use the blake2b-256 hash here, such that we can usefully check that | ||
the script refers to a bytestring that we know only the hash of. | ||
|
||
In pseudocode: `hash $ prefix <> blake2b_256 bs` | ||
|
||
#### Hashing strings | ||
|
||
The flat-encoding is hashed. | ||
|
||
In pseudocode: `hash $ prefix <> flat x` | ||
|
||
#### Hashing lists, pairs | ||
|
||
Lists and pairs are hashed like a Merkle tree, | ||
much the same way that terms are. | ||
The children have a known type, and are hashed according to how that | ||
type should be hashed, i.e. with the correct algorithm and prefix. | ||
|
||
In pseudocode: `hash $ foldl' (<>) prefix (hash <$> children)` | ||
|
||
#### Hashing `()`, `False`, `True` | ||
|
||
Each has its own separate prefix, like `Error`, hence: | ||
|
||
In pseudocode: `hash prefix` | ||
|
||
#### Hashing `Data` | ||
|
||
The `CBOR` encoding is used, notably, it must be compatible with the `serialiseData` | ||
built-in to be useful on-chain. | ||
We use the blake2b-256 hash here, such that we can usefully check that | ||
the script refers to a datum that we know only the hash of. | ||
If the hashing algorithm for data changes, we must also change it here. | ||
|
||
In pseudocode: `hash $ prefix <> blake2b_256 (serialiseData d)` | ||
|
||
## Rationale | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We might need some discussion of the cost of this kind of hashing. Our experiments suggested it was ~10x more expensive (https://github.com/input-output-hk/plutus/blob/master/doc/notes/plutus-core/Merklisation/Merklisation-notes.md#the-cost-of-calculating-merkle-hashes), unclear if this will have a meaningful impact but it might. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that the potential cost of this is my main concern. Calculating the hash involves traversing the entire AST (although as the document points out it can be fused with the deserialisation process), but also calling the underlying hash function(s) at every node, which could become expensive compared with just feeding the serialised script directly to a hashing function in one go. I'd really like to see some figures for this: it's conceivable that computing the Merkle hash might be more expensive than executing the actual scripts, and that might make this proposal impractical. The estimates from our earlier experiments (which were maybe three years ago) were entirely theoretical though, and things have changed a lot since we did that: for one thing, we're using |
||
|
||
Given this minor change, we can now check that one script is the application of another script. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I think you underestimate how much work this would be 😅 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Quite possibly. |
||
Concretely, given hash `script`, hash `original`, parameter `d` (as data), | ||
intermediate hashes `h0`, `h1`, hashing prefixes `ver_prefix`, `app_prefix`, `const_prefix`, we check: | ||
``` | ||
script == blake2b_224 $ ver_prefix <> h0 | ||
h0 == blake2b_224 $ app_prefix <> original <> h1 | ||
h1 == blake2b_224 $ const_data_prefix <> blake2b_256 (serialiseData d) | ||
``` | ||
|
||
We essentially open the Merkle tree commitment partially and check that the supplied path is correct. | ||
|
||
### Relation with CIP-58 | ||
|
||
This CIP does not _depend_ on CIP-58, but to hash integers on-chain | ||
the way it's done here, CIP-58's integer-to-bytestring serialisation built-in | ||
must be available in Plutus. | ||
|
||
### Relation with BIP-144 | ||
|
||
BIP144 uses this trick to avoid submitting the parts of the script that aren't used. | ||
Given that reference scripts are common in Haskell, this isn't a big win for efficiency, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Not sure what this means, do you just mean "Given that Cardano supports reference scripts"? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. brainfart |
||
but it might be worth implementing for the sake of scripts used only once. | ||
This CIP however doesn't require that that be implemented. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We looked into MAST during the development of Plutus Core, but we concluded that it wasn't worth it because the size of the hashes corresponding to omitted subtrees cancelled out the saving from omitting the subtree. You can read some notes on it here: https://github.com/input-output-hk/plutus/tree/master/doc/notes/plutus-core/Merklisation There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, we tried a very similar Merklisation scheme, but for different reasons. We were looking at ways to reduce script sizes and the idea of using Merklisation to let us omit unused parts of the AST in fully applied validators seemed promising. It turned out that that involved replacing subtrees of the AST with hashes which were large (32 bytes) and incompressible, and that meant that we couldn't get any worthwhile size reductions, so we abandoned that idea. However that was for an entirely different purpose, so I don't think it's too relevant here. |
||
|
||
The argument for privacy doesn't apply, private smart contracts can be achieved through | ||
the use of non-interactive zero-knowledge probabilistic proofs. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not today they can't. So I think it is still quite relevant. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wdym? They definitely can once we have at least bitwise primitives. |
||
|
||
### Reference scripts | ||
|
||
Currently, different instances of the same script will need their own reference inputs | ||
since their hashes are different. It seems feasible to allow sharing of a single reference script, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ... or they can put them in the datum? |
||
given the parameters and language version as witnesses, but given the complexity | ||
involved, it is not specified in this CIP. | ||
|
||
### Staking | ||
|
||
This makes staking validators much more powerful, since a single protocol can | ||
now manage many rewards accounts (by instantiating the script with a numeric identifier). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please can you write out this use case in more detail. You've alluded to it a few times but I'd really like to see more detail because I'm not familiar with it and I'm trying to back-infer the actual details, probably wrongly. And it seems to be the load-bearing example here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will do |
||
However, it is arguably not the optimal solution due to the reference | ||
script problem described above. Even if the reference script problem | ||
is solved as described above, it seems logical to allow supplying a datum | ||
to a staking validator, or somehow combining the payment address and staking address for scripts, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The problem with supplying a datum to anything is where does the datum live? For a validator script it lives on the output. Where could it live for a staking validator? If we can come up with a sane answer to that, then in principle we could just give staking validators datums. |
||
and using the same datum for both, while somehow solving the separate accounts problem. | ||
|
||
Given the heavy complexity of fixing staking validators, Merkelising script hashing seems much more feasible. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's not really clear to me that it's complex, just that we don't have a design right now. |
||
|
||
### Alternatives | ||
|
||
#### Parameterised Reference Scripts | ||
|
||
See https://github.com/cardano-foundation/CIPs/pull/354. | ||
|
||
Seemingly, Merkelisation is a less invasive and possibly cleaner change. | ||
|
||
#### Changing how constants are hashed | ||
|
||
The hashing of constants might not have a clear best solution, but most notably, | ||
it is not clear how much/less to Merkelise the hashing. | ||
E.g., the hashing of data itself could be Merkelised. This is not done in this CIP. | ||
The hashing of a `Data` constant could also prepend the prefix directly to the serialisation, | ||
rather than to the hash of the `Data`. It is not clear what is best. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think stopping the merkelization at the constants is the right place. |
||
|
||
##### Hashing strings, lists, pairs differently | ||
|
||
Strings are not very useful in Plutus. | ||
Hence, the hashing algorithm for them isn't optimised such that | ||
they can be easily verified. | ||
|
||
Strings have essentially no purpose on-chain, since they're only used | ||
for tracing, which should not be used in production. | ||
|
||
In the context of checking applied parameters, it is likely that only | ||
`Data`, `Integer`, `Bool`, `ByteString`, will be used as parameters, | ||
since they cover all useful behaviour in an efficient way. | ||
If you want to parameterise your script by a pair of integers, | ||
it is likely best to unwrap that into two separate integer parameters | ||
for the sake of efficiency of _running_ the script, which is likely | ||
to be more common that checking the parameters. | ||
|
||
Built-in lists and pairs are not commonly used as parameters, but it's plausible | ||
that they might still be the most efficient method in some scenarios. | ||
Hence, they have been included. | ||
They use Merkle-tree hashing since that's the simplest and most useful in this case. | ||
|
||
## Path to Active | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this should have some Acceptance Criteria a la the new CIP-001. Perhaps:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This seems reasonable to me, but calculating a few hashes (see example pseudocode) is well within the budget last time I checked. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Is that really true? If you're referring to the pseudocode here (under Rationale), then you need the hashes There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
|
||
### Implementation plan | ||
|
||
Las Safin will implement this if IOG don't have time. | ||
|
||
## Copyright | ||
|
||
This CIP is licensed under CC-BY-4.0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.