From 08aafcf6565952411379fe81a626777260782fc7 Mon Sep 17 00:00:00 2001 From: Michael Peyton Jones Date: Mon, 29 Nov 2021 14:29:05 +0000 Subject: [PATCH 1/4] Initial version of CIP-32 --- CIP-0032/README.md | 129 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 129 insertions(+) create mode 100644 CIP-0032/README.md diff --git a/CIP-0032/README.md b/CIP-0032/README.md new file mode 100644 index 000000000..333f6f04a --- /dev/null +++ b/CIP-0032/README.md @@ -0,0 +1,129 @@ +--- +CIP: 32 +Title: Inline datums +Authors: Michael Peyton Jones +Comments-Summary: No comments +Comments-URI: +Status: Draft +Type: Standards Track +Created: 2021-11-29 +License: CC-BY-4.0 +--- + +# Inline datums + +## Abstract + +We propose to allow datums themselves to be attached to outputs instead of datum hashes. +This will allow much simpler communication of datum values between users. + +## Motivation + +Conceptually, datums are pieces of data that are attached to outputs. +However, in practice datums are implemented by attaching _hashes_ of datums to outputs, and requiring that the spending transaction provides the actual datum. + +This is quite inconvenient for users. +Datums tend to represent the result of computation done by the party who creates the output, and as such there is almost no chance that the spending party will know the datum without communicating with the creating party. +That means that either the datum must be communicated between parties off-chain, or communicated on-chain by attaching it to the transaction that creates the output (which is also inconvenient in that the spending party must watch the whole chain to spot it). + +It would be much more convenient to just put the _datum itself_ in an output, which is what we propose. + +### Use cases + +We expect that, provided we are able to bring the cost low enough, a large proportion of dapp developers will make use of this feature, as it will simplify their systems substantially. + +## Specification + +Transaction outputs are changed so that the datum field can contain either a hash or a datum (an "inline datum"). + +The min UTXO value for an output with an inline datum depends on the size of the datum, following the `coinsPerUTxOWord` protocol parameter. + +When an output with an inline datum is spent, the spending transaction does not need to provide the datum itself. + +### Script context + +Scripts are passed information about transactions via the script context. +The script context therefore needs to be augmented to contain information about inline datums. + +Changing the script context will require a new Plutus language version in the ledger to support the new interface. + +There are two changes in the new version of the interface: +- The datum field on transaction outputs can either be a hash or the actual datum. +- The datum field on transaction inputs can either be a hash or the actual datum. + +Old versions of the language will retain the old interface, but the construction of the script context in the old interface is also changed as follows: if either an input or an output has an inline datum, we replace it by its hash, and put the datum in the existing datum-hash-to-datum mapping. + +### CDDL + +The CDDL for transaction outputs will change as follows to reflect the new alternative. +``` +transaction_output = + [ address + , amount : value + , ? datum : $hash32 / plutus_data + ] +``` +TODO: should there be a dedicated production for datum-hash-or-datum? Does it need to be tagged? + +## Rationale + +The key idea of this proposal is simply to restore the conceptually straightforward situation where datums are attached to outputs. +Historically, this was the way that the EUTXO model was designed, and switching to datum hashes on outputs was done to avoid bloating UTXO entries, which at that time (pre-multiasset) were constant-size (see [1] page 7). + +Now that we have variable-sized UTXO entries and the accounting to support them, we can restore inline datums. + +Since inline datums change very little about the model apart from where data is stored, we don't need to worry about violating any of the other requirements of the ledger, but we do need to worry about the effect on the size of the UTXO set. + +### UTXO set size + +This proposal gives users a way to put much larger amounts of data into the UTXO set. +Won’t this lead to much worse UTXO set bloat? + +The answer is that we already have a mechanism to discourage this, namely the minimum UTXO value. +If inline datums turns out to drive significantly increased space usage, then we may need to increase `coinsPerUTxOWord` in order to keep the UTXO size down. +That will be costly and inconvenient for users, but will still allow them to use inline datums where they are most useful and the cost is bearable. +Furthermore, we hope that we will in fact be able to _reduce_ `coinsPerUTxOWord` when the upcoming work on moving the UTXO mostly to on-disk storage is complete. + +Another guard rail would be to enforce upper limits on the size of inline datums. +At the extreme, we could bound them to the size of a hash, which would guarantee no more space usage than today. +However, this is much worse for users, since it introduces a sharp discontinuity where an inline datum is entirely acceptable, until it crosses the size threshold at which point it is unacceptable, and there is no way to avoid this. +Generally we prefer to avoid such discontinuities in favour of gradually increasing costs. + +In practice, what is implemented here may depend on whether the UTXO-on-disk work is completed by the time that this proposal is implemented. + +### Other modes of specifying datums + +We could deprecate the other methods of specifying datums (datum hashes or datum hashes+optional datums). +However, the other approaches also have some advantages. + +- Transmission costs: creator pays versus consumer pays + - Inline datums: creator pays + - Datum hashes: consumer pays + - Datum hashes+optional datums: both pay +- Min UTXO value costs + - Inline datums: depends on data size + - Datum hashes: fixed cost + - Datum hashes+optional datums: fixed cost +- Privacy + - Inline datums: datum is immediately public + - Datum hashes: datum is not public until consumed + - Datum hashes+optional datums: datum is immediately public, but only to chain-followers +- Communication of datums + - Inline datums: easy on-chain communication + - Datum hashes: off-chain communication necessary + - Datum hashes+optional datums: complicated on-chain communication + +Any one of these factors could be important to particular use cases, so it is good to retain the other options. + +### Changing the script context + +We don't strictly need to change the script context. +We could use the fall-back approach that we use for the old interface, namely continuing to pretend that datums are always hashes, and providing the mapping from hashes to datums. +This would be functionally equivalent. + +The only advantage of changing it is that it's more honest to the real representation of transactions, and therefore allows scripts to insist on inline datums if they want. +It's unclear whether this is desirable. + +## References + +[1]: Chakravarty, Manuel MT, et al. "The extended UTXO model." From 3860052ccf0c89d2fceefb735ee4bbd71954a609 Mon Sep 17 00:00:00 2001 From: Michael Peyton Jones Date: Wed, 26 Jan 2022 15:02:31 +0000 Subject: [PATCH 2/4] Revise discussion of script context --- CIP-0032/README.md | 51 +++++++++++++++++++++++++++++++++++++++------- 1 file changed, 44 insertions(+), 7 deletions(-) diff --git a/CIP-0032/README.md b/CIP-0032/README.md index 333f6f04a..c4453744c 100644 --- a/CIP-0032/README.md +++ b/CIP-0032/README.md @@ -51,7 +51,7 @@ There are two changes in the new version of the interface: - The datum field on transaction outputs can either be a hash or the actual datum. - The datum field on transaction inputs can either be a hash or the actual datum. -Old versions of the language will retain the old interface, but the construction of the script context in the old interface is also changed as follows: if either an input or an output has an inline datum, we replace it by its hash, and put the datum in the existing datum-hash-to-datum mapping. +The interface for old versions of the language will not be changed, and scripts with old versions will not be able to be spent in transactions that include inline datums. ### CDDL @@ -115,14 +115,51 @@ However, the other approaches also have some advantages. Any one of these factors could be important to particular use cases, so it is good to retain the other options. -### Changing the script context +### The script context -We don't strictly need to change the script context. -We could use the fall-back approach that we use for the old interface, namely continuing to pretend that datums are always hashes, and providing the mapping from hashes to datums. -This would be functionally equivalent. +#### Including information about inline datums -The only advantage of changing it is that it's more honest to the real representation of transactions, and therefore allows scripts to insist on inline datums if they want. -It's unclear whether this is desirable. +In principle we do not need to let scripts see whether a datum is inline or not. +We could pretend that inline datums are non-inline and insert them into the datum witness map. + +The underlying question is: do we want scripts to be able to make assertions about whether datums are inline or not? +There are reasons to want to do this: _not_ using inline datums causes communication issues for users, and so it is quite reasonable that an application developer may want to be able to enforce their use. + +Furthermore, as a general principle we try to keep the script context as faithful to real transactions as possible. +Even if the use for the information is not immediately obvious, we try to err on the side of providing it and letting users decide. + +Hence we _do_ include information about inline datums in the script context. + +#### Representation and backwards compatibility + +There are a couple of options for how to change the representation of the script context to include the new information, and whether to make a backwards compatibility effort for old language versions. + +For the new script context: + +1. Match the ledger representation as much as possible: change the fields on inputs and outputs to be either a datum hash or a datum. +2. Try to only have one way to look up datums: put inline datums in the datum witness map and insert their hashes into the corresponding inputs and outputs; optionally add a boolean to inputs and outputs to indicate whether the datum was originally inline. + +For backwards compatibility: + +1. Don't try to represent inline datums for scripts using old language versions: old scripts simply can't be run in transactions that use inline datums (since we can't represent the information). +2. Rewrite inline datums as non-inline datums: put inline datums in the datum witness map and insert their hashes into the corresponding inputs and outputs. + +For the new script context, option 1 has the significant advantage of matching the ledger representation of transactions. +This makes it easier to implement, and also avoids conceptual overhead for users who would have to distinguish the two ways of representing transactions. +While the conceptual distance here may be small, if we let it grow over time then it may become quite confusing. + +We then have the choice of what to do about backwards compatibility. +Option 2 would work, but is more complicated for the ledger and is inconsistent in representation with our choice for the new script context (inline datums are sometimes represented faithfully, and sometimes put in the witness map). +Option 1 is simple, but doesn't allow old scripts to work with inline datums. +This would not be so bad if it just meant that old scripts could not be spent in transactions that include inline datums, but it also introduces a new way to make an unspendable output. +If a user creates an output with an old script and an inline datum, then any transaction spending that output will include an inline datum, which we would not allow. + +Unfortunately, we cannot prevent users from creating such outputs in general, since script addresses do not include the script language, so the ledger cannot tell whether the inline datum is permissible or not. +Client code typically _will_ be able to do this, since it will usually know the script. + +The mitigating factor is that we expect this to be uncommon in practice, particularly since we expect that most users will move to the new version relatively quickly, since we expect support for inline datums to be desirable, and released alongside other desirable features. + +Hence we choose both option 1s and do _not_ provide backwards compatibility for old language versions. ## References From fa17b3b5425c7866fb9a0577b9651cd3361248cd Mon Sep 17 00:00:00 2001 From: Michael Peyton Jones Date: Thu, 27 Jan 2022 14:33:01 +0000 Subject: [PATCH 3/4] Be more explicit about failure --- CIP-0032/README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/CIP-0032/README.md b/CIP-0032/README.md index c4453744c..996371478 100644 --- a/CIP-0032/README.md +++ b/CIP-0032/README.md @@ -51,7 +51,8 @@ There are two changes in the new version of the interface: - The datum field on transaction outputs can either be a hash or the actual datum. - The datum field on transaction inputs can either be a hash or the actual datum. -The interface for old versions of the language will not be changed, and scripts with old versions will not be able to be spent in transactions that include inline datums. +The interface for old versions of the language will not be changed. +Scripts with old versions cannot be spent in transactions that include inline datums, attempting to do so will be a phase 1 transaction validation failure. ### CDDL From be593e481a58a5b4731a0f7aa8a232af4bf4441c Mon Sep 17 00:00:00 2001 From: Michael Peyton Jones Date: Fri, 4 Feb 2022 11:33:11 +0000 Subject: [PATCH 4/4] Clarify reference to optional datums --- CIP-0032/README.md | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/CIP-0032/README.md b/CIP-0032/README.md index 996371478..3751d7bc4 100644 --- a/CIP-0032/README.md +++ b/CIP-0032/README.md @@ -24,7 +24,8 @@ However, in practice datums are implemented by attaching _hashes_ of datums to o This is quite inconvenient for users. Datums tend to represent the result of computation done by the party who creates the output, and as such there is almost no chance that the spending party will know the datum without communicating with the creating party. -That means that either the datum must be communicated between parties off-chain, or communicated on-chain by attaching it to the transaction that creates the output (which is also inconvenient in that the spending party must watch the whole chain to spot it). +That means that either the datum must be communicated between parties off-chain, or communicated on-chain by including it in the witness map of the transaction that creates the output ("extra datums"). +This is also inconvenient for the spending party, who must watch the whole chain to spot it. It would be much more convenient to just put the _datum itself_ in an output, which is what we propose. @@ -94,25 +95,25 @@ In practice, what is implemented here may depend on whether the UTXO-on-disk wor ### Other modes of specifying datums -We could deprecate the other methods of specifying datums (datum hashes or datum hashes+optional datums). +We could deprecate the other methods of specifying datums (datum hashes or datum hashes+extra datums). However, the other approaches also have some advantages. - Transmission costs: creator pays versus consumer pays - Inline datums: creator pays - Datum hashes: consumer pays - - Datum hashes+optional datums: both pay + - Datum hashes+extra datums: both pay - Min UTXO value costs - Inline datums: depends on data size - Datum hashes: fixed cost - - Datum hashes+optional datums: fixed cost + - Datum hashes+extra datums: fixed cost - Privacy - Inline datums: datum is immediately public - Datum hashes: datum is not public until consumed - - Datum hashes+optional datums: datum is immediately public, but only to chain-followers + - Datum hashes+extra datums: datum is immediately public, but only to chain-followers - Communication of datums - Inline datums: easy on-chain communication - Datum hashes: off-chain communication necessary - - Datum hashes+optional datums: complicated on-chain communication + - Datum hashes+extra datums: complicated on-chain communication Any one of these factors could be important to particular use cases, so it is good to retain the other options.