Write down ADR for data serialization within on-chain validator. #147

KtorZ · 2021-12-21T10:23:10Z

13. Data Serialization Within On-Chain Validators

Date: 2021-12-21

Status

Proposed

Context

In Hydra, during the Close and Contest transitions, one must verify, within on-chain validators, that a certain piece of data has been multi-signed by all head participants. While verifying a multi-signature performed via MuSig2 (which can be made Schnorr-compatible) is relatively easy and can rely on existing Plutus built-ins; producing the payload / pre-image that was signed is problematic for there's no Plutus built-ins regarding data serialization.

Incidentally, event though there exists quite simple and compact (implementation-wise) serialization algorithms (e.g. CBOR), this is path we do not want to follow as there's a high chance to increase the validator size far above an acceptable limit.

Hence, how to obtain arbitrary serialized data within an on-chain validator?

Decision

Overview

In Cardano, transactions may carry information in various ways and in particular, one must provide Plutus data as part of a transaction witness set. Those data are made available to the underlying validator script context as a (key, value) list where keys are data hashes and value the data. It's important to note that the correspondence between a hash and its data is
verified by the ledger during phase-1 validations;

We want to leverage this data lookup table to pass arbitrary data and their corresponding hashes to a validator. This effectively means that we introduce an extra indirection in the redeemer of the Close and Context transition. Indeed, instead of passing the full data as the redeemer, we can only give a hash which can be looked up from the script context to obtain its corresponding data. This can be achieved with the following on-chain function:

import Ledger
import qualified PlutusTx
import qualified PlutusTx.AssocMap as AssocMap

reifyData :: PlutusTx.FromData a => ScriptContext -> DatumHash -> Maybe a
reifyData (ScriptContext info _) h =
  AssocMap.lookup h (AssocMap.fromList (txInfoData info))
  >>= 
  PlutusTx.fromBuiltinData . getDatum
{-# INLINEABLE reifyData #-}

Obstacles

There's a little quirk with this approach unfortunately: the ledger does not allow the presence of extraneous datum in the witness set. In fact, the ledger will fail phase-1 with a NonOutputSupplimentaryDatums error if a transaction include any datum that is neither

a. Required by an input associated to a script address
b. Referenced by an output

Thus, without requiring a hard-fork, we must be careful including an extra output carrying the required datum hash. In the context where we control the underlying wallet, we can rather easily adds this to a change output already fueling the transaction. Note that this barely change anything for a vk output; the datum will simply be ignored by the ledger and not required for spending.

Consequence

We can actually write Close and Contest validator

github-actions · 2021-12-21T10:42:51Z

Unit Test Results

    4 files ±0   63 suites ±0 3m 43s ⏱️ +3s
184 tests ±0 182 ✔️ ±0 2 💤 ±0 0 ❌ ±0

Results for commit 5482529. ± Comparison against base commit c62e0bf.

kk-hainq · 2021-12-22T08:18:39Z

docs/adr/0013-data-serialization-within-on-chain-validators.md

+In Cardano, transactions may carry information in various ways and in particular, one must provide Plutus data as part of a transaction witness set. Those data are made available to the underlying validator script context as a (key, value) list where keys are data hashes and value the data. It's important to note that the correspondence between a hash and its data is 
+verified by the ledger during phase-1 validations; 
+
+We want to leverage this data lookup table to pass arbitrary data and their corresponding hashes to a validator. This effectively means that we introduce an extra indirection in the redeemer of the `Close` and `Context` transition. Indeed, instead of passing the full data as the redeemer, we can only give a hash which can be looked up from the script context to obtain its corresponding data. This can be achieved with the following on-chain function:


I don't know what are the requirements of the Close and Context transactions so the question might be stupid... But if the multi-signed data piece isn't duplicated, is representing it as a datum more efficient than a redeemer? Both require the data in the transaction witness set but finding data in the script context is more costly inside validator scripts. Redeemers also don't require storing unused data (datum hash of the multi-signed data piece) in the next UTxO set.

I am not quite sure what you mean by "isn't duplicated"?

The multisigned payloads are produced off-chain, as part of the running head. It is basically what entitles participants to move the contract on-chain. Thus, it is precisely something we cannot store on-chain upfront of course. Nor it is something we can pass as-is for redeemer to the script (because of the surrounding information that come with the multisigned payload).

I would need to double check that but, as far as I remember, the ledger rules also forbid to add supplementary 'unused' redeemer to the script data. Hence the use of a datum for that.

I am not quite sure what you mean by "isn't duplicated"?

Sorry for the vague description. By that, I meant redeemers are stored per input (more precisely per pointer as things like minting policies have redeemers too) while datums are stored per unique hash. If the multi-signed data is only used in the validation of one input then the storage in the witness set is roughly the same. Datums are only "lighter" when that same data is used in the validation of multiple inputs. In this context, I guess that the multi-signed data is only used in the validation of its specific head UTxO, hence no storage gains from the datum way.

Nor it is something we can pass as-is for redeemer to the script (because of the surrounding information that come with the multisigned payload).

I'm still confused why you can represent something as a datum but not as a redeemer when building transactions. Aren't they represented in the same Data type at both the ledger and script levels?

I would need to double check that but, as far as I remember, the ledger rules also forbid to add supplementary 'unused' redeemer to the script data. Hence the use of a datum for that.

The redeemer will be used in the validation script for the multi-signature check?

ch1bo · 2021-12-22T08:49:23Z

LGTM. Concerning the obstacle: isn't there also auxiliary datums which are also hashed into script validity of the body? In any case we should ask the ledger ppl and draw the consequences here. That is, in the worst case, requiring changes in the ledger -> additional scope for babbage hard-fork

ghost

I am unsure what problem this solves 🤔
Having spent a few hours dealing with reconciling serialisation representations on- and off-chain to validate some output, I have doubts about this approach. It seems to me it adds complexity both on-chain (adding the need to lookup actual data from a hash that's in another txout, so double indirection) and off-chain (adding the need to pack the hash and the datum in some other txout).
On-chain we can always hash, and if we cannot serialise then either we should ask for it to be included as a builtin, or pass additional serialised representation should we need it.
As for all our other ADRs, I think this should be based on a proper experiment demonstrating the validity and generality of the solution.

KtorZ · 2021-12-22T13:48:32Z

@abailly-iohk keep in mind that this proposal's strong hypothesis is that we cannot / do not want to modify the ledger rules. While this discussion could happen with the ledger team, it'll push back likely quite a lot any testnet or mainnet integration.. Hence, in the meantime, this proposal offers "a way" to workaround the issue.

As for, "pass additional serialised representation should we need it", this is precisely what we cannot do 😅! Or more exactly, the on-chain validator must be in a position to verify that the serialised data matches the unserialised content to verify! Otherwise an attacker may close with different UTxO than the one in the signature... So, without re-implementing the serialization logic in the on-chain validator, the only hope I see is to make use of the ledger as a middleman for validation.

ghost · 2021-12-22T13:52:16Z

Fair enough. That said, I still want to see a proper spike done before considering adoption this as a guiding principle. As an alternative, which I have worked on this morning, we should consider "serialising" the data we are interested in should it matter. I suspect we might have the problem this is supposed to solve because of current limitations or shortcomings in our implementation, not because there will be a need for it in the long run.

ch1bo · 2021-12-29T09:02:54Z

Can we document also the alternative? That is, requiring a Plutus language builtin for serializing any ToData

KtorZ · 2022-01-12T15:26:52Z

Closing this as we went for writing on-chain encoders. In the end, this approach would have probably worked if we needed a single hash of a large data-structure, but in practice, we need individual hashes for many tx outs which would be unpractical to all have as independent datums.

KtorZ requested review from ch1bo and a user December 21, 2021 10:23

KtorZ self-assigned this Dec 21, 2021

Write down ADR for data serialization within on-chain validator.

5482529

KtorZ force-pushed the KtorZ/ADR-0013 branch from 67ddbf6 to 5482529 Compare December 21, 2021 10:25

kk-hainq reviewed Dec 22, 2021

View reviewed changes

ghost suggested changes Dec 22, 2021

View reviewed changes

KtorZ closed this Jan 12, 2022

KtorZ deleted the KtorZ/ADR-0013 branch January 13, 2022 08:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Write down ADR for data serialization within on-chain validator. #147

Write down ADR for data serialization within on-chain validator. #147

KtorZ commented Dec 21, 2021 •

edited

Loading

github-actions bot commented Dec 21, 2021

kk-hainq Dec 22, 2021 •

edited

Loading

KtorZ Dec 22, 2021 •

edited

Loading

kk-hainq Dec 22, 2021 •

edited

Loading

ch1bo commented Dec 22, 2021

ghost left a comment •

edited by ghost

Loading

KtorZ commented Dec 22, 2021

ghost commented Dec 22, 2021

ch1bo commented Dec 29, 2021

KtorZ commented Jan 12, 2022

Write down ADR for data serialization within on-chain validator. #147

Write down ADR for data serialization within on-chain validator. #147

Conversation

KtorZ commented Dec 21, 2021 • edited Loading

13. Data Serialization Within On-Chain Validators

Status

Context

Decision

Overview

Obstacles

Consequence

github-actions bot commented Dec 21, 2021

Unit Test Results

kk-hainq Dec 22, 2021 • edited Loading

Choose a reason for hiding this comment

KtorZ Dec 22, 2021 • edited Loading

Choose a reason for hiding this comment

kk-hainq Dec 22, 2021 • edited Loading

Choose a reason for hiding this comment

ch1bo commented Dec 22, 2021

ghost left a comment • edited by ghost Loading

Choose a reason for hiding this comment

KtorZ commented Dec 22, 2021

ghost commented Dec 22, 2021

ch1bo commented Dec 29, 2021

KtorZ commented Jan 12, 2022

KtorZ commented Dec 21, 2021 •

edited

Loading

kk-hainq Dec 22, 2021 •

edited

Loading

KtorZ Dec 22, 2021 •

edited

Loading

kk-hainq Dec 22, 2021 •

edited

Loading

ghost left a comment •

edited by ghost

Loading