Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write down ADR for data serialization within on-chain validator. #147

Closed
wants to merge 1 commit into from

Conversation

KtorZ
Copy link
Collaborator

@KtorZ KtorZ commented Dec 21, 2021

13. Data Serialization Within On-Chain Validators

Date: 2021-12-21

Status

Proposed

Context

In Hydra, during the Close and Contest transitions, one must verify, within on-chain validators, that a certain piece of data has been multi-signed by all head participants. While verifying a multi-signature performed via MuSig2 (which can be made Schnorr-compatible) is relatively easy and can rely on existing Plutus built-ins; producing the payload / pre-image that was signed is problematic for there's no Plutus built-ins regarding data serialization.

Incidentally, event though there exists quite simple and compact (implementation-wise) serialization algorithms (e.g. CBOR), this is path we do not want to follow as there's a high chance to increase the validator size far above an acceptable limit.

Hence, how to obtain arbitrary serialized data within an on-chain validator?

Decision

Overview

In Cardano, transactions may carry information in various ways and in particular, one must provide Plutus data as part of a transaction witness set. Those data are made available to the underlying validator script context as a (key, value) list where keys are data hashes and value the data. It's important to note that the correspondence between a hash and its data is
verified by the ledger during phase-1 validations;

We want to leverage this data lookup table to pass arbitrary data and their corresponding hashes to a validator. This effectively means that we introduce an extra indirection in the redeemer of the Close and Context transition. Indeed, instead of passing the full data as the redeemer, we can only give a hash which can be looked up from the script context to obtain its corresponding data. This can be achieved with the following on-chain function:

import Ledger
import qualified PlutusTx
import qualified PlutusTx.AssocMap as AssocMap

reifyData :: PlutusTx.FromData a => ScriptContext -> DatumHash -> Maybe a
reifyData (ScriptContext info _) h =
  AssocMap.lookup h (AssocMap.fromList (txInfoData info))
  >>= 
  PlutusTx.fromBuiltinData . getDatum
{-# INLINEABLE reifyData #-}

Obstacles

There's a little quirk with this approach unfortunately: the ledger does not allow the presence of extraneous datum in the witness set. In fact, the ledger will fail phase-1 with a NonOutputSupplimentaryDatums error if a transaction include any datum that is neither

a. Required by an input associated to a script address
b. Referenced by an output

Thus, without requiring a hard-fork, we must be careful including an extra output carrying the required datum hash. In the context where we control the underlying wallet, we can rather easily adds this to a change output already fueling the transaction. Note that this barely change anything for a vk output; the datum will simply be ignored by the ledger and not required for spending.

Consequence

  • We can actually write Close and Contest validator

@KtorZ KtorZ requested review from ch1bo and a user December 21, 2021 10:23
@KtorZ KtorZ self-assigned this Dec 21, 2021
@github-actions
Copy link

Unit Test Results

    4 files  ±0    63 suites  ±0   3m 43s ⏱️ +3s
184 tests ±0  182 ✔️ ±0  2 💤 ±0  0 ±0 

Results for commit 5482529. ± Comparison against base commit c62e0bf.

In Cardano, transactions may carry information in various ways and in particular, one must provide Plutus data as part of a transaction witness set. Those data are made available to the underlying validator script context as a (key, value) list where keys are data hashes and value the data. It's important to note that the correspondence between a hash and its data is
verified by the ledger during phase-1 validations;

We want to leverage this data lookup table to pass arbitrary data and their corresponding hashes to a validator. This effectively means that we introduce an extra indirection in the redeemer of the `Close` and `Context` transition. Indeed, instead of passing the full data as the redeemer, we can only give a hash which can be looked up from the script context to obtain its corresponding data. This can be achieved with the following on-chain function:
Copy link
Contributor

@kk-hainq kk-hainq Dec 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what are the requirements of the Close and Context transactions so the question might be stupid... But if the multi-signed data piece isn't duplicated, is representing it as a datum more efficient than a redeemer? Both require the data in the transaction witness set but finding data in the script context is more costly inside validator scripts. Redeemers also don't require storing unused data (datum hash of the multi-signed data piece) in the next UTxO set.

Copy link
Collaborator Author

@KtorZ KtorZ Dec 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not quite sure what you mean by "isn't duplicated"?

The multisigned payloads are produced off-chain, as part of the running head. It is basically what entitles participants to move the contract on-chain. Thus, it is precisely something we cannot store on-chain upfront of course. Nor it is something we can pass as-is for redeemer to the script (because of the surrounding information that come with the multisigned payload).

I would need to double check that but, as far as I remember, the ledger rules also forbid to add supplementary 'unused' redeemer to the script data. Hence the use of a datum for that.

Copy link
Contributor

@kk-hainq kk-hainq Dec 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not quite sure what you mean by "isn't duplicated"?

Sorry for the vague description. By that, I meant redeemers are stored per input (more precisely per pointer as things like minting policies have redeemers too) while datums are stored per unique hash. If the multi-signed data is only used in the validation of one input then the storage in the witness set is roughly the same. Datums are only "lighter" when that same data is used in the validation of multiple inputs. In this context, I guess that the multi-signed data is only used in the validation of its specific head UTxO, hence no storage gains from the datum way.

Nor it is something we can pass as-is for redeemer to the script (because of the surrounding information that come with the multisigned payload).

I'm still confused why you can represent something as a datum but not as a redeemer when building transactions. Aren't they represented in the same Data type at both the ledger and script levels?

I would need to double check that but, as far as I remember, the ledger rules also forbid to add supplementary 'unused' redeemer to the script data. Hence the use of a datum for that.

The redeemer will be used in the validation script for the multi-signature check?

@ch1bo
Copy link
Collaborator

ch1bo commented Dec 22, 2021

LGTM. Concerning the obstacle: isn't there also auxiliary datums which are also hashed into script validity of the body? In any case we should ask the ledger ppl and draw the consequences here. That is, in the worst case, requiring changes in the ledger -> additional scope for babbage hard-fork

Copy link

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure what problem this solves 🤔
Having spent a few hours dealing with reconciling serialisation representations on- and off-chain to validate some output, I have doubts about this approach. It seems to me it adds complexity both on-chain (adding the need to lookup actual data from a hash that's in another txout, so double indirection) and off-chain (adding the need to pack the hash and the datum in some other txout).
On-chain we can always hash, and if we cannot serialise then either we should ask for it to be included as a builtin, or pass additional serialised representation should we need it.
As for all our other ADRs, I think this should be based on a proper experiment demonstrating the validity and generality of the solution.

@KtorZ
Copy link
Collaborator Author

KtorZ commented Dec 22, 2021

@abailly-iohk keep in mind that this proposal's strong hypothesis is that we cannot / do not want to modify the ledger rules. While this discussion could happen with the ledger team, it'll push back likely quite a lot any testnet or mainnet integration.. Hence, in the meantime, this proposal offers "a way" to workaround the issue.

As for, "pass additional serialised representation should we need it", this is precisely what we cannot do 😅! Or more exactly, the on-chain validator must be in a position to verify that the serialised data matches the unserialised content to verify! Otherwise an attacker may close with different UTxO than the one in the signature... So, without re-implementing the serialization logic in the on-chain validator, the only hope I see is to make use of the ledger as a middleman for validation.

@ghost
Copy link

ghost commented Dec 22, 2021

Fair enough. That said, I still want to see a proper spike done before considering adoption this as a guiding principle. As an alternative, which I have worked on this morning, we should consider "serialising" the data we are interested in should it matter. I suspect we might have the problem this is supposed to solve because of current limitations or shortcomings in our implementation, not because there will be a need for it in the long run.

@ch1bo
Copy link
Collaborator

ch1bo commented Dec 29, 2021

Can we document also the alternative? That is, requiring a Plutus language builtin for serializing any ToData

@KtorZ
Copy link
Collaborator Author

KtorZ commented Jan 12, 2022

Closing this as we went for writing on-chain encoders. In the end, this approach would have probably worked if we needed a single hash of a large data-structure, but in practice, we need individual hashes for many tx outs which would be unpractical to all have as independent datums.

@KtorZ KtorZ closed this Jan 12, 2022
@KtorZ KtorZ deleted the KtorZ/ADR-0013 branch January 13, 2022 08:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants