Logbook 2021 H2

December 2021

2021-12-30

Pairing session (SN)

Add a mutator to closeTx which changes the snapshot number without changing the signature -> tests fail sometimes.
We added labels to locate which mutation failed -> red bin: do this properly, it feels hacked and in a way we should have a sum type enumerating close-specific mutations
Correctly discard the healthy case when mutating the snapshot number -> tests pass always.
Add a third close mutator to improve the implementation: changing both, the signature and snapshot number to a valid but unexpected value
- First we did this using the closeRedeemer smart constructor, thus always getting well-formed snapshot numbers
- Later we deliberately not used closeRedeemer to test using the "on-chain type", that is (still) a Bytestring
- For example, mutating this to something not resembling a serialized integer (but still a valid signature)
Then we changed the snapshot number back to a Integer, as we will be wanting to check > 0 later
When snapshotNumber is an Integer, we need to serialize and hash on-chain to verify a signature
- We implemented a basic Integer-to-CBOR-encoder or rather Natural-to-CBOR as it will error on negative numbers
- The fact that negative Integers error, made our tests pass as ill-formed values (< 0) were not valid, despite correct signatures
Discussion on what we eventually will need on-chain with the realization that ultimately we only will ever need to serialize/hash Integers and Hashes for closeTx, but TxOut for fanoutTx

Pairing session (MB)

We continued working on the close tx validator, starting with a first observation: there was a discrepancy between the on-chain and off-chain representations of a snapshot number (bytestring vs natural). Since any positive number is actually a potentially valid snapshot number, we created a mutation to generate negative number as snapshot numbers. As a consequence and to cope with the now failing test, we changed the on-chain representation to an Integer (there's no Natural available in Plutus!) and we wrote a basic CBOR encoder for unsigned integers (to match the off-chain signable representation).
From there, we discussed whether this approach could indeed be used in the long-run. Writing a CBOR encoder for unsigned integer is quite trivial and does not require much code. While this is mostly sufficient for the close tx (which only requires to serialize the snapshot number), it isn't for the fanout. In its simplest form (i.e. no split, full UTXO fits in the transaction), it is necessary for the validator to verify that the output UTXO does indeed match whatever UTXO's hash was specified during the close and stored in the state-machine datum. This could potentially be addressed by #147. However, when we consider the realistic form of the fanout which will likely require splitting UTXO into sub-utxo, we will need to:
- (a) Split the UTXO into structured subsets;
- (b) Prove inclusion of a subset into the bigger set.
In the paper, this is achieved via Merkle-Patricia-Trees, but as discussed previously, in the coordinated form it can also be achieved with much simpler Merkle-Trees. This still means that we will need, eventually, to construct a hierarchical structure of hashes; where signable representations are Merkle nodes where leaves are transaction output. Unfortunately, we can't really use the trick described in #147 in this case because it would require one extra datum PER TXOUT. Thus, without added builtins in Plutus, we are left with no choice than finding some on-chain signable representation of TxOut. Could this be CBOR in the same way we approached the signable representation of snapshot numbers? Maybe. We would need to write a CBOR for:
- non-negative integers (CBOR type-00)
- negative integers (CBOR type-01)
- bytestrings (CBOR type-02)
- (finite) lists (CBOR type-04)
- (finite) dictionnaries (CBOR type-05)
Note that, this can be quite straightforward (e.g. https://github.com/elm-toulouse/cbor/blob/master/src/Cbor/Encode.elm) so it may not be a bad idea. Possibly as an independent "Plutus library". We should also probably start a discussion with the ledger and Plutus team with regards to including CBOR-serialization of builtins types as builtin.

2021-12-29

Pairing session

Discussed the ADR13 candidate:
- There was an "optional datum" field in Alonzo before, not in the final spec though.
- If we can get the ledger team to drop https://github.com/input-output-hk/cardano-ledger/blob/70cfbf9be79533a6d1b2ff446567f5b78bf945aa/eras/alonzo/impl/src/Cardano/Ledger/Alonzo/Rules/Utxow.hs#L290-L301, this approach would be less hacky.
- We should write down the alternative: Adding a serialization (+ hashing) builtin to plutus.
Reviewed open PRs and what had been merged to master
- realized that the mock implementation is actually wrong: nothing checks whether the included hash (which is verified) is indeed corresponding to some specific snapshot number / pre-image, i.e. we could change the "content" + the signature to another valid value
Goal for the week:
- complete the mock implementation using an on-chain encoder for the snapshot number (it's only an integer)
- implement the ADR13 method for close on a branch to check feasibility

2021-12-23

AB Solo Programming

Plan for today:

merge pending PRs
error handling in Head Logic
architecture writeup
Mithril

Rebased https://github.com/input-output-hk/hydra-poc/pull/144 on master after merging branch expanding MockHead contracts

Writing NodeSpec test to check we properly notify clients when an exception is raised in the Chain component.

Added a new PostTxOnChainFailed message to the server output
Introduce combinators to capture server output and mock exceptions in the Chain component
Not sure this is the right thing to do however, seems like we are somewhat tightly coupling node and implementation of other components?

Also, perhaps "let-it-crash" strategy would be better for the Hydra node?

Need to adapt YAML specifications to add new message and move shared PostChainTx and InvalidTxError to common.yaml.

Got an "interesting" error in the Log API tests:

  1) Hydra.Logging HydraLog
       Assertion failed (after 1 test and 9 shrinks):
         [Envelope {namespace = "", timestamp = 1864-05-09 09:18:20.203694887175 UTC, threadId = 0, message = DirectChain {directChain = PostingTxFailed {toPost = AbortTx {utxo = fromList []}, reason = CannotSpendInput {input = "", walletUtxo = fromList [], headUtxo = fromList []}}}}]
         Traceback (most recent call last):
           File "/nix/store/ypidjcrcsxpnrqm4ivxf8pg475m0axqd-python3.8-jsonschema-3.2.0/lib/python3.8/site-packages/jsonschema/validators.py", line 811, in resolve_fragment
             document = document[part]
         KeyError: 'PostChainTx'

         During handling of the above exception, another exception occurred:

         Traceback (most recent call last):
           File "/nix/store/ypidjcrcsxpnrqm4ivxf8pg475m0axqd-python3.8-jsonschema-3.2.0/bin/.jsonschema-wrapped", line 9, in <module>
             sys.exit(main())
           File "/nix/store/ypidjcrcsxpnrqm4ivxf8pg475m0axqd-python3.8-jsonschema-3.2.0/lib/python3.8/site-packages/jsonschema/cli.py", line 76, in main
             sys.exit(run(arguments=parse_args(args=args)))
           File "/nix/store/ypidjcrcsxpnrqm4ivxf8pg475m0axqd-python3.8-jsonschema-3.2.0/lib/python3.8/site-packages/jsonschema/cli.py", line 87, in run
             for error in validator.iter_errors(instance):

Errors are now properly reported to clients and the TUI. There are still errors which make the node crash, esp. the ones related to failure to validate the tx and validators.

These should be also reported as InvalidTxError but this is left for next year

2021-12-22

AB Solo Programming

Plans for today:

Spike implementation of matching mock crypto so that we can verify sigantures in MockHead
PR about reporting failures to submit transaction to end-user, catching exceptions and sending messages

Signature property check fails immediately => algorithm for computing encoding is basically wrong...

I have Inverted quotient and remainder in my transformation from integer to bytes on-chain 🤦

The error we get with a failed property is not very helpful as iut does not show anything about datums or redeemers, need to enhance output to show those?

Adding redeemers display and datums to the describeCardanoTx function

Still have failure: There's probably a difference in the representation of snapshot (number) on and off chain that explains the failing signatures verification

In the SignableRepresentation of Snapshot we used show but in constructing the datum in closeTx we use serialise'

Still having failure: There is a one character difference between what show displays for off-chain Signed data and what it displays from Datums:

MultiSigned {multiSignature = [UnsafeSigned "K\245\DC2/4ET\197\NUL\NUL\NUL\NUL\NUL\NUL\NUL\SOH",UnsafeSigned "K\245\DC2/4ET\197\NUL\NUL\NUL\NUL\NUL\NUL\NUL\STX",UnsafeSigned "K\245\DC2/4ET\197\NUL\NUL\NUL\NUL\NUL\NUL\NUL\ETX"]}

vs

DataConstr Constr 1 [B "\SOH",List [B "PK\245\DC2/4ET\197\NUL\NUL\NUL\NUL\NUL\NUL\NUL\SOH",B "PK\245\DC2/4ET\197\NUL\NUL\NUL\NUL\NUL\NUL\NUL\STX",B "PK\245\DC2/4ET\197\NUL\NUL\NUL\NUL\NUL\NUL\NUL\ETX"]]

That must come from the ToCBOR instance which certainly adds some prefix when we invoke serialize': CBOR encoding of bytestrings prepend one or more bytes identifying the type and length of the bytes.

Using CBOR encoding to pass bytes on- and off-chain is problematic because we don't have CBOR parsing capabilities on-chain so we must ensure whatever bytes manipulation we do works on compatible representations.

Trying to use directly the bytes from the underlying signature => Unit tests pass

Got some tests still failing with new closeTx validator:

DirectChainSpec is failing but this is expected as we don't pass any signature in the Snapshot we post :)
TxSpec is also failing to observe closeTx
ETESpec and TUISpec

Fixing first DirectChainSpec was easy enough, just needed to sign the snapshot.

There's a minor snag in that we pass the Party to withDirectChain which means we need to draw the signing key from somewhere else. Perhaps it would make sense to pass the SigningKey to withDirectChain?

The problem with observing closeTx now is that we expect to decode an integer as SnapshotNumber but of course we get a bytestring! https://github.com/input-output-hk/hydra-poc/blob/ensemble/more-contract-testing/hydra-node/src/Hydra/Chain/Direct/Tx.hs#L592

Spent an hour troubleshooting the TxSpec test which was not passing because we passed a SNothing datum. The output of the test failure is particularly cryptic and does not provide much clues on what's going on.

I now only have the ETE test failing on validating the signatures, not sure why however 🤔 The cool thing is that we have a preoper failure being reported in the test.

Having a look at the node's logs: The CloseTx transaction is properly posted on-chain by node 1 AFAICT, but the message is actually misleading: What happens is that the transaction gets constructed properly but the submission fails and crashes the node => Adding some more detailed log messages

Seems like not all parties have signed the snapshot, here is the plutus reported error:

The data is: Constr 1 [List [I 10,I 20,I 30]]
The redeemer is: Constr 1 [B "\SOH",List [B "K\245\DC2/4ET\197\NUL\NUL\NUL\NUL\NUL\NUL\NUL\n",B "K\245\DC2/4ET\197\NUL\NUL\NUL\NUL\NUL\NUL\NUL\RS"]]

We have 3 parties but only 2 signatures which is fishy! The error might comes from the HeadLogic: The confirmed snapshot contains only part of the signatures we received! How come we did not catch this bug earlier 😮?

Checking the ETE test passes with the "correct" signatures set then will write a proper test for that. After the change we get the proper number of signatures:

The data is: Constr 1 [List [I 10,I 20,I 30]]
The redeemer is: Constr 1 [B "\SOH",List [B "K\245\DC2/4ET\197\NUL\NUL\NUL\NUL\NUL\NUL\NUL\DC4",B "K\245\DC2/4ET\197\NUL\NUL\NUL\NUL\NUL\NUL\NUL\n",B "K\245\DC2/4ET\197\NUL\NUL\NUL\NUL\NUL\NUL\NUL\RS"]]

but still have a failure to validate signatures.

This comes from the fact the signatures are not in the same order than the parties which is an important assumption made in the mock head code.

Possible solutions: Pass a map of parties to signatures, check the signatures differently (eg. shuffling lists), or make sure the 2 lists are in the same order

Adding the signatures to the SnapshotConfirmed message so that we can observe it and have a proper unit test

Decided to not implement complex ordering logic checking within MockHead but rather to order the multisignatures in the HeadLogic, where is produced, to match parties ordering.

Ideally I would have liked to do that in the aggregate function but this one works on already signed ByteStrings so the PArty information is buried.
Also noticed that the APISpec tests did not fail in spite of me adding a field to a ServerOutputs constructor: I would expect the JSON specification validation code to have caught this but it did not.

TUISpec test cannot pass in the current implementation of validators because it does not produce any snapshot, hence the Close transaction fails to pass validation. This is explicitly handled in the paper, e.g when snapshot number is 0 hence we should deal with it in our ContractSpec tests.

Made all tests green but for one TUISpec test which I put in pending because we don't have the ability to make it pass right now: The TUI full lifecycle test does not produce any snapshot hence we close with snapshot 0 and no signatures and the MockHead validator does not cover this.

2021-12-16

Ensemble Session

Trying to generalise mutations, to produce more redeemers then introducing different and more significant ones.

I am "surprised" by the fact one cannot automatically derive instances for Plutus datatypes:

src/Hydra/Contract/MockHead.hs:42:13: error: [-Wmissing-methods, -Werror=missing-methods]
    • No explicit implementation for
        ‘==’
    • In the instance declaration for ‘Eq Input’
   |
42 |   deriving (Eq, Generic, Show)

Also, the Eq instance does not even seem to be visible to test code. This is probably an arfifact from the Plutus plugin compiler and transformation?


test/Hydra/Chain/Direct/ContractSpec.hs:166:15: error:
    • No instance for (Arbitrary Plutus.V1.Ledger.Value.Value)
        arising from a use of ‘genericArbitrary’
      There are instances for similar types:
        instance cardano-ledger-shelley-test-0.1.0.0:Test.Cardano.Ledger.Shelley.ConcreteCryptoTypes.Mock
                   c =>
                 Arbitrary (Cardano.Ledger.Mary.Value.Value c)
          -- Defined in ‘Test.Cardano.Ledger.ShelleyMA.Serialisation.Generators’
    • In the expression: genericArbitrary
      In an equation for ‘arbitrary’: arbitrary = genericArbitrary
      In the instance declaration for ‘Arbitrary MockHead.Input’
    |
166 |   arbitrary = genericArbitrary

Perhaps they are available in some module we do not import?

Looks there are some in the PAB code. As we don't depend on plutus-app anymore, we'll need to rewrite them or vendor this file. like we did for SM code.
Created a Plutus.Orphans module vendoring stuff from PAB.

I have a test failure with the generator when the redeemer is a Close with a different snapshot number, which is an expected error I would say? *Anyhow, the close redeemer should not be only a snapshot number but a whole signed snapshot so this paves the way to do the needed changes.

Trying to remove non-documented symbols from haddock document generation, seems like this is only available as a module-level prune attribute, which sucks...

Some discussion aournd "mutation testing" approach:

There is nothing there forcing us into implementing correctly the output datum, so that a close tx could just as well produce an invalid output datum which won't be consumable by the fanout tx
However we already have tests in place for the whole "happy path" so if a close does produce some invalid datum, this will be caught by the fanout, or later when we check the produced UTXO
By thoroughly testing each validator, we ensure each link of the chain is correct but we need test(s) to ensure the whole chain is correct
Validators are always checked and implemented as purely local functions so it makes sense to test them locally

2021-12-15

Ensemble Session

Discussing what to do next, and which contracts to implement. Seems like Close is the best candidate because it's one we have barely touched so far.

What about the contestationPeriod? Discussing the passing of time in the HEad logic, should probably be reported by underlying ChainComponent as ticks, instaed of putting the contestation computation logic in the on-chain component. 2 different things:

Has enough time passed?
Is the fanout posting the right UTXO?

Discussing testing strategy for contracts:

Mutation approach: Generate valid transaction, then mutates them to render them invalid and make sure the validator fails
Constraints-based approach: Starts with an empty transaction then add more "constraints" representing how we expect the transaction to be
- requires to start with a Const False validator to make sure we have a failing test
We need some higher level of testing involving sequence of transactions/validators to express properties: for example,
- It should always be possible to abort if the head is not open
We also want to have rock-solid contracts so we need to test all kind of non-happy paths

We decide to give the "mutation" approach on Close validator a go. The idea is the following:

We start from an arbitrary supposedly valid Tx (and relevant UTXO) of the required type, in this case genCloseTx
We start from a Const True validator, eg. one that validates any transaction
We indeed verify this transaction passes stage 2 validation
- We don't care about stage 1 as it's supposed to be validated before that, which kind of implies we need to generate structurally valid transactions
We then generate various relevant mutations to this valid transaction that are supposed to make it invalid
- We started with a simple one, namely replacing the redeemer with Abort
- Various possibilities include:
  - Modifying the value of an input or an output,
  - modifying the datum of an input (note that we modify the hash in the provided UTXO because if the hash is not compatible the transaction is srtucturally invalid)
  - Removing some output or some input
By adding more and more mutations, the goal is to "triangulate" the validators, to make them more and more precise and verify more and more conditions
While constructing the mutations, we let emerge some kind of DSL to construct transactions that we use in Tx module to simplify it

2021-12-14

Research Meeting

Merkle Patricia Tree

Raised the question of MPT again with researchers based on mostly one observation / concern: Hydra's specification for MPTs is different from Ethereum's, and requires the prefix for each node to be part of the node's hash. Implementation-wise, this introduces a quite important complexity since it requires hashes to be constructed one-by-one, as an onion layer, for each digit of the prefix -- without what it is not feasible to add / remove element with only a proof and a root hash. This means that in the case of Cardano output references, an MPT path is at minima 32 hashes! We are afraid that this would make the computational budget go over the roof.

Thus two legitimate questions:
1. Why include the prefix / path as part of the hash? What would be the security consequences / hypotheses for omitting the prefix / path from the hash structure?
2. Do we actually need to add / remove elements from an MPT at all in a coordinated Head context?
Researchers are investigating (1) and comparing with Ethereum's implementation, looking at the trade-offs in both solutions. For (2), the answer is almost in the question. Adding and removing elements to and from an MPT is required by the OCV code for the close transition in the presence of dangling transactions (basically, the OCV is re-applying those transactions on top of the signed snapshot and checks that the resulting UTXO matches with the MPT's root). In a coordinated context, there's none. Which means that MPT are only truly needed for fanout splitting.
As a consequence of (2) above, we may also wonder why an MPT is even needed at all. Since the output references are meaningless on the layer 1, all that is really necessary during fanout is to check that resulting UTxO (i.e. addresses, value and datums) do indeed correspond to what's been agreed by the participants. As such, a "simple" Merkle Tree (perhaps enhanced with the accumulated size to facilitate splitting) would be sufficient in order to create and verify the split transactions.

Adding / Removing Head participants

We discussed the possibility of adding and removing participants to and from an already established head. This would allow a head to on-board new participants, for either a short period of time, or until the end. While there's no apparent issue or concern with this (it was discussed during the writing of the paper, but omitted to avoid bloating the paper), there hasn't been any explicit use-case made for it. One possible idea would be to on-board new validating participants and make the head a bit more of an "open" network (so long as participants agree to onboard someone...).

AB Solo Programming

Trying to remove the annoying messages that are printed when the thread controlling the TUI exits because it's blocked on a STM operation.

Seems like it's thrown in the MonadSTM but the exception type is not exported which means there's no way to catch it.
I presume this is intentional, in order to remove the temptation users could have to tamper with those exceptions but in our case this is pretty annoying. Perhaps I could wrap it in a silently call but then how about interaction with stdin/stdout?
It's possible to add a custom event that would be handled in the main event loop as an AppEvent that will invoke halt function to stop the TUI. However, it's not clear how to inject that custom event into the channel that distributes them as it's currently private, so there would be a need for some kind of control side channel that would stop the TUI inner loop but this is too significant a change to be delt with right now.

In CI Build https://github.com/input-output-hk/hydra-poc/runs/4516685770?check_suite_focus=true there is something odd: it reports failed tests but those are nowhere to see.

The fact hydra-tui is not built along with hydra-node in the docker-compose is annnoying, also there aren't any build instructions so it's not possible to build them in compose.

Seems like demo instructions assume user pulls images and does not build them locally, going to add instructions on how to build them locally.

There is a need for another level of ETE test, one that would check the docker images are properly working. We could test through the TUI, using existing infrastructure but running the nodes as containers and the TUI in-process with several instances interacting with a cluster.

Trying to simplify TxSpec tests and see if I can extract common features to reuse in testing contracts.

My idea about contracts would be to provide a way to build transactions and UTXO then apply the tx against given UTXO using underlying ledger-specs infra as provided by Hydra.Ledger.Cardano.
To test contracts we could do something similar to the constraints eDSL in Plutus: Start from a blank transaction then generate a new transaction applying a sequence of arbitrary constraints to generate a tx that would or would not pass the validator. Then trying to validate the transaction. Problem is the oracle: How do we know the property holds? Perhaps what we could do is to have the generator express what a valid init/commit/... transaction is? Like:
```
prop "check valid commit" $ \ (ACommitTx tx utxo) ->
  isRight $ runIdentity $ evaluateTransactionExecutionUnits pparams tx utxo epochInfo systemStart costmodels
```

2021-12-13

Pair Programming

Goal: Make the demo work.

We have a flawed logic in the observation of commits: We remove the inputs from initials that have not been observed in the commit which obviously leads to the inability to commit after having observed another commit tx.

To Test the observeCommit tx and modify the Onchain state, we need to generate a list of initials (TxIn, PKH). But we also need to populate the list of initials :: [(TxIn, TxOut, Data)]
The reason why we have both TxOut and Data is that we are using ledger specs where TxOut only contains datum hash.

Going to fix the TUI's commits. The problem is that we cannot build transactions when the head is open as the UTXO committed are now using full cardano tx hence we need to identify them according to our own addresses. We also need matching signing key to be able to sign the tx

While refactoring TUI we are bitten by the problem that Party only contains the multisig key and not the cardano key, which makes it impossiblre to use it to identify addresses to use in the TUI

Workaround is to infer the list of addresses to sent money to in the TUI from the list of existing UTXpO

Some little bugs reamining in the TUI:

List of UTXO displays duplicate addresses which messes up with navigation
When user has no UTXO to send, she can still go to the recipient list but this crashes afterwards => Won't fix for now

We were able to complete the journey through the TUI, observing the fanout transaction in the cardano-node 🎉 :

df44aeb02bb740c86b3745b604bf3eccbb77f13d3493532df8fda38eea95de3a     0        2000000 lovelace + TxOutDatumHash ScriptDataInAlonzoEra "67d8ed01e13f33438ea9059ac9be2e159f943cffe054283485e0300271e3e9f9"
df44aeb02bb740c86b3745b604bf3eccbb77f13d3493532df8fda38eea95de3a     1        100000000 lovelace + TxOutDatumNone
df44aeb02bb740c86b3745b604bf3eccbb77f13d3493532df8fda38eea95de3a     2        100000000 lovelace + TxOutDatumNone
df44aeb02bb740c86b3745b604bf3eccbb77f13d3493532df8fda38eea95de3a     3        10000000 lovelace + TxOutDatumNone
df44aeb02bb740c86b3745b604bf3eccbb77f13d3493532df8fda38eea95de3a     4        90000000 lovelace + TxOutDatumNone

Also, we were able to open another Head after having closed the first one, and have one party not committing anything which is fine 🍾 .

A problem with our current scheme is that a party whic commits nothing or whcih has consumed all its UTXO won't be listed in the recipients list.

2021-12-10

Pair Programming

[1 of 7] Compiling CardanoClusterFixture ( src/CardanoClusterFixture.hs, dist/build/CardanoClusterFixture.o, dist/build/CardanoClusterFixture.dyn_o )

src/CardanoClusterFixture.hs:14:15: error:
    • Exception when trying to run compile-time code:
        /tmp/nix-build-local-cluster-lib-local-cluster-0.1.0.drv-0/hydra-poc-root-local-cluster-lib-local-cluster-root/local-cluster/config: getDirectoryContents:openDirStream: does not exist (No such file or directory)
      Code: makeRelativeToProject "config" >>= embedDir
    • In the untyped splice:
        $(makeRelativeToProject "config" >>= embedDir)
   |
14 | configFiles = $(makeRelativeToProject "config" >>= embedDir)

Seems like file-embed does not work correctly inside nix build :sad:

Using cabal API to package and extract extra data files across packages works just fine, no need to use file-embed module: https://cabal.readthedocs.io/en/3.4/cabal-package.html#accessing-data-files-from-package-code

Still failing to build inside nix:

Setup: filepath wildcard 'config/*.json' refers to the directory 'config',
which does not exist or is not a directory.

=> We probably want to list all files explicitly
We need to regenerate materialisation when change data file (or any content) of a local package

Docker compose build and run is working fine now, needed to:

Update permissions when running prepare-devnet.sh so that files have 0400 perms
rebuild hydra-tui properly

Cool thing is that running hydra-tui works fine from the docker-compose using just

docker-compose --profile tui run hydra-tui-alice

Injecting UTXO(s) into demo cardano-node so that TUI user can post transaction and commit

Trying to simplifying key juggling code between crypto keys and cardano-api keys

I am hitting a small snag with the hashKey function which is used by the Tx module to pack in the initial datume, trying to find a hashing function that works with API types

Got a first working version of an exe injecting seed payment for one address, but got a submission error when trying to run it, might be an issue with version of nodes

Rebuilding docker containers...

Program can inject a single UTXO + seed payment to be used in the network:

$ cabal run seed-network -- --cardano-node-socket demo/devnet/ipc/node.socket --cardano-signing-key demo/devnet/credentials/alice.sk
Querying node for Protocol Parameters at demo/devnet/ipc/node.socket
Posting seed payment transaction at demo/devnet/ipc/node.socket, amount: Lovelace 100000000, key: demo/devnet/credentials/alice.sk
UTXO for address ShelleyAddress Testnet (KeyHashObj (KeyHash "f8a68cd18e59a6ace848155a0e967af64f4d00cf8acee8adc95a6b0d")) StakeRefNull
{
    "223de11cbda4126bae963c1d653e7c4711554011bcd807ec3eea8bf958199fa7#0": {
        "address": "addr_test1vru2drx33ev6dt8gfq245r5k0tmy7ngqe79va69de9dxkrg09c7d3",
        "value": {
            "lovelace": 100000000
        }
    },
    "223de11cbda4126bae963c1d653e7c4711554011bcd807ec3eea8bf958199fa7#1": {
        "address": "addr_test1vru2drx33ev6dt8gfq245r5k0tmy7ngqe79va69de9dxkrg09c7d3",
        "datumhash": "a654fb60d21c1fed48db2c320aa6df9737ec0204c0ba53b9b94a09fb40e757f3",
        "value": {
            "lovelace": 899899828691
        }
    }
}

2021-12-09

AB Solo Programming

Fixed generators for WalletSpec so that we have 100% Right coverage. Trying to get code coverage information to understand what we are testing really

Somewhat correct invocation for code coverage with hpc, generating HTML files but failing to generate an index:

hpc markup \
 '--destdir=/home/curry/hydra-poc/dist-newstyle/build/x86_64-linux/ghc-8.10.7/hydra-node-0.1.0.0/t/hydra-node/hpc/vanilla/html/hydra-node' \
 --hpcdir=/home/curry/hydra-poc/dist-newstyle/build/x86_64-linux/ghc-8.10.7/hydra-node-0.1.0.0/t/hydra-node/hpc/vanilla/mix/hydra-node  \
 --hpcdir=/home/curry/hydra-poc/dist-newstyle/build/x86_64-linux/ghc-8.10.7/hydra-node-0.1.0/hpc/vanilla/mix/hydra-node-0.1.0/ \
 --hpcdir=/home/curry/hydra-poc/dist-newstyle/build/x86_64-linux/ghc-8.10.7/hydra-node-0.1.0.0/hpc/vanilla/mix/hydra-node-0.1.0.0 \
 --srcdir hydra-node \
 ./dist-newstyle/build/x86_64-linux/ghc-8.10.7/hydra-node-0.1.0/t/tests/hpc/vanilla/tix/tests/tests.tix

Ensemble Session

Discussing how to retrieve the UTXO from the cardano node, whether or not to put it in existing Client in TUI.

It makes sense to separate responsibilities between component talking to Hydra and one talking to the node, even though in the end, from the perspective of the TUI, it's a single entry point

Got stuck once more in issues with various types of UTXO being available:

queryUtxo returns a cardano-api UTxO
we need a Hydra' Utxo

We need to filter the UTXO used for payment with the markedDatum

We manage to see the Head open in the TUI, with the right commits available

Troubleshooting the issue on CI with TUISpec, namely that End-to-end tests fail, looks related to the fact the fd used to write is not a vty: https://stackoverflow.com/questions/1605195/inappropriate-ioctl-for-device

2021-12-08

SN Solo work on TUI tests

Starting to get the demo running: it's IMPORTANT to have the devnet re-created as the nodes do not sync back in time.
Host-mounting the node.socket allows for some convenient cardano-cli querying
hydra-node crashes when the TUI selects a randomly generated utxo to commit with "CannotSpendInput" -> expected
By exposing the IO Vty initializer, we can hook into the Vty interface and re-direct the Output into a normal file
A generic BrickTest handle / with pattern emerges
Realized that the update function does write continuously into the file and it contains "multiple" screens
- try to seek backward on each update
- this messes with the terminal
Theory why splitting individual frames is not possible: only changes are drawn to the Fd?
There is a Mock output in vty: https://hackage.haskell.org/package/vty-5.33/docs/Graphics-Vty-Output-Mock.html
Maybe outputPicture can be used instead? If used with a "fresh" displayContext, this shows the picture for real instead of forwarding it to the Fd
Using a custom Output I can hook into 'outputByteBuffer' and redirect that? This seems to also allow providing an 'assumedStateRef' which could be cleared outside to force a "full re-render"?
By writing into an IORef, which is cleared before each call to outputPicture we can keep a single frame!
If we use the real display context, it draws correctly using writeMoveCursor but this output is harder to reason about programmatically
Maybe I could drop the escape codes for expectations and keep them for displaying?
Adding the shouldRender function was easy now
- threadDelays are necessary right now -> ugly .. double buffering where the getPicture uses a (T)MVar to block on the next frame could help
- it outputs the frame which does not have the expected bytes -> nice!
Also, I do get a BlockedIndefinitely exception on failing tests.. should be okay

Pairing session

We want to add hydra & cardano node to the TUI tests using withBFTNode and withHydraNode
- rewriting the tests was fine
- but fails now as the withBFTNode from local-cluster can't find the hard-coded fixtures in config/..
We see two options now:
- generate everything similar to cardano-testnet is doing it
- embed our hard-coded local-cluster files when compiling the local-cluster library -> we go for this as it's more in-line what we have right now
Writing config JSON files which had been copied before from memory
Cardano keys are a bit more involved. We had been pointing the withBFTNode to the actual file paths instead of copying them to a temporary directory, so we tackle this and need to change quite some signatures of keysFor or signingKeyPathFor
After rebasing the TUI test work, we also need to distribute initialFunds and make a "seed payment" using mkSeedPayment -> Success, all tests but the expected failure pass!

2021-12-07

AB Solo Programming

Goal: Fix all tests implementing fee coverage using "marked" UTXO

Still stuck in fixing DirectSpec test, probably because there's a race condition while we are waiting for the payment utxo to appear

Inject marker UTXO
Throwing an exception when we cannot cover fee

DirectSpec test is passing but WalletSpec properties never cover Right side:

  coverFee
    balances transaction with fees
      +++ OK, passed 100 tests (100% Left).
    transaction's inputs are removed from wallet
      +++ OK, passed 100 tests (100% ErrNoPaymentUtxoFound).

Struggling to write a correct seed payment transaction generator for use in Integration and ETE tests

There is a mismatch between the config we generate as part of the CardanoNode setup and the existing initial funds: In one case we use 900000 ADA and in the other case 900 ADA. As they use the same address, when the mkGenesisTx function runs it retrieves one or the other.
I would like to start with empty initialFunds and then fill them up as we need when we start the cluster

Paying InitTx succeeds but posting all subsequent transactions in DirectChainSpec fails, probably because they are not waiting for payment to appear

Retrying blindly without timeout does not work, of course, so need to add a timeout to all retries
Problem now is that generatePaymentToCommit is probably consuming the markerDatum without recreating it so it disappears

Forgot to retry some postTx calls, the ones that are supposed to fail

I can confirm the "marked" UTXO is consumed by the generatePaymentToCommit.

ETE tests still failing because we don't have a seed transaction posted so there's no payment utxo available => seeding UTXO

It's possible there is a race condition between the time the node sees the commits and the time the wallet takes into account the payment utxo to cover the collectcom tx? It's bob who is tryign to post the collectcomtx and is failing to cover its fees I can see this in Bob's log:

{
  "message": {
    "directChain": {
      "contents": {
        "after": {
          "d0e48424eed4e798aac21e0caae434aa3fbb2fafa4dd62f40f568c6a7c895bdb#0": {
            "address": "601052386136b347f3bb7c67fe3f2ee4ef120e1836e5d2707bb068afa6",
            "datahash": null,
            "value": {
              "policies": {
              },
              "lovelace": 97834279
            }
          },
          "876818297ef126d372d05572ddeb3a4dd971d72eb6062a119987bc20ab6212c5#1": {
            "address": "601052386136b347f3bb7c67fe3f2ee4ef120e1836e5d2707bb068afa6",
            "datahash": "a654fb60d21c1fed48db2c320aa6df9737ec0204c0ba53b9b94a09fb40e757f3",
            "value": {
              "policies": {
              },
              "lovelace": 899896530691
            }
          }
        },
        "before": {
          "d0e48424eed4e798aac21e0caae434aa3fbb2fafa4dd62f40f568c6a7c895bdb#0": {
            "address": "601052386136b347f3bb7c67fe3f2ee4ef120e1836e5d2707bb068afa6",
            "datahash": null,
            "value": {
              "policies": {
              },
              "lovelace": 97834279
            }
          }
        },
        "tag": "ApplyBlock"
      },
      "tag": "Wallet"
    },
    "tag": "DirectChain"
  },
  "timestamp": "2021-12-07T12:14:49.531989295Z",
  "threadId": 20,
  "namespace": "HydraNode-2"
}

which happens "after" it tries to post collectcom, it's perfectly possible it's missing the payment txout

Should the node try again to post it if it fails, or should this be handled in the postTx definition in the Direct module? => It's reasonable to expect various race conditions and the need to retry posting given some conditinos are not yet met but could be met in the future.

Pair Programming

Added a timeout around finalizeTx in Direct and reinstated retry in the wallet so that we can wait for payment utxo to appear.

This is actually a case where the client could be interested in the error reported and do something about it, eg. send money to the Node's "wallet" to pay for Head SM

LocalClusterSpec fails because there's no initialFunds as I removed it from the genesis-shelley.json file => We want to put them back, and overwrite them in tests.

2021-12-06

Pair programming

Start pairing with realization that the MockHead is already checking things:
- it asserts the "newValue" is the same as "oldValue", i.e. nothing is added
- but of course we are adding the collected value from the commits
- So we start by passing off-chain knowledge to the Head (SM) validator via the redeemer (SM Input)
- NOTE: This might not be a good idea and instead we should look at the script context / all commit (PT) inputs
We improve error printing on tx submission failures of Chain.Direct
Now the close fails because it does not preserve value and we pass in a TxOut of the head utxo

Goal: Have ETE and Benchmarks pass

There errors we are seeing from tests execution are painful, so we want to improve their formatting:

Formatting the submission error is bit annoying as it requires peeling several layers of stacked errors The plutus error is already formatted so would be nice to print it directly rather than showing it
Is there not already a way to PPrint ledger/node errors?

The error in the CollectCom comes from the SM:

It checks the value stored in the SM UTXO is preserved between transitions, and this is not the case currently, hence the errors reported on script execution
We collect the total value in the CollectCom redeemer and use that to update the destination state's value

SM validator now fails on the close tx, for the same reason, eg. missing values

Then fanoutTx also fails, for the same reason, but now we must ensure the Final state is really final so that the SM logic checks the destination state value is 0 and there's no additional output for the SM/

Commit is failing in the benchmarks run with not enough fees => Trying to fix the wallet logic to remove the input txin from selection logic, but this does not work

Unit tests are failing:

Struggling with the ledger/api discrepancies to fix unit tests
We are missing the datum to pass to the collectComTx function so we add them, but now this breaks some tests which require the UTXO, but not the datum and we are stuck in a maze of mapping and transforming back and forth between ledger and api

We are having an error in DirectSpec which fials to post the init tx: It seems we are retrying when posting the initTx but catching all errors which is sledgehammerish => Adding a proper exception instead

InitTx submission blocks because we changed the way inputs are selected in the finalizeTx and in the wallet we retry when there's no availableUtxo, removing the retry reveals the error

Idea:

We need to have a distinguished address or utxo we carry around to pay for the fees. Could we use a datum for that?
Other option: Simply call cardano-cli to do the balancing of a tx

2021-12-03

Pair Programming

Trying to make the ETE fail by having both Alice and Bob committing UTXO. This should fail according to my theory that we are consuming the wrong UTXO in the collectCom tx

ETE test now fails for the same reason than Benchmark fails:

  CannotSpendInput
    { input = ("9546383daca50c0c643abca09331c5e58cfef49fa899eb8d15bfb2347ba1b001", 1)
    , walletUtxo =
        fromList
          [ (TxIn "8d383a29a211578298143ab26b3b2e1c4406abe5d7a905c49b234fdccf2627c8" (TxIx 1), TxOut (AddressInEra (ShelleyAddressInEra ShelleyBasedEraAlonzo) (ShelleyAddress Testnet (KeyHashObj (KeyHash "3aaa2e3de913b0f5aa7e7f076e122d737db5329df1aa905192284fea")) StakeRefNull)) (TxOutValue MultiAssetInAlonzoEra (valueFromList [(AdaAssetId, 899996702000)])) TxOutDatumNone)
          ]
    , headUtxo =
        fromList
          [ (TxIn "bd71f91ba872c4d79f45163dae877c1644a22527df47e1c49c25657acd10603c" (TxIx 0), TxOut (AddressInEra (ShelleyAddressInEra ShelleyBasedEraAlonzo) (ShelleyAddress Testnet (ScriptHashObj (ScriptHash "07204aff6885e57d799ee359f7e401d64ff3b54e0de7c4edc3f163d9")) StakeRefNull)) (TxOutValue MultiAssetInAlonzoEra (valueFromList [(AdaAssetId, 2000000)])) (TxOutDatumHash ScriptDataInAlonzoEra "c2f7589a052854c8877e74b7ec3de892981766ef819fc03bc8c893daf66dd72e"))
          , (TxIn "bd71f91ba872c4d79f45163dae877c1644a22527df47e1c49c25657acd10603c" (TxIx 1), TxOut (AddressInEra (ShelleyAddressInEra ShelleyBasedEraAlonzo) (ShelleyAddress Testnet (ScriptHashObj (ScriptHash "12c4f8ff8070f0d659bdb2ecf844190ddb1134ef821764bd2b4649b2")) StakeRefNull)) (TxOutValue MultiAssetInAlonzoEra (valueFromList [(AdaAssetId, 2000000)])) (TxOutDatumHash ScriptDataInAlonzoEra "2502ff9c9c341dd1384724ae35eab0b19e394c90226892fcc8e7cc86342d324e"))
          , (TxIn "bd71f91ba872c4d79f45163dae877c1644a22527df47e1c49c25657acd10603c" (TxIx 2), TxOut (AddressInEra (ShelleyAddressInEra ShelleyBasedEraAlonzo) (ShelleyAddress Testnet (ScriptHashObj (ScriptHash "12c4f8ff8070f0d659bdb2ecf844190ddb1134ef821764bd2b4649b2")) StakeRefNull)) (TxOutValue MultiAssetInAlonzoEra (valueFromList [(AdaAssetId, 2000000)])) (TxOutDatumHash ScriptDataInAlonzoEra "3b5e4228faf69ddf21fb84990b54d806c2b1234a250e0f4c54cc953257ff57ac"))
          , (TxIn "bd71f91ba872c4d79f45163dae877c1644a22527df47e1c49c25657acd10603c" (TxIx 3), TxOut (AddressInEra (ShelleyAddressInEra ShelleyBasedEraAlonzo) (ShelleyAddress Testnet (ScriptHashObj (ScriptHash "12c4f8ff8070f0d659bdb2ecf844190ddb1134ef821764bd2b4649b2")) StakeRefNull)) (TxOutValue MultiAssetInAlonzoEra (valueFromList [(AdaAssetId, 2000000)])) (TxOutDatumHash ScriptDataInAlonzoEra "59b19510ad6d701df3df01804374a9a5126b265b779804d739e78721ebd3872c"))
          ]
    }

This error is returned by the Wallet when it tries to resolveInputs. Adding more info about all the inputs of the transaction

I don't see a way to have a proper CollectComTx than collecting the outputs of the commit transactions and keeping them around in the Direct chain state, or the head state, to pass later on when submitting collect com tx. Now trying to retrieve the commits from observing the UTXO, we need to return the ν_commit UTXO from the observed commit txs.
Trying to fix the value produced in the commit output

Once again lost in the maze of types between ledger and API...

Why don't we simply pass a Utxo to the commitTx function and instead pass a Maybe (x,y) tuple?

This tuple is what is selected by the user, which should be a proper Utxo containing a single input.

Observation test now pass but test for transaction size fails

The problem in the TX size comes from the generated Value which is huge. This is also what we observe in the CI and this comes from our use of arbitrary Txs from the Ledger. We have something in the WalletSpec already for trimming down the values to something more palatable, perhaps using ReasonablySized ?

Struggled quite a bit but test checking we observe collectCom properly is failing because we do not use the commit outputs.

Note: I really think the observeXXX functions should work with OnChainHeadState as:

What is relevant for observation depends on the state
The state can be modified by the observed TX, like the initials being produced/consumed by various transactions

Managed to have the CollectCom transaction consumes the commits UTXO and not the committed ones, now checking what happens in ETE test

ETE and DirectChainSpec tests still failing

I think I know what happens: The Wallet tries to resolve an input corresponding to the commit UTXO but it does not have it in its UTXO set because it's a UTXO paid to a script address and not to the Wallet's owner address so we don't track it.

But what happens for Head script UTXO?
We pass more UTXO to resolve when we call coverFee

=> Trying to add the accumulated commits in OnChainHeadState to the cover fee function

We are now observing a

WrappedShelleyEraFailure
  ( MissingScriptWitnessesUTXOW
      ( fromList
          [ ScriptHash "6679d3c92844becb16c55161b60111336e1ba2f3d14bbb52b051c4db"
          ]
      )
  )

which probably means we don't put the script for consuming the commit outputs into the collect com tx. Also putting the redeemers.

Transaction now fails because of scripts execution:

HardForkApplyTxErrFromEra S (S (S (S (Z (WrapApplyTxErr {unwrapApplyTxErr = ApplyTxError [
   UtxowFailure (
     MissingRequiredDatums (fromList [SafeHash "549485dcc8131ab64122a9163080943f55b83ae368bc55bec73e583f192f3080"]) (fromList [SafeHash "8392f0c940435c06888f9bdb8c74a95dc69f156367d6a089cf008ae05caae01e",SafeHash "f4b9d64e4725efc05d7d078bb19e952b288c8403f5a585a8a6ffe589a9851614"])),UtxowFailure (WrappedShelleyEraFailure (UtxoFailure (UtxosFailure (ValidationTagMismatch (IsValid True)
  (FailedUnexpectedly [
    PlutusFailure "
The 3 arg plutus script (PlutusScript PlutusV1 ScriptHash \"07204aff6885e57d799ee359f7e401d64ff3b54e0de7c4edc3f163d9\") fails.
CekError An error has occurred:  User error:
The provided Plutus code called 'error'.
The data is: Constr 0 [Constr 0 [I 100000000000000],List [I 10]]
The redeemer is: Constr 0 []
The context is:
Purpose: Spending (TxOutRef {txOutRefId = bc1eae5aa8e72d3f9e5c0cd725b49dc47954570b6087eb16b5cc3f4ce6daf4ea, txOutRefIdx = 0})
TxInfo:
  TxId: 91b8eae01ad75d635d8e925925195da468bc700c9a20286d2efdfd7957b3d3a8
  Inputs: [ a24d818e7fc823c61416a095f98b139fc8c520b9ee5365791245f8d9ec7efc6b!0 -> - Value (Map [(,Map [(\"\",3000000)])]) addressed to
                                              ScriptCredential: 6679d3c92844becb16c55161b60111336e1ba2f3d14bbb52b051c4db (no staking credential)
          ,a24d818e7fc823c61416a095f98b139fc8c520b9ee5365791245f8d9ec7efc6b!1 -> - Value (Map [(,Map [(\"\",899989536279)])]) addressed to
                                                       PubKeyCredential: f8a68cd18e59a6ace848155a0e967af64f4d00cf8acee8adc95a6b0d (no staking credential)
       , bc1eae5aa8e72d3f9e5c0cd725b49dc47954570b6087eb16b5cc3f4ce6daf4ea!0 -> - Value (Map [(,Map [(\"\",2000000)])]) addressed to
                                                           ScriptCredential: 07204aff6885e57d799ee359f7e401d64ff3b54e0de7c4edc3f163d9 (no staking credential)]
  Outputs: [ - Value (Map [(,Map [(\"\",5000000)])]) addressed to
               ScriptCredential: 07204aff6885e57d799ee359f7e401d64ff3b54e0de7c4edc3f163d9 (no staking credential)
           , - Value (Map [(,Map [(\"\",899986238279)])]) addressed to
               PubKeyCredential: f8a68cd18e59a6ace848155a0e967af64f4d00cf8acee8adc95a6b0d (no staking credential) ]
  Fee: Value (Map [(,Map [(\"\",3298000)])])
  Value minted: Value (Map [])
  DCerts: []
  Wdrl:[]
  Valid range: (-\8734 , +\8734)
  Signatories: []
  Datums: [ ( 8392f0c940435c06888f9bdb8c74a95dc69f156367d6a089cf008ae05caae01e

Seems like it's missing the datum witnesses for the commit outputs.

=> Adding commit datums

So I don't have the missing datum error anymore, only a script failure

The datum types are odd in the error, need to dump the transaction to see what's going on

It seems it's the head script which is failing:

The data is: Constr 0 [Constr 0 [I 10000000000000],List [I 10,I 20,I 30]]\nThe redeemer is: Constr 0 []

It's clear the datums are there:

HardForkApplyTxErrFromEra S (S (S (S (Z (WrapApplyTxErr {unwrapApplyTxErr = ApplyTxError [UtxowFailure (WrappedShelleyEraFailure (UtxoFailure (UtxosFailure (ValidationTagMismatch (IsValid True) (FailedUnexpectedly [PlutusFailure "
The 3 arg plutus script (PlutusScript PlutusV1 ScriptHash \"07204aff6885e57d799ee359f7e401d64ff3b54e0de7c4edc3f163d9\") fails.
CekError An error has occurred:  User error:
The provided Plutus code called 'error'.
The data is: Constr 0 [Constr 0 [I 10000000000000],List [I 10,I 20,I 30]]
The redeemer is: Constr 0 []
The context is:
Purpose: Spending (TxOutRef {txOutRefId = bd71f91ba872c4d79f45163dae877c1644a22527df47e1c49c25657acd10603c, txOutRefIdx = 0})
TxInfo:
  TxId: 35b789b90d675222ca720ec0edb167d3c80d8ea2505327e5a8a6154de39c8ef7
  Inputs: [ 0baa47ee668c4a9daf984ca29d2ada80224ea74ecae114f2b49c923252bd612f!0 -> - Value (Map [(,Map [(\"\",4000000)])]) addressed to
                                                                                    ScriptCredential: 6679d3c92844becb16c55161b60111336e1ba2f3d14bbb52b051c4db (no staking credential)
          , 0baa47ee668c4a9daf984ca29d2ada80224ea74ecae114f2b49c923252bd612f!1 -> - Value (Map [(,Map [(\"\",899984536279)])]) addressed to
                                                                                    PubKeyCredential: f8a68cd18e59a6ace848155a0e967af64f4d00cf8acee8adc95a6b0d (no staking credential)
          , 8d383a29a211578298143ab26b3b2e1c4406abe5d7a905c49b234fdccf2627c8!0 -> - Value (Map [(,Map [(\"\",2000000)])])addressed to
                                                                                    ScriptCredential: 6679d3c92844becb16c55161b60111336e1ba2f3d14bbb52b051c4db (no staking credential)
          , 97469573293f61bf761da1adc77c1ad207e4c65249ee78f18c336db0a137a7a8!0 -> - Value (Map [(,Map [(\"\",4000000)])]) addressed to
                                                                                    ScriptCredential: 6679d3c92844becb16c55161b60111336e1ba2f3d14bbb52b051c4db (no staking credential)
          , bd71f91ba872c4d79f45163dae877c1644a22527df47e1c49c25657acd10603c!0 -> - Value (Map [(,Map [(\"\",2000000)])]) addressed to
                                                                                    ScriptCredential: 07204aff6885e57d799ee359f7e401d64ff3b54e0de7c4edc3f163d9 (no staking credential) ]
  Outputs: [ - Value (Map [(,Map [(\"\",12000000)])]) addressed to
               ScriptCredential: 07204aff6885e57d799ee359f7e401d64ff3b54e0de7c4edc3f163d9 (no staking credential)
           , - Value (Map [(,Map [(\"\",899981238279)])]) addressed to
    PubKeyCredential: f8a68cd18e59a6ace848155a0e967af64f4d00cf8acee8adc95a6b0d (no staking credential) ]
  Fee: Value (Map [(,Map [(\"\",3298000)])])
  Value minted: Value (Map [])
  DCerts: []
  Wdrl: []
  Valid range: (-\8734 , +\8734)
  Signatories: []
  Datums: [ ( 8392f0c940435c06888f9bdb8c74a95dc69f156367d6a089cf008ae05caae01e
          , <> )
          , ( 9352b132cb8dcedbc4d1115321a357d32b538aa1ba57c4c958ee6ebae8f5d50c
          , <10,
          \"{\\\"9546383daca50c0c643abca09331c5e58cfef49fa899eb8d15bfb2347ba1b001#1\\\":{\\\"address\\\":\\\"addr_test1vru2drx33ev6dt8gfq245r5k0tmy7ngqe79va69de9dxkrg09c7d3\\\",\\\"value\\\":{\\\"lovelace\\\":2000000}}}\"> )
          , ( af586b80c5243d28f4c1e9d1984236d2934fecb6d3d0a3b7e50ba94f446c150f
          , <30, \"{}\">)
          , ( c2f7589a052854c8877e74b7ec3de892981766ef819fc03bc8c893daf66dd72e
          , <<10000000000000>, [10, 20, 30]> )
          , ( f5bcf944acb09ae13fcdec6517ad3ee23c03de7c6ac779dd4304ebfd38faeb44
          , <20,
          \"{\\\"ac6e8d41c8e11d7883b1a5f5b025494cde06c7cecd67d10d06d7627374cf81af#1\\\":{\\\"address\\\":\\\"addr_test1vqg9ywrpx6e50uam03nlu0ewunh3yrscxmjayurmkp52lfskgkq5k\\\",\\\"value\\\":{\\\"lovelace\\\":2000000}}}\"> ) ]
"

It's definitely the head script (with hash 07204aff6885e57d799ee359f7e401d64ff3b54e0de7c4edc3f163d9) that's failing as it's the only output beside change to the Tx.

Amount is correct, it equals the sum of inputs + 2 ADAs
I can see the 2 committed UTXO from Alice and bob.
Trying to have the mockhead script always succeed does not help

2021-12-02

Trying to avoid my LSP displaying "ghost imports" which are a PITA to navigate the file. Also removing the annoying popups.

It's lsp-lens-mode which is enabled by default in lsp-mode. Adding
```
(use-package lsp-mode
   :custom
   (lsp-lens-enable nil))
```
per https://emacs-lsp.github.io/lsp-mode/page/settings/lens/

Pair Programming

Got the following error when running benchmark:

hydra-node: failed to submit tx: HardForkApplyTxErrFromEra S (S (S (S (Z (WrapApplyTxErr {unwrapApplyTxErr = ApplyTxError [UtxowFailure (WrappedShelleyEraFailure (UtxoFailure (ValueNotConservedUTxO (Value 2000000 (fromList [])) (Value 1493272799 (fromList []))))),UtxowFailure (WrappedShelleyEraFailure (UtxoFailure
(BadInputsUTxO (fromList [TxInCompact (TxId {_unTxId = SafeHash "26ecb3a06c1e32f63742f1f7836c42dc86184fd71e32988d0b4099382cf009d1"}) 0])))),UtxowFailure (WrappedShelleyEraFailure (UtxoFailure NoCollateralInputs)),UtxowFailure (WrappedShelleyEraFailure (UtxoFailure (InsufficientCollateral (Coin 0) (Coin 4947000))))]
})))))

Another (different) error:

hydra-node: failed to cover fee for transaction: ErrNotEnoughFunds {missingDelta = Coin 3589920368}, ValidatedTx {body = TxBodyConstr TxBodyRaw {_inputs = fromList [TxInCompact (TxId {_unTxId = SafeHash "852d11d73776b64a9416bbef7811cca03485a01691abc37751dcb866b1353a29"}) 0], _collateral = fromList [], _outputs = St
rictSeq {fromStrict = fromList [(Addr Testnet (ScriptHashObj (ScriptHash "07204aff6885e57d799ee359f7e401d64ff3b54e0de7c4edc3f163d9")) StakeRefNull,Value 2000000 (fromList []),SJust (SafeHash "67d8ed01e13f33438ea9059ac9be2e159f943cffe054283485e0300271e3e9f9")),(Addr Testnet (KeyHashObj (KeyHash "16601980e4ae7eb11e87
180d154cc44a9b24105a6ee7c592ca66329c")) StakeRefNull,Value 3604780403 (fromList []),SNothing),(Addr Testnet (KeyHashObj (KeyHash "542a8a32c2a56fc6081e784a2a0527803015922309eee9ff051f629e")) StakeRefNull,Value 2541227916 (fromList []),SNothing),(Addr Testnet (KeyHashObj (KeyHash "529b55087caf60a7251c68b38f480de1f3ad
d14561322447f25fcf20")) StakeRefNull,Value 1042096452 (fromList []),SNothing)]}, _certs = StrictSeq {fromStrict = fromList []}, _wdrls = Wdrl {unWdrl = fromList []}, _txfee = Coin 0, _vldt = ValidityInterval {invalidBefore = SNothing, invalidHereafter = SNothing}, _update = SNothing, _reqSignerHashes = fromList [],
 _mint = Value 0 (fromList []), _scriptIntegrityHash = SNothing, _adHash = SNothing, _txnetworkid = SNothing}, wits = TxWitnessRaw {_txwitsVKey = fromList [], _txwitsBoot = fromList [], _txscripts = fromList [(ScriptHash "07204aff6885e57d799ee359f7e401d64ff3b54e0de7c4edc3f163d9",PlutusScript PlutusV1 ScriptHash "07
204aff6885e57d799ee359f7e401d64ff3b54e0de7c4edc3f163d9")], _txdats = TxDatsRaw (fromList [(SafeHash "67d8ed01e13f33438ea9059ac9be2e159f943cffe054283485e0300271e3e9f9",DataConstr Constr 3 []),(SafeHash "ff5f5c41a5884f08c6e2055d2c44d4b2548b5fc30b47efaa7d337219190886c5",DataConstr Constr 2 [])]), _txrdmrs = RedeemersR
aw (fromList [(RdmrPtr Spend 0,(DataConstr Constr 3 [],WrapExUnits {unWrapExUnits = ExUnits' {exUnitsMem' = 0, exUnitsSteps' = 0}}))])}, isValid = IsValid True, auxiliaryData = SNothing}, using head utxo: fromList [(TxInCompact (TxId {_unTxId = SafeHash "852d11d73776b64a9416bbef7811cca03485a01691abc37751dcb866b1353
a29"}) 0,(Addr Testnet (ScriptHashObj (ScriptHash "07204aff6885e57d799ee359f7e401d64ff3b54e0de7c4edc3f163d9")) StakeRefNull,Value 2000000 (fromList []),SJust (SafeHash "ff5f5c41a5884f08c6e2055d2c44d4b2548b5fc30b47efaa7d337219190886c5")))], and wallet utxo: fromList [(TxIn "852d11d73776b64a9416bbef7811cca03485a01691
abc37751dcb866b1353a29" (TxIx 1),TxOut (AddressInEra (ShelleyAddressInEra ShelleyBasedEraAlonzo) (ShelleyAddress Testnet (KeyHashObj (KeyHash "30b49f3a89bb12e567cc21f749bbad9276e214eee6ffa63257bbcf30")) StakeRefNull)) (TxOutValue MultiAssetInAlonzoEra (valueFromList [(AdaAssetId,5378452288)])) TxOutDatumNone),(TxIn
 "8a927269eb6e203d189c5be935efc8c239721a58e1176ffab515bcc3ba69040d" (TxIx 1),TxOut (AddressInEra (ShelleyAddressInEra ShelleyBasedEraAlonzo) (ShelleyAddress Testnet (KeyHashObj (KeyHash "30b49f3a89bb12e567cc21f749bbad9276e214eee6ffa63257bbcf30")) StakeRefNull)) (TxOutValue MultiAssetInAlonzoEra (valueFromList [(Ada
AssetId,3601482403)])) TxOutDatumNone)]

Seems like all nodes try to post the fanout which explains the invalid UTXO we observer -> leader only should try to fanout

Our BehaviorSpec test is passing which is wrong We need to observe the transactions posted on chain to ensure a single node posts it Refactoring ConnectToChain type to have a history function exposed to observe it
Our unit test is still passing :( We confirm there's only on FanOutTx posted, by the party which decided to Close. Trying to change the closer in ETE test shows ETE test passes consistently even when we change the closing party, so there's probably something fishy in the transactions we generate in the benchmarks

tx =
  HardForkApplyTxErrFromEra
    S
    ( S
        ( S
            ( S
                ( Z
                    ( WrapApplyTxErr
                        { unwrapApplyTxErr =
                            ApplyTxError
                              [ UtxowFailure (WrappedShelleyEraFailure (UtxoFailure (ValueNotConservedUTxO (Value 2000000 (fromList [])) (Value 520403697 (fromList [])))))
                              , UtxowFailure (WrappedShelleyEraFailure (UtxoFailure (BadInputsUTxO (fromList [TxInCompact (TxId{_unTxId = SafeHash "5866ca5080edbe814c1a5d05d505b137c8648e4f885bc35b951dfaf82c1a969b"}) 0]))))
                              , UtxowFailure (WrappedShelleyEraFailure (UtxoFailure NoCollateralInputs))
                              , UtxowFailure (WrappedShelleyEraFailure (UtxoFailure (InsufficientCollateral (Coin 0) (Coin 4947000))))
                              ]
                        }
                    )
                )
            )
        )
    )

Seems like the wallet cannot find the input to pay/provide collateral There's probably a race condition in the wallet whereby we keep a tx input that's been used until we observe it from an onchain block

We should remove the inputs as soon as we post the transaction to theyt become unavailable

Trying to add a property to WalletSpec checking that: Our properties for covering fees are not relevant as they end up with 100% False cases, eg. the generated tx/outputs don't have enough ADA to pass the function

Trying to change the generators to produce TxOut with emough value and messing up with api/ledger discrepancies
Struggling to generate the right combination of UTXOs for the wallet and an arbitrary transactions to cover fees. Our calculation depends on PParams' maximum fees which are too high as we currently compute fees as an upper bound using maxTxExUnits
Provide trimmed down pparams to ensure most of the transactions successfully cover fees

We observe the close fails to submit because it does not have enough funds, which makes sense given the collectCom tx does not properly propagates the total funds committed.

Fixing the value in the CollectCom's output
Benchmarks now failing consistently because of CannotSpendInput error, which is probably caused by some node trying to post a CollectCom transaction concurrently with another node?

AB Solo

Looking at the errors reported by the benchmarks, feeling they could be clearer. Having the logs dumped to stdout as Haskell Show instances makes them somewhat less readable and parseable. Also thinking of a way to prove this is caused by concurrent attempts at posting the collectcom tx: There aren't any hints at this in the logs

Trying to replace the body of transactions in the DirectChain log with their ids

Turns out we also have a problem in commitTx:

commitTx party utxo (initialIn, pkh) =
  mkUnsignedTx body datums redeemers scripts
 where
  body =
    TxBody
      { inputs =
          Set.singleton initialIn
            <> maybe mempty (Set.singleton . toShelleyTxIn . fst) utxo
      , collateral = mempty
      , outputs =
          StrictSeq.fromList
            [ TxOut
                (scriptAddr commitScript)
                (inject $ Coin 2000000) -- TODO: Value of utxo + whatever is in initialIn
                (SJust $ hashData @Era commitDatum)
            ]

There too the values are incorrectly computed. This does not really explain why the benchmarks fail with CannotSpendInput though?

It seems we are consuming the wrong UTXO in the collectCom, eg. we are consuming the committed UTXOs instead of the result of the commit tx

In the observeCommitTx we return the committed UTXO:

observeCommitTx :: ValidatedTx Era -> Maybe (OnChainTx CardanoTx)
observeCommitTx tx@ValidatedTx{wits} = do
  txOut <- snd <$> findScriptOutput (utxoFromTx tx) commitScript
  dat <- lookupDatum wits txOut
  (party, utxo) <- fromData $ getPlutusData dat
  OnCommitTx (convertParty party) <$> convertUtxo utxo
 where
  commitScript = plutusScript MockCommit.validatorScript

  convertUtxo = Aeson.decodeStrict' . OnChain.toByteString

and the collectCom has no way to know the actual UTXO from the CommitTX itself.

2021-12-01

Hydra Engineering meeting on rollbacks

Quick intro why Rollbacks happen and which parts of the architecture are related to this
Arnaud presents the strategy:
- Each node just re-applies the events to recover the HeadState as it was
- When re-applying, the events are not reported back to the HeadLogic (really? why not?)
- But what happens when there is e.g. an Abort when re-applying / synchronizing with the chain.
- Any unexpected "replay" is deemed an adversarial action and we would be closing / aborting the head anyways -> anything happened in the Head so far would be lost.
- We aim to expose "stability" to our users so they can decide whether they rely on the open Head.
Discussion starts
- PAB does this replaying as well and we are in danger of "re-inventing the wheel".
- Seeing an inconsistent transaction might not necessarily be an adversarial move though. Forking chains could result in this even with all honest parties.
- What is the actual problem here?
Running example: The txs establishing a Head are rolled back and cannot be re-applied -> the Head was never open.
- Is it only when opening the Head? But also when closing/contesting the Head?
- Simple re-submission might not be enough, it could also require the application to re-balance or even re-construct the transaction. (Three levels of reaction)
- "Whatever it takes" to re-establish the HeadState.
"Confidence" in Head should be visible to the users and they can decide on it
- Is this an individual decision or should it be known apriori? i.e. a parameter
- Users should decide
Other situations where rollbacks are bad:
- Contestation rollbacks!
- What happens with contestation period / validity?
- Need to adapt the timeout to the new situation / new slots?

Ensemble

We have several issues popping up following our changes in the commitis/collectCom/fanout logic:

ETE test is failing intermittently to submit the fanout tx because of unsifficient funds and also missing UTXO
Some properties are also failing on generating init/commit?

Checking the serialisation (Plutus) for ContestationPeriod

Trying to track where the PT1 error comes from. It only appears in the .pir code but not in the PLC, however the PIR for head is not generated.

Other option is to dump the splice with TH.
This is a dead end, what happens is that the script execution fails because of a mismatch in redeemers, because we are missing an input

Seems like we had a collision in the generators for TxIn which we import from Test.Cardano.Ledger.Shelley.Serialisation.EraIndepGenerators The hash generators used is based on an Int:

genHash :: forall a h. HashAlgorithm h => Gen (Hash.Hash h a)
genHash = mkDummyHash <$> arbitrary

mkDummyHash :: forall h a. HashAlgorithm h => Int -> Hash.Hash h a
mkDummyHash = coerce . hashWithSerialiser @h toCBOR

...

instance CC.Crypto crypto => Arbitrary (TxId crypto) where
  arbitrary = TxId <$> arbitrary

instance CC.Crypto crypto => Arbitrary (TxIn crypto) where
  arbitrary =
    TxIn
      <$> (TxId <$> arbitrary)
      <*> arbitrary

Int has a much smaller domain than a 32-bytes BS obviously and we fell into a case where 2 TxIn were generated that lead to the inputs being "merged".

How to evolve our code to handle that problem? What this means is that we must ensure the head input is unique and does not conflict with the initials input

Function is currently not total, it fails if this requirement is not met
We could return an Either or create a "smaller" input type with a smart constructor, but the latter is pretty much the same as the first.
Trying to filter the initials to remove the head input if it's there => does not work, because it then fails to validate the scripts because of discrepancies between utxo, tx and redeemers
If/when we mvoe to using cardano-api, we know the makeTransactionXX functions can fail so we probably need to fail too.

End up returning Either which ripples all over the codebase

We have 59 calls to error in our codebase, which is not great

We are observing collision in the list of initials now:

Passing a Map TxIn Data as initialInputs to abortTx function makes it more explicit we don't want collisions there

November 2021

2021-11-30

SN Solo on refactoring Direct chain

Start refactoring of using cardano-api in Hydra.Chain.Direct{.Tx, .Wallet}
Start from the "inside out" by creating Api.Tx and converting it to Ledger.ValidatedTx on demand to ensure we can create the transaction drafts / pass them to the Wallet as we do right now.
When creating a plutusScript pendant to the existing one to get a cardano-api script (to get a script address) -> where is the ToCBOR Plutus.Script instance coming from?
Turns out.. there is only a Serialise instance (from serialise package) and other parts (plutus-ledger package Ledger.Scripts module) to define an orphan ToCBOR instance which does use encode from serialise package
observeInitTx logic could be directly translated via access to Ledger.TxBody, but can be made much simpler using the TxBodyContent
Rewriting observeInitTx was a bit more work than expected, but works.
If we use cardano-api types we might be able to drop the Data from OnChainHeadState triples as the Api.TxOut can carry the Datum to spend an output (in CtxTx) .. or maybe not as it's optional in the Api.TxOut type.
Next Step: ensure finalizeTx can work with only a TxBody and produce a fully balanced & signed ValidatedTx
- idealy using makeTransactionBodyAutoBalance eventually is an alternative to coverFee_
Looking at the signature of makeTransactionBodyAutoBalance.. should initTx be producing a TxBodyContent BuildTx Era instead? and only the Hydra.Direct.Wallet make it a BalancedTxBody and then sign it? i.e.

initTx :: .. -> TxBodyContent BuildTx Era

-- rename to 'balance'?
coverFee :: .. -> TxBodyContent BuildTx Era -> TxBody Era

sign :: .. -> TxBody Era -> Tx Era

2021-11-29

Pairing Session

Today's goal: Fanout real UTXO

We need to add the UTXO to the output of the fanout TX
The fanout tx is currently incorrect as it outputs a UTXO for the state machine which should not be the case
- To detect the fanout tx, we can look at the inputs and check one of them uses the head script passing FanOut as redeemer

Adding an assert to the DirectChainSpec to observe there is a payment made to Alice after observing the FanOutTx

We fail to submit tx with a strange error about OutpuTooSmall
Looks like we were tripped by SN's comment about closing with an arbitrary UTXO! The problem is that we cannot make this test pass keeping it as it is ,there are still some kind of verifications done to ensure consistency of txs

Going to add more verification in EndToEndSpec

We want the same kind of assertion to be done in ETE test, eg. to check we correctly fan out the right UTXO for alice and bob
Refactoring check from DirectChainSpec into a waitForUtxo function Had to extract \case function to named and typed function in where clause to make compiler happy

We cannot use the utxo in ETE directly as it's a mixed type TxIn/JSON -> converting to and From JSON to get a correct Utxo

The ETE test fails because the output we want to fanout is too small! It's 14 which is fine off-chain as our params are very lenient, but not quite so in Alonzo.
We commit just 1 ADA in the head which is enough to fanout

We have a failure on the /Hydra.Chain.Direct.Tx/fanoutTx/transaction size below limit for small number of UTXO/ test: With too many UTXOs the transction becomes way too large, esp. as those UTXOs are pretty much arbitrary and can themselves be very large.

Trimming it down to filter UTXO > 10 items does not help much
*Disabling the property for now

Discussion on rollbacks

Thinking about rollbacks and laying out a plan:

If we can resubmit a rollbacked tx, we do it
- How do we detect a transaction has been rolled back?
  - We need to record the block at which a transaction of interest is observed (from Chain Sync)
  - When we get a rollback message, we can check which transactions are past the rolled back index
  - Then we can resubmit them in order
- How do we know it can be resubmitted?
  - we always resubmit
- If resubmit fails on a supposedly rolledback tx
  - Alert user
  - Act on failing tx in head state:
    - Init => Head vanished => discard state
    - Commit => User can try to recommit, the same or other utxo?? (we know which commit(s) has been rolledback)
    - CollectCom => one commit probably disappeared?
- Who does the resubmit?
  - the one who initially submitted it
  - if the rollback is adversarial ??
Could we kill OnChainHeadState and only do queries?
- we have 2 competing states => risk of desync is high
- we should stop observing the chain in the DirectChain
Each party's chain component should insulate the Head state from rollbacks and try to resubmit
If someone does not resubmit, there's no point in another party trying to resubmit
The onchain head state maintains a stream of events topost/to observe
- When a rollback happens, past events are rehandled in a "rollback mode" which means they do not propagate to the Head State
- New observation from the chain still entail notification to the head state
- Head state must trust the chain component and any event coming from it resets the head state
All nodes need to follow the rollback protocol so there cannot be any "interleaved" event
- in this case we need to abort/close, but that can also be tricky

We need to design the Close sequence for rollbacks

2021-11-26

Ensemble Session

How to deal with benchmarks?

We need to commit UTXO initially
We need to pass the keys for the initial UTxO to ensure the commits end up having the same ids between every run

Adding singingKey to Dataset type -> need to implement To/FromJSON also removing Eq instance

Adding a roundtrip JSON test for Dataset

We cannot use plain genDataset, got some errors trying to generate arbitrary transactions:

  src/Test/Aeson/Internal/RoundtripSpecs.hs:59:5:
  1) Test.Generator, JSON encoding of Dataset, allows to encode values with aeson and read them back
       uncaught exception: ErrorCall
       findPayKeyPairAddr: expects only Base or Ptr addresses
       CallStack (from HasCallStack):
         error, called at src/Test/Cardano/Ledger/Shelley/Generator/Core.hs:434:7 in cardano-ledger-shelley-test-0.1.0.0-827a00c3eaf868a9c6ed74e429f91efce6a3bea6c8e377f0e0d8dab608426e8b:Test.Cardano.Ledger.Shelley.Generator.Core
       (after 2 tests)
         Exception thrown while showing test case:
           findPayKeyPairAddr: expects only Base or Ptr addresses
           CallStack (from HasCallStack):
             error, called at src/Test/Cardano/Ledger/Shelley/Generator/Core.hs:434:7 in cardano-ledger-shelley-test-0.1.0.0-827a00c3eaf868a9c6ed74e429f91efce6a3bea6c8e377f0e0d8dab608426e8b:Test.Cardano.Ledger.Shelley.Generator.Core

Removed arbitrary dataset, now making sure we can commit generated UTXO

In the generatePayment we extract the UTXO for the initial funds, there is a function apparently for that in the cardano-api we could use to get deterministically the initial txIn from which we can construct the payment transactions and hence its initial UTXO
Adding a mkGenesisTx function to compute the transaction and output from initialFunds for a given key
Realising we don't need the initial Utxo but actually an initial payment tx => we still need to store signing key to prime the node

Wart: Would make sense to have the networkId in the CardanoNode, right now we expose a defaultNetworkId which is hard-coded

Implementing castHash to convert from a payment key to genesis utxo key

We don't have access to constructors so we need to serialise then deserialise to convert the values

GeneratorSpec tests are now failing because we use all UTXO from the initial funding transaction to compute the amount to send, we should only select the "commit UTXO" and pass this around -> writing a (partial) function to select the minimal UTXO.

This will work fine for the genesis transaction because all UTXO have the same TxId and different index, and the "commit" UTXO is the first one by construction.

Benchmark compiles but fails with a strange error about "peers not connected"

There was a discrepancy in the value of initial funds leading to an error when committing on-chain -> unified into a hardcoded constant in CardanoCluster
We now only have a problem with paying the fees for the initial funding tx, so we need to use buildRaw and calculateMinFee to properly build the tx
We also have a logical problem with the way we generate and run datasets:
- The concurrency parameter defines how many datasets we generate
- We use the number of datasets generated to define the number of nodes to run

Now struggling to retrieve the ProtocolParameters needed to calculate feees.

We want to extract them from the genesis-shelley.json file, but apparently there's a discrepancy in the formats: The API has aProtocolParameters format which is different from each Era's format
There is a JSON instance for genesisShelley, we can use that to read it from file and then convert to API's ProtocolParameters

Benchmark fail on submitting commit tx with an ValueNotConserved error: Seems like the UTXO we consume is not correct, probably unknown by the Wallet -> This is the one we construct from the initialFunds which is supposed to work thanks to a ledger function to produce a TxIN from initial funds

Managed to generate dataset with initial funding transaction but ran into a snag: The "leader" of the bench run does the InitTx which means it consumes its initial funding and produces a new transaction out of it, hence the initial funding transaction in the dataset does not exist anymore.

If we move the commit seeding before sending the init, we get another error: Commit transactions have more than one UTXO committed.
We forgot to filter the utxo returned by the initial funding transaction

It now fails in the finalizeTx: The wallet raises an exception saying it cannot find the input to spend or cover the fees.

MB Solo Programming

Fixed run of the benchmarks:

We attached the client to the datasets in the wrong order thus the keys and UTXO were not the right ones
Always select the maximum value UTXO from the wallet for change

2021-11-25

Ensemble Programming

Migrating to cardano-api:

need to materialise nix on MB's machine because of a weird error
some failing tests: generator test is failing with invalid witnesses, validation of TX also fails?

The generator is failing with NoRedeemer error -> probably generating tx with scripts and not passing the redeemers => Improving error reporting to see the actual tx generated

Transaction generated by alonzo generator are not necessarily valid because of the scripts or execution units or what not...

We should talk to ledger team on how to generate valid Alonzo txs, who does generate Alonzo txs for test?
In the meantime, would make sense to generate Mary txs and resign them because of the issue with body serialisation.
Trying to increase execution budget in the PParams does not wokr Using freeCostModel in our generator to sidestep the issue of execution units. Switching makes the test pass, seems like there's a TODO in ledger code about using a non-free cost model for generating tx with scripts

Rebasing hail-cardano-api branch onto master in order to fix ETE test: We want to be able to commit 0 or 1 UTXO which has been fixed in master and is the last failing test

AB Solo Programming

Completing work on commits from L1.

Fixing ETE test to ensure it uses a properly committed UTXO. It's pretty straightforward thanks to the utxoToJSON function that converts the generated payment to the expected format.

Transaction fails to be submitted off-chain:

         seen messages:    {"transaction":{"witnesses":{"keys":["8200825820db995fe25169d141cab9bbba92baa01f9f2e1ece7df4cb2ac05190f37fcc1f9d5840dffeaeb16f1b23a76b1f038f835099c81aaaab7d1ac9e8c0fadc192e7593466810193a72d53c9402eeb7748e6b7eef19d287241b385976da929237f279d3d300"],"scripts":{}},"body":{"outputs":[{"address":"addr_test1vz35vu6aqmdw6uuc34gkpdymrpsd3lsuh6ffq6d9vja0s6s67d0l4","value":{"lovelace":1000000}}],"mint":{"lovelace":0},"auxiliaryDataHash":null,"withdrawals":[],"certificates":[],"inputs":["9fdc525c20bc00d9dfa9d14904b65e01910c0dfe3bb39865523c1e20eaeb0903#0"],"fees":0,"validity":{"notBefore":null,"notAfter":null}},"id":"4c69e0154cdc07ca752157ed6cf247fe449b3d21e40bab0c848a822ae5a54c85","auxiliaryData":null},"utxo":{"998eec9baf49ee66c1609157f00a31198621740226584ae0eb4f32c81ff700f0#1":{"address":"addr_test1vru2drx33ev6dt8gfq245r5k0tmy7ngqe79va69de9dxkrg09c7d3","value":{"lovelace":1000000}}},"validationError":{"reason":"ApplyTxError [UtxowFailure (UtxoFailure (ValueNotConservedUTxO (Value 0 (fromList [])) (Value 1000000 (fromList [])))),UtxowFailure (UtxoFailure (BadInputsUTxO (fromList [TxIn (TxId {_unTxId = SafeHash \"9fdc525c20bc00d9dfa9d14904b65e01910c0dfe3bb39865523c1e20eaeb0903\"}) 0])))]"},"tag":"TxInvalid"}

The input to use for the off-chain transaction was hardcoded -> replacing with the one we generate

Now with another error:

{"transaction":{"witnesses":{"keys":["8200825820db995fe25169d141cab9bbba92baa01f9f2e1ece7df4cb2ac05190f37fcc1f9d58405f14a0e9b7da0deca07529cd4d1fa8e59b5efe345afcc94c7dbf1eb7c2e8a485658e36715d04ea305f510291204c0450f4d7cbc119e2495d98430134a3b9c301"],"scripts":{}},"body":{"outputs":[{"address":"addr_test1vz35vu6aqmdw6uuc34gkpdymrpsd3lsuh6ffq6d9vja0s6s67d0l4","value":{"lovelace":1000000}}],"mint":{"lovelace":0},"auxiliaryDataHash":null,"withdrawals":[],"certificates":[],"inputs":["998eec9baf49ee66c1609157f00a31198621740226584ae0eb4f32c81ff700f0#1"],"fees":0,"validity":{"notBefore":null,"notAfter":null}},"id":"d87840c06e3e65d422ed9181273579dc82e6b471024d3899610b5c025a243442","auxiliaryData":null},"utxo":{"998eec9baf49ee66c1609157f00a31198621740226584ae0eb4f32c81ff700f0#1":{"address":"addr_test1vru2drx33ev6dt8gfq245r5k0tmy7ngqe79va69de9dxkrg09c7d3","value":{"lovelace":1000000}}},"validationError":{"reason":"ApplyTxError [UtxowFailure (MissingVKeyWitnessesUTXOW (WitHashes (fromList [KeyHash \"f8a68cd18e59a6ace848155a0e967af64f4d00cf8acee8adc95a6b0d\"])))]"},"tag":"TxInvalid"}

Probably because we are passing the wrong key?

=> Keys and addresses were generated and hardcoded

ETE test is now passing!

Fixing benchmark to work with on-chain commits:

we currently generate an arbitrary dataset from a seed random Utxo and then generating transactions with the right keys
in the case of the constant Utxo, things are fine because we generate key pair to produce a utxo so we could as well keep the (initial) keys around and then commit the initial utxo on chain when starting the benchmark

Struggling a bit with getting to/fromJSON instances right for KeyPair, current solution is to store the signing key only and regenerate the verification key from the serialised bytes the bytestring is hex-encoded in a JSON object

Added a function to transform a Ledger.KeyPair into a (VerificationKey PaymentKey, SigningKey PaymentKey)

Feels like changing the benchmark is a bit more involved than merely adding keys and committing, as the whole logic of generating transactions beforehand is a bit borked. We should probably provide not a dataset but parameters for a dataset, like number of transactions to run and other things, and generate the txs on the go. This might skew the timings a bit but probably dwarfed by IOs anyway. Feels like the "right path" be:

have a single way of generating dataset, eg. one utxo per client
generate keys for each participants
pass dataset parameters instead of actual dataset
do not store the dataset? Once we have keys defined then the UTxOs should be constant?
generate transactions in client from previous transaction, using the same keys
- transactions can be sent at random to some other party? but this might deplete the clients' funds and led to a client not being able to post txs anymore
- fix the payment graph so that amounts stay constant and all parties can always keep generating txs

2021-11-24

AB Solo Programming

Still trying to properly commit actual UTXOs.

Managed to get TxOut transformed between an Alonzo one and a Mary one, without requiring full transformation of the internal ledger

Tests are failing still but with a different error, namely that we commit more than UTxO which is odd... Actually not: We wait for all payments at some address and use the retrieved UTxO there to commit, but we should only commit one of them.

got another interesting error:

ErrNotEnoughFunds {missingDelta = Coin 2298000}

At least, I can see that the inputs are correctly set, with the committed UTxO as input and also in the datum:

failed to cover fee for transaction: ErrNotEnoughFunds {missingDelta = Coin 2298000},
  ValidatedTx {body = TxBodyConstr TxBodyRaw {
    _inputs = fromList [TxIn (TxId {_unTxId = SafeHash "bc1eae5aa8e72d3f9e5c0cd725b49dc47954570b6087eb16b5cc3f4ce6daf4ea"}) 1
                       ,TxIn (TxId {_unTxId = SafeHash "e50062182d5d401d13249a7f7e7e1ac73deec0170421e10bc7d9b346c284ebdd"}) 1],
    _collateral = fromList [],
    _outputs = StrictSeq {fromStrict = fromList [(Addr Testnet (ScriptHashObj (ScriptHash "6679d3c92844becb16c55161b60111336e1ba2f3d14bbb52b051c4db")) StakeRefNull,Value 2000000 (fromList []),SJust (SafeHash "549485dcc8131ab64122a9163080943f55b83ae368bc55bec73e583f192f3080"))]},
    _certs = StrictSeq {fromStrict = fromList []},
    _wdrls = Wdrl {unWdrl = fromList []},
    _txfee = Coin 0,
    _vldt = ValidityInterval {invalidBefore = SNothing, invalidHereafter = SNothing},
    _update = SNothing,
    _reqSignerHashes = fromList [],
    _mint = Value 0 (fromList []),
    _scriptIntegrityHash = SNothing,
    _adHash = SNothing,
    _txnetworkid = SNothing},
  wits = TxWitnessRaw {_txwitsVKey = fromList [],
                       _txwitsBoot = fromList [],
                       _txscripts = fromList [(ScriptHash "12c4f8ff8070f0d659bdb2ecf844190ddb1134ef821764bd2b4649b2",PlutusScript PlutusV1 ScriptHash "12c4f8ff8070f0d659bdb2ecf844190ddb1134ef821764bd2b4649b2")],
                       _txdats = TxDatsRaw (fromList [
                          (SafeHash "2502ff9c9c341dd1384724ae35eab0b19e394c90226892fcc8e7cc86342d324e",DataConstr B "\248\166\140\209\142Y\166\172\232H\NAKZ\SO\150z\246OM\NUL\207\138\206\232\173\201Zk\r"),
                          (SafeHash "549485dcc8131ab64122a9163080943f55b83ae368bc55bec73e583f192f3080",DataConstr Constr 0 [I 10,B "{\"e50062182d5d401d13249a7f7e7e1ac73deec0170421e10bc7d9b346c284ebdd#1\":{\"address\":\"addr_test1vru2drx33ev6dt8gfq245r5k0tmy7ngqe79va69de9dxkrg09c7d3\",\"value\":{\"lovelace\":1000000}}}"])]),
                        _txrdmrs = RedeemersRaw (fromList [(RdmrPtr Spend 0,(DataConstr Constr 0 [],WrapExUnits {unWrapExUnits = ExUnits' {exUnitsMem' = 0, exUnitsSteps' = 0}}))])}, isValid = IsValid True, auxiliaryData = SNothing},
using utxo: fromList [(TxIn (TxId {_unTxId = SafeHash "bc1eae5aa8e72d3f9e5c0cd725b49dc47954570b6087eb16b5cc3f4ce6daf4ea"}) 0,(Addr Testnet (ScriptHashObj (ScriptHash "07204aff6885e57d799ee359f7e401d64ff3b54e0de7c4edc3f163d9")) StakeRefNull,Value 2000000 (fromList []),SJust (SafeHash "f4b9d64e4725efc05d7d078bb19e952b288c8403f5a585a8a6ffe589a9851614"))),
                      (TxIn (TxId {_unTxId = SafeHash "bc1eae5aa8e72d3f9e5c0cd725b49dc47954570b6087eb16b5cc3f4ce6daf4ea"}) 1,(Addr Testnet (ScriptHashObj (ScriptHash "12c4f8ff8070f0d659bdb2ecf844190ddb1134ef821764bd2b4649b2")) StakeRefNull,Value 2000000 (fromList []),SJust (SafeHash "2502ff9c9c341dd1384724ae35eab0b19e394c90226892fcc8e7cc86342d324e")))]

Finally fixed the commit test:

The UTXO set maintained by the Wallet is right: When we find a block, we traverse the transaction list (topological ordering?), remove the txins we know from the map and add the txouts we found corresponding to our address of interest.
The problem was in the way we select the UTXO to use in coverFee: We take the maximum of the UTXO from our internal state but this maximum is just an ordering of txids and chances are we get a smaller UTXO. Just filtering the map to select UTXO with a value higher than some threshold makes the test pass.
The only remaining test that fails is the EndToEndSpec test as we still try to commit an arbitrary UTXO.

Ensemble Programming

Fixing TUI and the use of addresses, comparing by Show instance which is not great but needed because we keep them as keys in Map. Now implementing mkSimpleTx which is the function from TUI that creates actual transaection to be committed on chain

Completed implementation of mkSimpleTx, it now returns an Either with an error, because makeTransactionBody does: It checks well-formedness of the transaction.

Got a failure with invalid witnesses and an encoding problems for UTxOs, getting an invalid UTF-8 encoding error

Problem comes from AssetName. AssetName are encoded as Latin-1 in the cardano-api, why?
ToJSON/FromJSON instances in our version of cardano-api are wrong, they do not roundtrip properly. We want to upgrade our cardano-node dependency as it's been fixed recently

Fixed the dependencies and the few impacts from changes in API

Still have a few failures related to the UTXO generator
One test that's failing is the one about size of commits: What is this test about?. Perhaps we only need to ensure a single UTXO would fit atm?

2021-11-22

Discussed usage of cardano-api in Hydra.Ledger.Cardano, MB is implementing it; that the Alonzo types are more complete / simpler to handle, but most of Alonzo features are not supported yet by our integration -> need to strip down generators in tests
"Fixed" benchmarks to only commit a single UTXO (current limitation)
Continued "committing real UTXO" in pairing session
- Expect postTx of committing arbitrary UTXO to fail when really spending the selectd UTXO in commitTx
- To make it succeed though, we need to generate a payment tx such that the Hydra.Direct.Wallet "sees" the resulting UTXOs, knows about them and can spend them
- We are using cardano-api via CardanoClient to construct, sign and submit this transaction
- In order to pass the resulting UTxO to CommitTx (Utxo CardanoTx) we would either need to convert cardano-api UTxO to the cardano-ledger UTxO type, or utilize the refactored Hydra.Ledger.Cardano which uses cardano-api types

Getting a cardano-node dev environment

The shell.nix by default does also build local cluster scripts which cannot be disabled with an argument.
Also, this workbench is a bit confusing and it didn't seem to be giving me something I need.
The cabal in scope is actually wrapped and uses a different cabal.project and using the standard one is not working well with exactDeps = true
In summary, here is a diff I used to get into a nix-shell which can cabal test cardano-api:

diff --git a/shell.nix b/shell.nix
index b44ff6d99..c20294d26 100644
--- a/shell.nix
+++ b/shell.nix
@@ -89 +89 @@ let
-      cabalWrapped
+      pkgs.cabal-install
@@ -104,5 +103,0 @@ let
-    ## Workbench's main script is called directly in dev mode.
-    ++ lib.optionals (!workbenchDevMode)
-    [
-      cluster.workbench.workbench
-    ]
@@ -112,9 +106,0 @@ let
-    ]
-    ## Local cluster not available on Darwin,
-    ## because psmisc fails to build on Big Sur.
-    ++ lib.optionals (!stdenv.isDarwin)
-    [
-      pkgs.psmisc
-      cluster.start
-      cluster.stop
-      cluster.restart
@@ -125 +111 @@ let
-    exactDeps = true;
+    exactDeps = false;

2021-11-19

Ensemble Programming

Looking at PR to review, SN's PR fails to build on CI with the following error:

hydra-node: failed to submit tx: HardForkApplyTxErrFromEra S (S (S (S (Z (WrapApplyTxErr {unwrapApplyTxErr = ApplyTxError [UtxowFailure (WrappedShelleyEraFailure (UtxoFailure (MaxTxSizeUTxO 17634 16384)))]})))))
CallStack (from HasCallStack):
  error, called at src/Relude/Debug.hs:288:11 in relude-1.0.0.1-7lFjKp98vwwCeaGqBf05Yi:Relude.Debug
  error, called at src/Hydra/Chain/Direct.hs:329:34 in hydra-node-0.1.0-inplace:Hydra.Chain.Direct

Seems like we are already hitting some limits of the chain, in this case the size of Tx apparently (greater than 16K)

Bumping tx size to 50K and block size to 100K allows running the benchmark
committed Utxo is about ±100 Utxo in total which makes the datum quite large

Something we could do to reduce the size of the datum on chain would be to make the commit and collectCom transactions pass a datum made from a hash of the MT of the committed Utxos.

The committed Utxos would then be sent as a first message off-chain, signed by each party and verified thanks to the MT root hash:
each party send Committed message containing its Utxo
each party reconstruct the Utxo MT and verify its root hash
But is it really needed? We can reap the Utxo directly from the commmitted transactions, no need to pass it around
Plus, how is it verified on-chain? The ν_commit validator needs to verify, in the case of an abort, that the UTXO posted by the AbortTx are indeed the ones present in the datum of each of the aborted commits: This could be achieved by computing the MT root of the UTXOs committed by the abort transactions, as the validator has access to them but of course this could be computationally relatively expensive.

Another failure running the benchmark:

hydra-node: failed to submit tx: HardForkApplyTxErrFromEra S (S (S (S (Z (WrapApplyTxErr {unwrapApplyTxErr = ApplyTxError [UtxowFailure (WrappedShelleyEraFailure (UtxoFailure (FeeTooSmallUTxO (Coin 3365841) (Coin 3298000))))]})))))

Two problems:

We can't just pack arbitrary many Utxo because this would blow up the tx size -> limit a single Utxo per participant
We can only ever commit concrete CardanoTx not abstract ones on the direct chain -> removing parameterisation on concrete transactions handling in Direct chain
We can do things 2 steps:
- first degenerify the tx when using DirectChain
- second limit commits to a single Utxo

Where to check we do not commit more than on TX?

We could do it in the CommitTx but this would require introducing some more type for representing a single TxOut -> large change
We do it at the DirectChain level throwing an exception if something goes wrong
We check the size of committed Utxo in the fromPostChainTx function -> we could do it in the commitTx function instead?

Still 2 tests failing:

1 test about the size of the commitTx -> should be fixed once we change the interface to the commitTx function
End to end test

Interestingly our current code does not allow commiting no UTxO which explains why the ETE tests is failing: node 2 and 3 do not commit anything. But the error reported says: MoreThanOneUtxoCommitteed which is certainly misleading.

Added a test to DirectChainSpec asserting we can commit an empty UTxO set, but this feels a bit too high-level for this kind of test. Perhaps there could be a more granular test module for the postChainTx function? There seems to be a separate responsibility here, which is the handling of the on-chain state
But the commitTx function accepts one and only one UTxO, so we cannot commit an empty UTxO set Going down the easy route: pass Maybe Utxo to the commitTx function

Weekly Review

What we achieved?

Swapped Hydra node to use real cardano node on a devnet, removing Mock chain
Working on making master "green" again following big changes
Demo works again with cardano node

What we plan to do?

Properly commit and fanout Utxo from/to the real chain
Design and implement handling of rollbacks
Start implementing proper OCV in Plutus (again)
Follow-up meetings with potential Hydra early adopters

2021-11-18

Seems like haskell.nix puts an exe into the shell env when it's mentioned as a build-tool-depends in one of the local packages' .cabal
When showing the demo, ensure the devnet is wiped and cardano-node is restarted, otherwise the hydra-node (it's wallet) could not find a "seed input" and crashes (for now at least)

2021-11-17

SN fixing docker setup

To have the cardano-node docker (entrypoint) not remove key arguments and indeed produce blocks it requires the environment variable CARDANO_BLOCK_PRODUCER=true
- https://github.com/input-output-hk/cardano-node/blob/master/nix/docker/context/bin/entrypoint seems to be the entrypoint in use
When getting NoLedgerView errors, updating genesis systemStart (byron + shelley) to be "within some time" helps
Investigating why the returned tx is not captured by observeInitTx
- We should really log why something is not an initTx etc. -> Either Reason OnChainTx
Found the reason:
- observeInitTx thought party is not elem of the converted parties
- convertParty creates un-aliased Party from the chain data
Possible solutions:
- alias of Party is not taken into account for Eq of Party
- strip alias from party before checking elem party parties
- Do not incorporate alias into party but wrap it with data AliasedParty = AliasedParty Text Party
Receive a CommandFailed when trying to commit
Ran into the problem that the hydra-tui was showing "Initializing" == InitialState, but in fact we were only in "ReadyState" -> this is because we violated "make impossible states unrepresentable" when managing the state in Hydra TUI! :(
Make HeadLogic "alias"-proof by adding an aliased party in HeadLogicSpec
Lots of repetition between README.md and demo/README.md, especially after introducing prepare-devnet.sh -> only explain in demo/?
Chain works, but seems like hydra-nodes are not connecing to each other (using direct cabal invocation of demo setup)
Nodes seem to be connected in the docker-compose setting
- No tx submission in an open head though with repeating ReqTx:

  {"message":{"node":{"by":{"alias":"alice","vkey":"000000000000002a"},"event":{"message":{"transaction":{"witnesses":{"scripts":{},"keys":["820082582060cdff1c5cd672fb7d8df7f60121fabd4416b2381df70d5c65cb1559af81599858406d08cd35336088575712d8f7fb5fc96a9e29fa6c89305a920aa41e2162a98b0daeb82a7696d14cd9ff6b308eebf71620f354a6820467d87ca5ff8ca383f10705"]},"body":{"outputs":[{"address":"addr_test1vre6wmj9zmh0fjfavedh6q9lq32lunnlseda4xk7t0cg47sal9qft","value":{"lovelace":1893670963}}],"mint":{"lovelace":0},"auxiliaryDataHash":null,"withdrawals":[],"certificates":[],"fees":0,"inputs":["ae85d245a3d00bfde01f59f3c4fe0b4bfae1cb37e9cf91929eadcea4985711de#93"],"validity":{"notBefore":null,"notAfter":null}},"id":"7071e48915eb9c3de986cef336544c24af8a01eedbf4721ddbef6fce0b591ad3","auxiliaryData":null},"party":{"alias":"alice","vkey":"000000000000002a"},"tag":"ReqTx"},"tag":"NetworkEvent"},"tag":"ProcessedEvent"},"tag":"Node"},"timestamp":"2021-11-17T18:21:07.164988302Z","namespace":"HydraNode-1","threadId":33}

Ensemble Programming

Just realised we have this section in shell.nix

  tools = [
    pkgs.pkgconfig
    pkgs.haskellPackages.ghcid
    pkgs.haskellPackages.hspec-discover
    pkgs.haskellPackages.graphmod
    pkgs.haskellPackages.cabal-plan
    pkgs.haskellPackages.cabal-fmt
    # Handy to interact with the hydra-node via websockets
    pkgs.ws
    # For validating JSON instances against a pre-defined schema
    pkgs.python3Packages.jsonschema
    pkgs.yq
    # For plotting results of local-cluster benchmarks
    pkgs.gnuplot
  ];

but actually it's not used and the tools listed are not available on the command-line! Seems like they are used in the shell based on haskell.nix but not in the cabal only shell

Problems on master:

Flacky test on DirectChain
Test checking conformance of logs with schema does not seem to catch undocumented ctors

Adding an item in the backlog for rollbacks which we should handle sooner rather than later

Looks like the flakiness of DirectChainSpec comes from the use of withCluster which strats 3 nodes and produces rollbacks

We need to add waitForSocket everywhere which is clumsy => refactor to move into withBFTNode

Last step before merge to master = make benchmark runnable again

We need to generate all key pairs for all nodes in the cluster and then write relevant files for each node
We need the (Cardano) keys to modify the entries in initialFunds -> move them before we start BFTNode
Passing the list of verification keys to makeNodeConfig so that we do the change to genesis-shelley.json inside the withBFT function => add empty list when not needed Using lenses to update the initialFunds field, we can use addField function from CArdanoNode We need to encode the VerificationKey PaymentKey we have into Hex-encoded thingy

We can run the benchmarks and got results 🎉

Seems like re-running benchmarks does not work correctly now

How to deal with rollbacks?

resubmit transaction when it gets rollbacked? if it is valid
indicate to the user probability of a rollback -> %age of stability = paranoia level
making it a function of value committed? overridable with own settings
we could replay the sequence of events? => genuine rollbacks (non-adversarial)
if L1 can rollback so can L2
once enough time has passed no rollback can happen => only need to keep stream of events until $k$ slots has passed
the off-chain can start from where it was, the latest snapshot if the Head can be reopened with same UTXO set
if contestation period is shorter than rollback period this could be a security issue we could not submit the exact same close because the validator would check the contestation period extends from the start of the close
txs in the mempool would be replayed automatically in case of rollbacks => we'll observe them in the ChainSync
user might want to introspect the on-chain state?
PAB does not do anything about rollbacks -> pushing it to users
what to expose to users? stability level, probability that there will be a rolllbak (99.99% is a few dozen blocks)
provide a HeadRollbacked output to users
practically, most rollbacks have been pretty small (< k/4) https://plutus-apps.readthedocs.io/en/latest/plutus/howtos/handling-blockchain-events.html https://plutus-apps.readthedocs.io/en/latest/plutus/explanations/rollback.html

Some documentation on Settlement error

SettlementError(b, eps, g) = g * exp(-0.69 - b * [0.249 * eps^{2.5} + 0.221 * eps^{3.5}])

Parameters:
b: the number of blocks on top of the transaction in question;
eps = 1 - 2*[adversarial stake], where the adversarial stake is a real between 0 and ½;
g: the grinding power of the adversary. A single-CPU grinding would correspond to g=10^5;  a conservative default choice could be g=10^8 corresponding to a 1000-CPU grinding.

The resulting SettlementError(b, eps, g) is an estimate of the probability that a valid transaction appearing b blocks deep can be later invalidated. Here exp(X) refers to e^X where e is the base of the natural logarithm.

Note: given the crude estimate on grinding coming from the factor g, for small values of b the formula will produce outputs greater than 1 (until the exponential term becomes small enough to counter the effect of g). This simply means that for such small values of b this method does not provide any guarantees.
>>>>>>> Updated Logbook

2021-11-16

SN fixing Demo instructions

Add instructions on how to start a local, single cardano-node devnet.
Generating a topology file feels annoying, but providing all peers as arguments (as we do) might scale less? echo '{"Producers": \[{"addr": "127.0.0.1", "port": 3001, "valency": 1}\]}' > topology.json
Got a HandShakeError with VersionMismatch
- was led astray on updating to a newer cardano-node dependency in our code
- however, our code was already newer than the docker image so supporting the latest + one before was the solution

hydra-node can connect after fixing protocol version, initTx is created and submitted, but not observed

Also when re-trying / re-submitting the node crashes with

hydra-node: cannot find a seed input to pass to Init transaction
CallStack (from HasCallStack):
  error, called at src/Relude/Debug.hs:288:11 in relude-1.0.0.1-KWrPF7zdlFZ8gdnjuoSoUr:Relude.Debug
  error, called at src/Hydra/Chain/Direct.hs:359:13 in hydra-node-0.1.0-inplace:Hydra.Chain.Direct

After restarting the hydra-node and re-trying [i]nit this is the error

hydra-node: failed to submit tx: HardForkApplyTxErrFromEra S (S (S (S (Z (WrapApplyTxErr {unwrapApplyTxErr = ApplyTxError [UtxowFailure (WrappedShelleyEraFailure (UtxoFailure (ValueNotConservedUTxO (Value 0 (fromList [])) (Value 900000000000 (fromList []))))),UtxowFailure (WrappedShelleyEraFailure (UtxoFailure (BadInputsUTxO (fromList [TxIn (TxId {_unTxId = SafeHash "39786f186d94d8dd0b4fcf05d1458b18cd5fd8c6823364612f4a3c11b77e7cc7"}) 0]))))]})))))
CallStack (from HasCallStack):
  error, called at src/Relude/Debug.hs:288:11 in relude-1.0.0.1-KWrPF7zdlFZ8gdnjuoSoUr:Relude.Debug
  error, called at src/Hydra/Chain/Direct.hs:328:34 in hydra-node-0.1.0-inplace:Hydra.Chain.Direct

When adding all the block signing keys the node spams a weird log message (error?)

{"thread":"31","loc":null,"data":{"val":{"kind":"TraceNoLedgerView","slot":16370873857},"credentials":"Cardano"},"sev":"Error","env":"1.30.1:0fb43","msg":"","app":[],"host":"eiger","pid":"1","ns":["cardano.node.Forge"],"at":"2021-11-16T18:29:45.70Z"}
{"thread":"31","loc":null,"data":{"kind":"TraceStartLeadershipCheck","chainDensity":0,"slot":16370873858,"delegMapSize":0,"utxoSize":6,"credentials":"Cardano"},"sev":"Info","env":"1.30.1:0fb43","msg":"","app":[],"host":"eiger","pid":"1","ns":["cardano.node.LeadershipCheck"],"at":"2021-11-16T18:29:45.80Z"}

Re-using a db + cardano-node run command from e2e test works (hydra-node shows initializing!)
Manually invoking cardano-node keeps producing errors (and no blocks), this time:

{"thread":"32","loc":null,"data":{"val":{"kind":"TraceNodeNotLeader","slot":3462},"credentials":"Cardano"},"sev":"Info","env":"1.30.1:0fb43","msg":"","app":[],"host":"eiger","pid":"1","ns":["cardano.node.Forge"],"at":"2021-11-16T18:43:29.20Z"}

AB Solo Programming

Goal: remove a pendingWith statement in a test

Master does not compile, so reverting to last green point which is 21 days in the past We should really take care of not breaking master in the future as this prevents rapid intervention and branching when need be (nbot that I am a bigfan of branchinbg anyhow)

Trying to reactivate ServerSpec test, seems like there's a race condition.

I don't understand how the code works anymore so it's unclear to me why it's failing, this has to do with more messages coming than expected 🤔 ? Here is a trace I see

received "{\"me\":{\"vkey\":\"0000000000000001\"},\"tag\":\"Greetings\"}"
resp: Greetings {me = 0000000000000001}
received "{\"me\":{\"vkey\":\"0000000000000001\"},\"tag\":\"Greetings\"}"
resp: Greetings {me = 0000000000000001}
sending ReadyToCommit {parties = fromList []}

Trying to augment timeout does not work

Adding showLogsOnFailure to have the server's traces displayed

Of course, the client stops after receiving one message so it nevers waits for everyhing after the greetings....
That was easy :)

2021-11-15

Ensemble Programming

Goal: Use DirectChain in EndToEndSpec test

Added signing and verification keys for Cardano tx

We can see the initTx being submitted and added to the MempPool but not having it be part of a block
Adding JSON instances to DirectChainLog in order to have better traces (we currently pass a nullTracer because we don't have those instances). Note that we don't write full instances because ValidatedTx is pretty complex so just use its show instance

We need to increase timeout for observing init and commit transactions

We see 1 node can commit but the other nodes are crashing with no ownInitial which says they cannot extract their own pkh from the initials Utxo => 🤦 We forgot to add our own pkh to the list of initials!
We fail to observe headIsOpen because of timeout again

Block length on our cluster is 2 seconds but we produce one block out of 3 because of our config which has 3 validators => we need 6 seconds we produce a block

It depends on the slotLength and activeSlotCoeff: 100ms * 20 = 2s
The "rule" is that $3k / f > epochLength$

Changed active slot coeff to 1.0 but to no avail, we still miss some commits and don't see the head being opened

Carol has no fund so she can't do anything...

We got a green ETE test with DirectChain 🎉

Tests are a bit slow though, even when we increase slot coefficient

We have a problem with benchmark: they run a cluster with arbitrary number of nodes, so we need to generate n addresses and keys for each node. Plan is:

read genesis file as JSON (raw JSON, we don't care to use cardano-api)
generate right number of key pairs
need to store the files in the temporary directory where we run the cluster
convert to CBOR encoded address using buildAddress from CardanoClient
inject into initialFunds field in genesis-shelley.json

After switching to use withDirectChain in Bench.EndToEnd we have issues with overlapping instances on Utxo CardanoTx type which is an alias to the ledger's type

Cardano.Api provides ToJSON instances which we should probably use, but we know cardano-api is prone to breaking changes. Also it has no FromJSON from some of the types which is annoying as we rely on those in various tests
Got stuck down a rabbit hole with lot of tests failing => revert and use custom function to bypass overlapping instances + comment out some tests relying on Arbitrary instances for DirectChainLog

Ending the day with only DirectChainSpec tests failing, unclear why.

2021-11-09

Ensemble Session

Goal: Complete the OCV logic with Fanout

We have troubles with rollbacks in the chain: We observe some rollbacks even though there should not be any because we are on BFT nodes => Trying to run the network with a single BFT node makes the flakiness disappear

BFT nodes should not do rollbacks but we have rollbacks, henceforth it's probably not BFT nodes we are running.

While writing observeFanoutTx we are stuck with an issue: The test fails but there's no obvious reason why it's failing, seems like the redeemer (Fanout) cannot be decoded correctly.

Turns out it was a copy-paste issue 🤦

Ended up having the full test with mock fanout transaction (eg. one without actual committing txs) passing:

We have a single node to avoid rollbacks
The transition SM is still very simple, we probably want to ditch it altogether as we are handling the state threading by hand
Refactored CardanoCluster and CardanoNode to make it easy to run a single node cluster, also removing copying of keys which is unneeded in the cluster. We can just load the keys from where they are stored in the source tree

2021-11-01

SN Solo

Work on open & close via Direct chain (using mock validators)
Patterns start to arise: there are many things which could be DRYed
- observing state machine transitions are very similar
- keeping track of "interesting" utxo in OnChainHeadState is very similar
- constructing SM transition txs is very similar
When adding the closing SnapshotNumber it was interesting to observe that initialy I kept it as Datum, but to observeCloseTx it was more appropriate to keep it as Redeemer and decode it from that
- There was no need to store it as Datum (right now)
- Also: Redeemers are more space efficient as we would only include them in the spending tx
Concerning OnChainHeadState
- Seeing the repetition in OnChainHeadState of (TxIn, TxOut, Data) triples really gives the hint that this state-tracking code could also be generalized and only keep track of "interesting" utxo + their data
- OTOH, a "head identifier" is currently implicitly encoded in the threadOutput TxOut address, while it would make perfect sense to add it to the PostChainTx/OnChainTx types to describe "which head to abort" etc.
- This makes me think that we could keep the whole state (head id + interesting utxo) abstractly in the HeadState, e.g. existentially quantified; That way, we would not need TVar in the Chain.Direct and make the whole component stateless!

October 2021

2021-10-29

SN Solo

Copy & extend e2e test to also cover posting & observing CollectComTx
This should be fairly easy, as we know the total utxo from the PostChainTx value
- OCV (not covered now) would "only" need to check all committed, i.e. all PTs present
- How far to go right now? collectCom could just ignore all the committed utxo?
- Is probably the smalles step, but there would be no real value in the head (or the ledger would not allow it)
When drafting collectCom I realize that we do not need HeadParameters, but rather Data Era
- it is enough to just keep the Datum around uninterpreted in the OnChainHeadState
When fixing TxSpec usage of construct functions (because changed signatures), I realize that "cover fee" test was more arbitrary than necessary
- It was "side-loading" initial inputs, instead of feeding the initTx outputs into abortTx
The more complex tests in TxSpec cry for some refactoring now
- Some DSL or operators to easily construct outputs, datums and "forward" them from one to the next tx would help
Continuing with implementing collectComTx via the canonical transaction size prop test
- also about 7kB transaction size for most arbitrary :: Utxo SimpleTx
Next: roundtrip-test with a newly created observeCollectComTx
- As the OnCollectComTx is actually holding no data, this should be quite trivial
observeCollectComTx is just the same as observeAbortTx and can be obviously DRYed
- deliberately holding back on it though
- it's possible that something comes up which makes it not as straight-forward
Unit tests pass now. Quick confusion about why it passes even though datum hash of provided output is SNothing
To make the e2e "open Head" test pass, I only needed to plug observeCollectComTx into the <|> sequence of runOnChainTxs .. that was easy!
- also, runOnChainTxs feels a bit off in Chain.Direct.Tx -> moving it to Chain.Direct

2021-10-28

Pairing session

Made abortTx unit property tests pass by improving observeAbortTx, which requires to pass a Utxo to observeAbortTx now
When adding initials outputs to initTx, we need to store the PubKeyHash of the participants Cardano credential!
- This is not yet kept around
- We need to add the Cardano credentials of all the participants to initTx construction
Discussing on DirectChainSpec where the cardano credentials for participants should go now
First try: Adding them to the HeadParameters analogously to parties = [alice, bob, carol]
- We know this is brittle and morally we would change Party to relate Hydra and Cardano (public) credentials to each other
Second try: Add it to InitTx for now as its used in less places
Realize adding it to InitTx is already involved
- the lowest hanging fruit may be to pass it to the withDirectChain and thus make it "non-configurable"
- would not work as we want to open subsets of participants?
Third try: Start from bottom-up instead and work on initTx + observeInitTx for now
- not worry yet about where the keys come from
Seeing the "not observed if not invited" test raises the question whether we should determine being part of the Head using the Hydra credentials or Cardano credentials?
This is interesting case where we could have used the Mikado method to safely and incrementally build a plan to make cardano credentials available in Party

SN Solo

Start putting credentials into withDirectChain and see where this gets me
Suprisingly, the Chain.Direct integration test passes with [] as cardano keys -> this should matter
Also interesting: only the "can commit" e2e test fails!

  1) Test.DirectChain can commit
       uncaught exception: ErrorCall
       no ownInitial: []
       CallStack (from HasCallStack):
         error, called at src/Relude/Debug.hs:288:11 in relude-1.0.0.1-KWrPF7zdlFZ8gdnjuoSoUr:Relude.Debug
         error, called at src/Hydra/Chain/Direct.hs:306:24 in hydra-node-0.1.0-inplace:Hydra.Chain.Direct

Passing [aliceCardanoVk] makes the commit test progress further

  src/Hydra/Chain/Direct.hs:275:15:
  1) Test.DirectChain can commit
       uncaught exception: ErrorCall
       failed to cover fee for transaction: ErrUnknownInput ...
       CallStack (from HasCallStack):
         error, called at src/Relude/Debug.hs:288:11 in relude-1.0.0.1-KWrPF7zdlFZ8gdnjuoSoUr:Relude.Debug
         error, called at src/Hydra/Chain/Direct.hs:275:15 in hydra-node-0.1.0-inplace:Hydra.Chain.Direct

Of course: initials are not in knownUtxo
Need to add TxOut to initials of OnChainHeadState
- this was a bit messy and the tuples are really crying for a refactor
"can commit" e2e test progresses, but times out
- log is not very conclusive
- try adding more cardano logs to debug there
For some reason I have not seen the TraceMempoolRejectedTx error before...

{"thread":"80","loc":null,"data":{"tx":{"txid":"txid: TxId {_unTxId = SafeHash \"7cda5fb5d5828c4cf1081f406cb1cc3d0241e2b8aa824b3682edfc4ea64c8138\"}"},"mempoolSize":{"numTxs":0,"bytes":0},"kind":"TraceMempoolRejectedTx","err":{"received":["fb5a425ee6b4da
       │ 39fd9074006af88d7675e24acad19f252c0e133f379d1246c4"],"required":["2502ff9c9c341dd1384724ae35eab0b19e394c90226892fcc8e7cc86342d324e"],"kind":"MissingRequiredDatums","scripts":{"12c4f8ff8070f0d659bdb2ecf844190ddb1134ef821764bd2b4649b2":{"spending":"cf4be62
       │ b474fe3047bc8630f462c0e130cb7064872a4e417da70ba321faf34e2#1"}},"errors":[{"kind":"CollectError","scriptpurpose":{"spending":"cf4be62b474fe3047bc8630f462c0e130cb7064872a4e417da70ba321faf34e2#1"},"error":"NoRedeemer"}]}},"sev":"Info","env":"1.30.0:a7085","
       │ msg":"","app":[],"host":"eiger","pid":"1980729","ns":["cardano.node.Mempool"],"at":"2021-10-28T16:47:44.00Z"}

"Handling" the reject result in txSubmissionClient has the test fail right away and no need to dig into log files! 🎉
- For example

  src/Hydra/Chain/Direct.hs:289:38:
  1) Test.DirectChain can commit
       uncaught exception: ErrorCall
       failed to submit tx: HardForkApplyTxErrFromEra S (S (S (S (Z (WrapApplyTxErr {unwrapApplyTxErr = ApplyTxError [UtxowFailure (MissingRequiredDatums (fromList [SafeHash "2502ff9c9c341dd1384724ae35eab0b19e394c90226892fcc8e7cc86342d324e"]) (fromList [SafeHash "fb5a425ee6b4da39fd9074006af88d7675e24acad19f252c0e133f379d1246c4"]))]})))))
       CallStack (from HasCallStack):
         error, called at src/Relude/Debug.hs:288:11 in relude-1.0.0.1-KWrPF7zdlFZ8gdnjuoSoUr:Relude.Debug
         error, called at src/Hydra/Chain/Direct.hs:289:38 in hydra-node-0.1.0-inplace:Hydra.Chain.Direct

Reason: commitTx is not including an initial redeemer
- our unit tests would ideally balance, sign and validate txs against a ledger to catch this earlier
After adding datum/redeemer the tx submission does not fail anymore, but test times out
- enabling tracers in cardano-node to debug this
There are funny things in the logs like FetchDeclineChainNotPlausible

...declined","declined":"FetchDeclineChainNotPlausible","peer":{"remote":{"addr":"127.0.0.1","port":"25898"},"local":{"addr":"127.0.0.1","port":"41739"}}},{"length":"1","kind":"FetchDecision

Reading our own logs would help though.. seems we see the tx, but not the corresponding OnChainTx

FromDirectChain "alice" (ReceiveTxs {onChainTxs = [], receivedTxs = [ValidatedTx {...

Duh.. observeCommitTx was not even called from Direct chain / runOnChainTxs
- What kind of test would cover / improve this?
Commit e2e test passes!

2021-10-27

Pairing session

Continue with creating a MockCommit script to focus on off-chain parts for now
- the Commit script is not wrong per se, but we would not know as no test is covering them right now
- also, any change driven by the off-chain logic (i.e. observing a commit tx) would force us updating the validator logic (not being checked etc.)
How to identify a commit tx?
- looking at the outputs? a single pay to v_commit?
- using the PT?
After introducing a onchain Utxo type, we realize that convertUtxo :: OnChain.Utxo -> Utxo tx is tricky
- after as short discussion we decided to go via a binary representation
- for now ToJSON / FromJSON, later CBOR or so
- this allows us to use the Tx type class to convert to and from the on-chain Utxo
Observing a commit tx works
- we were surprised by the small size and consistent size (~7kB)
- adding a scale (*100) had the transaction size property fail
- so it likely is because the Gen (Utxo SimpleTx) is staying in reasonable orders of magnitue and it's just Integers what we store
The local-cluster test of committing via DirectChain still fails
- with a non-saying PostTxFailed
- led us to changing a Maybe function chain back to use error for more better visibility
- obviously we should improve error handling!
The reason is that initials = []
- as a next step, creating outputs which pay to a MockInitial script could work

SN Solo

After introducing a MockInitial script some unit tests fail

seems like abortTx + observeAbortTx do not yield a Just anymore
could it be that the mocked redeemer type () messes with Head.Abort?
can't seem to spot the bug in observeAbortTx and how it would lead to Nothing.. too long of a day as it seems

Found the reason: observeAbortTx does think it sees a Just CollectCom when encountering DataConstr Constr 0 [] in the redeemers, although that was the plutus equivalent of () (the MockInitial redeemer type)

Next step: make observeAbortTx more robust

2021-10-26

Pairing Session

Discussion about how postTx would fail if a transaction is invalid (or could not be submitted)
- Synchronous failure via a return value or exception vs. asynchronous failure via the callback "back-channel"
- Easier to stick with the callback (OnChainTx) for now
- How to extend that type to accomodate a PostTxFailed -> wrap or additional data constructor?
- Decided for the latter
- Test assertion is then:
```
failAfter 10 $ takeMVar calledBackBob `shouldReturn` PostTxFailed
```
Bob thinks he is part of the party now
- Sees the InitTx even though he is not invited
- Two ways:
  - Provide the hydra credential to the direct chain and check HeadParamters against it
  - Require to be payed (via a script) a participation token, spendable by our Cardano credentials
Take small steps: We go with checking Hydra credentials being in parameters or not in off-chain
Formulating a property test in TxSpec to facilitate "not being able to observe init tx when not invited"
- required adding Party everywhere in Chain.Direct
Prop tests pass, local-cluster test errs because can't post AbortTx in closed state
- this catches the invalid scenario even in a synchronous manner
- in contrast to knowing whether a posted tx failed (tx submission thread is asynchronously connected via a queue)
After wrestling a bit with the tx submission code, we get the DirectChainSpec test to pass!
- unfortunately, this is still not forcing us to do proper on-chain validation
- the stateful nature of the DirectChain component prohibits accidentially closing "other heads"

2021-10-25

AB Solo

Today's goals:

Finish reading Maladex's white-paper (they talk quite a lot about Hydra)
Timebox trying to upgrade dependencies to make it possible to construct Abort transaction with cardano-api
Draft new ADRs
Add PT minting, initial validator, and burning to init and abort txs

Trying to find a suitable set of dependencies to be able to build Hydra OCV transactions with cardano-api. Found recent commits in cardano-node and cardano-ledger-specs that work fine but this leads to issues in plutus because of dependencies to stuff in networking Removing all dependencies to PAB and off-chain Contract from hydra-node to try to get to a smaller set of dependencies in Plutus

Sadly, the on-chain part of the StateMachine code is in the new repository plutus-app which depends on cardano-node at an older version, but it is from more than one month ago and does not contain the changes we need, eg. allowing to store script as part of output's datum.

After upgrading dependencies for cardano-node, cardano-ledger-specs and plutus, and removing stuff related to the PAB, the local cluster test using cardano-api built transactions for init and abort passes:

Test.LocalCluster
  should produce blocks and provide funds

Finished in 6.8452 seconds
1 example, 0 failures
Test suite integration: PASS

I had to vendorize the StateMachine related code from plutus-apps repository in order to make it work though.

Master build failed following merge, investigating why and fixing it:

Timeout for observing transaction submission on-chain might too low for slow CI, might want to increase it
Also, when tests fail, we cannot easily access the logs that were generated from the tests failure, we could upload the logs to some S3/google storage bucket when CI tests fail

Drafting ADRs related to cardano-api and direct chain interaction

Trying to write a test to introduce the need for Participation Tokens minting in order to make it possible to Commit funds to an opened Head and thus actually do useful stuff with an opened Head

The purpose of the PTs is to

ensure only identified Head participants can commit
ensure only Head participants can advance the Head, eg. post transactions for the OCV state machine The question is: How does one observe the rejection of a transaction submission, given this process is asynchronous?
Seems like the "right way" to do this would be to add a constructor in the OnChainTx data type representing failed submissions, so that the node is notified when a submission fails? When we post a transaction we actually need evaluate the validators in order to be able to balance it and assign execution units, so we know whether or not the transaction is correct before submitting, even though a submitted transaction can still "fails", eg. being rolled back or rejected by the ledger because of double spending
There are quite a few error calls in Direct module anyways, that should be handled in one way or another

Direct chain test is still failing apparently randomly, might come from issues in handling of rollbacks in the wallet: https://github.com/input-output-hk/hydra-poc/runs/3998479889?check_suite_focus=true#step:7:2699

2021-10-22

Possible FAQ material from #ask-hydra on Discord

When Hydra (on mainnet)?
What does opening a Hydra Head actually mean?
Can a Hydra Head be opened from a mobile phone?
How can a Hydra Head (auto-)scale horizontally?
What are the limitations for Hydra when fully implemented?
Where does the DApp go in Hydra?
How would Hydra work with AMM (automatic market makers?)?
Is Hydra a mixer?
How is Hydra different from (zk-)rollups?

AB on Cardano-API

Trying to execute init -> abort script sequence using CardanoClient, eg. only using cardano-api stuff Currently lost in the maze of type wrappers for scripts...

Got a surprising error when trying to submit transaction:

       uncaught exception: CardanoClientException
       BuildException "TxBodyError TxBodyMissingProtocolParams"

The Protocol parameters are required in the TxBodyContent to build the body in Alonzo era

So the script got executed but it produces the infamous error:

       uncaught exception: CardanoClientException
       BuildException "TxBodyScriptExecutionError [(ScriptWitnessIndexTxIn 0,ScriptErrorEvaluationFailed (CekError An error has occurred:  User error:\nThe provided Plutus code called 'error'.))]"

Activated debugging output for scripts execution to try to see what's happening, but there are no logged errors, so it's a more general problem with the way the transaction is built.

Trying to add collateral input does not fix the problem.
MB spots (at least) one problem with the transaction: The datum is not part of it so the Head.validatorScript will fail because the state machine requires both datums (input and output) to be present in the transaction.
As observed by SN, there used to be now way in the cardano-api to include a Datum in a transaction, but this has changed "recently": One can now construct a TxOut passing either a TxOutDatumHash or a TxOutDatum which will then be included in the scripts' context

2021-10-21

Pairing session

Problem is: coverFee does not know what's the input value to balance, inputs only is a txref
- short discussion about what to do now
- SN mentions this is a "self-made problem" as there is a split between the chain component and the tiny wallet, where the chain component would know the relevant utxo
- By keeping the separation we might end up with a good interface of "doing it externally" later
Start pairing by adding a UTXO to lookup inputs to coverFee :: ValidatedTx Era -> STM m (Either ErrCoverFee (ValidatedTx Era))
After adding lookupUtxo to coverFee_ we introduce knownUtxo to get a set of known utxo from the Chain component's OnChainHeadState to provide to coverFee
- SN is not convinced by the OnChainHeadState in general
- But this provides for the right info at the right time without refactoring too much know knownUtxo :: OnChainHeadState -> Map (TxIn StandardCrypto) (TxOut Era)
Consequently we also add TxOut values to the OnChainHeadState's Initial constructor
- This required more work in observeInitTx to provide the TxOut and made it's implementation more complex
- But we also fixed the "bug" that it's thread output's might not always be the firstInput
When trying to fix observeAbortTx with the additional txout for the threadOutput...
- fixing it is smelly as it is not used at all in observeAbortTx and one would rather expect a single "Head identifier" or so
- Remove the unused types with intention to re-add what the observeXXX require further down the road to not "overgeneralize"
- Removing the OnChainHeadState made the signatures of observeAbortTx again a -> Maybe (OnChainTx, OnChainHeadState) and simplified implementation of observeInitTx again
WalletSpec now failing because we changed interface and semantics of coverFee
Need to also resolveInput in walletUtxo (besides lookupUtxo)
We now get an ErrorCall from the ledger for too big TxOut values
- scaling to reasonablySized generators helped
coverFee results in double the amount -> we counted the selected utxo twice
- include the selected utxo to cover fee to the inputs (use inputs' for resolvedInputs)
MissingScriptUTXOW remains an error
- we saw this in the past
- this time again, the ledger sees less needed than provided scripts -> confusing error
- in our case it was the abort tx not spending any 'initia' outputs (yet)
We get FeeTooSmallUTxO error now from the ledger
- after initial confusion of a too low fee or that minFeeA and minFeeB being 0 are possible problems
- we found out that this in fact that fee is just too low for script execution etc.
- our cluster had too high executionPrices -> use realistic (mainnet) values for genesis-alonzo.json
The integration test passes!
- We have a full roundtrip of posting and observing Init -> Abort, i.e. the short Head lifecycle (with many simplifications)!

2021-10-20

Engineering Meeting

Another round of discussions over deposit-based Head allowing more participants off-chain than on-chain, e.g. variation on the idea of distinguishing between running a Hydra Head node and using a Hydra Head to make transactions.
There is some research in that direction, along the lines of the tail protocol but removing the requirement for large amount of collateral deposit from intermediaries
We discussed a related approach proposed by Matthias based on deposits and inspired by Lightning and drew it up on our Miro board

Direct Chain

Analysing transaction that fails validation to check our scripts execution logic and redeemers setting

Turns out the issue in the coverFee_ test came from missing coins in the abortTx output: In the initTx output, we add a fixed 2000000 lovelace, but in the output of the abortTx we set the fee to 0
Now more tests are failing, probably because of changes in coverFee. Also the previous abortTx property fails because of the missing 2000000.

Test checking abortTx transition was flaky because we were looking at the datums to identify the aborttx, but it could be the case that we decode the initial datum first -> Refactored code to look at the redeemers instead of the datums

Test is still failing because of the 0 execution units

Writing test to check coverFee_ updates and cover execution cost of scripts

Instead of calculating the exact execution cost for the redeemers, we take the maximum for a Tx and divide by the number of redeemers.
But it's not much more complicated to call the actual function for computing exunits... 🤔

We notice we are consuming the same UTXO in both transactions beacuse the Wallet does not remove the UTXO it consumes when it covers fees

retry in STM for the UTXO set to change when retrieving it

The error we now get is weird, it seems to be a mix of several errors:

It's unbalanced
It's using a UTXO which has already been spent before (39786f186d94d8dd0b4fcf05d1458b18cd5fd8c6823364612f4a3c11b77e7cc7)
The MissingScriptWitnessesUTXOW error should be accompanied with a list of missing hashes
=> We should improve error reporting in our Direct chain test ot understand better why it fails...

2021-10-19

AB Solo

Continue working on replacing script with actual calls to cardano-api, part as educational work, part to build a proper CardanoClient we'll be able to use elsewhere when interacting with a node.

Struggling to extract the value from the output: Some functions are very recent and not available in our version of the API which is already 3 weeks old
Replaced more cardano-cli calls with custom function in CardanoClient module, could go with more but I would like to have the master build.

Working on making master green, eg. have DirectChainSpec validates

Added query for PParams as it's needed fro computing scriptIntegrityHash
Compute scriptIntegrityHash in coverFee_ -> InitTx passes correctly, now tackling AbortTx which is failing currently

Adding a property test checking coverFee_ does the right thing with execution units computation and setting redeemers pointers right should it add a UTXO

There's this annoying PParams lying around, going to add it to some fixture code so that we can use it in different modules.
Managed to get property for coverFee fails with the "right" error, eg. covering of transaction succeeds but validation fails because rdptr is not correctly set, going to write code to fix that :fingers_crossed:

Adjusting the redeemer pointers, the idea is to compare the two sorted list of inputs and adjust those RdmrPtr for which the initial value is different from the final one, once we have added the input for balancing tx and paying fees.

It's not working, still got a script execution error for one of the redeemers but it seems the adjustment of redeemer pointers actually worked.

2021-10-18

AB Solo

Writing a cardano-cli "wrapper" module, something that would provide useful functions for common operations in order to remove the need to run scripts from the command-line, running into a couple snags:

The generators exposed in the sub-library for cardano-api use Hedgehog instead of QuickCheck, so we need to convert them but this loses shrinking capability
Of course, the cardano-cli uses types from cardano-api, but our Wallet uses cardano-ledger-specs types Going to write the module using cardano-api types

While trying to replace address building from cardano-cli with Haskell code I got the following error:

  test/Test/LocalClusterSpec.hs:25:11:
  2) Test.LocalCluster should produce blocks and provide funds
       uncaught exception: ErrorCall
       InputTextEnvelopeError (TextEnvelopeTypeError [TextEnvelopeType "PaymentVerificationKeyShelley_ed25519"] (TextEnvelopeType "GenesisUTxOVerificationKey_ed25519"))
       CallStack (from HasCallStack):
         error, called at src/Relude/Debug.hs:288:11 in relude-1.0.0.1-KWrPF7zdlFZ8gdnjuoSoUr:Relude.Debug
         error, called at test/Test/LocalClusterSpec.hs:45:21 in main:Test.LocalClusterSpec
         assertCanSpendInitialFunds, called at test/Test/LocalClusterSpec.hs:25:11 in main:Test.LocalClusterSpec

although the underlying representation is the same, it fails to deserialise properly because of the envelope tag, so need to make address more robust. There is the castVerificationKey function which could be used for that?

In the genesis-shelley.json file we need a base16 encoding of the address but cardano-cli uses bech32

cardano-cli address build --testnet-magic 42 --payment-verification-key-file alice.sk | bech32

Direct Chain on Local Cluster

Managed to have a Direct component connected to the local cluster, had to tweak some parameters in the cardano-node.json to have protocols activated.

We can see the transaction submitted but it fails, first with minimum output value, then with not being balanced -> need to wire wallet for balancing
coverFee in Wallet takes a TxBody and not a ValidatedTx, so we need to adapt it
Still missing computation of scriptIntegrityHash which sdhould be done as part of tx balancing, fee payment and signing because it requires the final set of inputs to be defined

2021-10-15

Changes in dependencies

Updated GHC to version 8.10.7
Updated haskell.nix to version fd4d10efe278ba9ef26229a031f2b26b09ed83ff
Removed nix dependency to cardanoPkgs
- It's probably what caused me a lot of trouble when I first tried to update haskell.nix a week ago
- local-cluster now uses build-tools-depend to pull cardano-node and cardano-cli in scope which guarantees we use the same version everywhere
Updated plutus version to 5ffcfa6c0451b3b937c4b69d2575cd55adebe88b
- Updated ledger and node packages to suit plutus' dependencies
- Note this implies we temporarily depend on a fork of cardano-ledger-specs: https://github.com/raduom/cardano-ledger-specs, which does not yet contain major changes to directory structure
Fixed minor changes in API following dependencies update
Updated cabal.project's index-state to 2021-08-14T00:00:00Z
- This needs to be older than haskell.nix's version but I am really unsure how they relate to each other
- Ideally, we should need to change only one of those to ensure pinned dependencies

Alonzo Local Cluster

Goal for today:

Have a working Alonzo cluster with funds in local-cluster package
Rename local-cluster -> hydra-cluster and rename modules

I can see the cluster tries to start but there's no logs, need to store the logs as we do in the EndToEndSpec tests

Added capture of logs to the CardanoNode process wrapper but seems like no logs are output => need to add backends to the node's configuration -> Now have logs activated for all cardano-nodes, in JSON
Nodes are starting up and apparently succeeding in their connection, not sure why they are not producing logs? Our waitForNewBlock function is a bit crude => making it a bit smarter actively waiting for a new block to be produced

Adding a test checking we can make a simple payment transaction using initial funds:

Test simply executes a script using cardano-cli, this sounded much easier than trying to replicate all commands within Haskell
Struggling to get script to run correctly, it cannot find the cardano-cli executable in its PATH even though I pass the testing process' environment to it => I was incorrect calling [ -x cardano-cli]
I can see the transaction submitted to the node-1's mempool but it seems to never end in a block?

The transaction appears to be rejected by the mempool:

{"thread":"73","loc":null,"data":{"tx":{"txid":"txid: TxId {_unTxId = SafeHash \"00943cd84146550c7162ba5fc9d2bdef940afddfc4712e916060af5373acefdb\"}"},"mempoolSize":{"numTxs":1,"bytes":234},"kind":"TraceMempoolRejectedTx","err":{"produced":{"policies":{},"lovelace":900000000000},"kind":"ValueNotConservedUTxO","consumed":{"policies":{},"lovelace":0},"badInputs":["2fec440b7b461450420820a57f913d17525bc915da37d86e0423775110a05683#0"],"error":"This transaction consumed Value 0 (fromList []) but produced Value 900000000000 (fromList [])"}},"sev":"Info","env"
:"1.30.0:7ff91","msg":"","app":[],"host":"haskell-","pid":"696941","ns":["cardano.node.Mempool"],"at":"2021-10-15T08:10:56.70Z"}

It takes time for the UTXO to be committed, going for an active loop in the script to check the newly created UTXO presence:

One problem in the script: the -e option fails the entire script as soon as 1 sub-command fails, so grep failing meant the script exited immediately
Another problem: I was using some syntax not available in plain /bin/sh so it was running somewhat differntly inside the test and outside of it....
It actually takes a while to have the transaction correctly submitted => Reduced slot length to 100ms and it's definitely much faster :)

2021-10-14

PR for AbortTx is green 🎉 🍾

Plans for today:

merge PR to master
send head -> abort transaction sequence to a local testnet

Now trying again to submit an aborttx to the testnet manually

Note: We should make available soon a Hydra Test Cluster "framework" or package that will make it easy for people to start a local cluster and interact with it programmatically, eg. expose our HydraNode from the local-cluster package to be used downstream and not only the docker-compose file. When working on the cardano-node cluster, I have found that having scripts is great but having our local-cluster is even greater as it should now be straightforward to wrap any test within this cluster and connect our hydra-nodes to it. Exposing this feature to other developers would be super-useful.

Discussing with MB issues with running scripts on-chain:

configure local-cluster to start in Alonzo era with some funded UTXOs
- see https://github.com/input-output-hk/cardano-configurations to pull config
fix how we create transactions:
- correctly assign ex units => need to evaluate the Tx and update ex units in redeemers
- script integrity hash -> see hashScriptIntegrity in LEdger
- add collateral input -> should be done in the tinyWallet reusing the one and only input we have in the wallet
- beware of redeemer pointers logic: balancing the tx adds a new input which can change the pointer logic
- see Shelley/Transaction.hs in cardano-wallet
- beware of error in execution units -> seems like one needs to take a large margin (like 2x)
- scirpt execution takes maximum 6 ADA
wire wallet in the Direct.Tx to cover fees
- should also assign redeemers?

To debug scripts failure need a cardano-node rebuilt with tweaked cardano-ledger-specs to provide logging output when evaluating scripts

Wrote a simple script to submit transactions for a head until the abort tx.

Managed to get some debugging output:

["L1","Ld","S5","PT5"]

L1 -> Output constraint failed

Ld is used for 2 things:

    MustSatisfyAnyOf xs ->
        traceIfFalse "Ld" -- "MustSatisfyAnyOf"
        $ any (checkTxConstraint ctx) xs

{-# INLINABLE checkScriptContext #-}
-- | Does the 'ScriptContext' satisfy the constraints?
checkScriptContext :: forall i o. ToData o => TxConstraints i o -> ScriptContext -> Bool
checkScriptContext TxConstraints{txConstraints, txOwnInputs, txOwnOutputs} ptx =
    traceIfFalse "Ld" -- "checkScriptContext failed"
    $ all (checkTxConstraint ptx) txConstraints
    && all (checkOwnInputConstraint ptx) txOwnInputs
    && all (checkOwnOutputConstraint ptx) txOwnOutputs

S5 -> "State transition invalid - constraints not satisfied by ScriptContext from the StateMachine library
PT5 -> generic error when a check is false

Getting an even more puzzling error when trying to submit txs with MockHead script:

cardano-cli transaction build --alonzo-era --cardano-mode --testnet-magic 42 --change-address $(cat alice/payment.addr) --tx-in 0e2c5cb64ca7012dd235bad5b00fc8bf86662172e8af600a35aa5d42e761e5c3#1 --tx-in-collateral 0e2c5cb64ca7012dd235bad5b00fc8bf86662172e8af600a35aa5d42e761e5c3#0 --tx-out $(cardano-cli address build --payment-script-file mockHeadScript.plutus --testnet-magic 42)+1000 --tx-out-datum-hash ff5f5c41a5884f08c6e2055d2c44d4b2548b5fc30b47efaa7d337219190886c5 --tx-in-script-file mockHeadScript.plutus --tx-in-datum-file headDatum.data --tx-in-redeemer-file headRedeemer.data --out-file abort.draft --protocol-params-file example/pparams.json
Command failed: transaction build  Error: The transaction does not balance in its use of ada. The net balance of the transaction is negative: Lovelace (-21987700) lovelace. The usual solution is to provide more inputs, or inputs with more ada.

This shows I don't provide enough input to pay for the scripts' execution according to the result of evaluating execution units -> providing more input makes the transaction valid and submitted successfully.

L1 failure code happens in checkOwnOutputConstraint which checks the datum hash of the outputs of the transaction. It's as if it was missing the abort datum, but the hash is actually present in the transaction.

Trying to use the --tx-out-datum-file option which passes a hash extracted from datum file does not work either.
🎉 The trick is to use --tx-out-datum-embed-file which puts the data in the transaction!
This means transactions working with the state machine always require to have the datums of its output included and not only the hashes, which increases the tx size requirements esp. in the case of Hydra where the state of the SM can potentially be large

Configuring our local-cluster to start in alonzo and have initial funds

Extended LocalClusterSpec to check we can actually spend outputs from initial funds. Writing it all in Haskell would be a PITA so I will first try to do that using an embedded shell script.
Got an error when starting the cluster, so need to not delete the created directory if the tests fail.
LocalClusterSpec is failing with new configuration, but not sure if this is not just a timeout problem -> need to gather logs...

2021-10-12

Dependencies and Build

Managed to have the project builds outside nix-shell sidestepping some issue in the retrieval of blocks in Wallet module. All but 3 tests pass:

TxSpec test for aborttx fails, I suspect this is caused by the change to validatorHash now that the dependencies are updated
it cannot find cardano-node nor cardano-cli executable -> they should be provided in build-tools-depends as suggested by MPJ
prometheus monitoring fails, perhaps a side effect of the above

What I did:

Install ghcup and set versions to GHC 8.10.7
```
$ curl --proto '=https' --tlsv1.2 -sSf https://get-ghcup.haskell.org | sh
$ ghcup set ghc 8.10.7
```
I had some troubles with my path because I asked to append the ~/.ghcup/bin to PATH instead of prepend

Install various system dependencies

$ sudo apt install -y  build-essential curl libffi-dev libffi7 libgmp-dev libgmp10 libncurses-dev libncurses5 libtinfo5
$ sudo apt install -y  libz-dev liblzma-dev libzmq3-dev pkg-config libtool

Do not confuse lzma with liblzma-dev, those are 2 existing package

Install forked libsodium

git clone https://github.com/input-output-hk/libsodium
cd libsodium/
git checkout 66f017f16633f2060db25e17c170c2afa0f2a8a1
./autogen.sh
./configure
make && sudo make install

Build and test everything:
```
cabal build all && cabal test all
```

Trying to update haskell.nix at fd4d10efe278ba9ef26229a031f2b26b09ed83ff and using ghc8107 -> udpated materialisation first -> nix-shell works fine and I can build hydra 🎉 locally on my VM.

Submitting Plutus tx

It appears the datum file which the cardano-cli requires must be the JSON representation of the Data and not the TextEnvelope. => making the changes in the inspect-script code...

Doing Aeson.encode $ toData mydata does not work, it generates some CBOR encoding and not JSON
Turns out one must not use serialiseToTextEnvelope for datums but scriptDataToJson ScriptDataJsonDetailedSchema. But scriptDataToJson operates on a ScriptData which is a mirror in cardano-api of Plutus' Data so first one must convert toData and then use Cardano.Api.Shelley.scriptDataFromPlutus

From Duncan on slack:

if you're starting from a type from the underlying ledger rather than starting with the API types, then yes you can use the conversion functions to/from the underlying ledger types. The general principle of the API is that Cardano.Api exports everything at the level of the API types and Cardano.Api.Byron or Cardano.Api.Shelley exports and exposes all the underlying representations and conversion functions. So since you want to "lift the lid" and see all the representations (i.e. using fromPlutusData) then you want to import Cardano.Api.Shelley.

The generated datums should be OK for now, rebuilding cardano-node in order to be able to start a local cluster and retry submitting my script transactions.

The generated TextEnvelope for scripts have an incorrect type but not sure why? -> Need to use a Ledger.Scripts.toCardanoApiScript on Plutus' scripts to "convert"

Trying to fix the last remaining important issue following up dependencies upgrade, namely the Point Block conversion problem: We have a Point (ShelleyBlock (AlonzoEra c)) and we want a Point (CardanoBlock c).

castPoint definition requires coercibility between 2 different HeaderHash type family instances:

castPoint :: Coercible (HeaderHash b) (HeaderHash b') => Point b -> Point b'

The concrete needed coercion looks like:

 ShelleyHash StandardCrypto
             -> OneEraHash
                  '[Ouroboros.Consensus.Byron.Ledger.Block.ByronBlock,
                    ShelleyBlock (Cardano.Ledger.Shelley.ShelleyEra StandardCrypto),
                    ShelleyBlock (Cardano.Ledger.Allegra.AllegraEra StandardCrypto),
                    ShelleyBlock (Cardano.Ledger.Mary.MaryEra StandardCrypto),
                    ShelleyBlock (Cardano.Ledger.Alonzo.AlonzoEra StandardCrypto)]

Following the chain of definitions and imports gives me:

newtype OneEraHash (xs :: [k]) = OneEraHash { getOneEraHash :: ShortByteString }

then

newtype ShelleyHash c = ShelleyHash {
       unShelleyHash :: SL.HashHeader c
     }
...
type instance HeaderHash (ShelleyBlock era) = ShelleyHash (EraCrypto era)

with SL.HashHeader being

newtype HashHeader crypto = HashHeader {unHashHeader :: Hash crypto (BHeader crypto)}

then

type Hash c = Hash.Hash (HASH c)

where module Hash is ultimately Cardano.Crypto.Hash.Class which does not export the constructor for newtype Hash

module Cardano.Crypto.Hash.Class
  ( HashAlgorithm (..)
  , sizeHash
  , ByteString
  , Hash(UnsafeHash)
...
newtype Hash h a = UnsafeHashRep (PackedBytes (SizeHash h))
...
pattern UnsafeHash :: forall h a. HashAlgorithm h => ShortByteString -> Hash h a
pattern UnsafeHash bytes <- UnsafeHashRep (unpackBytes -> bytes)
  where
  UnsafeHash bytes = UnsafeHashRep (packBytes bytes :: PackedBytes (SizeHash h))
{-# COMPLETE UnsafeHash #-}

So if I cannot coerce I could just pattern match and get the underlying ShortByteString and rewrap it.

Right tip -> do
  let blk = case tip of
        GenesisPoint -> GenesisPoint
        (BlockPoint slot h) -> BlockPoint slot (fromShelleyHash h)
      fromShelleyHash (Ledger.unHashHeader . unShelleyHash -> UnsafeHash h) = coerce h
      query = QueryIfCurrentAlonzo $ GetUTxOByAddress (Set.singleton address)
  pure $ LSQ.SendMsgQuery (BlockQuery query) (clientStQueryingUtxo blk)

Now I need to fix the TxSpec test which fails, probably because serialisation has been fixed in Plutus

✅ Replaced convoluted Initial.Dependencies hash computation with validatorHash and script validates

Rebasing ch1bo/aborttx branch over master as it does not have some changes improving over flakiness of monitoring tests

Back to submitting transactions, restarting a cluster and recreating a user:

mkdir alice
cd alice
cardano-cli address key-gen --verification-key-file payment.vkey --signing-key-file payment.skey
cardano-cli address build --testnet-magic 42 --payment-verification-key-file payment.vkey > payment.addr
cd ..
cardano-cli query utxo --testnet-magic 42 --address $(cat alice/payment.addr)

I managed to build the aborttx transaction except the script validation failed before submission:

cardano-cli transaction build --alonzo-era --cardano-mode --testnet-magic 42 --change-address addr_test1vqpfgdh6ldx73nypc5hkur2wm2hpt0kx240qlxvykhy8efc74sfu5 --tx-in 6bd74fd0e48e6a35c4fd59ba474b671866f115bc67fc8d6d84259e45e229bf15#1 --tx-in-collateral 6bd74fd0e48e6a35c4fd59ba474b671866f115bc67fc8d6d84259e45e229bf15#0 --tx-out addr_test1wp3urt44rzvpsj2fu696su9ee573m6ne0ce4uydhcdnwhkshjamur+1000 --tx-out-datum-hash ff5f5c41a5884f08c6e2055d2c44d4b2548b5fc30b47efaa7d337219190886c5 --tx-in-script-file headScript.plutus --tx-in-datum-file headDatum.data --tx-in-redeemer-file headRedeemer.data --out-file abort.draft --protocol-params-file example/pparams.json
Command failed: transaction build  Error: The following scripts have execution failures:
the script for transaction input 0 (in the order of the TxIds) failed with:
The Plutus script evaluation failed: An error has occurred:  User error:
The provided Plutus code called 'erro

🤦 It's perfectly possible all this dance on dependencies upgrade was unneeded to submit transactions manually. I thought the formats had changed over the past few weeks but turns out I was using the wrong serialisation functions...

2021-10-11

Today's goal:

Spin-up an Alonzo (local) test network
(optional) Send transactions from our Direct chain component to this network

The scripts directory in cardano-node repo contains mkfiles.sh script that does the necessary magic to create an Alonzo network, either transitioning all the way from Byron to Alonzo, or hardforking immediately at epoch 0.

Script needs to be run from top-level:

$ scripts/byron-to-alonzo/mkfiles.sh
~/cardano-node/example ~/cardano-node
scripts/byron-to-alonzo/mkfiles.sh: line 205: cardano-cli: command not found

and requires cardano-cli and cardano-node to be available in PATH

I need a recent version of cardano-node obviously to start an alonzo network, latest version in master is 1.30.0, but the version we have in scope in hydra-poc is 1.27.0

Activating nix inside cardano-node directory through echo use nix > .envrc and direnv allow .envrc -> perhaps we should upgrade our dependencies after all...

Other option suggested by MB: https://github.com/input-output-hk/cardano-wallet/blob/master/lib/shelley/exe/local-cluster.hs

Wallet uses this version of cardano-node: https://github.com/input-output-hk/cardano-node/commits/0fb43f4e3da8b225f4f86557aed90a183981a64f

Cardano-node (master) depends on this plutus version:

commit edc6d4672c41de4485444122ff843bc86ff421a0
Merge: 569f98402 63c6ca8ac
Author: Michael Peyton Jones <michael.peyton-jones@iohk.io>
Date:   Fri Aug 20 10:43:53 2021 +0100

    Merge pull request #3430 from input-output-hk/hkm/windows-cross

    windows cross compile

Running scripts/byron-to-alozon/mkfiles.sh alonzo "works": I can see 3 nodes up and running. Now need to understand how to post a transaction to them...

In the cardano-node scripts there's an initialFunds field which sets some lovelaces to some address, but in the wallet there's none and it says there can't be as it needs to transaction to byron from shelley, but if we hard fork at epoch 0 immediately this should work?

cardano-cli can talk to the node and get some information:

$ CARDANO_NODE_SOCKET_PATH=example/node-bft1/node.sock cardano-cli query tip --cardano-mode --testnet-magic 42
{
    "epoch": 5,
    "hash": "48c4b8c546a0a9ffd0649a77b0926881e6e8869d83cb6da70f1d32ac9f936878",
    "slot": 2700,
    "block": 182,
    "era": "Alonzo",
    "syncProgress": "42.68"
}

I can also get the UTXO set of the network:

$ CARDANO_NODE_SOCKET_PATH=example/node-bft1/node.sock cardano-cli query utxo --cardano-mode --whole-utxo --testnet-magic 42
                           TxHash                                 TxIx        Amount
--------------------------------------------------------------------------------------
61fa39c2f3e110850c741da3a0f978bcee0fd9abfc7b0bce4df3ea047d61e824     0        5010000000 lovelace + TxOutDatumNone
704e1dc2f4dfcc44c0ba90978a0d58371b5f7ee1d3c47b1cedb52e1b1cb37b18     0        5010000000 lovelace + TxOutDatumNone
f2d04cab14eefbb0571e6a74b64b49453ac3312c20ce8fcda9d125c9020bd267     0        900000000000 lovelace + TxOutDatumNone

Leveraging SN's previous experiments to learn how to create a transaction to send some ADAs between addresses in the testnet

How do I get the details of a transaction using cardano-cli?
Going throuhg https://github.com/input-output-hk/cardano-node/blob/master/doc/stake-pool-operations/simple_transaction.md to create a transaction to send some ADAs from the genesis funding transaction to some other user's address but it fails as I am missing the right key
The key to use is example/shelley/utxo-keys/utxo1.skey which results in successfully submitting transactions. I can see the transaction is successfully submitted but it does not appear when querying the utxo set: Possibly because I set the validity interval too far in the future and I need to wait? The node is stale and does not make progress anymore => Restarting from scratch

I was able to submit a transaction!

$ cardano-cli query utxo --whole-utxo --testnet-magic 42
                           TxHash                                 TxIx        Amount
--------------------------------------------------------------------------------------
06a82e6521f8d88a9ffe082f66f9f2bb114c9145d3f13cbfb36a3facba8d4de9     0        5010000000 lovelace + TxOutDatumNone
6cafe0b8352fa6bf5c7433bb668bf675c220d27adb42e1da28ab25741290176e     0        899999999599 lovelace + TxOutDatumNone
c989e8557c10a2fee3de5d37bc3858e4a3f2629d07d897f39d8d9ddf631e0c0f     0        5010000000 lovelace + TxOutDatumNone

Now going to submit some plutus transactions and check how it goes... There are a bunch of examples in scripts/plutus that seem interesting

I was able to run successfully :

$ scripts/plutus/example-txin-locking-plutus-script.sh guessinggame
                           TxHash                                 TxIx        Amount
--------------------------------------------------------------------------------------
9fd2c741e9b582328269dcd1ee5282625be36215126ae2ce0edc24f48de82057     1        10000000 lovelace + TxOutDatumNone

It does not work twice though, needed to do a minor change to retrieve the first Tx for given address -> updating script to select first transaction found

Next step: Generating needed files for our own scripts and datums -> Reviving old executable from MB that outputs a script in serialised form, useful for manually testing SC on a network

Some interesting and useful documentation available here on genesis configuration for Shelley
Plutus provides ways to export data for consumption by cardano-cli: https://plutus.readthedocs.io/en/latest/plutus/howtos/exporting-a-script.html -> One needs to serialise with TextEnvelope apparently

Looking at what the plutus script in cardano-node does:

plutusscriptaddr=$($CARDANO_CLI address build --payment-script-file "$plutusscriptinuse"  --testnet-magic "$TESTNET_MAGIC")

it constructs an address from the script file's content which indeed is an "enveloped" serialised script

Wrote an inspect-script executable that output scripts, datums and redeemers for init and abort transaction given a currency and a token. These are written using cardano-node's custom TextEnvelope format which is "semi-readable", now going to try to submit head then abort transaction.

Restarting network from scratch, creating 3 utxos for Alice to use in the head txs

Estimating fees:

$ cardano-cli transaction calculate-min-fee --tx-body-file tx.draft --tx-in-count 1 --tx-out-count 4 --witness-count 1 --byron-witness-count 0 --testnet-magic 42 --genesis example/shelley/genesis.json

I want to send the change back to the genesis utxo, so I need its address: how do I get that?

$ cardano-cli -- shelley address build \
    --payment-verification-key-file example/shelley/utxo-keys/utxo1.vkey \
    --testnet-magic 42
addr_test1vqcvgup2qg3uf525ln7xyj5ymenupyzq6shrwcq08nanm2s2708jd

then

$ cardano-cli transaction build-raw --tx-in 837b43e0ce1da9aabe9794a4c5f8e3da5fde73e5f24927a97862c776357790b3#0 --tx-out $(cat alice/payment.addr)+10000000  --tx-out $(cat alice/payment.addr)+10000000 --tx-out $(cat alice/payment.addr)+10000000 --tx-out addr_test1vqcvgup2qg3uf525ln7xyj5ymenupyzq6shrwcq08nanm2s2708jd+$((900000000000 - 10000000 - 10000000 - 10000000 - 601)) --invalid-hereafter 10000 --fee 601  --out-file tx.draf

signing and submission:

$ cardano-cli transaction sign --tx-body-file tx.draft --signing-key-file example/shelley/utxo-keys/utxo1.skey  --testnet-magic 42 --out-file tx.signed
$ cardano-cli transaction submit --tx-file tx.signed --testnet-magic 42
Transaction successfully submitted.

I now have 3 UTXOs to spend in the scripts.

$ cardano-cli query utxo --whole-utxo --testnet-magic 42
                           TxHash                                 TxIx        Amount
--------------------------------------------------------------------------------------
8930182280603aab400a1856daf20c63a6376cae31f2be584f2493f13fba3b22     0        5010000000 lovelace + TxOutDatumNone
b4f31ac83344988b1cd7bcf8bb150b9e3b4aca519b7f6bc89bc09d545b343f6f     0        10000000 lovelace + TxOutDatumNone
b4f31ac83344988b1cd7bcf8bb150b9e3b4aca519b7f6bc89bc09d545b343f6f     1        10000000 lovelace + TxOutDatumNone
b4f31ac83344988b1cd7bcf8bb150b9e3b4aca519b7f6bc89bc09d545b343f6f     2        10000000 lovelace + TxOutDatumNone
b4f31ac83344988b1cd7bcf8bb150b9e3b4aca519b7f6bc89bc09d545b343f6f     3        899969999399 lovelace + TxOutDatumNone
f517a7081008aa3e658a7f88ad0458bda733d22307a6401a1881cc12ff199890     0        5010000000 lovelace + TxOutDatumNone

Trying to generate script's address fails with

$ cat alice/headScript.plutus
...
"description":"headScript","type":"PlutusV1Script"}curry@haskell-dev-vm-1:~/cardano-node$ cardano-cli address build --payment-script-file alice/headScript.plutus --testnet-magic 42
Command failed: address build  Error: alice/headScript.plutus: Error decoding script: TextEnvelope type error:  Expected one of: SimpleScriptV1, SimpleScriptV2, PlutusScriptV1 Actual: PlutusV1Script

The descriptor type has been changed in recent cardano-node versions, so I need update cardano-node dependencies to have the proper tag. ATM, trying to simply change the type in the plutus files direclty..

$ cardano-cli address build --payment-script-file alice/headScript.plutus --testnet-magic 42
addr_test1wq2rv89vr2mtkfmcqqpzwz0f88sv86h05cw8mz74vcyd9gclj6lqt

🤦 Actually I need the datum hash to build the tx, not the datum of course.

Creating draft tx for Head init tx without outputting any PTs

$ cardano-cli transaction build --alonzo-era --cardano-mode --testnet-magic 42 --change-address $(cat alice/payment.addr) --tx-in b4f31ac83344988b1cd7bcf8bb150b9e3b4aca519b7f6bc89bc09d545b343f6f#0 --tx-out $(cardano-cli address build --payment-script-file alice/headScript.plutus --testnet-magic 42)+1000 --tx-out-datum-hash a6196b078239886432cc8bb0f981cb9f7df54bcf2fb8951b01c6639104a10640 --out-file head.draft

I was finally able to submit the transaction succesfully:

$ cardano-cli query utxo --whole-utxo --testnet-magic 42
                           TxHash                                 TxIx        Amount
--------------------------------------------------------------------------------------
6147dae7ecb37fc1ea0c34e32419c1cc5916244dfb94f1239622d65d0be0d23d     0        9998733 lovelace + TxOutDatumNone
6147dae7ecb37fc1ea0c34e32419c1cc5916244dfb94f1239622d65d0be0d23d     1        1000 lovelace + TxOutDatumHash ScriptDataInAlonzoEra "a6196b078239886432cc8bb0f981cb9f7df54bcf2fb8951b01c6639104a10640"
...

Now checking I can actually consume the transaction! Unfortunately, serialisation formats definitely have changed for datums too:

$ cardano-cli transaction build --alonzo-era --cardano-mode --testnet-magic 42 --change-address $(cat alice/payment.addr) --tx-in 6147dae7ecb37fc1ea0c34e32419c1cc5916244dfb94f1239622d65d0be0d23d#1 --tx-in-collateral 6147dae7ecb37fc1ea0c34e32419c1cc5916244dfb94f1239622d65d0be0d23d#0 --tx-out $(cardano-cli address build --payment-script-file alice/headScript.plutus --testnet-magic 42)+1000 --tx-out-datum-hash 08090cf3024c750773519501c52bec72749c28d8732dcafc3690c2f77793f84e --tx-in-script-file alice/headScript.plutus --tx-in-datum-file alice/headDatum.plutus --tx-in-redeemer-file alice/headRedeemer.plutus --out-file abort.draft
Command failed: transaction build  Error: Error reading metadata at: "alice/headDatum.plutus"
JSON schema error within the script data: {"cborHex":"d8799fd8799f1b000000e8d4a51000ff80ff","description":"headDatum","type":"ScriptDatum"}
JSON object does not match the schema.
Expected a single field named "int", "bytes", "string", "list" or "map".
Unexpected object field(s): {"cborHex":"d8799fd8799f1b000000e8d4a51000ff80ff","description":"headDatum","type":"ScriptDatum"}

⚠️ Trying (again) to upgrade dependencies following what's in the plutus cabal.project

When upgrading dependencies I have run into more nix/cabal/hackage issues and was unable to upgrade cabal.project alone.
Now trying to build the project with updated plutus dependencies not using nix: downloaded and installed ghcup with ghc version 8.10.7

2021-10-08

Goal: Upgrade dependencies to more recent Plutus, Ledger and Cardano-node

Try upgrading hydra-poc dependencies following https://github.com/CardanoSolutions/ogmios/blob/5048fb6cd9eb245b4062191220ad96e945d66258/server/cabal.project

Hitting an issue with plutus-contract package not building, checking what the dependencies are in Plutus at this commit

Dependencies in ogmios are actually too old for plutus. The revision pointed at dates back from 2 months ago:

commit edc6d4672c41de4485444122ff843bc86ff421a0
Merge: 569f98402 63c6ca8ac
Author: Michael Peyton Jones <michael.peyton-jones@iohk.io>
Date:   Fri Aug 20 10:43:53 2021 +0100

    Merge pull request #3430 from input-output-hk/hkm/windows-cross

    windows cross compile

Starting from plutus' master which might be a better choice for us

Interestingly, plutus depends on a fork of cardano-ledger-specs:

source-repository-package
  type: git
  location: https://github.com/raduom/cardano-ledger-specs
  tag: ef6bb99782d61316da55470620c7da994cc352b2

The pointed at commit (https://github.com/raduom/cardano-ledger-specs/commit/ef6bb99782d61316da55470620c7da994cc352b2) says:

Make the code compile with a newer plutus version
 raduom/plutus-exbudget-error

Now trying to update cardano-ledger-specs following changes in directories structure Looks like updating those dependencies will be a nice 🐰 🕳️
Build fails because of missing liblzma dependency, added
```
    pkgs.lzma
```
to the shell.nix file and now it's recompiling nix!
Reverting all my changes as it has become a mess. Starting over from the nix shell dependencies as it seems to be the root to be updated. I am in the situation I definitely would like to avoid: I need to update dependencies and for this I need to modify nix stuff which means I need to understand what I am going and what to update where. But I don't really know what I am doing and SN is away and has been the one updating dependencies and maintaining the nix infrastructure in the past -> Bus factor = 1
When last updated dependencies SN used nix-shell -A cabalOnly to not use haskell.nix which seem to have helped him, will try this.

Adding lzma dependency to the shell.nix, also upgrading GHC to 8.10.7, haskell.nix archive to a more recent one and nixpgs reference to 21.05

Needed to materialise nix plan and it's now compiling all base dependencies
Struggling with nix to get the update to 8.10.7 to pass, now I need to add some more libsodium configuration for some packages from https://github.com/input-output-hk/plutus/blob/master/nix/pkgs/haskell/haskell.nix#L256 => Just duplicated the libsodium-vrf declaration from shell.nix to default.nix and it now seems to work

Seems like upgrading Plutus won't be easy: MPJ failed to upgrade dependencies to cardano-ledger-specs in their repository, he had to create a separate branch to add some pending changes. It's probably safer to stay as we are now even with the issues we are having.

List of archiectural katas: http://nealford.com/katas/list.html

2021-10-07

Goal: Fix Direct on-chain component abort transaction validation failure

Got failing test for init -> abort transaction logic, going to add traces to understand what's failing
Checking the scripts and datums hashes to make sure the transaction provide all of them
Modifying cardano-ledger-specs to add more verbose output when script validation fails. Trying to enter nix-shell in cardano-ledger-specs to be able to test my changes, took about 10 minutes to enter nix shell, now doing a cabal build all in the ledger specs directory to compile stuff
- Depending on external dependencies like the ledger increases turnaround time to insane levels: Now need to check it works in the original repo before changing the reference in the hydra-poc repo, because otherwise every commit will require a full recompilation which is ridiculously expenseive
- cardano-api depends on ValidationFailed error's structure so I need to also adapt the code there because I changed the error in alonzo => Rather than modifying the cardano-ledger-specs I am going to work at a lower level, namely testing the script directly with Plutus, as a unit test
- Just added Verbose logs with Debug.Trace.trace in ledger spec then update the dependency in hydra-poc
Abort Tx test fails with
```
["mustRunContract: script not found.","Pd"]
```
which is exactly what MB was seeing the last time, which seems to imply the script cannot be found, either because the hash is invalid or some other reason. => looking at the source of the error

Trying to write a test using https://github.com/input-output-hk/plutus/blob/master/plutus-ledger-api/src/Plutus/V1/Ledger/Api.hs#L262

This is hard because I need to build the ScriptContext which requires a full transaction which is difficult to build by hand. Could build the transaction in the ledger and then use the functions to translate it to Plutus but thought I might as well check first what the logsa are saying

Trying to uncomment the mustRunContract function which is the one resolving the contract references we need to validate the output is correctly spent: This function fails to resolve the script, when replaced with a const True function the test does not pass but the scripts execution succeeeds

display the Dependencies content and compare with what's in the transaction Hashed dependencies show:

[581cf0bce8043dc5f9c32ebad31652e239a8f15d1bf01f4d8d1b9740f73f,581c2e95c0a89c450a245d3324d16260797b54f2010e2ea494e5214323c9]

but the scripts' hash in the transaction are :

5d8dd23697de989275a58ef20edeacb320994f590cf0e10a0163cf3a
f0bce8043dc5f9c32ebad31652e239a8f15d1bf01f4d8d1b9740f73f

Trying to simplify dependencies hash computation to use the validatorHash provided in the SC code. AFAICT The validatorHash ultimately uses the same hash function, the one from Cardano.Ledger.Era.hashScript

Interestingly replacing the hash computation yielded the same hashes

So I still have the same error... Now investigating what the hashes look like on both sides and trying to find how the Credential in the TxInInfo we are filtering is constructed on the ledger side

transCred :: Credential keyrole crypto -> P.Credential
transCred (KeyHashObj (KeyHash (UnsafeHash kh))) = P.PubKeyCredential (P.PubKeyHash (P.toBuiltin (fromShort kh)))
transCred (ScriptHashObj (ScriptHash (UnsafeHash kh))) = P.ScriptCredential (P.ValidatorHash (P.toBuiltin (fromShort kh)))

🍾 I managed to have the abortTx validates its scripts. The issue was indeed in the way we construct the hashes. It was unclear to me why we are seeing different hashes between Plutus.validatorHash and Ledger.hashScript but I finally found the reason: We are using an "old" version of Plutus.

Hash Computation

Here is the code that computes a `ValidatorHash` given a script

validatorHash = ValidatorHash . scriptHash . getValidator

scriptHash :: Script -> Builtins.BuiltinByteString
scriptHash =
    toBuiltin
    . Cardano.Api.serialiseToRawBytes
    . Cardano.Api.hashScript
    . toCardanoApiScript

toCardanoApiScript :: Script -> Script.Script Script.PlutusScriptV1
toCardanoApiScript =
    Script.PlutusScript Script.PlutusScriptV1
    . Cardano.Api.PlutusScriptSerialised
    . SBS.toShort
    . BSL.toStrict
    . serialise

Then the code for Cardano.Api.Script.hashScript :

hashScript :: Script lang -> ScriptHash
hashScript (SimpleScript SimpleScriptV1 s) =
...
hashScript (PlutusScript PlutusScriptV1 (PlutusScriptSerialised script)) =
    -- For Plutus V1, we convert to the Alonzo-era version specifically and
    -- hash that. Later ledger eras have to be compatible anyway.
    ScriptHash
  . Ledger.hashScript @(ShelleyLedgerEra AlonzoEra)
  $ Alonzo.PlutusScript script

Where Cardano.Ledger.Era.hashScript is a method of ValidatorScript typeclass with Era-dependent implementations, The generic implemetnation says:

-- UNLESS YOU UNDERSTAND THE SafeToHash class, AND THE ROLE OF THE scriptPrefixTag
hashScript =
  ScriptHash . Hash.castHash
    . Hash.hashWith
      (\x -> scriptPrefixTag @era x <> originalBytes x)

but the implementation for Alonzo says:

instance (CC.Crypto c) => Shelley.ValidateScript (AlonzoEra c) where
  scriptPrefixTag script =
    if isPlutusScript script
      then "\x01"
      else nativeMultiSigTag -- "\x00"

So it seems it hashes not only the script's serialised content but also a prefix tag of 0x01!

In the https://github.com/input-output-hk/cardano-node/blob/master/cardano-api/src/Cardano/Api/Eras.hs#L336 file we have:

  ShelleyLedgerEra AlonzoEra  = Ledger.StandardAlonzo

with the latter being defined in https://github.com/input-output-hk/ouroboros-network/blob/master/ouroboros-consensus-shelley/src/Ouroboros/Consensus/Shelley/Eras.hs#L89 as

type StandardAlonzo = AlonzoEra StandardCrypto

So all in all the computed hash values should be equal!

It happens it all make sense: The hashes are actually consistent but in a more recent version of Plutus code than the one we are using! The version of plutus we use is at commit 36dcbb9140af0c9b5b741b6f7704497d901c9c65 which contains this code for hashing scripts:

scriptHash :: Serialise a => a -> Builtins.BuiltinByteString
scriptHash =
    toBuiltin
    . Crypto.hashToBytes
    . Crypto.hashWith @Crypto.Blake2b_224 id
    . Crypto.hashToBytes
    . Crypto.hashWith @Crypto.Blake2b_224 id
    . BSL.toStrict
    . serialise

2021-10-06

Engineering Meeting

Discussing so-called “star-shaped head network” protocol draft:
- There's one server which is part of a Head, or even is running a Head alone
- There are many clients connected to the server
- Client <-> Server are connected through 2-parties isomorphic channels, eg. "mini-Heads" that should be simpler than full multiparty head but with the same properties: Isomorphic, safe, requiring being online to ensure progress.
- Transactions can flow from one client to the others through the pairwise channels mediated by the Head which acts effectively as a bridge
- This channel construction is similar to Perun/lightning channels and can easily be leveraged to give Virtual channels network,
- On-line requirement is needed to ensure safety without collateral from the server like in T2P2: When offline, channel with the server is stale so client needs at least to be periodically online. This could be good fit for light/mobile clients provided there's a way to have a safe access to the chain's state (Watchtowers?)
- This implies some form of multi-protocol support inside a single node is needed (relaying, different protocols between different parties)
Discussing a potential collaboration with Perun researchers/engineers for an alternative way to inter-connect Hydra Heads (using virtual perun channels?)
Some more discussion about NFTs on Hydra Heads:
- analogy with "Scotty, beam me up" -> do you transport the matter or destruct/reconstruct it somewhere else?
- NFTs in Head -> MintingPolicy allowing to remint the NFT on-chain in the fanout

Other

Published Milestone report urbi et orbi. Highlights are:

good feedback from Summit and outcome of team Workshop in Berlin leading to refined understanding of short term goals and use case
while we managed to have a working demo in time for the summit, we did not "close the loop" and were not able to run Hydra Head cluster over an actual Cardano Alonzo testnet,
we are on track to provide a roadmap and implementation plan for S1 2022 by end of October 2021.

I want to clean up the PR backlog before tackling the Direct transaction submission problem, going to fix log-filter tests and process wrapper to ensure we can merge that today.

Monitoring test is flacky on CI, although I changed the way we allocate ports => Fuse the 2 unit tests in one because it does not make much sense to have 2 separate tests for the same "behavior"

Thinking it could be a "fun" side-project to implement Golomb-Rice set in Haskell: https://github.com/btcsuite/btcutil/blob/master/gcs/gcs.go

Added a section in demo/README.md to start the demo without docker as requested by someone on the discord channel.

18:19 (hydra-poc)

Also all PRs are merged and the only one left is the "direct chain interaction" https://github.com/input-output-hk/hydra-poc/pull/90 to have an Init -> Abort sequence working properly on-chain (or at least with transactions validated by the ledger)

2021-10-04

Working on Wallet PR KtorZ/ADP-919/sign-transaction to add more integration tests

https://cbor.me enables decoding a base64 encoded CBOR data

Got stuck with testing transaction signing with withdrawal(s), got an error with CBOR decoding of TX on the server side. Wrote a unit test at the deserialisation level for sealedTxFromBytes which led us to realising the roundtrip test was not covering much of the structured of the sealedTx.
Possible investigations: Cover more fields in the roundtrip, also try to get better error report at CBOR level
The problem was that we passed the serialised sealedTx into the quasiquoter constructing the payload to the sign transaction endpoint, instead of the value itself. Seems like there's a instance ToJSON ByteString in scope.
A SealedTx is just a wrapper around the raw bytes of a cardano transaction, not clear what the other fields are used for. It can be produced by parsing some bytes as it is just a cardano transaction in CBOR encoding. The question is: How is the bytestring encoded in user requests?
- In JSON, it is assumed to be a string containing the base64 encoding of the transaction, but the cardano-cli output base16 encoded raw transactions for signing.
- So we should accept both encodings for an ApiT SealedTx in order to minimize friction for end users.

Not having haskell-language-server working is painful, tried to install HLS from nix and source but does not work.

Trying to use HLS in the cardano-wallet does not work out of the box, seems like one needs to do more work: using scripts/gen-hie.sh. Wallet has more than 100 modules in the core packages which leads to long compilation time esp. without immediate feedback from HLS, this is painful.

Not sure me helping the wallet for a couple of weeks is very useful and productive use of my and Matthias' time: I won't be able to learn much of the codebase in the given time frame so won't be autonomous and will need to ask lot of questions and get a lot of guidance, for a net ROI which is probably negative as I won't staty working on the wallet. I could pair and possibly contribute useful observations and a second pair of eyes but this would require pairing most of the time with different people so unsure if that's the gaol.

September 2021

2021-09-29

Musig2 spike

Adding a shell.nix to get build tools like clang in scope
Libsodium: the musig2_compat branch is quite different to the one we use in the hydra-node -> make sure to rebase the necessary changes later
Got the musig2test working -> nix-shell --run "make && ./musig2test" on https://github.com/ch1bo/musig2/tree/ch1bo/build-via-nix
Plan: dump keys and signed message from ./musig2test, load them in haskell and run them through Plutus' verifySignature -> no FFI required for now
Writing keys and message was straight-forward
- quite nice to do C for a change!
- What is the format of the signed message / envelope?
The example uses libsodium's combined mode
- Separated signatures are more likely what we would be requiring when wrapping this into a library
- Split it by hand after the fact for now
- Signature length should be 64 bytes

2021-09-28

Some notes from the workshop

Realization: Protocol validators need to ensure that the tx spending the contract output and creating the next datum, needs to make sure that "others" can now the datum used for the datum hash. For example: The close tx validator needs ensure that (at least the) snapshot number is included such that other participants can re-construct the datum using the number + stored snapshots in order to spend the output / contest.
Making the datum simple helps in keeping the script output "spendable". In the end, the datum is a secret and if we want it easy to be able to contest, just having the snapshot number as datum is maybe enough?
Walking through the on-chain validators, datums and redeemers again
- Fanout is quite complex, it might get expensive with many outputs
- Good thing: it is deterministic and costs would be known up-front and can be avoided before acknowledging them in the Head
- Optimizations to keep the utxos in the head small could be worthwhile
Utxo set size and costs created by that need to be tractable and not hidden from users of hydra-node
- Applications / operators should be able to take action and decide on such things
- There are hard limits though based on main-chain protocol parameters

2021-09-23

AB Solo

Looking at document about Plutus extensions provided by MPJ, which is referring to Hydra in a few place. Proposals are:

Add reference inputs to transactions, eg. inputs which are not consumed by the transaction but whose datum/datum hash are available to scripts
Use inline datum instead of only datum hashes.
Provide script references which is a combination of the above 2 proposals, to remove the need to provide a script as witness in the consuming transaction every time it is used.

Found it difficult at first to understand what I had to do, and how to properly extend existing code to do what I want, eg. retrieve data from Init transaction so that it can be consumed by the Abort transaction. It's actually hard to shape one's thoughts along the lines of another person's thoughts, esp. when in "experimentation" mode and we take a lot of shortcuts, or drift away from the actual goal because it's too complicated to do in one step. In this situation, probably everyone would do a slightly diffrerent step and take a slightly diffrerent direction.

This shed some lights on the importance of pairing/mobbing to share the context of the code we write when it's not already obvious. An alternative is to be very explicit about the goals, and the intermediate steps we are taking, and the assumptions we make about the environment, the shortcuts we take. They are here but disseminated across different functions and files which makes building a big picture hard.

Trying to add logs to the direct chain component, seems however we don't have JSON instenaces for ValidatedTx? -> using only Show for now but should be fine

How do I make a TxIn from a TxOut? => hash the TxBody and add the index

Got to the point where I have only one failing test, namely the one about init -> abort dance which now makes sense.

I need to properly observe the abort transaction from the chain and make sure it has the necessary inputs from the current head state.

Actually there's currently no way to link init to abort because:

Init produces a single output which is the address of the main (SM) script with the parameters as datum
Abort consumes an output for the validator Script with the pubkeyhash of the recipient
Abort should also consume the SM output and pass the parameters as datum which is what we need to verify first

Going back to basics, here are next steps to be able to do the init -> abort sequence correctly:

Add the output for the SM to the Init tx
Make sure this output updates the on-chain state
Have the Abort transaction consumes this output
Add the thread token inferred from the seed txin
Add parties' verification keys to the head parameters
Mint the PTs (using Thread token) and create one output per party with the PTs and their verification keys
Consume those output in the abort tx

In parallel we need to write and check the validator scripts themselves as this is not really done in our tests because the mock ledger does not verify anything, of course.

Got state change test is green but now the abort tx unit test validation fails with:

Evaluation results: fromList [(RdmrPtr Spend 0,Left (ValidationFailed (CekError An error has occurred:  User error:
The provided Plutus code called 'error'.)))]

Recompiling ledger-specs setting the flag for evaluation to Verbose in order to get better logs
It seems there's another problem in the Head validator's state machine as we don't pass any ThreadToken, or rather we pass one but do not use it to instantiate the SM hence it's more than probable the evaluator fails to find the SM -> MB adapted the interface in another PR
Trying to switch the validator to a simpler one, and check I can build the aborttx, possibly also checking I can sequence the transctions and observe the aborttx. With a simple (parameterized) validator, script evaluation succeeds just fine even though the test fails because the count of results is incorrect, but next execution fails
With a single script reference to the MockHead validator it passes, so I must be doing something wrong with thr RdmrPtr logic.

There is an issue related to how it resovles its redeemers.

I was using directly RdmrPtr passing an incremented counter but of course this does not make sense because inputs is a Set.
One need to either sort the inputs by (TxId, TxId) order or use the rdptr function from the ledger API that does the right thing to associate the redeemer with the right input.

2021-09-22

Engineering Meeting

Topic: What to do about rollbacks?

Actually, theses are not really rollbacks it's just a longer chain was found. It's important because this means we are not going back in time, eg. the new chain will be at the same "moment" in time than the new one.

CollectCom is the most sensitive transaction, if rolled back it's as if nothing happened in the head which might be quite annoying if people are expecting head transactions to be "final" or "settled" and do side-effects depending on it.
Close and Contest could also be problematic as they are time-bound (by the contestation period) and it could be the case some contests "disappear" for want of time to post them in case of a rollback
Other mainchain transactions are less problematic, they can simply be resubmitted

Hydra users need to be aware of the settlement time on the mainchain, for exeample there is a 600s limit on Kraken for payments on Cardano to be considered final. To be safe against adversarial nodes, one needs to wait for some number of blocks (there is a document available providing some simple tables relating expected probability of "failure" to adversarial stake and number of blocks to wait, eg. for 5% adv stake, 0.01% failure, one has to wait 73 blocks or ~20 minutes)

The contestation period needs to be set to some large enough value, eg. larger than expected time to get a rollback
Validators do not get absolute time slots, they only get the validity range of the transaction which, in the case of Close/Contest transactions includes the contestation period, and because scripts run after stage 1 validation, they can assume the range is valid. From there, the validator can check if the range falls within accepted bounds
Also, T_max should not be too large as to prevent the head from making progress, but this can verified by the validator too in the range

Consequences for Hydra Head:

HeadLogic needs to be aware that its state could be "rolled back", eg. an onchain transaction can reset the state to something else, even while the head is opened => This could be property tested, we had something similar at one point
The settlement time should be a parameter of the node set by users, depending on how long/what risk they are willing to take w.r.t to rollbacks
The contestation period should be set large enough, possibly in relationship to this settlement time?
The OnChain component could be the one doing the wait, retaining OnChainTx until enough blocks have passed before notifying node

2021-09-21

SN Solo

Plutus validators are also Blake2b_224, but why did the fromJust not work before? case solved it, hash conversion works now
Get a MissingScript error now

       Falsified (after 1 test):
         TxInCompact (TxId {_unTxId = SafeHash "03170a2e7597b7b7e3d84c05391d139a62b157e78786d8c082f29dcf4c111314"}) 180
         Utxo: UTxO (fromList [(TxInCompact (TxId {_unTxId = SafeHash "03170a2e7597b7b7e3d84c05391d139a62b157e78786d8c082f29dcf4c111314"}) 180,(Addr Testnet (ScriptHashObj (ScriptHash "1302a9a442fa86e8e836aa39d961ec3e71f500f21a633ae0cf2b60b1")) StakeRefNull,Value 0 (fromList []),SJust (SafeHash "faa51ea0059e04224cc13da34b53bba807fb2affd71ee401e85dfa3f769081fd")))])
         Tx: ValidatedTx {body = TxBodyConstr TxBodyRaw {_inputs = fromList [TxInCompact (TxId {_unTxId = SafeHash "03170a2e7597b7b7e3d84c05391d139a62b157e78786d8c082f29dcf4c111314"}) 180], _collateral = fromList [], _outputs = StrictSeq {fromStrict = fromList []}, _certs = StrictSeq {fromStrict = fromList []}, _wdrls = Wdrl {unWdrl = fromList []}, _txfee = Coin 0, _vldt = ValidityInterval {invalidBefore = SNothing, invalidHereafter = SNothing}, _update = SNothing, _reqSignerHashes = fromList [], _mint = Value 0 (fromList []), _scriptIntegrityHash = SNothing, _adHash = SNothing, _txnetworkid = SNothing}, wits = TxWitnessRaw {_txwitsVKey = fromList [], _txwitsBoot = fromList [], _txscripts = fromList [(ScriptHash "a893ca7be59f00c935c382fd8f8e515adcc9850f1ec5dbafbe99face",PlutusScript ScriptHash "a893ca7be59f00c935c382fd8f8e515adcc9850f1ec5dbafbe99face")], _txdats = TxDatsRaw (fromList []), _txrdmrs = RedeemersRaw (fromList [(RdmrPtr Spend 0,(DataConstr Constr 0 [],ExUnits {exUnitsMem = 0, exUnitsSteps = 0}))])}, isValid = IsValid True, auxiliaryData = SNothing}
         Evaluation results: fromList [(RdmrPtr Spend 0,Left (MissingScript (RdmrPtr Spend 0)))]

Reason: address script hash of utxo is different than the provided scripts

Apparently hashScript is different than hashFromBytes via ValidatorHash
- suspect: the double hashing in plutus' scriptHash is suspicious and maybe the ledger does not expect that?
- use the same technique in both places to see whether it's just the hashing or the serialized script -> will it run?

Pairing session

SN explained the current situation of transaction creation to AB
Started off with the MissingDatum error when running the initial validator against an abortTx
Solved it by providing the PubKeyHash to abortTx, but this is really only a quick fix. The abortTx should actually spend all the outputs which contain PTs.
By providing the PubKeyHash, abortTx validates now against initial!
Bad news: We realize that we need some "onchain state" which is tracking utxo + which datums would be able to spend these
- e.g. for the abortTx we would need to have something like [(TxIn, PTSpending)] with
```
data PTSpending
= FromInitial PubKeyHash
| FromCommit (UTxO Era)
```
  to keep track from where and how utxo's would need to be spent by this transaction
- doing things the "Direct" way is hard!

SN Solo

Give another shot on PAB, tests fail because of a missing nextTransactionsAt
- awaitUtxoProduced seems for a good replacement, providing us with outputs and txs
- along with that there is txOutRefMapForAddr for filtering certain TxOut
- can't seem to find/import ChainIndexTx though
typeScriptTxOut won't decode the datum because that ChainIndexTxOut is not necessarily containing the tx and thus it can't know the Datum (but just the hash)
- txOutRefMapForAddr gives directly TxOuts which do only contain datum hashes
- re-combine into a ChainIndexTxOut with ChainIndexTx or use utxosTxOutTxFromTx as Plutus.Contract.StateMachine
Changed to just merge all utxos seen in watchInit together and try to decode the right datum from any output
- It turns we do not need the address of the state machine validator anymore
- Maybe this is inefficient
The Head statemachine contract can't be observed for Final state as it does not produce a txOut
- re-defining isFinal Final = False works around this

2021-09-20

SN Solo

Idea: running the plutus validator in haskell against constructed transactions
Which serialization of plutus scripts?
- From TypedValidator we would use tvValidator / mkValidatorScript to get a Validator
- This is an instance of Serialise (from serialise package)
- However Validator is only a thin wrapper around Script, which has a ToCBOR where we can use cardano-binary's serialize? -> opted for this one
Debating on how to run the validator now
- evalScripts takes the TxInfo already as Data, so maybe use collectTwoPhaseScriptInputs - this is from the ledger
- there are other (more low-level) ways from plutus-ledger-api package
- evaluateTransactionExecutionUnits is used by cardano-api, maybe also fine for us / our tests?
- the constructValidated function looks promising for a template to collect+eval scripts, although this function seems not to be used anywhere
Call into plutus evaluateScriptCounting directly as we want to evaluate a specific script on Tx
Realized that validating the init tx with the initial contract does not make sense!
- rather the commit tx or abort tx would be goverened and thus need to validate with the Initial script
- those txs would also include said script in order to spend the input, so using the plutus functions is too low level and we should be able to use the collect+eval scripts from ledger after all
- Refactor to use evaluateTransactionExecutionUnits from ledger
What is that ScriptIntegrityHash used for?
Converting from a plutus validatorHash/Address to the ledger's Addr is weird
- also could not find whether / where plutus is doing the same thing?
- BTW they also have exactly a mock chain sync client and server but this seems to be using their own Tx type (see Plutus.V1.Ledger.Tx)
~~It's not only weird, but it fails because plutus does not give us blake2b_224 hashes (likely sha256 instead)~~

2021-09-17

Direct Chain Component

We discussed the new approach for on-chain interaction as an alternative to the PAB.

We were able to complete a full round-trip using the Ouroboros mini-protocols, though the transactions being submitted and deserialized are not representative of the actual chain interactions. It still demonstrates how the setup works and, that it's possible to fully test the approach in isolation with a mock server.
We agree that we want to keep "wallet concerns" inside the component, and not leak it through the abstraction to keep the solution as much as possible close to what the PAB gives us. Ideally, we could swap one implementation with the other once done.
The last point means that we have to provide the direct-chain component with credentials to be used for (a) signing Hydra on-chain transactions and (b) tracking users' funds to pay for those transactions. A simple pub/prv key pair would do, from which we can derive a change address and track the UTXO set easily. This means that the Hydra node would initially require users to move some funds into a specific address that they control, but gives "custody" to the Hydra-node for running a head. It's important that funds at the address are sufficient to cover a full head lifecycle (init, close and more importantly, contest) and we should warn users accordingly if not (or even, refuse to init a head).
The current implementation of the chain-sync client is wrong (and we know it) as it will synchronize blocks from the origin always. What we want however is to start at a much later point, for example, the current tip and onwards. This is easily achieved with the chain-sync protocol itself but, comes with a limitation: all participants have to be online and observing the chain before any init transaction is submitted to the network. While it is okay-ish for now, if we persist with that approach, we'll have to provide some synchronization mechanisms between peers.
We are also current over-simplifying the problem by considering that participants are only member of a single head at a time. Thus, looking for on-chain transaction does not currently check whether a transaction is indeed about a given head, but only checking whether it involves "us" as a participant. Later, we'll want to also recognize which head instance is concerned by an on-chain transaction, which can be done through the mean of the state-machine thread token.
The question of rollbacks was raised again. In principle, in Praos, a node can rollback up to 3k/f slots (~18h) so, transactions only truly reach immutability after 18h. Yet, they reach a high enough probability (99.999%) well before that; Still, depending on the adversarial stake in the system, this can vary between few minutes to hours. We want to bring this question to the next engineering meeting with the consensus team. There are really two types of rollbacks:
1. 'organic' rollbacks, which can occur because of Praos and how the consensus sometimes elect two or more leaders for the same slot. This type is quite benign, and transactions lost by such rollbacks can simply be re-submitted if needed. Although this is annoying for the contestation, it can likely be managed gracefully.
2. 'adversarial' rollbacks, which are induced by an adversarial party trying to double-spend. For example, one head participant could commit to a head some funds he/she is also trying to double-spend at the other edge of the network. This would result in head participants thinking they're indeed inside a head, whereas in practice, nothing really happened.

SN Solo

Work stream on the "round-trip" of InitTx by posting and observing it again on chain - anologously as with the ExternalPAB
Start with storing/recovering the HeadParameters as Datum
- before diving into minting the thread token, participation tokens etc.
Managed to create and add a script output, BUT
- while I could check that there is an output with some datum hash (even checking the hash)
- cardano-api seems not to include the Datum when creating an output (in contrast to the plutus framework)
- So while the test would need to also signShelleyTransaction to see script witnesses in the transaction, the ones for "creating outputs" are not present
- This would make it impossible to deserialize the HeadParameters from an initTx
- ... back to cardano-ledger-specs for more control?
Switching to cardano-ledger-specs was not too hard
- Could construct the TxBody with the single input
- Now the question is how to assert that the datum is present?
- Return Tx without signatures and update that later or introduce an intermediate type just for this? e.g.
```
data TxDraft = TxDraft
  { body :: TxBody (AlonzoEra StandardCrypto)
  , -- | Datums used by scripts in the body.
    dats :: TxDats (AlonzoEra StandardCrypto)
  }
```
- Opted for simply returning an unsigned (and unbalanced for that matter) ValidatedTx
Interesting observation: converting between HeadParameters and Initial / onchain representations is explicit now in the tests / not via JSON
observeTx could serve as the pendant of constructTx
- Using Alternative Maybe we could provide for a nice interface
- Is this efficient?

2021-09-16

AB Solo

Trying to enhance log-filter program to be able to fork a hydra-node program and filter the logs it produces directly from its stdout, rather than filtering logs on disk. Ran into some problems:

children are not properly reaped when parent dies apparently
passing arguments to the log-filter program is problematic when invoking hydra-node through cabal run because I need to pass -- twice

Ensemble Session

Continue working on implementing a mock node in order to test Direct chain component that will build transactions from PostTx messages and send back OnChainTx messages from observed transactions in blocks.

Implementing mock TxSubmission server using a TQueue to hold the transactions, shortcutting the block construction with 1 block = 1 tx

Find intersect: Client sneds some points which are supposed to be in the ledger, server will respond with the latest chain in the sent points
Server maintains a cursor for each client and send them updates on request, which could be backward/forward

Design discussion about next steps for "mock node", or how to write server and client in order to test transactions are observed on both sides:

We need one peer per client per protocol
The client (production) and server-side (test) code should both be ouroboros applications
We can build a pair of Channels to the client/server? => the Mux can use a Channel
- Or we need to bite the bullet and use a Snocket

Next steps:

Complete network layer
Add smart constructors to create transactions for Hydra protocol
Which API to use? Cardano-api or ledger-api? => Which one is easier
- We used the cardano-api in the tests to decouple from the hydra-node library code, and went for using the JSON schema API instead

SN solo

Goal: smart constructors for protocl txs, e.g. PostChainTx -> Tx AlonzoEra
Start working on the "chain tx" constructors, created Hydra.Chain.Direct.Tx as separate module as it provides for a good "seam" for unit tests
Which Tx type to use? There are at least:
- GenTx (CardanoBlock StandardCrypto) from ouroboros-consensus?
- ValidatedTx (AlonzoEra StandardCrypto) from cardano-ledger-specs?
- Tx AlonzoEra from cardano-api?
Discussion with KtorZ on whether we'll have a Tx or a TxBody and "who signs transactions"?
- Conclusion: Signing / keeping keys at the client would be morally the right thing, but we want to do it the same as we expected the PAB to work, i.e. "the system" has the keys and does sign txs / spend
- The smart constructors though, should be producing TxBody and the withDirectChain would have access to some SigningKey
Started with using the cardano-api types as they were easy to handle in the e2e test
All of these functions will be something like .. TxIn -> Either TxBodyError (TxBody AlonzoEra)
For initTx we can use a single TxIn for paying fees & as the parameter of minting thread token & participation tokens
Problem: No Arbitrary instance for cardano-api's TxIn
Shift to using cardano-ledger-specs as it has Arbitrary TxIn in one of their test package, BUT
- it's easier to shoot yourself in the foot with this API. i.e. nothing prevents you from creating a TX without inputs
- not sure if this is good!?
Was pointed at hedgehog Gen TxIn for cardano-api -> switch back
Down a rabbit hole how to run a hedgehog Gen in a QuickCheck Gen
Was pointed at hedgehog-quickcheck, where I missed the hedgehog -> QuickCheck direction by being blind
Back in the flow of creating tests against initTx :: ... -> Either TxBodyError TxBody -> success with a generated TxIn
Adding TxOut parameter to calculate and add a change output
- Fails if fees /= 0 because there might not be enough in the generated txOut
- create a ===> implication to ensure enough lovelace
- hedgehog Gen TxIn seems not to be scale-able from QuickCheck
Seeing the complexity of makeTransactionBodyAutoBalance had me pivot to change initTx create TxBodyContent instead and have the balancing, change and fee calculation be done by this function

2021-09-15

Engineering Meeting

Presentation of early benchmark results
- Evolution of plots and performance over time
- Effect of various dimension (number of nodes, generator type)
- We have 2 generators currently, a "standard" one which uses ledger's genTx producing "large" transacions and growing UTXO set, and a "constant" one that produces micro-payments like transcations (one input, one output) and keeps UTXO set constant
Discussion:
- Suggestion: Do a heap profile of each nodes, does not require recompilation with profiling enabled
- "Fractal benchmark":: SN can't reproduce behaviour of simple txs over ouroboros network, so perhaps an artifact of a single run ona specific machine?
- Could be good to run on bare metal to get some real benchmarking, but virtualised environments are somewhat consistent with probable deployment model
- We should take some time with Marcin to check we are not doing something stupid in the protocol layer
- Question: Why does the performance seems to degrade over time? => we don't know (yet), requires more investigation
- Extract some info from RTS from the nodes and use embedded Prometheus server to get data at regular interval
Discussion about PAB vs. direct interaction with the chain:
- Review with Marcin how to integrate with NodeClient protocol for the purpose of testing direct submission of txs to the chain
- Also have a look at how Plutus implements Mock chain server
- We want to get something running sooner, it's tactical decision for short-term unblock us
- PAB has taken a while to build because it's actaully complex, and deals with a lot of intrisic complexities of the chain (and node was also a moving target...)
- Once PAB is ready we will have an easier time because dev will be user-driven
- Even if we get it running now with the "direct" approach, it does not mean we won't be using the PAB (as-a-library) for production later

2021-09-14

AB Solo

Finish reading ACE Paper

2021-09-13

AB - Benchmarks

I would like to count the UTXO set size after each transaction, then correlate that with the transaction confirmation time (or ReqTx time?) to check if this shows any correlation with the latency increase.

I now have a list of txId/utxo size pairs, now need to correlate that with the time the transaction was confirmed. Need to have a table of list of tx ids with confirmation time from the SnapshotConfirmed message

Struggled a bit with jq syntax to extract timestamp for each transction in the confirmedTransactions field of a snapshot confirmed message but turns out it's pretty simple:

$ cat log.1 | grep ProcessedEffect | grep SnapshotConfirmed | jq -cr '{ts:.timestamp, txs: .message.effect.serverOutput.snapshot.confirmedTransactions[]}' | jq -cr '[.txs, .ts] | @csv'

The confirmedTransactions array is actually "flattened" and one record is generated per element.

Trying to map UTXO over time, I messed up with the logic: I need to apply the list of transaction in the order of their confirmation to keep track of the growth of UTXO se over time ; instead I simply concatenated the list of transactions in the dataset which does not make sense. The confirmation time for each tx is good however, need to sort the file on this.

Wrote a small haskell program to extract the information I need from the dataset.json file and the confirmed-txs.json, namely the size of UTXO per transaction confirmed ordered by time.

Then I can produce a time series for the size of UTXO set with:

cat utxo-size.json | jq -cr '.[] | @csv' > utxo-size.csv
join -t ',' utxo-size.csv confirmed-txs.csv  | cut -d ',' -f 2- | tr -d \" | awk -F ',' '{ print $2, $1 }' | sort > utxo-time

Now trying to run a benchmark with a different transaction set, namely one where the UTXO set does not grow over time.

Struggling to get the API rights to generate a sequence of transcations that cycle among aUTXO set.
In the Cardano module we use Crypto.Ledger.Keys but this is not in the cardano-api which has different types, ended up exposing a function to extract verification key from keypair as I need the former to generate UTXO, and the latter to generate addresses in transactions.
Trying to leverage what @KtorZ did for the TUI, generating keys, addresses and specific UTXO set, but working with the API to eg. select a random UTXO from a list of UTXOs is very annoying
Trying another approach: Start with a single UTXO, a single key pair, and then consume the UTXO producing the same UTXOs, sending to a new key pair. It seems to work but I still have errors when running the property test, seems like the values are not correctly conserved...
Turns out the error was I was trying to apply the transactions in the wrong order using foldr instead of foldl'
Now I get another error with some UTXO being too large, trying to trim that down to some manageable size.

Finally managed to run a benchmark with a constant UTXO set size:

Writing results to: test-bench-constant/results.csv
Confirmed txs: 18000
Average confirmation time: 6.724038065066666e-2
Confirmed below 1 sec: 100.0%

Catchup w/ Team

What's missing for demo?

Sending native assets
Showing the 3 clients opened in the head
Explaining use of test faucet + use of addresses representing each party
2 lists of things in the Head: Connected nodes + hydra keys in hydra heads
- use address instead of peer to send UTXO to? + alias
We send only a part of the UTXO
- Pb: There are 4 addresses instead of 3
Show list of head participants using Hydra public keys w/ colors
Showing the final UTXO set after closing
Identifying parties, passsing key pair to the exec for the head
Initial addressses for the head are generated from the port only (which are all the same which is weird...)

SN - Working on TUI

Picked up on the "aliased Party" branch and implemented an optionally aliased Party type
- Ord instance problems + tests
- Show instance which prefixes alias@ and uses hex encoding on the verification key
- ToSON dropping alias when Nothing
Have the node alias me and otherParties when loading keys from file
- Heuristic to only alias when files start with a letter
- Leaves Party in end-to-end tests unaliased (otherwise we would need to carry around the names in our instrumentation)
Show the list of Party using the new format in the TUI now
Noticed that the TUI was refactored into data State = State { ..., clientState :: ClientState} with data ClientState = Connected | Disconnected, which is exactly what I avoided initially -> need to discuss this
How to get the me :: Party on the client side?
- Thought about different ways to do this:
  - Simply add a me :: Party to ReadyToCommit {parties :: Set Party, me :: Party} -> this provides us with enough info at the right point in time, which we can keep around while the head is open
  - Add a new ServerOutput NodeInfo { me :: Party } .. maybe also including a version :: Version etc.
    - This could be sent as some kind of "greeting" to each connected client as the first output in history
    - A corresponding GetNodeInfo input which fetches this information
- Intially thought of option one being easiest, but changing ReadyToCommit to be node-specific is a PITA for tests
- Option 2 with the latched greeting was easy to achieve and has a taste of old-school protocols
- API server tests are obviously failing, could fix most.. but one was puzzling me and no time to fix now (sorry)
Finally, the client-side can now store the public key of the connected node and we can generate Utxos and addresses from that, instead of a port or other info
- This essentially means we would want to Party -> CardanoKeyPair and Party -> Utxo
- Moved the "fauceting" and "credential conversion" into Hydra.Ledger.Cardano as it requires now to deconstruct the Party's VerKeyMockDSIGN to get our hands on a suitable seed for crafting cardano credentials -> hydra-node seemed to be a better place for this than the TUI
Handling the Greeting was trivial, but the fact that me :: Mabye Party is messing up quite a lot of the code
Some final more polishing / experiments in highlighting "us" and "own addresses"

2021-09-12

AB - More analysis of benchmark run

Extracting siez of UTXO set from reduced log:

$ cat log.1 | grep HeadIsFinalized | jq '.message.effect.serverOutput.utxo | keys' | wc -l

Extracting length of processed snapshots:

cat log.1 | grep ReqSn | grep NetworkEvent | grep ProcessedEvent | jq -r '[.timestamp, (.message.event.message.transactions| length)] | @csv' | tr -d \" > snapshot-length.csv

Extracting timings for ReqSn start and stop:

$ cat log.1 | grep ReqSn | grep NetworkEvent | grep 'ProcessingEvent' | jq -r '[.message.event.message.snapshotNumber, .timestamp] | @csv' | tr -d \" > snapshot-processing.csv
$ cat log.1 | grep ReqSn | grep NetworkEvent | grep 'ProcessedEvent' | jq -r '[.message.event.message.snapshotNumber, .timestamp] | @csv' | tr -d \" > snapshot-processed.csv
$ join -t ',' snapshot-processing.csv snapshot-processed.csv > snapshot-time.csv

Extracting times for `AckSn`.

The idea is to compute the time span between the first processing event for an ack and the last processed:

$ for i in {1..6}; do cat log.1 | grep AckSn | grep NetworkEvent | grep ProcessingEvent | jq -r "select (.message.event.message.party == $i) | [.message.event.message.snapshotNumber, .timestamp] | @csv" | tr -d \" > processing-ack-$i.csv ; done
$ for i in {1..6}; do cat log.1 | grep AckSn | grep NetworkEvent | grep ProcessedEvent | jq -r "select (.message.event.message.party == $i) | [.message.event.message.snapshotNumber, .timestamp] | @csv" | tr -d \" > processed-ack-$i.csv ; done
join -t ',' processing-ack-1.csv processing-ack-2.csv | join -t ',' - processing-ack-3.csv

What I want is the ack-sn.csv file that looks like:

2021-09-10T16:33:28.664Z,118
2021-09-10T16:33:28.912Z,194
2021-09-10T16:33:29.141Z,136
2021-09-10T16:33:29.347Z,108
2021-09-10T16:33:29.598Z,149
2021-09-10T16:33:29.816Z,59
2021-09-10T16:33:30.066Z,145
2021-09-10T16:33:30.424Z,349
2021-09-10T16:33:30.629Z,186
2021-09-10T16:33:30.867Z,130

To produce it from the logs was somewhat painful though:

Aggregate timings for AckSn events processed for each node, grouped by ProcessingEvent and ProcessedEvent, in a JSON array,
Load the array (in my case using nodejs) and extract the minimum of start time and the maximum of stop time,
Compute the different between the two and produce a file with x value being the end time, and y value being the total latency for Acking a snapshot.

Joining all AckSn timings:

join -t ',' processed-ack-1.csv processed-ack-2.csv | join -t ',' - processed-ack-3.csv | join -t ',' - processed-ack-4.csv | join -t ',' - processed-ack-5.csv| join -t ',' - processed-ack-6.csv > processed-ack-all.csv
join -t ',' processing-ack-1.csv processing-ack-2.csv | join -t ',' - processing-ack-3.csv | join -t ',' - processing-ack-4.csv | join -t ',' - processing-ack-5.csv| join -t ',' - processing-ack-6.csv > processing-ack-all.csv
join -t ',' processing-ack-all.csv processed-ack-all.csv > ack-all.csv

Then I manually transformed this file back to JSON in Emacs :( before processing its content in node. Here is the JS script to produce a JSON that contains the above data:

const fs = require('fs');
const ack = fs.readFileSync('ack-sn.json');
const d = JSON.parse(ack);
const ts = d.map(arr => [arr[0]].concat(arr.slice(1).map(d => new Date(d).getTime())));
const minmax = ts.map(arr => [arr[0],Math.min(...arr.slice(1,6)), Math.max(...arr.slice(7))]);
const sntime = minmax.map(arr => [new Date(arr[2]),arr[2] - arr[1]]);
fs.writeFileSync('ack-sn.json',JSON.stringify(sntime));

Then producing a CSV for plotting with gnuplot amounts to:

cat ack-sn.json | jq -rc '.[] | @csv' | tr -d \" > ack-sn.csv

gnuplot is a bit quirky to work with if the data is not in the right format, doing computation and transformations on data is awkward, eg. like computing the moving average for receiving all AckSn in the following transcript:

set xdata time
set format x "%H:%M:%S"
set xtics out rotate
set title 'Snapshot acknowledgement time (ms) - 1389 snapshots'
samples(x) = $0 > 9 ? 10 : ($0+1)
avg10(x) = (shift10(x), (back1+back2+back3+back4+back5+back6+back7+back8+back9+back10)/samples($0))
shift10(x) = (back10 = back9, back9 = back8, back8 = back7, back7 = back6, back6 = back5, back5 = back4, back4 = back3, back3 = back2, back2 = back1, back1 = x)
init(x) = (back1 = back2 = back3 = back4 = back5 = back6 = back7 = back8 = back9 = back10 = sum = 0)
plot sum =init(0),\
 'ack-sn.csv' u (timecolumn(1,"%Y-%m-%dT%H:%M:%SZ")):($2)  w l t 'AckSn Processing time (ms)', \
 'ack-sn.csv' u (timecolumn(1,"%Y-%m-%dT%H:%M:%SZ")):(avg10($2)) w l t 'Moving average (10 points)', \
 'ack-sn.csv' u (timecolumn(1,"%Y-%m-%dT%H:%M:%SZ")):(sum = sum + $2, sum/($0+1)) w l t 'Cumulative mean'

Plotting moving average with gnuplot is clunky: http://skuld.bmsc.washington.edu/~merritt/gnuplot/canvas_demos/running_avg.html

What I would like to do tomorrow is to map the UTXO set over time, checking if there's a correlation between the UTXO set and the time it takes to produce snapshots after a while. This does not seem to be the case as we can see the ReqSn processing time does not significantly change over time. Note this could also be tested experimentally by running a benchmark with a synthetic transaction sequence that does not increase the UTXO size, something like ping-pong style transactions which keep sending the same amount back and forth.

2021-09-10

Plan for today:

MB: Got simulation work to do for researchers, fixing "bug" in UTXO display and polishing TUI
AB: Add more concurrency and parallelism to benchmark (run n clients per node, run more than 3 nodes)

AB Solo

Going to work on having multiple clients per node first, so that we can see the effect of submitting parallel non conflicting transactions to same node

Also need to find a way to make the contestationPeriod dynamic becuase otherwise the test can fail as it timeouts waiting for finalisation
There is a single registry for confirmation time of all transactions on all sequences but this does really make sense and increases contention as each thread compete for the same piece of data -> split registry among all threads and combine them at end of run

Adding the ability to increase concurrency above number of nodes, eg. have more than one client per node

Solution is rather simple: Just extract the startConnect function to the toplevel, it's the one responsible for opening the connection to the hydra node.

Introduce withCluster function that "folds" withHydraNode over an arbitrary non-zero expected number of nodes

Got an error at startup so it seems there's some 3 hardcoded somewhere...

I can see the 4 hydra nodes starting up but the test fails on an expectation:

      waitFor... timeout!
         nodeId:           4
         expected:         {"parties":[1,2,3,4],"tag":"ReadyToCommit"}

         seen messages:    {"parties":[4,1,2,3],"tag":"ReadyToCommit"}

Turning the list of parties into a Set does the trick -> ordering is guaranteed as Set needs Ord instances and maintain deterministic ordering of nodes

Running a benchmark with 6 clients, 4 nodes gives me:

Confirmed txs: 481
Average confirmation time: 0.4695817474137214
Confirmed below 1 sec: 100.0%

With 3 nodes:

Confirmed txs: 571
Average confirmation time: 0.4129197640157618
Confirmed below 1 sec: 100.0%

with 2 nodes:

Confirmed txs: 533
Average confirmation time: 0.3144588079249531
Confirmed below 1 sec: 100.0%

Spent some time fixing a mistake I pushed to master: Changed the type of parties committing from [Party] to Set Party and this had ripples I did not notice

Trying to run a 10 nodes simulation => waitFor startup timeouts, need to increase it

Trying with smaller number of nodes, say 6

Got a 6-nodes benchmark running, I guess startup timeout should be increased to something like 20 seconds per node
Benchmark still running after 1.5 hours, it's really hard to say how much has been done -> need better reporting than dumping transaction ideas and snapshots

Trying to run a benchmark with more "reasonable" values: 10 concurrent clients, 6 nodes, scaling factor of 50, and added some progress report:

Client 5 (node 2): 17/249 (6.83)
Client 1 (node 6): 13/1366 (0.95)
Client 3 (node 4): 15/1112 (1.35)
Client 4 (node 3): 14/642 (2.18)
Client 2 (node 5): 14/901 (1.55)
Client 6 (node 1): 17/1081 (1.57)
Client 7 (node 6): 13/361 (3.60)
Client 8 (node 5): 14/154 (9.09)
Client 9 (node 4): 15/571 (2.63)
Client 10 (node 3): 14/1452 (0.96)

Also set the number of transactions to be the same for all clients, otherwise we might get artifacts in the numbers we extract from the run if some clients stop sending transactions before others.

2021-09-09

Team Sync

Benchmark TODOs:

Run the benchmark over some period of time to get steady state behaviour => trim down the logs?
Add validation time and throughput
Add metrics internal to nodes (event queue size, confirmation time)
Increase the number of nodes
Spread the load on different clients
We have the worst case => increase the parallelism of transactions generated

TUI TODOs:

commit from faucet
new transaction
UTXO set visualisation

Note: We can easily DoS nodes spamming them with invalid transactions, as demonstrated by performance drop when submitting a lot of invalid transactions. We need some rate limiting and/or caching of validation to reduce the load on the nodes

AB Solo

Enhancing plot:

Looking at https://torbiak.com/post/histogram_gnuplot_vs_matplotlib/#gnuplot to plot throughput of txs in benchmark
Trying to plot a histogram of transactions throughput, this link is actually simpler and works: http://www.gnuplotting.org/calculating-histograms/
Managed to plot confirmation time and throughput on the same graph. Obviously confirmation time follows an inverse trend with throughput which follows from Little's Law

Now adding more parallelism in the data set so that we can observe how the nodes behave with non conflicting transactions ste

First step is generating n non-conflicting transaction sequences that will be each handle by one client thread sending to one node
Got caught in a rabbit hole transforming the way we store and pass data to EndToEnd in order to introduce potential concurrency Added a parameter to the benchmark and created a Dataset type to map a Uxto set and the sequence of transactions generated for it, so that we can have multiple sequences and distribute them among many clients
Implemented parallel submission and confirmation of transactions, with one thread per generated dataset and the threads distributed over the various clients available The run deadlocks pretty quickly and TxInvalid messages show up which shouldn't be the case: It seems the first transaction submitted in the second thread is not the first transaction from dataset?
- Of course: There is a single submission queue for all the threads -> removing it from the registry and creating it in each thread should work better, Each pair of submitter/confirmer now has its own queue but I am still running into a deadlock with concurrency > 2
Trying to unblock the submitter when the transaction is invalid, so this works in the sense that the process does not deadlock but transactions stay invalid "forever"
I have got the explanation: All the UTXO sets generated have the same TxId as reference, so the transaction consumes the wrong txout and everything goes awry => Need to make sure the UTXO sets are completely different...
- Updated genUtxo function to use an arbitrary genesis TxId which hopefully should be fine?
When trying to increase concurrency level over 3, eg. one client per node, I am running into troubles of course because all clients for same node share a connection to the node which does not work! => Need to make sure each submitter has its own connnection, which might be slightly annoying given the way we structured HydraNode

All nodes are now busy, each with its own dedicated client:

  28553 curry     20   0 1024.8g  95716  23184 S 101.0   0.6   0:56.91 hydra-node
  28585 curry     20   0 1024.8g  85788  23120 S  98.7   0.5   0:55.95 hydra-node
  28574 curry     20   0 1024.8g  92324  23104 S  95.0   0.6   0:57.10 hydra-node

Tip: Counting the number of transcations in a dataset:

cat bench-parallel-test-2/dataset.json | jq -c .[].transactionsSequence[].id | wc -l

Just realised it's not possible to run 2 benchmarks in parallel: I killed a long running one because I was oblivious to this :( Now adding validation time to the plot so that we can check how this evolves over time. I suspect validation time amounts for a significant fraction of the time spent processing a transaction as the size of UTXO set grows

It's slightly annoying but due to the way the reader is coded for DiffTime one has to add an s suffix to the numnber of secondes for timeout:

$ cabal bench local-cluster --benchmark-options '--scaling-factor 20 --concurrency 3 --output-directory bench-test --timeout 1000s'

Trying to run the benchmark with nullTracer just to make sure the output of the logs is not impacting the performance of the nodes significantly: I suspect JSON (de)serialisation might contribute significantly to bad performance of nodes

Final plot of the day, with a null tracer:

2021-09-08

Goal for today:

Write the snapshot decider "by the book", eg. independently from the rest of the protocol and as described by the paper
Remove the current ReqSn production and wire the newSn

Removing SnapshotStrategy which is really not used

Tried to wire the new snapshot decision logic into the Node, simply enhancing the effects with a ReqSn when we decide we should snapshot:

Lot of tests are failing/hanging now -> Trying to debug the tests one by one as we have a lot of them around
Also removed RequestedSnapshot from the ADT
Removed all snapshot emission tests from HeadLogicSpec -> Those are now covered by SnapshotStrategySpec

NodeSpec tests are failing with rather obvious reasons => We don't want to emit a ReqSn if seenTxs is empty while we ShouldSnapshot

Problem now is that we emit a ReqSn upon every ReqTx so one NodeSpec test fails because we add more ReqSn than expected

emitSnaapshot now change the state so that we don't emit multiple snapshots while processing a batch of events
We also make newSn work on CoordinatedState directly instead of HeadState to remove some cases

Got all tests to pass but benchmarks are still livelocking so something's still wrong in the snapshotting/confirmation logic:

AB going to finish log-filter program to be abel to analyse more easily logs
MB to troubleshoot issues with bench

AB Solo

Back working on log-filter process, got a an issue with transforming the list of confirmedTransactions, what I get is not a list of ids but a single ids so I assume my traversal is not working

Interesting question is: How to test this log filter and ensure it stays in sync with the structure of the log entries?

MB suggests to generate random logs, and check compression achieved by log-filter against some expected threshold
Wrote unit tests to assert LogFilter properly transforms an Array of transactions into an Array of TxIds. Wasted 10 minutes troubleshooting the test which failed for the wrong reason because a keyword used was invalid => test is useful!

Impact of log-filter, before:

-rw-rw-r-- 1 curry curry 184567326 Sep  8 10:12 /run/user/1001/bench-67fb0dac6531c6bf/1

after:

-rw-rw-r--  1 curry curry 5595672 Sep  8 13:40 filtered

2021-09-07

AB Solo

Trying to extract a test case from the failed benchmark run of yesterday

Started writing a "log-filter" program whose purpose is to filter and trim down the logs, removing unessential details at this stage in order to better understand the flow of transactions and messages It currently replaces full tx with txid and removes long map of UTXO from a few messages, still some work to do to have it usable and remove most of the noise of the logs Note that it also removes all log entries which are not from the NOde (eg. ouroboros network messages) I have used lens-aeson but unsure if I am really harvesting the full power of the library and lenses in general, but it seems to work fine so far

16:31

Discussing issues about snapshots with MB

Our current approach seems to be flawed: Its the 3rd time we are having issues with snapshots while we are trying to produce them synchronously with other events
In the original formulation of the paper, the decision to create a new snapshot is independent from processing of txs and snapshots, and has been translated in hydra-sim as a SnapshotStrategy that drives a snapshot thread which injects the ReqSn

Plans for tomorrow:

Write a better tester for our protocol, possibly using some kind of events generator. One approach would be to consider individual transactions "validation journey" and then compose the needed events/messages to produce an arbitrary interleaving and test that
Another solution would be to just generate randomly possible messages coming from the network, eg. consider the network abstractly without taking into account the other nodes and just observe messages coming and act on them. Then we could generate sequence of messages from the point of view of the network to try to trigger unexpected behaviour and interleaving (if it's sound unclear it's because it is...)
emove the snapshotting logic from the HeadLogic's function update and write a Snapshotter independently, with proper tests and specified behaviour, then plug it in the node

2021-09-06

AB Solo

Recreating Dev VM, trying to see if I could get something faster to improve turnaround time for compilation, but seems like C2 machines are the fastest available CPU wise

I have managed to get autotest.sh script work again, thus will see over the next weeks/month whether this has an impact on the cabal time. I expect it should because I should be able to run the tests using autotest.sh rather than cabal. Ideally I would need a small wrapper program to tally timings between compilation cycles withing ghcid, which means I shuold probably look into ghcid's source code

When compiling master, I am having problems with missing git reference from dependencies, which are addressed by SN's PR:

$ cabal build all & cabal test all
[1] 5621
fatal: reference is not a tree: 09433fe537a4ab57df19e70be309c48d832f6576
fatal: reference is not a tree: 09433fe537a4ab57df19e70be309c48d832f6576

Working on merging this PR, then trying to fix compiler errors stemming from upgrade in dependencies.

ContractTest is failing with

                          Contract instance stopped with error: ConstraintResolutionError (DatumNotFound ca54c8836c475a77c6914b4fd598080acadb0f0067778773484d2c12ae7dc756)
        src/Plutus/Contract/Test.hs:241:
        Init > Commit > CollectCom: CollectCom is not allowed when not all parties have committed

The collectCom test is expected to fail but it seems this error is not expected? Actually it fails in the commit call.

Rebasing update deps PR on master before pushing, contract tests are now pending but we should revisit Plutus anyway so 🤷

Ensemble Session

Deciding to work on bench as MB has not touched it a lot and it's broken. Writing a function to submit transactions in parallel with confirmation

When we get a TxInvalid we want to wait for the next snapshot confirmed to resubmit the tarnsaction

Turning the Registry into a record that contains a queue of txs to resubmit

Slight problem for pairing: AB has not the missing reference from master anymore so cannot work on it, but MB has not built the update-deps branch so has no dependencies and the build is not finished yet so we don't have a cache available -> we don't rotate for the moment and wait for PR to finish building

MB got troubles compiling after merge of update deps PR: Problems with happy dependency from plutus-core -> program could not be found ??!!

There were 2 different installations of happy and alex, removing one of them fixed the issue

We now put transactions in aqueue and then repush them when they are invalid, then we resubmit a transction only if the sanpshot number has changed

Our transaction submitter gets stuck reenqueueing transactions and never stops, which hogs the process at 100% CPU
Trying to simplify code by extracting a "decision function" running in STM that returns an Outcome which says what to do with the function
Got a successful run with 5 snapshots and 10 txs so I suspect there is a race condition

I think the snapshot process "loses" transactions:

A tx received through a ReqTx should be snapshotted by node 2 which will be the leader, but there is still a snapshot in progress so no new snapshot is started
The snapshot is emitted, but then no new tx can be submitted because of the "blocked tx" which is not confirmed -> All subsequent txs that depend on it will fail to submit
As no new tx can be submitted, no new ReqTx is produced which prevents the production of a new snapshot

This is exactly the problem for which we introduced DoSnapshot initially: We cannot link the production of a new snapshot to ReqTx because if we don't produce it immediately, and there is no more ReqTx to trigger the check, the transaction gets "lost" and never confirmed.

2021-09-03

Updating project dependencies (SN)

Originally not planned to do this today, but I realized that our master cannot be built from scratch in a fresh working copy because we seem to be referring a now gone cardano-ledger-specs commit (we took that one from the plutus we tracked so far).

So I set out to update the source-repository-package in cabal.project to match the most recent plutus master.

Using a nix-shell -A cabalOnly this was quite rapid to set up and then later materialize into a "proper" haskell.nix shell.

Three API changes seem to have happened:

utxoAt -> utxosAt renamed and does return a ChainIndexTxOut now instead of a type UtxoMap = Map TxOutRef TxOutTx. Was not much of a problem to us, but I changed the code slightly to use more often TxOut instead of the Tx-referencing TxOutTx.
ScriptLookups is now taking ChainIndexTxOut for slTxOutputs which can be created from txOut using fromTxOut :: ChainIndexTxOut -> Maybe TxOut and slOtherScripts is now taking ValidatorHash instead of addresses, this was also easy to change.
nextTransactionsAt seem to have been deleted! This is more of a problem and was not yet solved. Was it replaced? None of the functions in Request.hs seem to be doing this.

Until the last API change has been fixed, these things can be found on this branch: https://github.com/input-output-hk/hydra-poc/tree/ch1bo/update-deps

AB Solo

Having a look at glean from FB, a tool to explore code bases as advertised on the web site

Looks like there are only indexers (eg. parsers that generate predicates and facts from source code) for JS/Flow and Hack, the 2 "proprietary" languages owned by FB Shouldn't be too hard to make some for Haskell code based on HLS provided tooling?

Ensemble Session

Discussing in details the Hydra demo and improvements to TUI we want to make

Use UTxO and TX, do not try to abstract away the details of the ledger (eg. use values only)
Show the TxRef + the address the values are sent to by the UTxO
- Detail: How do you know which output is owned by whom? Can we derive the pubkey of owner from the address used?
- when using Ledger generation machinery, keys are set as part of Constants, we could use the same key for everyone
How do you create a transaction?
- Select from your unspent outputs
- Accumulate available values from selected outputs
- One dialog screen per step: Select inputs, create outputs, confirm

Planning and aligning on work to do in the next couple of weeks:

Flesh out the TUI
Adding metrics to the benchmark and making it more useful (run with more nodes and different configurations) Also gather scrape internal metrics from each nodes and output them as part of the benchmark's results

Using the innovation/learning budget to see how to use Plutus w/o PAB => Run a spike to craft txs by hand and use compiled validators

Updated Logbook

2021-09-02

Ensemble Session

Worked on 2 PRs still in flight: Log API documentation and schema testing, and round-robin leader "election" in the protocol.

Log API Docs

Design discussion about the prop_specIsComplete property is defined and how to use it:

It currently takes a SpecificationSelector which is really a lens selecting some part of the provided schema
This lens should point to a schema fragment which is an array of objects having a title field, which we use to compare to the list of constructors extracted from arbitrary data and find discrepancies
Unfortunately, the lens or some other kind of expression is needed, and not only the name of the field we are interested in because of differences in structures in the schemas
We should document this test and property a bit more as they are not really obvious, and also help users by letting the common parts out of the provided lens (the part extracting a list of titles from a Value)

There was also a meta-discussion about whether or not it's ok for someone to add commits to someone else's PR to "fix" it. Seems like we agree this is all fine and good as long as the changes are motivated, but then one could ask: Why not simply add more commits on master directly, either pairing or ensembling, or discussing them at start of ensemble session?

Round-robin Leader

Worked together on the PR to "complete" it as we agreed the test written in the HeadLogicSpec was not satisfying as it is: It would better fit as a NodeSpec test which is better suited to express the expected output of a Node given a sequence of events, without having to care about the details of the state.

Writing the NodeSpec test was not straightforward but led us to uncover an issue:

If a node receives a ReqTx while a snapshot is being acknowledged, but before it's confirmed, and this node would be the next leader, then we should trigger a snapshot emission otherwise we run the risk of losing the transaction if no other tx is submitted => no snapshot is triggered until another transaction appears.
We added a unit test in HeadLogic to ensure a leader emits a ReqSn when its turn comes

2021-09-01

Ensemble Session

Discussing PRs in flight:

https://github.com/input-output-hk/hydra-poc/pull/69:
- Need to move Enveloppe type to Logging module as it's use in tests makes api doc and code inconsistent
- There's something fishy going on with the tests as they should not passs because we don't have the files in the data-files, plus there's a namespace which should be used
Implement ADTArbitrary as orphan instances in tests to make sure we cover constructors in aeson's roundrtip testse
https://github.com/input-output-hk/hydra-poc/pull/70 => merging, using gnuplot is fine and simple enough
- SN made some changes to make the script more portable => use /usr/bin/env to find bash executable
https://github.com/input-output-hk/hydra-poc/pull/72
- Discussion about the use of contestationDeadline in the OnCloseTx
- Seems like we need the deadline anyway in various places, not only in the client
- We store the "on-chain" transactions in the mock chain because we want to calculate time at posting time and not at consumption time

Trying to display the IP address of connected hosts in the TUI

The Heartbeat answers a PArty but we really need a Host. We can simply encode the Host in the Data constructor of the Heartbeat.
Got into troubles with the APISpec saying Committed is wrong -> we had a comment saying it should be a Party but it was really a Peer

Engineering Meeting

Demoing the TUI:

Mock chain is confusing name as it's already used by Plutus -> Stub chain or Proxy chain
Make it clear what the limits of the demo are, what's available or not (crypto primitives, main chain, contracts...)
Would be great to have PAB with (actual) mock-chain as its release is due mid-Sep but seems unrealistic

Demoing the benchmark:

Does not work out-of-the-box as we made some breaking changes

AB Solo Programming

Puzzled by the behaviour of the APISpec and LoggingSpec, esp. how the namespace is used to check some properties The classify function only works on a specific structure, namely one where we have the following tree from the root:

properties:
  <namespace>:
    type: array
    items:
      oneOf:
      - title: <property>
      ...

but the utxo and txs are defined as:

properties:
  utxo:
    type: array
    items:
      $ref: "#/definitions/Utxo"

  txs:
    type: array
    items:
      $ref: "#/definitions/Transaction"

and of course logs is not defined anywhere.

The intent of the property is pretty clear, namely to check the completeness of the specification against generated values but it is very inflexible, tied to the precise structure of the api.yaml file and not suited for anything but having top-level properties with a specific sturcture

Rewriting the property to accept some arbitrary selector which makes it possible to adapt to specific structure of schema and tested data type.

August 2021

2021-08-31

SN Solo work on TUI

Current goal: be able to iterate the full life-cycle of a Head
- but keep commands static and only later make the client aware which is possible when.
Committing some value in Hydra-TUI could act as some kind of "faucet", but I opted for simply committing mempty
- Maybe we could have a brick dialog to ask users how much ADA (or other assets) they would like to commit?
Adding command and server output handlers was really easy and quickly done. Although I refrained from rendering Utxo sets.
When head was closed, client does not really know when the contestation period ends and this felt very unresponsive
- The ServerOutput should provide a point in time when this is (roughly) ends
- The UI can then show a countdown or so
- So when the HeadIsClosed should hold a contestationDeadline, the OnChainTx handling in Hydra.HeadLogic needs to know the current time.. or is given the deadline as well.
- The latter seems to be easier as the chain client would also know best about "what time really means" on the respective blockchain

2021-08-30

AB Solo Programming

Completing the Transaction description in the api.yaml

...And then add some tests for it

Got failures in the APISpec tests, unsurprisingly. Seems like AuxiliaryData produces a None when not present which is unexpected?
Do StrictMaybe fields whose value is SNothing generates None instead of null? Strict maybe's ToJSON instance is defined here: https://github.com/input-output-hk/cardano-base/blob/eb58eebc16ee898980c83bc325ab37a2c77b2414/strict-containers/src/Data/Maybe/Strict.hs#L91 and it's defined in terms of Maybes instance which must produce a null if Nothing: https://github.com/haskell/aeson/blob/master/src/Data/Aeson/Types/ToJSON.hs#L1244
Trying to generate a transaction and check manually the validity against api.json => Surprisingly, generated transaction is valid against schema. Trying to generate more but if I try to validate the generated tx from encode it works fine.
Trying to save the input file to see if there's a discrepancy. The list of CardanoTx in the temporary directory is empty, seems like its content is not correctly updated upon shrinks perhaps?
Saw https://github.com/Julian/jsonschema/issues/623: Json schema outputs the cryptic None: None is not of type 'string' when a string field has a null value, which is really not obvious from the output. Seems like a PR fixed it but unsure if we have it in our version
=> Found the solution to allow null values for auxiliaryData and auxiliaryDataHash:
- For the hash, it's simply an enumeration of possible types:
```
auxiliaryDataHash:
  type: [ "string", "null"]
  description: >-
    Hex-encoding of the hash of auxiliary data section of the transactions.
  examples:
  - "9b258583229a324c3021d036e83f3c1e69ca4a586a91fad0bc9e4ce79f7411e0"
```
- For the data, I had to resort to use oneOf keyword to either have a Cbor value or null:
```
auxiliaryData:
  description: >-
    Hex-encoding of CBOR encoding of auxiliary data attached to this transaction. Can be null if
    there's no auxiliary data
  oneOf:
  - type: "null"
  - $ref: "#/definitions/Cbor"
```
Got bitten by the fact jsonschema is implemented in Python and actually relies on a mapping between the JSON schema specification and the Python type system. The value null is mapped to the value and type (?) None in Python leading to some cryptic error messages. I am mildly convinced by the model-first approach especially if the tooling trips us. Also, tests are somewhat intricate as we need to pass through a layer of transformation from YAML to JSON, then call an external process to validate a schema.

Added documentation for log entries.

While working on adding a validation test of log entries against JSON schema, I am hitting a snag: Importing both the Logging module and the Cardano module leads to conflicting JSON instances on UTxO, which has a JSON instance defined in Cardano.Api.Orphans.

Instead of custom types for string encoding, we should use media-types to represent various encoded pieces of data: https://json-schema.org/understanding-json-schema/reference/non_json_data.html

2021-08-27

Pair Programming

Start documenting in more details the structure of the Cardano transactions as exposed by Hydra node API.

Got a bit puzzled by how to represent dynamic keys which are needed for assets' representation.

Playing with better formatting of errors in the benchmark, using hspec => We can use runSpec to run the bench, making it a Spec simply using it

Goals for today:

Validate NewTx against confirmed snapshot and not submit ReqTx if it fails
Let the client (benchmark) handle resubmission

We simply drop the transaction if it cannot be submitted by the benchmark => if it happens early then a lot of transactions will be dropped later

We see the error message for the TxInvalid and the benchmark keeps running but we don't see any snapshot confirmed
Node 1 sents a AckSn for its signature but it does not get processsed, Seems like we don't process our own ReqSn ??
We need to improve our tooling for exploring the logs

Trying to modify wait so that we don't throw away messages => we can simply consume messages and dump them

We still don't see snapshot confirmed messages
Just happens we forgot to loop in the TxInvalid case 🤦

Benchmark succeeds but only 16 out of 526 transactions suceeded

When having an InvalidTx we simply resubmit it. Resubmitting transaction immediately hits hard on the node, so trying to increase the delay between initial submission => much better, see a lot of snapshots

We managed to run benchmark to completion with all the transactions by delaying submission time => now plotting and interpreting the results

Week's progress:

We are getting closer to a real ledger, no real crypto but a real ledger
It's not about TPS but about latency => we need to plot distribution of latencies providing some kind of guarantees
We also want to test with more nodes, seeing how the cluster behaves with more participants Load testing = saturate resources (CPU/Memory) and observe response time => need to be able to tune throughput to saturate the nodes
Need to trim down the logs:
- remove some network logs which are very verbose ? => need to confirm if the network logs are actually a problem
- do not log full event/effect, log the end events/effects using ids
- Logs are written in a tmpfs now, we should parameterize it to be able to store more of them. tmpfs is limited in size. Later on, use some cloud storage or log ingestion system

There are infinitely many possibilities with the logs, what do we really need now?

Confirmation of simulation?
Is latency increasing when adding more nodes in an exponential/quadratic way?
Keep a transaction set around that we can use as reference, rather than generating one on the fly every time. We need 2 different tools, we can have 2-3 different scenarios to becnhmark
Extension to load testser: make the number of nodes dynamic, submit transactrions to multiple nodes instead of a single one
We also want to check CPU/RAM load of each node to ensure they are saturaed (also network bandwidth?)

Solo SN

Do some cleanup work and make tests green again
Debugging APISpec failures is somehow possible by temporarily adding more specs for sub-types and corresponding top-level properties to the schema, e.g. "utxo"

  specify "Utxo" $ \(specs, tmp) ->
    property $ prop_validateToJSON @(Utxo CardanoTx) specs "utxo" (tmp </> "Utxo")

and

  utxo:
    type: array
    items:
      $ref: "#/definitions/Utxo"
    additionalItems: false

I kept these ☝️ entries for Utxo CardanoTx and CardanoTx to differentiate test failures
Realized that a HUnitFailure is not properly formatted in BehaviorSpec -> red bin
Found the bug in NewTx for the failing BehaviorSpec:

case canApply ledger utxo tx of
 Valid -> \[ClientEffect $ TxValid tx, NetworkEffect $ ReqTx party tx\]
 Invalid err -> \[ClientEffect $ TxInvalid{utxo = utxo, transaction = tx, validationError = err}\]]]

We had been validating against the confirmed ledger, but not reporting it being invalid using the seenUTxo, so the expectation was wrong.

This now also requires the test to wait for a SnapshotConfirmed before re-submitting the secondTx.

And the benchmark should likely do the same.

2021-08-26

ToJSON should not contain empty objects, e.g. assets in a value
- We did remove it for the assets, but there are others
- Maybe tackle this when also documenting the API format for txs
Is the benchmark really a load-test?
- Using hspec / runSpec would also handle and render HUnitFailures properly
- This turns out to quite simple: runSpec (it "some context" action) defaultConfig >>= evaluateSummary (although this kills the process)
Created a log capture template to easily capture entries like this

(setq org-capture-templates
        '(("l" "Log" entry
           (file+headline org-default-notes-file "Log")
           "* %? %T\n%a"))
           ;; other templates
           )

Pair Programming

We start writing a "missing" unit test for broacast-to-self in NodeSpec but we realise this is not possible as it's not directly observable -> just remove the comment about the implementation details and rely on indirect observations

We switch to complete serialisation of cardano transactions, working on adding minted values to the JSON format

Added mint but it seems transactions are not generated with minted value in genTx for Mary => check what's going on to Alonzo

Runing the benchmark we got errors in the validation of transactions. First error is about wrong script witness, then other errors about invalid key witnesses. Some transactions are valid, and we see 18 being processed as TxSeen, with 12 reported as TxValid

Error reporting in the benchmark is painful:

we should stop as soon as we get a TxInvalid report
We are missing some information when we get a validation error, namely the details of the transaction that failed and the UTxo set to which the transaction was applied -> add it to TxInvalid and then we can use it as regression tests when we get a failure

Adding unit test harvesting output from the Benchmark (need some love to be more usable), got the following failures:

ApplyTxError [UtxowFailure (MissingScriptWitnessesUTXOW (fromList [])),
 UtxowFailure (UtxoFailure (ValueNotConservedUTxO (Value 55938162 (fromList [])) (Value 107981334 (fromList [])))),
 UtxowFailure (UtxoFailure (BadInputsUTxO (fromList [TxInCompact (TxId {_unTxId = SafeHash \"d2635419a791eef0ba694bbcb66de7c7e76a865a493e7d2cc46f5c6b1ecb7b8d\"}) 3])))]")

MissingScriptWitnessesUTXOW shows an empty difference between needed and provided

Looking at how transactions are generated, we replace the property test on single transactions with one on a sequence of transactions -> The property fails, reusing the example to check why it fails

Trying to shrink the examples we have -> works for txs but not for utxos because the shrinking is not done in relationship with the UTXO

How can we generate a valid sequence of transactions that then fails to validate against the very same UTXO set used fir generating the transactions. What is the shrinker for lists doing by default?

Answer: The reason applying several transactions at once vs. applying one by one fails is that we throw away the delegation pool state between each application when we generate them, we only keep the changed UXxO set. This is not the case in the applyTxsTransition which carries over both the UTxO set and the DPState, so transactions can now fail.

Not applying the transactions as a list but one by one works!

Seems there's something wrong with the way we are applying the transactions to the ledger?
=> changing the interfcae of the Ledger to only apply one transaction at a time
It's probable the failures we are seeing in the benchmark is caused by reordering of transactions?

We see more InvalidWitnessesUTXOW failure, with a list of public keys

Issue probably comes from serialisastion of keys (and possibly script) witnesses, investigating from a failure, using the Haskell show instance to compare how it's serialised to JSON And back.

Managed to have a WitVKey constructed as:

key :: CryptoFailable (WitVKey 'Witness StandardCrypto)
key = do
  pubkey <- publicKey @ByteString "\150\f[\192l\179\v\136%\182%\137 \STX\215\229up\228$V\157?F\151i\236\144\SI;e\142"
  sig <- signature @ByteString "\160r\240\221\191\ACK\221*\193\178>\SUB\USL\252HAID0\DC1\NUL~\131\&0\DLEy\188\187\197u\236\&8\201\175aNK\150\141\224\190\EM\141\129\STX\155\231\226N'E\DLEZ\249\131,ao\156\156\CANA\t"
  pure $ Cardano.WitVKey (VKey $ VerKeyEd25519DSIGN pubkey) (SignedDSIGN $ SigEd25519DSIGN sig)

Writing a ToJSON/FromJSON instance for WitVKwey, unpacking what we had in Witnesses before => The WitVKey is correctly encoded and decoded

Trying to chase the source of the error we are seeing from a failing transaction, deserialising the JSON witnesses and checking if they match the input transaction's => they do

However, this transaction contains minted values with a ScriptHash value as policy ID, could be the case that we get a missing script witness because we don't pass down the mints in the body?

We are minting the value

mint = Value 0 (fromList [(PolicyID {policyID = ScriptHash "42c7a014a4cd5537f64e5ae8ec7349db3d8603e16765dc37f8fb6e67"},fromList [("
yellow0",134392),("yellow5",368980)])])}

which matches a script hash provided as part of the witnesses. Could it be we get a witness error because there are too many witnesses? OR a script provided is not matched in the body of the transaction? => yes

The verification of script hashes checks that all script witnesses are used, and all required scripts are present
Added JSON instance for assets in Value so that we complete the TxBody, bar the PP updates

Still having an error in the benchmark

ApplyTxError [UtxowFailure (UtxoFailure (ValueNotConservedUTxO (Value 309051813 (fromList [])) (Value 333277734 (fromList [])))),UtxowFailure (UtxoFailure (BadInputsUTxO (fromList [TxInCompact (TxId {_unTxId = SafeHash \"5e2921b6a85257bcdb0f2c5e9d96f0e5ed7cf199a646ce4d5d8961fa939bb126\"}) 2])))]")

It's perfectly possible for a submitted transaction to not be applicable at NewTx time, but in the HeadLogic we still submit it as a ReqTx and report a TxInvalid to the client

In the original paper, transactions are required to apply to the confirmed UTxO set before being propagated

Changing the logic of NewTx to:

Validate transaction against confirmed set (from latest snapshot

Not send a ReqTx if the transaction does not apply The behaviorSpec test now fails

       FailureException (HUnitFailure (Just (SrcLoc {srcLocPackage = "main", srcLocModule = "Hydra.BehaviorSpec", srcLocFile = "test/Hydra/BehaviorSpec.hs", srcLocStartLine = 248, srcLocStartCol = 15, srcLocEndLine = 248, srcLocEndCol = 31})) (Reason "Test timed out after 1s seconds"))

which is to be expected

Two things for tomorrow/next session:

Change in the HeadLogic and adapt the tests
Adapt the benchmark to use hspec to run it so that we get better error reporting. It's not really a benchmark anyway, it's more a load test.

2021-08-25

Pair Programming

We should change the names of the witnesses fields:

scripts is fine
addresses -> keys (and break it down into a vkey and a signature part)

We discussed the perceived awkwardness of the NodeSpec test as it is now:

We should test at the boundaries and stub the effects, no more, and use the same createHydraNode function for all tests
This means we should move the BroadcastToSelf wrapper into the node and not configure it outside as it is an integral part of the behaviour of the node
Alternative would be to bake reinjection of Event from Effect into the HeadLogic protocol itself
Writing a test exposing Wait of some event: We injet out of order AckSn/ReqSn and expect to see our own AckSn But we see the AckSn with a weird signature... => Refactoring Node code to have a dedicated createHydraNode function that does the wrapping
There is a problem in our createHJydraNode function: withHeartbeat and withBroadcastToSelf require and produce NetworkComponents which contain both sending and receiving part of the network. Solution is to refactor createHydraNode to withHydraNode as a with pattern => Let's take a step back and not focus too much on this refactoring => keep the test pending for now and refactor later in solo mode

We then turn our attention towards benchmark errors again:

Fixing the test output to be less verbose so that we get a better error reporting Pondering if we should not write the messages into a file, but it makes things more complicatred, going for the simple thing of truncating the list of messages when displaying the error

We were waiting for snapshotConfirmed and we changed the serialisation format to use Generic which emits SnapshotConfirmed with a capital C 🤦

Scaling bench again we have the timeout on waiting for confirmations again, but with more snapshots produced => Extracting the snapshot number from the confirmed ones and reporting it, instead of throwing an error and giving information, let's report progress from the benchmark

We should use our Tracer to show progress in the benchmark

Investigating another failure, we see that node-3 gets a ReqSn that it drops because it is still processing another one. This case is actually incorrect, we should Wait if we get a ReqSn that could be valid in the future: Writing a unit test in HeadLogic:

Send 2 ReqSn in a row, should wait the 2nd one
Receiving a ReqSn which is from the past should fail

Added some tests to cover the case of snapshots "from the future", re-running the benchmarks works on AB's machine, yielding 50-70 snapshots whereas it fails on SN's machine which is faster

Node 1 ends up not emitting snapshots, seems like it is not emitting a network effect to send ReqSn message Trying to drop the null (seenTxs ) condition works => the condition should be at the level of the guard so that we don't "consume" The DoSnapshot event without either waiting or emitting a ReqSn

There is a property waiting to be written there, expressing snapshot strategy invariants in terms of variation of state/sequence of events.

Benchmark now runs to completion without failing 🎉

Discussing the snapshot strategy as it's getting somewhat cumbersome now

2021-08-24

AB Solo Programming

(Cont'ed work on Tx generator)

Setting most values to 0 lead to an error in the generator, with all frequencies set to 0. QC.frequency is used in different sections of the generator:

To generate credentials registration => some XXCred must be non 0
To generate delegation => frequencyKeyCredDelegation or frequencyScriptCredDelegation must be non 0, or there must not exist a stake pool to delegate to (in DPState ?)
To generate withdrawals => we use defaults so should be fine?

Got another error now:

ApplyTxError [UtxowFailure (MissingScriptWitnessesUTXOW

So the transactions are generated sometimes with script addresses which obviously require a script witness, which we simply drop when creating the witnesses... => Going to add handling of script witnesses in the CardanoWitness data structure

The script witnesses field is actually a Map ScriptHash Script, trying to serialise it as an object? Interestingly there's a ToJSON instance for ScriptHash but no ToJSONKey which is somewhat sad
=> Added scripts witnesses to the ToJSON instance for witnesses, and ToJSONKey/FromJSONKey instances for ScriptHash, now testing to see if I get failures in golden and roundtrip for witnesses, which should be the case...
Added JSON instances for Timelock so that the JSON instance for witnesses is simpler

All serialisation tests now pass, now having an even more complex error with several issues:

Left (ValidationError {reason = "ApplyTxError [
UtxowFailure (MissingScriptWitnessesUTXOW (fromList [])),
UtxowFailure (InvalidWitnessesUTXOW [VKey (VerKeyEd25519DSIGN (PublicKey \"\\177g\\EM@R\\DC2\\251\\129\\GS\\175\\211+t\\146\\161\\205\\174\\138\\247\\154S\\244>\\r\\f%\\195U\\141\\166\\234\\&9\")),VKey (VerKeyEd25519DSIGN (PublicKey \"\\252]=\\212&\\139\\138\\240\\185\\ESC\\185\\GS\\186Dk\\164\\ESC`\\249I\\186\\163\\224K\\r\\SI\\192KT\\204\\160\\SO\")),VKey (VerKeyEd25519DSIGN (PublicKey \"\\150\\f[\\192l\\179\\v\\136%\\182%\\137 \\STX\\215\\229up\\228$V\\157?F\\151i\\236\\144\\SI;e\\142\")),VKey (VerKeyEd25519DSIGN (PublicKey \"o9\\174\\133\\251V\\252\\247\\210j\\187\\DC4\\178\\223@\\225\\182\\&9\\148\\a\\229\\\"4{\\185XR\\210<\\245\\154\\255\")),VKey (VerKeyEd25519DSIGN (PublicKey \"J\\219\\247.<\\203\\238\\216\\162\\EMhY{\\ESCk#\\214\\155\\170\\206J\\210\\FS\\206\\130\\209\\158s\\255\\&4\\255\\ETB\")),VKey (VerKeyEd25519DSIGN (PublicKey \"\\EOT0\\ETBo\\183\\n\\138\\182\\143\\192#\\172\\183\\243\\245\\215Sp\\201\\220\\DLE)\\SYNQ\\167\\ETB\\251\\218e\\ETX\\132\\196\")),VKey (VerKeyEd25519DSIGN (PublicKey \"h\\239\\210sTVfp\\NAK2-\\130\\STX\\253a\\DC2\\209\\204n\\245\\188\\213\\138cG\\136\\186I\\r\\249\\173\\143\"))]),
UtxowFailure (UtxoFailure (ValueNotConservedUTxO (Value 230733318 (fromList [])) (Value 230733318 (fromList [(PolicyID {policyID = ScriptHash \"42c7a014a4cd5537f64e5ae8ec7349db3d8603e16765dc37f8fb6e67\"},fromList [(\"yellow1\",729252),(\"yellow2\",901652),(\"yellow4\",871114),(\"yellow5\",127109)])]))))]"})

The transaction is relatively small:

{
  "witnesses": {
    "scripts": {
      "42c7a014a4cd5537f64e5ae8ec7349db3d8603e16765dc37f8fb6e67": "820181820181820180",
      "a3e84983320841577ac20d77058e440d7fb7e17e98659e921b1274a3": "83030383820282820182820181820518208200581c733aea10df2a2bb1d3019a7337b240ad64a174c919fc034fb372fdc9820182820181820418208200581c733aea10df2a2bb1d3019a7337b240ad64a174c919fc034fb372fdc9820282820182820181820518598200581c571758200680b643781738e0436291811be83c1707fc66edd4982b0e820182820181820418598200581c571758200680b643781738e0436291811be83c1707fc66edd4982b0e8202828201828201818205183582$0581cb9acb4b5682ddb6980f2471bbd13a3765e54d79ebf46417c850a609c820182820181820418358200581cb9acb4b5682ddb6980f2471bbd13a3765e54d79ebf46417c850a609c"
    },
    "addresses": [
      "8200825820b16719405212fb811dafd32b7492a1cdae8af79a53f43e0d0c25c3558da6ea395840b8de36f9836332743d8068478fd5a1e93aeff12dfade0dedf86c74a252e23c1f7903b81d43a6a8b21e42b08fb531c2e9f6e78080aa71bf234e5117a7a1328a0f",
      "8200825820fc5d3dd4268b8af0b91bb91dba446ba41b60f949baa3e04b0d0fc04b54cca00e5840ff802a5358b84e2110f981a697b60141dce0925f251a954c3c04877ea061083b9ddbfd40ce96a72e3600950bd4b866a49965480d70f0f45e186f8bcc8f9d130d",
      "8200825820960c5bc06cb30b8825b625892002d7e57570e424569d3f469769ec900f3b658e5840cb9aa3267c3c7a05aabb7e3f57b62b238590922ff9e2b4d5965d4eab5a3fee92ccb1fd095c87ee7685f18a8704b04234ba56236adfb037ea157988aa8605e902",
      "82008258206f39ae85fb56fcf7d26abb14b2df40e1b6399407e522347bb95852d23cf59aff58402e639dd813c0a9879366f7c9491ea95d70134be90687b7687308551488a556811c4fc0aced07af841f2e5cc0248af747cf3dfd506d5158d71592878800bce709",
      "82008258204adbf72e3ccbeed8a21968597b1b6b23d69baace4ad21cce82d19e73ff34ff1758404c94c33d62fd704369c110cd010a4d6ea04eb002ae6c3fbdadb0d62a84b03c0e2be84696631d82e82eb3deec595c72e5b2d810f3a95909058bc2e82549a0a104",
      "82008258200430176fb70a8ab68fc023acb7f3f5d75370c9dc10291651a717fbda650384c45840135bcc2f58a21a9273e8e7ca481a744aefadf12b48e9e8b5b3e5e6820e04bf74113d696cd45f5b5d17ade8d23e38522902dd4463852d17c9ce4c818e61c2c107",
      "820082582068efd2735456667015322d8202fd6112d1cc6ef5bcd58a634788ba490df9ad8f5840fa9f0adb12e0ccea8ef31c656af30b473334c026228f223940e08f2ec344c6c9cc0d3a9b4ba0f597f79fb84bb885fd59089fc12890c3563b96a4121e68fb2701"
    ]
  },
  "body": {
    "outputs": [
      {
        "address": "addr_test1qqzfllufs42yh9tz3j5zeqeh8v789hvzvz57kd4n5xez0pnc0554t3aspn7xrc0ekfq7he5gwwx935kc8yzx00znxr3shu4jta",
        "value": {
          "lovelace": 46146663
        }
      },
      {
        "address": "addr_test1qryc674js99w50kjf30heds8eqqe0vre3d8487swgrmd7q5a8uwp3k06h9vg32z7lrnzjvpey9eymx7zq8atvz755sjqcguqss",
        "value": {
          "lovelace": 46146663
        }
      },
      {
        "address": "addr_test1qz2y2kkjyhz4d5957msrgtesv85aerhynpgp29fnrxq676zpgz2ae0fmt5rr6kzr97c4e0qg8jvvnx4ktjnlx7unu4wsaad457",
        "value": {
          "lovelace": 46146663
        }
      },
      {
        "address": "addr_test1xzd9pvulknjv8x5fq6d9pluz3kccxw3l8rzvjk4tzdndug2900ddnvzdkregx3scav6qjjc0vq0l9apfh9sd8983zcesws0shy",
        "value": {
          "lovelace": 46146663
        }
      },
      {
        "address": "addr_test1xqs6qrhy9hu77wms0xcryy9tcnv32340gy63ejdz7zxeqj8kf86jrtuhuy6recsnpsn9gfen2u6uueqdljnsqlvu5kpsaldxc9",
        "value": {
          "lovelace": 46146663
        }
      },
      {
        "address": "addr_test1qqh8gkdmj6d8exd4tq65hql93dclhkfamh7dgr6etf8l4k5akymz4y4zwhufvwrymmy08acmy2ujkllln7jcs43k87ys04zqf8",
        "value": {
          "lovelace": 3
        }
      }
    ],
    "inputs": [
      "03170a2e7597b7b7e3d84c05391d139a62b157e78786d8c082f29dcf4c111314#20",
      "03170a2e7597b7b7e3d84c05391d139a62b157e78786d8c082f29dcf4c111314#58",
      "03170a2e7597b7b7e3d84c05391d139a62b157e78786d8c082f29dcf4c111314#61",
      "03170a2e7597b7b7e3d84c05391d139a62b157e78786d8c082f29dcf4c111314#80"
    ]
  },
  "id": "f207b15e2b5691ce237d0b28e6e26cfc5f933281eb16c363458652d663b3dd29",
  "auxiliaryData": null
}

So it is missing outputs for non-ADA tokens obviously => We drop the mint field from the generated body and a lot of other fields, which is a remnant from previous attempt at dumbing down the Body. Just passing the body encoded as is should be fine hopefully, except we are probably dropping a lot of fields in the serialisation process => Test for applying transactions succeeds, now going to wire that in the benchmark

ETE test is now failing which is expected as the format has changed.

Network is wrong, should unify to TestNet for all components. Funnily, the Testnet data constructor from cardano-api is different from the one in the Shelley ledger as it takes a Magic argument, but in the conversion process it is simply dropped.

As expected, when converting the benchmark to use CardanoTx, it fails to validate the transactions emitted because our serialisation is missing quite a lot of fields from the TxBody. We should either filter those, or complete the serialisation to handle more body fields => Covering JSON serialisation of missing fields in TX, in order to ensure we can properly encode/decode all kind of transactions, we'll deal with rejecting irrelevant transactions later on.

2021-08-23

AB Solo Programming

Continued working on generating transactions and checking roundtrip/goldeb serialisation.

Need to tweak the max transaction size parameter to find the right one:

Default maximum tx size is set to 2048 (bytes?): https://github.com/input-output-hk/cardano-ledger-specs/blob/nil/shelley/chain-and-ledger/executable-spec/src/Shelley/Spec/Ledger/PParams.hs#L320 Settting it to 1MB for the moment

Got a different error this time:

ApplyTxError [UtxowFailure (UtxoFailure (UpdateFailure (NonGenesisUpdatePPUP (fromList [KeyHas
h \"099f27f2d9bc901017518ee78b9b12a52ce658142e255666e2ce0b9d\",KeyHash \"859e3a86e34626df256a84ee03d813819aa731e854b6e4034e7024e0\",KeyHash \"a
01f063c96ada95334fcdc7beb3a8fb2d0ff4ee8d206be17fa1becae\",KeyHash \"a94e6fffab278ffef8092918bc3ae6ac47d3cf8d9f4b923ecbfd8236\",KeyHash \"f90e54
1ed22c517263ab0885721c02f08a313b21de009efd3672afed\"]) (fromList []))))]"})

Trying to strip the generated transactions' body from updates stuff but turns out it's not as simple as this, of course, simply stripping the body from the parts we are uninterested in leads to more errors:

[UtxowFailure (InvalidWitnessesUTXOW [VKey (VerKeyEd25519DSIGN (PublicKey \"J\\21
9\\247.<\\203\\238\\216\\162\\EMhY{\\ESCk#\\214\\155\\170\\206J\\210\\FS\\206\\130\\209\\158s\\255\\&4\\255\\ETB\")),VKey (VerKeyEd25519DSIGN (
PublicKey \"h\\239\\210sTVfp\\NAK2-\\130\\STX\\253a\\DC2\\209\\204n\\245\\188\\213\\138cG\\136\\186I\\r\\249\\173\\143\"))]),UtxowFailure (Miss
ingTxBodyMetadataHash (AuxiliaryDataHash {unsafeAuxiliaryDataHash = SafeHash \"8493f75f77f6f02b5342998e180be02fa132c26fba140bdb51026d9f1a2f6bce
\"}))]"})

As pointed out by Jared, we can tweak the generator by setting various parameters in the Constants argument

Pairing Session

Reviewing TUI code written by SN

Trying to fix NodeSpec test by adding a Wait to replace error -> of course, unit test pass but benchmark still fails, note we have to revert to using SimpleTx in node and mock-chain

One problem is that we can emit 2 times the same snapshot

Writing a NodeSpec test to expose the problem of not emitting 2 ReqSn for the same snapshot twice

We don't see any ReqSn after injecting a bunch of txs -> node is not the leader
Also we do not handle effects so we want to create the node with a list of events to prepoluate the queue and then process events until completion or quiescence eg. when the queue is empty

Starting at ed8eeaba94efffca8596e2339e03b1852d3ce4aa BehaviorSpec tests hang:

We spent some time troubleshooting an issue which ended up being caused by the following code:
```
runHydraNode ::
  Tracer m (HydraNodeLog tx) ->
  HydraNode tx m ->
  m ()
runHydraNode tracer node =
  forever . stepHydraNode
```
It happens that forever has type Applicative f => f a -> f b, and (a ->) is actually an Applicative so forever would endlesslly evaluate a thunk which reduce to a function which cannot be evaluated further, which locks the process.

Ended up having a finer grained description of the SeenSnapshot than a Maybe to distinguish the situation from the point of view of the leader and the followers so that we don't make too many snapshots.

There is an interesting micro-pattern here that also is prominent in cardano API which is to not have Maybe blindness: Use expressive and "domain relevant" ADTs to express the state.

Re-running the benchmark (replacing ledger type) still fails in the followers with an out-of-order ReqSn but it fails much later, like in snapshot number 39 which is some progress: We get ReqSn for 40 but we are still processing 39 in node 2. We don't have any InvalidEvent in the node 3 though.

2021-08-22

Work on the hydra-tui

Added command line parameter to decide to which node to connect using --connect
- Uses the Host type of hyrda-node module Hydra.Network
- Needed to add more details to ClientJSONDecodeError to realize that the hydra-nodes in docker containers were still previous API JSON instances
- Rebuilt the docker images and re-started nodes using docker-compose build and docker-compose up -d in demo/
Start adding commands to drive the Head lifecycle
- Rendering [i]nit and handling the KeyEvent is quite easy with brick
- Now the tricky part is implementing Client{sendInput} for a not necessarily connected websocket
Switching to CardanoTx and adding [a]bort was a breeze
I would like to test handleEvent as it gets quite complex, but I don't know how?
At some point I realized that the hydra-node containers all full cpu utilization - are we busy looping?
- Yes, in the recently introduced logging rewrite when flushing the queue: https://github.com/input-output-hk/hydra-poc/pull/63

2021-08-20

Small note on the APISpec which tests our types against api.yaml:

  test/Hydra/APISpec.hs:35:7:
  1) Hydra.API, Validate JSON representations with API specification, ServerOutput
       Assertion failed (after 1 test and 7 shrinks):
         [PeerDisconnected {peer = VerKeyMockDSIGN 0}]
         {'peer': 0, 'tag': 'PeerDisconnected'}: {'peer': 0, 'tag': 'PeerDisconnected'} is valid under each of {'required': ['tag', 'peer'], 'title': 'PeerDisconnected', 'properties': {'peer': {'$ref': '#/definitions/Peer'}, 'tag': {'enum': ['PeerDisconnected'], 'type': 'string'}}, 'type': 'object'}, {'required': ['tag', 'peer'], 'title': 'PeerConnected', 'properties': {'peer': {'$ref': '#/definitions/Peer'}, 'output': {'enum': ['PeerConnected'], 'type': 'string'}}, 'type': 'object'}

Does not mean necessarily that the PeerDisconnected is implemented wrong, but in this case it was indeed the specification in api.yaml of PeerConnected!

It's a bit confusing that no PeerConnected was in the failing list, althuogh this might come from the batch-wise invocation of jsonschema and shrinking?

Pair programming

Discussing strategies and options for roadmap of Hydra. Could it be interesting to frame this using Real Options?

Merging PR about bech32 addresses, seems like TH is a bit overkill but OTOH it's safer and less ugly than handling a Left impossible. Also, handling of various addr types is cumbersome and could be removed if we used cardano-api's functions -> put a red bin to refactor that later

Got a failure in the mock0chain serialisation of UTxO: It's because the mock-chain is using SimpleTx and not CardanoTx. Could it be made more polymorphic and agnostic in the type of transactions it transports? => Not easily

We don't report the error to the InvalidInput which is annoying -> moving the InvalidClientInput wrapper to the HeadLogic module to have it available there, we realise this representation is actually too complex and does not roundtrip properly in JSON. We fixed the encoding of invalid input as just Text and display it to the end user (in ServerOutput).

Looks like we did not do the right thing the first time, namely making sure we are reporting error to the user properly

We now have a proper error message with ETE test: The transaction fails to be deserialised properly, and it points to the witnesses not being properly encoded.

Writing unit test with the faulty transaction as JSON
Comparing ETE encoding with what we have in the node, seems like the ETE is encoding the witness set as a CBOR list and not a list of CBOR?
We encode a list of KeyWitness on the client side, which seems ok, but the encoding of KeyWitness is weird, depending on the type of witness it encodes it as a 2-elements list with the first element being a discriminator: We should do something symmetric in the ToJSON/FromJSON in the Cardano ledger

Serialisation of TX is working fine, and it now it fails on applying the transaction to the ledger: We'll reuse existing functions from MaryTest

Previous code for applying transactions directly used Cardano.Tx but we have wrapped it in our type Need to convert a CardanoTx to a Cardano.Tx
We hit the same problem AB had for tx generation about TxBody: The one available in the API is not the correct one -> Changing for the proper on in ShelleyMA

We see the transaction is sent and injected into the ledger but it is invalid: We need to improve the reporting of errors about invalidity of transactions

Make ValidationError more verbose which showed us our addresses were incorrect, namely we sent Mainnet and receive Testnet

{"transaction":{"witnesses":["8200825820db995fe25169d141cab9bbba92baa01f9f2e1ece7df4cb2ac05190f37fcc1f9d58400599ccd0028389216631446cf0f9a4b095bbed03c25537595aa5a2e107e3704a55050c4ee5198a0aa9fc88007791ef9f3847cd96f3cb9a430d1c2d81c817480c"],"body":{"outputs":[{"address":"addr1vx35vu6aqmdw6uuc34gkpdymrpsd3lsuh6ffq6d9vja0s6spkenss","value":{"lovelace":14}}],"inputs":["9fdc525c20bc00d9dfa9d14904b65e01910c0dfe3bb39865523c1e20eaeb0903#0"]},"id":"56b519a4ca907303c09b92e731ad487136cffaac3bb5bbc4af94ab4561de66cc"},"output":"transactionInvalid","validationError":{"reason":"ApplyTxError [UtxowFailure (UtxoFailure (WrongNetwork Testnet (fromList [Addr Mainnet (KeyHashObj (KeyHash \"a346735d06daed73988d5160b49b1860d8fe1cbe929069a564baf86a\")) StakeRefNull])))]"}}

Node now fails because of missing ToCBOR/FromCBOR instances for CardanoTx which prevents proper communication with other nodes.

We realise our CardanoTx type is problematic as the TxId sshould always stays in sync with the TxBody -> remove it from the data structure and recompute it every time
We got stuck in decoding the Annotator TxBody as there is no FromCBOR TxBody instance: The FromCBOR class provides a Decoder but the runAnnotator requires access to the underlying ByteString which is annoying and prevents us to use a FromCBOR (Annotator a) wihthin a FromCBOR instance needing a FromCBOR a.
Getting dragged into the weeds of how transactions get serialised inside a node....
Solution is to decodeBytes then use those bytes as input to the decodeAnnotator function. We write the ToCBOR/FromCBOR instances of CardanoTx using the underlying FromCBOR(Annotator Tx ) instances and reconstruct the txId using the body

Finally got a green ETE test! 🎉

AB Solo Programming

Implemented a basic Arbitrary instance for CardanoTx to have a proper roundtrip and golden test. I have made the instance use a genCardanoTx function that will come in handy once we want to generate sequences of valid transactions, for example in the benchmark or end-to-end tests.

Now adding a ToCBOR/FromCBOR roundtrip test for completeness' sake.

Got failures when trying to quickcheck application of transactions to the Cardano ledger as the generated transactions run on the Testnet and not the Mainnet which changes the addresses used. -> Settle on using Testnet everywhere

Then another failure: The problem is that generated transactions are more complex than what we cope with in the ledger apparently.

BTW, it's really not obvious how to meaningfully shrink a transaction!
The transaction generated are actually using the auxiliary data so when we simply drop those in the generator, we make the transaction possibly invalid -> Add the auxiliary data as a field to the CardanoTxso that we have all the information for a real transaction.

2021-08-19

AB Solo

Looking at writing a genertor for our CardanoTx. There is a genTx function provided in ledger-specs : https://github.com/input-output-hk/cardano-ledger-specs/blob/master/shelley/chain-and-ledger/shelley-spec-ledger-test/src/Test/Shelley/Spec/Ledger/Generator/Utxo.hs#L103 which produces valid transactions. Trying to salvage work I have done before on model based tester

Ensemble Session

Follow-up on update/abort in PAB: We can't both wait for updates and expose endpoints because the update resumption always is done from the tip of the chain apparently, which means that by the time the abort is done, the update will miss the abort transaction. => activate 2 different contracts in 2 different threads

We add more tickets to the backlog on Miro, filling in some gaps we perceive in what's need for the milestone. We also agree on making less tickets "ensemble-only" to allow team members to pick more stuff when working alone.

We end the session closing the loop with the "real" cardano ledger in the Head:

AB had prepared To/FromJSON instances for most of our types, so we could start by wiring up the Hydra.Ledger.Cardano
When simply using cardanoLedger in hydra-node, the new Tx and associated Utxo types were used
We had still an error when deserializing a Commit client input and added the aeson error to the APIInvalidInput error trace
Finally we realized that it was not parsing because the address format we used in our e2e fixture is using bech32, while we were still de-/serializing a raw hex serialization for the address

2021-08-18

Engineering Meeting

Two solutions had been researched, but not audited yet
- One is implemented now by Inigo
- Both would be transparently supported by the node (when verifying)
Some little addition required in libsodium to make it possible to construct a multisig signature so that, on the verifier side no change would be needed and classic Ed25519 verification can be used.
Generating and not reusing nonces is a vital part (of engineering)
While it works in practice at the moment, the theory behind the MuSig2 needs to be validated from a mathematical standpoint.
Next steps
- Have it theoretically defined -> will yield a formal definition
- Implementation / changes to libsodium are checked against that
Really the most complicated part is managing nonces
- To produce a partial signature the signing party needs to have the nonce for it from all other signers
- Signers need to produce those nonces and need to keep a state of already produced nonces because reusing a nonce means disclosing private key. This implies keeping state on nodes for generated nonces
- Aggregation can be done by anyone, it does not handle any secret
- Aggregation of public keys and conversion from prime order group can be done once, at startup time
Other projects at IOHK will be using a non-interactive multi-signature scheme which requires changes on the verifier
- They will require a hard-fork then?
Code is located at https://github.com/input-output-hk/musig2, not production ready but something we can iterate on.

Then discussion about non-custodial head deployment model with some delegation, something like lightning with watchtowers. Conclusion is that it seems feasible as long as lawyers agree this is indeed a non-custodial solution.

Ensemble Session

There's no generally agreed upon JSON representation of a cardano transaction, which is somewhat annoying: In the cardano-api there are ToJSON instances but no FromJSON, and only for some parts of the API.

Switch to representing the utxo as a map from TxIn to TxOut instead of an object with a ref field and an output field: This means will be encoding the TxIn as it's done in the cardano-api, namely as a string with transaction id plus index.

We should write roundtrip serialisation tests for TxId, using the Gen TxId available in cardano-api perhaps? There are actually 2 interesting properties:

conformity with cardano-api
roundtrip ToJSON/FromJSON

Quick reflection on the session: We lost track this morning of our TDD principles, did not run our ETE test once and got lost in the weeds of implementing serialisation code for the full transasction whereas we would only need the UTXO to make some progress (eg. be able to sned commit commands).

Discussing demo:

what kind of audience are we targeting? devs/enthusiasts/SPOs.. -> interested in technical stuff
summit is also a wider audience so having something graphical to show?
the message is about scaling to global use cases
is it the first time in Cardano we are "locking" fund?
commits are mandatory, and an integral part of the head's capabilities

Do we really need a "lay people" demo? => Probably not, it might blur the message, letting people think this is all done and packages where it really is not => something to leave for marketing people to work on, to carve the right message

No need to have the 3 of us working on finishing serialisation, AB is going to wrap up the Cardano head ledger so that we can have ToJSON/FromJSON instances working and tested, then we pick up the ensemble session working on the validation and integration of actual ledger.

AB Solo Programming

Managed to get a reduced out-of-order snapshot test case, after extracting an immutable prefix from the events stream so that keys and committed UTXOs are right. Trying to plug this reduced test case in the shrinker does not lead to more shrinks so it seems in a sense to be minimal.

Struggling with getting the ToJSON/FromJSON instances right. Wrote a roundtrip test for UTxO and then the JSON instances would seem easy enough, but there's this crypto and era type parameters which are pretty annoying.

Solution is: Make Crypto crypto a constraint for all typeclasses and a parameter for types and this will make testing easier

Got everything to compile and test to run, but roundtrip is failing. Going to troubleshoot.

Trying to reproduce test failure in the REPL, but it does not work as easily: Several instance of FromJSON are in scope as the cabal repl command loads all the files from the component
The issue is in the parsing of TxIn as key in maps
```
Left "Error in $: parsing Natural failed, expected Number, but encountered String"
```
As expected, working with encoded txIn is a PITA...

Making progress, can now properly serialise UTXO and witnesses:

Hydra.Ledger.Cardano
  Cardano Head Ledger
    JSON encoding of (UTxO (ShelleyMAEra 'Mary TestCrypto))
      allows to encode values with aeson and read them back
        +++ OK, passed 100 tests.
    JSON encoding of (UTxO (ShelleyMAEra 'Mary TestCrypto))
      produces the same JSON as is found in golden/UTxO (ShelleyMAEra 'Mary TestCrypto).json
    JSON encoding of (WitnessSetHKD Identity (ShelleyMAEra 'Mary TestCrypto))
      allows to encode values with aeson and read them back
        +++ OK, passed 100 tests.
    JSON encoding of (WitnessSetHKD Identity (ShelleyMAEra 'Mary TestCrypto))
      produces the same JSON as is found in golden/WitnessSetHKD Identity (ShelleyMAEra 'Mary TestCrypto).json

2021-08-17

Pairing Session

Working again on signing a transaction to submit to the head, relevant function is signShelleyTransaction from cardano-api, but unsure if this is the right way to go as it seems a bit hard to work with.

We start trying to salvage what we did for MaryTest in order to build the txBody but give up after some efforts building the transaction: The ledger API used for MaryTest is too low level, we should really start from what we need in cardano-api, namely signShelleyTransaction and resolve issues from there.

Looking at how to build a transaction from the cardano-cli, what kind of data it provides. -> it uses TxBodyContent passing the different bits of information. Seems like a good strategy is to use exclusively stuff from Cardano.Api module which exposes the full node API.

For the moment, we use the getTxBody and getTxWitnesses to extract the data from the signed tx and just shove it as encoded strings into the JSON, but we want to have more details in the transactions.

We managed to build a complete transaction and print it in encoded form, now trying to format a transaction in JSON to send to the node. It's mildly annoying the cardano API does not provide default/empty values for a TxBodyContent so that we could just update the parts we are interested in, but it's nice it provides explicit types (note Maybe x) for all fields.

Also, we created a minimal JSON serialization of a Cardano Tx by "viewing" the TxBodyContent and using (partially) available ToJSON instances for TxIn and TxOut.

{
  "witnesses": [
    "8200825820db995fe25169d141cab9bbba92baa01f9f2e1ece7df4cb2ac05190f37fcc1f9d58400599ccd0028389216631446cf0f9a4b095bbed03c25537595aa5a2e107e3704a55050c4ee5198a0aa9fc88007791ef9f3847cd96f3cb9a430d1c2d81c817480c"
  ],
  "body": {
    "outputs": [
      {
        "address": "addr1vx35vu6aqmdw6uuc34gkpdymrpsd3lsuh6ffq6d9vja0s6spkenss",
        "value": {
          "lovelace": 14
        }
      }
    ],
    "inputs": [
      "9fdc525c20bc00d9dfa9d14904b65e01910c0dfe3bb39865523c1e20eaeb0903#0"
    ]
  },
  "id": "56b519a4ca907303c09b92e731ad487136cffaac3bb5bbc4af94ab4561de66cc"
}

Now need to make the test pass!

2021-08-16

Ensemble session

Discussion on user interfaces and how or whether to split between a high-level "wallet" and more lower-level management UI
Set off to "use the cardano ledger"
- Revisited codebase and see what's in MaryTest, what would be missing and where we likely need to change things
What's our goal? Start from the outside! We want to have our end-to-end test be using cardano transactions
We use our "own" json format, but do intend to support the "serialized cbor" format for accepting transactions later on
- Rationale being, that some ServerOutput is showing full transactions and clients likely are interested in comparing sent / seen transaction outputs etc.
We stopped at signing the transaction, which in particular is not provided by the cardano-ledger-core based API we had been using for constructing addresses, so we think about switching to using cardano-api for constructing / signing a transaction
- Also, it seems to be the most "blessed" and somewhat high-level API for dealing with Cardano transactions
Few tools and documents mentioned that are quite useful when working with Cardano data:
- bech32: very simple yet powerful for converting strings to/from bech32)
- cardano-addresses: handy command-line for creating, hashing and inspecting addresses and scripts. It has a nice(r than cardano-cli) interface.
- cbor.me: simple tool for inspecting hex-encoded CBOR content
- Mary CDDL & Alonzo CDDL: CBOR specifications for Cardano binary types.

SN Working on the TUI

Creating a first draft for a terminal user interface using brick to "manage a hydra node"
- This will be a Hydra client, which connects to a (local) hydra-node
- Will focus on introspecting the hydra-node and the Head state, as well as opening and closing
Start with static Brick UI which only shows version of the TUI and can be quit
Attaching it to the hydra-node using a Client component (see ADRs) which opens a websocket connection to the hydra-node
- For now hard-coded host and port
- Deserialize ServerOutput and handle them as "application-specific events" using customMain
- For example PeerConnected updates a list of connectedPeers in the State and draw paints them
Making the hydra-node connection robust ist non-trivial though
- Connectivity should ideally be known to the UI
- Changing the State to Disconnected | Connected {...} to make "invalid states unrepresentable"
- Extend event type to be something like ClientConnected | ClientDisconnected | Update ServerOutput
- Retry connection upon ConnectionException of websockets is not enough, need to catch and retry also on IOException (initially)
Next steps:
- Testing, any interesting properties in handling events / drawing?
- Command line parsing for picking hydra-node to connect to
- Adding commands and conditional rendering on HeadState -> How to infer it and which command possible from ServerOutput?

2021-08-13

AB on Logging

Seems like the JSON logger we are using is actually unreliable, some messages appear truncated in the output, like:

{"thread":"41","loc":null,"data":{"network":{"data":{"trace":[["event","receive"],["agency","ClientAgency TokIdle"],["send",{"contents":{"tran\sactions":[{"outputs":[1797,1798,1799,1800,1801,1802,1803,1804,1805,1806],"id":329,"inputs":[1785,1795]},{"outputs":[1807,1808,1809],"id":330,\"inputs":[1801,1802,1803,1804,1806]},{"outputs":[1810,1811,1812,1813,1814,1815],"id":331,"inputs":[1798,1805,1807,1808]},{"outputs":[1816,1817\,1818],"id":332,"inputs":[1796,1800,1810,1813,1815]},{"outputs":[1819,1820,1821],"id":333,"inputs":[1797,1799,1809,1811,1812,1814,1816,1818]},\{"outputs":[1822,1823,1824,1825],"id":334,"inputs":[1819,1820,1821]},{"outputs":[1826,1827,1828,1829],"id":335,"inputs":[1817,1823,1824,1825]}\,{"outputs":[1830],"id":336,"inputs":[1826,1828]},{"outputs":[1831,1832,1833,1834,1835,1836,1837],"id":337,"inputs":[1822,1830]},{"outputs":[1\838,1839,1840,1841],"id":338,"inputs":[1829]},{"outputs":[1842,1843,1844,1845,1846,1847,1848],"id":339,"inputs":[1827,1831,1832,1834,1835,1836\,1837,1840]},{"outputs":[1849,1850,1851,1852,1853],"id":340,"inputs":[1833,1841,1843,1844,1847,1848]},{"outputs":[1854,1855],"id":341,"inputs"\:[1838,1839,1842,1846,1849,1850,1851,1852,1853]},{"outputs":[1856,1857,1858,1859,1860],"id":342,"inputs":[1845,1854,1855]},{"outputs":[1861,18\62,1863,1864,1865,1866,1867,1868,1869],"id":343,"inputs":[1858,1860]},{"outputs":[1870,1871,1872,1873,1874,1875,1876,1877,1878,1879],"id":344,\"inputs":[1856,1857,1862,1865,1866,1868]},{"outputs":[1880,1881],"id":345,"inputs":[1859,1870,1871,1874,1875,1876,1878,1879]},{"outputs":[1882\,1883,1884,1885,1886,1887],"id":346,"inputs":[1861,1864,1877]},{"outputs":[1888,1889,1890,1891,1892,1893,1894,1895,1896],"id":347,"inputs":[18\63,1873,1880,1883,1884,1885,1887]},{"outputs":[1897,1898,1899,1900,1901,1902],"id":348,"inputs":[1869,1872,1882,1886,1888,1890,1891,1894]},{"o\utputs":[1903],"id":349,"inputs":[1867,1889,1892,1893,1898,1899,1900,1901,1902]},{"outputs":[1904,1905],"id":350,"inputs":[1881,1895,1896]},{"\outputs":[1906,1907,1908,1909,1910,1911,1912],"id":351,"inp]},{"outp

Simplifying logs using a simple queue where log messages are written to and read from in another thread which is responsible for dumping them in JSON to stdout, adding timestamp and various metadata. Ended up not using Katip as it still adds some cruft on top of what we really need, jsut wrote a simple thread-based logger that pumps from a queue and write to stdout.

Added some simple test as I noticed this code is never tested directly.

Logging format is simpler, extracting events become:

$ cat /run/user/1001/bench-f531fd03b79aa8ca/1 | jq -c 'select((.message.tag == "Node") and (.message.node.tag|test("ProcessingEvent"))) | .message.node.event'

Rerunning the benchmark I still have an incorrectly formatted log entry for the first node, which seems to be the one generating the error, but it's unclear from the error message.

So the logs are truncated because of the error sent, which is ok but it's unclear why an entry in the middle of the file could be incorrect. Could be caused by the flushing of the logs I have added to Logging as we can still write more logs even when the inner action is interrupted by an exception that prevents proper evaluation of the JSON data?

Dumped events from node 3 whose last action is a ReqSn and which seems to crash to as it's output is incomplete, trying to reproduce the failure using those logs. But I am still unable to reproduce the error thrown from update in HeadLogic, even though feedEvents now just discards LogicErrors, probably because the logs are truncated when the exception is thrown.

Trying to remove the error call and replace with a standard LogicError specialised for InvalidSnapshot. Still got a benchmark failure as some messages are not received.

I probably should give up for now... Cannot reproduce the failure using the logs which is really annoying, will try again later.

Ensemble Session

Back to work on the External PAB

What we are really interested in observing are the transactions that will be reflected to the Node as ChainTx values. Can we observe the "redeemer", or we don't need to, we just need to observe the inputs of the transactions (eg. the AbortTx Utxos come from the inputs)

We don't need much information from the tx comming back from the chain, because by definition they have been validated so they are correct. This implies we need to split the OnChainTx in 2, one for sending txs and one for receiving them. By separating the OnChainTx in 2 types, we add more logic to the head code but remove logic from the PAB/Contracts which is good.

For OnCloseTx we only need the snapshot number, and then we can verify the number against our latest confirmed snapshot:

If it's same => OK
If it's lower => post a contestTx
If it's greater => We have a problem

Adapting MockChain to convert between on and posted transactions, so far it was only forwarding what it received.

In the PAB we now want to convert whatever we observe from the current state to an OnChainTx which we'll send to the client. We cannot use the same types in the PAB and the client (Node) because that introduces coupling, and the PAB share types with onchain contarct and we don't want to tie our haskell code with plutus code.

We are stuck on the abort test, the endpoint is called but server returns an error 500 saying it's not there, which might come from incorrect body (but not the case here) or more simply from the fact the endpoint promise is not called in the contract.

select in plutus says that:

-- | @select@ returns the contract that makes progress first, discarding the
--   other one

So if the first "progressing" contract is waiting on the chain for something, then you're stuck. We could use 2 different contracts activated, one for listening to transactions, another one for endpoints and interacting with the head, but plutus team is working on another solution: passing a Promise to the waitForupdate function that makes it possible to have endpoints active while waiting, leading to a Timeout in the waiter code.

We part ways for the week setting our goal for next week: Integrate real ledger into the head. We'll leave the PAB and Plutus stuff aside for the moment and focus a couple weeks on the Hydra node itself, possibly adding some frontend (wallet for end users, TUI for admins).

2021-08-11

Hydra Engineering Meeting

Discussing issue with ν_initial validator:
- We initially thought the PTs would be paid to public key of participants, but actually this does not work because we need to be able to post an abort transaction which implies any participant must be able to consume the multiple UTxO from the Init transaction.
- We need to have the output containing the PTs to be actually paid to a script, which is the ν_initial script, and have the verification key be passed as datum so that the commit transactions are valid iff their signing key matches verification key.
- Tricky thing about this - in order to "discover a Hydra head" the address to which the PT is paid is ideally known in advance
- Having a "single" validator for all Hydra head instances should be fine (according to researchers)
We currently rely on the fact that the datum of the statemachine validator ν_SM is included in the init transaction
- Manuel: This is not always the case! Datums do not *need to be included in the transaction producing [the outputs holding] them.
How are we going to get the committed UTXO to pass to the collectCom endpoint to build the transaction? Right now in the test, they are known in advance but that won't be the case in real code, because the poster only has access to the chain? Or maybe not and just pass around the UTxO off-chain.
We need to pass in the ν_initial datum the parameter to reconstruct the address of the ν_sm state machine validator.

Ensemble session

Added and vetted new coding standards
Started to consolidate master with contract-sm and origin/KtorZ/experiment-move-lift-params-to-datums work streams on our plutus contracts
- Individual modules for each of our three contracts (head statemachine, initial and commit validators)
- Offchain / PAB glue code into the Hydra.Contract.PAB module
SN discovered that doom emacs has a feature for yanking (and browsing) github at point using SPC g y

2021-08-10

Updating plutus and dependencies to continue investigation of weird behavior of smart contracts on semantically equivalent changes
- New version of plutus changes how endpoint work, this function now takes a continuation
- The default port and HTTP api paths have also changed
After having it compile again, the changes which made it pass before do fail the test now whereas what was passing before does fail now!?
Plutus team suspects it's ordering issues

2021-08-09

AB

Back to work after 2 weeks vacations, catching up.

Recreated yubikey after the first one got destroyed when I dropped the laptop on hte wrong side, need to reorder a new one as spare. Fortunaly, I had an encrypted volume containing the secret keys as backup so I was able to restore the keys on the new card relatively easily following Dr.Duh's guide https://github.com/drduh/YubiKey-Guide/#configure-smartcard. The only snag I had was that gpg keeps the state of the key in its store, so reimporting it does not change the flag saying the key is on the card. I had to --delete-secret-keys manually to remove the key completely from the storethen reimport it then move it to smartcard.

Also recreating development VM. For some reason, the disks and FW rules still existed and were not completely destroyed when I used terraform destroy so I had to remove them from the console directly. Also, need to gcloud config configurations activate default to use the correct account settings. It's annoying gcloud does not allow per-directory configurations... Took about 1h8m to recreate haskell dev VM from scratch.

Reading layer2 market survey document coordinated by Ahmad. Seems like isomorphic transactions are really a distinguishing of Hydra from all other proposals, which either are limited to special transactions, eg. payments, or rely on specialised contracts.

Checking Red bin to see if there's some useful employment of my time to do:

Working on cleaning up working directory for tests
Exposing a test prelude in a new package.

2021-08-05

Mb

Done some more testing and exploration with MPT, in particular, playing around with different alphabet's sizes.
Refined a bit the test suite with a more precise formula w.r.t to the computation of the 'average' proof size.
Read up on Verkle Trees and vector hashes
More discussions and meeting on the Adrestia's side.

2021-08-06

MB

Continued the work on getting static addresses for init and commit contracts. This also included reworking a bit the existing contractSM so that it'd distribute the participation tokens to static init contracts, which can be observed from the watchInit endpoint. Doing so, I've also refactored a bit the module structure to more clearly separate on-chain from off-chain code.
Opened a PR on Plutus to add an extra MustSatisfyAnyOf primitive for the TxConstraints, necessary for expressing some of the conditions we have in our various Hydra contracts. (https://github.com/input-output-hk/plutus/pull/3706)
Investigated Plutus' contract size, and why they are so large. Opened an issue with some findings, and discussions with MPJ: https://github.com/input-output-hk/plutus/issues/3702

2021-08-04

MB

Mostly busy with Slack conversations and document reviews on various topics, but mainly, Adrestia, Plutus-core and the use of datum-parameterized contracts vs compilation-parameterized contracts.

2021-08-03

MB

Stumbled upon https://vitalik.ca/general/2021/06/18/verkle.html, which I haven't read beside the intro which has enough to keep me captivated:

[Verkle Trees] serve the same function as Merkle trees. [...] The key property that Verkle trees provide, however, is that they are much more efficient in proof size.
Also discussed with Matthias Fitzi some possible improvements of the MPTs proofs:
- Each node could store their children hashes as Merkle Trees, this allow to reduce the overall proof size by a factor of 3.
- We may want to try shorter alphabet, to create slightly longer proofs but with less neighbors on each levels.
- We've seen in previous simulations that even with short concurrency factors (~20), head networks would still perform reasonably well. So there's a right limit to find which lead to satisfactory performances.
I also had a go at inspecting the sizes' of our Hydra contracts. It's rather big. 11KB for the state-machine, 8KB for the close and initial. We may want to consider optimization to make scripts smaller.

2021-08-02

MB

Worked on an implementation of Merkle-Patricia Tree, including already a few of the necessary optimization w.r.t to the storage of the prefixes.
Writing a few QuickCheck properties revealed that we may have a proof size problem. While it is true that the size of proofs is in log(|U|) (for U the Utxo set), each element in the proof may embed up to 15 hashes, so for reasonably large UTxO set, we end up with proofs carrying 35/40 hashes! Since a proof is needed *per input and per output for each transaction, we may rapidly consume all the available space.

July 2021

2021-07-29

We have been observing weird / blocking issues with Plutus: https://github.com/input-output-hk/plutus/issues/3648

Pairing

Continued the work on the Hydra-Plutus-PAB integration to remove the hard-coded contestation period and have it part of the init transaction. Like a few other types (e.g. HeadParameters, Party) we have duplicating type-definitions for similar concepts between the Plutus on-chain code and the rest of the application.
We discussed (and rejected) the idea of removing that duplication in favor of a single type definition at the root of the dependency tree; yet this is rather unsatisfactory because:
1. Although data-types have the same names and represent similar concept on-chain and in the application, they aren't necessarily an exact overlap. Thus, we would end up with types that are more complex than they need to, because they need to satisfy more downstream consumers.
2. More importantly, the Plutus machinery is more restrictive on what primitive types can be used. For example, there's no Word in Plutus, only Integer. No DiffTime, only Integer. These restrictions forces data types defined for Plutus to loose a lot in expressiveness and satefy compared to what we would normally do for a Haskell program. Thus, while some little duplications is unfortunate, it actually helps to get a nicer designs for the main, off-chain, parts of the application while providing a minimal API surface for the on-chain code.
As a bridge between on-chain and off-chain types, we rely on JSON serialization (which fits nicely in the PAB design). Thus, PAB clients submits parameters as plain JSON, which gets deserialized into their on-chain compatible version using a more restricted set of primitives. Since this approach introduces some duplication in type-definitions, it now becomes utterly important to ensure it works as expected through tests for which property-based roundtrip tests are a good fit. Our first property actually even immediately caught what was a legitimate mistake of ours on its first execution.

2021-07-28

Hydra Engineering meeting

Somewhat ad-hoc agenda on the Close-OCV again:

MPT helps only for fan out txs .. but not when we on the Close / Contest
MPT would move the "space burden" into redeemers instead of datum, but still requires non-bound size for validating hanging transactions", right?
Applicability of Hanging transactions should be encodable by MPT insert / remove operations
- Sandro will check again
- We can/will walk through that with him
Coordinated protocol (= no hanging transactions) might not be affected by this
- Signed snapshots would suffice to validate in close/contest
- Only fanout needs to check presence of utxos in datum

Pairing

Refactored HeadState and HeadStatus back into a single data type to make invalid states unrepresentable: ReadyState won't ever have a valid list of parties as no InitTx, which announces the list of participants, was observed before that state
Reviewed status quo of ExternalPAB and discussed various aspects of it (like paying PTs to pubkeys or some quirks of the PAB)

SN Solo

As https://github.com/input-output-hk/plutus/issues/3546 was resolved, I set off to update plutus dependencies
This time a simple bump of plutus and cardano-ledger-specs was anough to satisfy cabal
Two additional changes required in the code though:
- IsData is now three type classes ToData, FromData and UnsafeFromData
- Boilerplate/gluecode for PAB is now using the HasDefinitions type class
The reproducer does now work!
We cannot update our code statemachine to use Just threadToken though as we do (currently) rely on forging our own tokens (including the PTs sharing the same currency symbol) and recent changes made Plutus.Contract.Statemachine forge thread token automagically.. which is not what we want 😿

2021-07-27

Talked about Tail simulation results, what would be a representative experiment for "general applicability" of the tail protocol, as well as future extensions / adapations and how they would compare to something like Ligthning network on Hydra Heads. Which also seems to be a good avenue to payment use cases for Hydra Heads.

2021-07-22

Read about and look into the the Raiden network
- This onboarding document seems to be a good "introduction" to their tech (stack)
- The spec is quite heavy-weight, but feels a bit "ad-hoc" or engineered rather than backed by research
- The raiden services seem to be "adding value" by short/cheap path finding and offline-capability (like LN watchtowers?) in exchange of some RDN token fee (their ROI?)

Ensemble Programming

Solving together the issue with snapshots not being emitted for transactions once we run out of transactions to submit Wrote failing behavioural test, solutions proposed:

have a concurrent snapshot thread like in hydra-sim
making sure we can have more than one snapshot in flight

Trying 2. as having multiple threads is unappealing. Test is still failing after changes seenSnapshot :: Maybe Snapshot -> seenSnapshots :: Map SnapshotNumber Snapshot We need to debug what's going on as the failure message is unhelpful => dump IOSim traces whene something goes wrong in BehaviorSpec

Taking a step back and thinking how we should solve the snapshot number problem. We need to add a snapshot number in the state, storing the nextSnapshotNumber and updating it in 2 places: When one emits a ReqSn and when one receives it, the former happening only when node is a leader It's a monotonically increasing counter but it's redundant Other solution is to store a Maybe Snapshot in the index, instead of a snapshot, so that the leader can use the index without constructing the snapshot

Test is failing because the snapshot 2 contains 2 confiremd txs instead of one => we should probably update the seenTxs as soon as we emit a ReqSn?

If we remove the snapshotted txs from the seenTxs as soon as we emit a ReqSn => it does not work

Adding more edge cases for leader handling in snapshot emission, code seems more and more complicated

We could also simply not handle ReqTx when there is a snapshot in flight in the leader?

Reverting back to where we had failing tests (And traces from IOSim logs), fiddling with merge/revert conflicts

Trying a mixed approach, not having a separate thread but having a separate event for requesting a new snapshot. The idea is that as we enqueue an event for each transaction anyway, we don't lose anyone of them, the new snapshot will be created with whatever exists at the time of its processing, and if there is another snapshot in flight, we will wait/discard it.

We managed to get "parallel" benchmark working by using a NewSn message that decorrelates the request for a new snapshot from the actual creation of snapshot. The NewSn message is enqueued and waited for if there's already a snapshot in flight, and discarded if there aren't any transactions to snapshot (seenTxs) is empty. This alleviates the need to have a separate thread runnning to trigger the snapshot, and it also works if we want a finer grained snapshot policy, like after N txs.

Then fixing HeadLogicSpec unit tests which are now failing because the snapshot logic has changed. Push to master was a bit too hasty...

2021-07-21

Read and discussed recent Hydra research
Extended the visual roadmap in discussions with researchers and product manager

AB Solo

An interesting minor suggestion for improving code reviews, and commit messages: https://ncjamieson.com/conventional-comments/ If we insist on doing reviews that is...

An interesting scalability paper co-authored by C.Decker, the guy behind eltoo.

Writing a unit test exposing the problem we are seeing with our parallel benchmark, namely that we get a signature for a snapshot we have not seen yet:

  (OpenState headState@CoordinatedHeadState{seenSnapshot, seenTxs}, NetworkEvent (AckSn otherParty snapshotSignature sn)) ->
    case seenSnapshot of
      Nothing -> error "TODO: wait until reqSn is seen (and seenSnapshot created)"
      Just (snapshot, sigs)

Reverting back to when we had parallel confirmations to try to load test the cluster leads to another failure -> Try providing a more helpful message when a waitMatch fails in ETE tests, as the current one is not very useful

Wait function was missing HasCallStack => no stack trace, wrong information from the failing tests
The wait timeouts and there aren't any message received, this is puzzling. I can the snapshot being confirmed in the node's log and the ClientEffect trace, so could it be an artefact of deserialisation?
It seems we never get more than one snapshot when submitting txs in parallel, which looks like an issue in the way we are doing the protocol
There aren't any Wait effect in the logs, so this means we never get into the situation where a tx or snapshot cannot be handled

I think I understand what's going on: We only request a snapshot when processing a transaction and there's no snapshot currently being processed, but given we have a single queue, we end up submitting all txs, then doing one snapshot, but other transactions do not trigger a snapshot request because there is still a snapshot unconfirmed in flight. Then the snapshot ends up being confirmed, but there's no more any transaction to trigger ReqSn.

2021-07-20

Managed to edit the API documentation file with some descriptions, now trying to generate human-readable documentation from it. I can transform the document to JSON using

$ yq . hydra-node/api.yaml

which is a thin wrapper over jq and takes the same kind of expressions. Added nix expressions to the shell.nix for jq and yq, I guess only the latter is necesseary as jq is a dependency of it. Trying to add a python3 package called [https://github.com/coveooss/json-schema-for-humans] but it does work in nix: The package is not part of nix database. Rodney and others has some pointers on how to add a python package to nix.

To install non-standard python packages, follow instructions here: https://nixos.wiki/wiki/Python. This basically mean writing a nix derivation that install the package and invoking it in the shell...
There's also nix-mach which provides tooling to produce nix derivations from python requirements.

Don't know why but I got a déja vu feeling with those JSON Schemas, like I was back in the days of XML processing where XML was everything and everything was described in XML, with complex tools to parse, analyze, validate, merge/split/transform XML documents. It's not as worse with JSON but still feels quite similar. I guess the question is, as always, what's the best format for specifying interfaces and APIs: A pivot format from which to generate or verify code, or code from which to generate doc?

Having a quick look at the generic Schema generator for Swagger in Haskell: It does not extract fields comments from data types records which is annoying as this means we'll need to repeat the same information twice.

Pairing Session

Working on the ETE benchmark test, generating more transactions to input. We move the generator from test package to code package which introduce dependency to QC in hydra-node which is probably fine as we already depend on it in the Prelude => add more stuff from QC to the Prlude?

We probably want to separate submitting of transactions from confirmation in 2 different threads in order to make sure we observe confirmation as soons as possible, while loading the server with more trnasactions.

We are struggling to get a Set from a Vector of Value, until we realise the solution is simple: there is a Foldable toList method!

After parallelizing submission and confirmation of txs, we get an error in the waitForPeersConnected function when runnning the test. We spend some time troubleshooting it:

This is weird as the error seems to be happening at the beginning of the test but we can see the nodes get transactions and messages so this means there is a thread that keeps running that timeout somewhere.
Something's fishy in the waitForAll function, adding traces to udnerstand what's going on
Adding more traces around the timeout call: Could it be triggered asynchronously somehow?
Node 1 is starting to wait for peers connected again at some point after the initial head is open
Adding more traces around various waitXXX functions
It looks as if it was calling waitForNodesConnected a second time after the first round, like there was some thread running in parallel doing that only for node1
Trying to reformat the code and use concurrently instead of concurrently_ explicitly discarding the result
Turns out the action is actually run twice (or more): When we are disconnected, it throws an IOException and this is handled by tryConnect always as a connection failure, which triggers re-running the action. We should only catch exceptions thrown by runClient and not the other ones, but this is not possible as failure to connect and disconnection seems to be represented as the same exception type.
There is a race condition in using the race_ function between detecting the process failed with exit code <> 0 and failing to connnect to it, which leads to non-deterministic test result

We workaround the issue by ensuring we don't retry to onnect when the action has started running which is clunky and uses a Bool flag but works well

Converting the SimpleTx generator to use getSize to be able to generate more transactions -> we see the process crashing and the error about reqSn not being properly implemented

AB Solo Programming

Switching to upgrading dependencies, making sure we can get the latest plutus stuff from SN's branch. Plutus tests are failing and unfortunately the error message is not very informative:

[WARNING] Slot 3: W1: Validation error: Phase2 3ffcc708303460d9cb6871495ae3391ad855745bcec9d5af02c662705eb29c74: ScriptFailure (Evaluatio
nError [])

The Init transaction is failing so the commit is also failing too as it does not have the participation token to spend. Following luigi's advice in issues we raised, all the tests in the upgrade dependencies branch are now passing. The issue was a mismatch in the type of the monetary policy validator: A new parameter was added in a PR recently, like 1 month ago.

2021-07-19

Ensemble Session

Trying to troubleshoot our close contract again. Removing the call to close endpoint still shows collectCom failure. master branch is passing the test, so perhaps the issue comes from our types? Trying to add small changes to the types to see if tests still pass here. Might be an issue with INLINABLE but this should break at compile time.

Test still fails with only a change to the types, trying to just add a simple no-arg constructor => still fails

Changing the order of ctors with a no-arg constructor pass the test, but not with Closed Snapshot. There seems to be interaction between the order of constructors and the order of the case branches in the validator ??

Adding Close/Closed to state/transition at the end of datatypes make the test fail on close... which is weird

Adding traceIfFalse statements to check what exactly is failing (not obvious from Plutus' emulator messages) -> not very conclusive either

We should probably try with a more recent version of Plutus and check if we have same errors/better error reporting. Plutus SC is on our critical path anyway so no point in side-stepping it, but the Plutus team is drowning under pressure and deadlines. Looks like we are hitting a wall, next step is:

upgrade dependencies and see if we can move forwared
circle back with Plutus team for some help

Moving to implementing benchmark

Goal is to have a simple benchmark, running a number of nodes and hitting those nodes with transactions through their clients. Dimensions of the bench are: number of nodes, concurrency level, also structure of transaction.

Discussing the respective merits of monotonic time, clock time, Data.Time or System.time packages... io-sim-classes uses a DiffTime to represent differences and also to represent monotonic time. monotonic time starts at undefined moment in the past (start of system) but is a Word64 in Haskell => No need to care about all this right now

Got JSON output of each transaction submission time and confirmation duration, in the form of a list Now refactoring tx submission to actually confirm all txs that are returned by the snapshot confirmed message

Got benchmark compiling and outputting the confirmation time of a single transaction, extracting txs and txid from the JSON values we get from the server. Next step is to send more transactions from a single node, then send transactions in parallel, and finally sned them to several nodes.

2021-07-16

Planning Session

See Miro board

AB Revisiting Smart contracts testing

Viewing Testing smart contracts by John Hughes

Trying to start again implementing a proper model for the head smart contracts, based on https://alpha.marlowe.iohkdev.io/doc/plutus/tutorials/contract-testing.html and John Hughes video. I think this should be our very first next step because it will help us get a complete picture of the smart contracts we need to implement and guide the implementation whatever form it can take, invidual validators or state machine based. I want to get back to the paper and formalise the SM specification there in code.

Meeting w/ Ahmad

Defining value dimensions for a Layer 2 solution:
- Speed
- Transaction cost
- Security model, custodial vs. non-custodial, level of trust required
- Decentralisation
- Ledger capability
- Scale of participants
- ...
Map different solutions on a spider web chart (aka. radar chart)
Do the same thing for technical parts/components needed, defining how feasible they are:
- The dimensions are the technical components of possible solution(s)
- The scale of the dimension is the maturity level of that particular technical part
Solutions are composed of components
- We can then relate the "desirability"/value of a solution in front of its "feasibility"/maturity
- This should be done collaboratively wiht various stakeholders in order to foster discussions on values, solutions, dimensions

Idea: Create a Hydra testnet with several Hydra nodes connected together, that expose an API that can be used by clients, eg. a dedicated wallet for experimenting.

2021-07-15

Team Retrospective

Detailed notes here along with link to Miro board.

AB Learning about Lightning

Added notes on eltoo to Lightning network page.
Fun fact: The LND implementation of lightning network daemon as over 1000 files of Go source code

These slides from Orfeos are pretty much useless without accompanying talk

Following links from eltoo site, I found Yet another micropayment network proposal.

2021-07-14

Working on the Plutus reproducer

Updating dependencies in cabal.project to build the minimal example from 2 days ago with most recent plutus version
This is non-trivial! When simply bumping all the source-repository-package tags to the ones plutus is using, I get a conflict when resolving dependencies
Disabling hydra-node (and local-cluster) helps, as the conflict is somehow because of our use of cardano-node/ouroroboros-network vs. it's usage via plutus in the hydra-pab exe?
To get the nix-shell updated, this is also involving a lot of updating sha256 sums and taking some things from cardano-node and others from plutus, e.g.
- if we use the same haskell.nix rev as plutus, but stick with ghc8104, this would have the compiler be built along with ALL the packages (plutus seems to be using a custom ghc)
- using the haskell.nix rev from cardano-node, requires us to be using a slightly older hackage index-state and use ghc8105 to get at least SOME of the dependencies and not need to compile ghc
An alternative is to use an ad-hoc shell to get my hands on ghc and cabal, e.g. nix-shell -p ghc -p cabal-install -p pkgconfig -p zlib
- the packages are not enough and we need the patched libsodium
- so I updated shell.nix with a "cabal-only" / "non-haskell.nix" variant of a shell derivation accessible by nix-shell -A cabalOnly
Seems like multiple things have changed in recent plutus
- MonetaryPolicy is now named MintingPolicy, wrapping / compiling it is different, Several constraints have been renamed, BlockChainActions seems to have been removed etc.
- Instead of fixing all this, I removed non-repro code from the cabal module list
Finally, I needed to update the SM.hs code to use the new Plutus.Contract.StateMachine.getThreadToken function and could re-run the minimal reproducer example -> runStep still fails with InsufficientFunds
Reported this as an issue

Tail simulations

Added analysis of how much of the confirmed transactions are within 1, 10 and 0.1 slots and discussed that with researchers. Here are the previously recorded results for the 1000 slot compression (25323 slots) results with s=10800, with pro-active snapshotting at p=0.8:

txs-1000clients-25323slots-10800s-0.8p.csv
Analyze
    { numberOfConfirmedTransactions = 17999
    , averageConfirmationTime = 625.6537533845784
    , percentConfirmedWithin1Slot = 0.9408300461136729
    , percentConfirmedWithin10Slots = 0.9408856047558197
    , percentConfirmedWithinTenthOfSlot = 0.9245513639646648
    }

and without pro-active snapshotting

txs-1000clients-25323slots-10800s.csv
Analyze
    { numberOfConfirmedTransactions = 19761
    , averageConfirmationTime = 747.7579325380825
    , percentConfirmedWithin1Slot = 0.9313293861646678
    , percentConfirmedWithin10Slots = 0.9313799908911492
    , percentConfirmedWithinTenthOfSlot = 0.9168058296644906
    }

I also did perform simulation of the data with 100 slot compression (253227 simulated slots) to see how a ~3h settlement delay s=10800 would fair

txs-1000clients-253227slots-10800s-0.8p.csv
Analyze
    { numberOfConfirmedTransactions = 46952
    , averageConfirmationTime = 980.7909634163951
    , percentConfirmedWithin1Slot = 0.9134861134775941
    , percentConfirmedWithin10Slots = 0.9135500085193389
    , percentConfirmedWithinTenthOfSlot = 0.8959362753450332
    }

Surprisingly, the performance is not as good as on the 1000 slot compression data. We would have expected that the settlement delay would fall "in between" transactions more often with a data set with less traffic. In discussions we speculated that the data might be biased due to the nature of the simulation, where all txs are "fast" until s and in the end all txs require snapshots (as no funds are incoming)
- Proposed solution: Add an option to --discard-edges on analysis, i.e. keep it for measuring confirmation times (influence), but not calculate it into the results.
- This required a breaking change in how the results are written to disk: Previously the txref was "node + slot + .." packed as string and lexical sorting was not honoring "when" a transaction happened. So I do record now the slot + this label as a TxRef, making the type effectively ordered by slot, but factually invalidating all previous results (they were not ordered), so I just added the slot as another column in the CSV to avoid confusion
Given another simulation run on the 20000 slot compression (1266 slots) set with s=100, we expect that the "edges" in the range of s with slots 0-100 and 1166-1266 (or even more) are biased in one way or the other (i.e. all txs until s are "fast"). The non-discarded results are:

λ cabal exec hydra-tail-simulation -- analyze --discard-edges 0 txs-1000clients-1266slots-100s.csv
Analyze
    { numberOfConfirmedTransactions = 37448
    , averageConfirmationTime = 10.018317012686664
    , percentConfirmedWithin1Slot = 0.9032258064516129
    , percentConfirmedWithin10Slots = 0.9038666951506088
    , percentConfirmedWithinTenthOfSlot = 0.879619739371929
    }

Discarding edges for 100 and 200 confirm that it "settles" in the "center" of the simulation run, although only slightly in this example:

λ cabal exec hydra-tail-simulation -- analyze --discard-edges 100 txs-1000clients-1266slots-100s.csv
Analyze
    { numberOfConfirmedTransactions = 30787
    , averageConfirmationTime = 10.968831519346255
    , percentConfirmedWithin1Slot = 0.8934940072108357
    , percentConfirmedWithin10Slots = 0.8942735570208205
    , percentConfirmedWithinTenthOfSlot = 0.8709845064475266
    }
λ cabal exec hydra-tail-simulation -- analyze --discard-edges 200 txs-1000clients-1266slots-100s.csv
Analyze
    { numberOfConfirmedTransactions = 26721
    , averageConfirmationTime = 10.80831723318664
    , percentConfirmedWithin1Slot = 0.8944276037573444
    , percentConfirmedWithin10Slots = 0.8953257737360129
    , percentConfirmedWithinTenthOfSlot = 0.872609558025523
    }

2021-07-13

SN Reading up on lightning

Two-party payment channels secured via hashes + time locks
Networking effect comes from routing payments using layered hashes/secrets (similar to Tor / onion routing)
- natural consequence: Lighning only works with fungible tokens!
- great for privacy as each party only knows previous and next hop
Liquidity is a big problem
- "can't receive until sent", Wallets tackle this by providing channels on-demand, e.g. Phoenix's pay-to-open
- by default, each payment channel needs to be liquid enough to forward the transactions value -> hard to pay $1M through lightning
- spreading payments over multiple channels is getting researched (and implemented?) recently
Lightning nodes need to be online to be safe .. right?
- there is this game-theoretic way of punishing if peer broadcasts old commitment txs (LN-penalty)
- Watchtowers
  - it seems like these are used to allow lightning nodes to be offline for a longer time without losing much safety
  - they detect and ensure that no old states are posted on chain and do even dispute with more recent states of the payment channel
  - typically implemented as a third party service to which lightning nodes send encrypted data with the tx id triggering the dispute being the encryption key
- eltoo
  - a not-yet-implemented way to enforce continuity of states without incorporating all of them (?)
  - drop-in replacement of the penalty mechanism
- My thoughts on this: Are watchtowers this a way to make the Head protocol somewhat offline capable as well? i.e. backup multi-signed snapshots for potential contestation before you go offline .. obviously this is a trade-off for privacy (the watchtower sees all intermediate snapshots) unless we can also make this in an encrypted fashion?
How are non-custodial lightning wallets possible?
- They seem to have a lightning node integrated / running
- But only use a "light" bitcoin node to interact with the main chain (primarily as the storage required is huge), e.g. neutrino
(AB) took some notes on Lightning Network paper

2021-07-12

Talked to a developer of Marlowe as I found out that they are looking into "Merkelization" of the interpreter AST, which seems to be quite similar to our (researchers) ideas of using MPTs for not needing to store the whole UTxO set in the Hydra mainchain tx.

SN Solo

Besides multiple organisational things, I started to look into the issue of Plutus.Contract.Statemachine's runStep erroring with InsufficientFunds when using Just threadToken
I started by creating a minimal reproducer example which contains of a very simple plutus state machine with two states data State = First | Second and a single data Input = Step as well as this trivial transition oldState _ = Just (mempty, oldState{stateData = Second})
The following code is then printing SMCContractError (WalletError (InsufficientFunds \\\"Total: Value (Map [(,Map [(\\\\\\\"\\\\\\\",99985508)])]) expected: Value (Map [(858535eed6775064eed795dd9261d258dd97ad51983877cc3df52e3a10ed6108,Map [(\\\\\\\"thread token\\\\\\\",1)])])\\\"))

contract :: Contract () BlockchainActions String ()
contract = do
  threadToken <- mapError show Currency.createThreadToken
  logInfo @String $ "Forged thread token: " <> show threadToken

  let client = stateMachineClient threadToken
  void $ mapSMError $ SM.runInitialise client First mempty
  logInfo @String $ "Initialized state machine"

  res <- mapSMError $ SM.runStep client Step
  case res of
    SM.TransitionFailure (SM.InvalidTransition os i) -> logInfo @String $ "Invalid transition: " <> show (os, i)
    SM.TransitionSuccess s -> logInfo @String $ "Transition success: " <> show s
 where
  mapSMError = mapError (show @String @SM.SMContractError)

Next: updating cabal.project to a newer version of plutus and all our transitive dependencies

2021-07-08

Ensemble Programming

Working on PAB again, trimming down what we have done so far to the bare minimum. The goal is to have the InitTx transaciton with PTs posted and observed from the mainchain:

setup contract creates the thread token and starts the SM
We need to have the cardano pubkeys available to post the init transaction, because we want to pay PTs to those pubkeys
We see the transaction that creates a UTXO for token creation purpose, then the transaction that starts the state machine, with the thread token being added to the initiator's wallet.
We got stuck with explicit threadToken threading, seems like it's not actually implemented right now so passing the thread token to the mkStateMachine function does not work and makes the transactions unbalanced.
- https://github.com/input-output-hk/plutus/pull/3452 is the PR that introduces auto-forging of ST
- We want to reuse the same CurrencySymbol for the ST and the PTs but this won't be possible after that PR is merged.
- Using Currency.forgeContract to forge both ST and PTs but ignoring the former at the moment because it makes the transition fails.

Added wallet identifier to run withEXternalPAB and have a test with 2 parties so that we can actually be sure the party that checks the transaction has been posted is different from the one that actually initiate the head.

To observe the init transaction being posted, we listen to outputs paid to our pubkeyhash with some arbitrary currency symbol (the "unique id" of our head) and our pubkeyhash as a token name (and amount of 1),

The fact we have a Party in the node and another one in the contracts code is annoying => we have the same wire format so its fine for the moment, but a PArty should not be tied to specific types in Crypto module, it should jsut provide material to build keys which are just bytestrings

Working on tail simulations

Produced plots for multiple scenarios with high settlement times (as started yesterday)
Obviously we have either very fast or transactions just after the settlement delay, but there is also a noticable set of txs at 2s
Investigating why this could be:
The tail server does decide when a client NeedSnapshot, based on the results of matchBlocked
There are multiple withTMVar blocks -> suspecting a race condition
Created a reproducer events.csv:

slot,clientId,event,size,amount,recipients
0,1,pull,,,
0,2,pull,,,
0,1,new-tx,297,53964900,2
1,1,pull,,,
1,1,new-tx,297,53964900,2
2,1,pull,,,
2,1,new-tx,297,53964900,2

results in these times with s=3600:

txId,confirmationTime
1053964900297[2],1.1441610433e-2
1153964900297[2],7200.044700244114
1253964900297[2],1.1418798826e-2

After adding some traces and above minimal events.csv, it can be found that the second tx is reenqueued, but the third transaction actually gets handled before, thus delaying the second tx again -> double settlement delay on the confirmation time
Refactoring the code to not re-enqueue, but handle the message directly on SnapshotDone improves this particular situation, but due to concurrency in the server there is still a chance of "new" NewTx being handled before the "re-handled" NewTx and even a confirmation of around 3s was seen

Pro-active-snapshotting

Implementing the --pro-active-snapshot was not trivial due to the lack of tests, but with a small / constructed set of events and some trace it could be done (.. should've really written test myself this time :/)
Preliminary results of the 1000 compression data set (24320 compressed slots) with s=3600 and pro-active-snapshot limit of p=0.8 show a 242.50993413052075 slot average confirmation time, which is quite a bit better than the 313.5453135554798 slots average confirmation without the pro-active snapshotting.
There is a different number of confirmedTransactions when running the same dataset just with --pro-active-snpapshot on and off, why?
Likely caused by the way the simulation is structured:
- Threads are forked for running the server and client loops, each client processing [Event] and reacting on messages fromt the server (e.g. blocking the loop while doing a snapshot)
- Each client is having a local notion of currentSlot and gets delayed when blocking the event processing loop
- The simulation is stopped after a pre-calculated time, currently lastSlot of events + 2 * settlementDelay converted to seconds

2021-07-07

Engineering meeting notes

Topic: Discuss Custodial Hydra Head or whatever we should rather aim for as our MVP

Provide some context and back story to Duncan
There is always "something custodial"
It's not like "this" or "that"
Custodial systems raise regulatory obligations to the operating parties
- Hydra Pay (Tail) is also in this situation, the server is also a custodian
- If you are processing other people's payments, you need to register (on many jurisdictions)

Are we facing these issues with all of our variants?
Lightning is not having this problem? It's on a smaller scale (really?)

Besides "custodial issue" the tail has some more risks
- It involves creation of a client as an additional component
- Research is still in very early stages, contracts seem complex
- Faces the same problem of "finding the right server" (vs. "finding the right head")

Updating tail simulations

After having a cardano-node and ogmios server running on fully synched on the main chain, I could finally use a npm run pipeline in hydra-sim/scripts/tail to download blocks and construct a events.csv
I opted for npm run pipeline 1000 1000 24320 to try to re-produce the 1000 node with 1000 slot compression dataset as it is also checked in to the repo, which has events of slot 24320
Although I see (some of) the same events be produced, I realized that the current cardano-node-ogmios instance I am using is still (or again) having issues to synchronize fully:

[f30db018:cardano.node.DnsSubscription:Error:6539] [2021-07-07 07:41:28.21 UTC] Domain: "relays-new.cardano-mainnet.iohk.io" Application Exception: 18.158.202.103:3001 InvalidBlock (At (Block {blockPointSlot = SlotNo 28173266, blockPointHash = e42dacc3f7406a85b2e561fffc84118c63a5b71d05d3cb0272dbc2d11c235d2c})) (ValidationError (ExtValidationErrorLedger (HardForkLedgerErrorFromEra S (S (S (Z (WrapLedgerErr {unwrapLedgerErr = BBodyError (BlockTransitionError [LedgersFailure (LedgerFailure (DelegsFailure (WithdrawalsNotInRewardsDELEGS (fromList [(RewardAcnt {getRwdNetwork = Mainnet, getRwdCred = KeyHashObj (KeyHash "47edd4e27aa5ef468603ded3c3250b3fd53ac196d9009c3a189e3f2a")},Coin 14393994)])))),LedgersFailure (LedgerFailure (DelegsFailure (WithdrawalsNotInRewardsDELEGS (fromList [(RewardAcnt {getRwdNetwork = Mainnet, getRwdCred = KeyHashObj (KeyHash "aac051310c2760fae362766ab5e7dd27404da3f72732d68ea7ec0c2a")},Coin 1296084)])))),LedgersFailure (LedgerFailure (DelegsFailure (WithdrawalsNotInRewardsDELEGS (fromList [(RewardAcnt {getRwdNetwork = Mainnet, getRwdCred = KeyHashObj (KeyHash "e7a92b469d4af1e2b70efc3638f084757655e99a954d48aae232d488")},Coin 12093106)])))),LedgersFailure (LedgerFailure (DelegsFailure (WithdrawalsNotInRewardsDELEGS (fromList [(RewardAcnt {getRwdNetwork = Mainnet, getRwdCred = KeyHashObj (KeyHash "4ddc9a17c1e23a56f1e01718387f45e646b3bf9f83c0ba285b04e347")},Coin 1781746)])))),LedgersFailure (LedgerFailure (DelegsFailure (WithdrawalsNotInRewardsDELEGS (fromList [(RewardAcnt {getRwdNetwork = Mainnet, getRwdCred = KeyHashObj (KeyHash "ee9345b6e27716c48d68abd805aaca347ba2a65060c47f3e46904320")},Coin 1297137)]))))])})))))))

Having another go with a 1.25.1 cardano-node and a separate ogmios instance to get it fully synchronized while I continue in running tail simulations on the part of the dataset what I have
After confirming that I have somewhat similar data, I ran the simulation cabal exec hydra-tail-simulation run -- --payment-window 100 --settlement-delay 120 datasets/events-clients:1000-compression:20000.csv to see whether I get also somewhat similar results as in the paper
- I got these results and the average confirmation time seems to somewhat correspond to the graph in the T2P2 paper for 1000 nodes (12.4 seconds)

RunOptions
    { slotLength = 1 s
    , paymentWindow = Just
        ( Ada 100 )
    , settlementDelay = SlotNo 120
    , verbosity = Verbose
    , serverOptions = ServerOptions
        { region = LondonAWS
        , concurrency = 16
        , readCapacity = 102400 KBits/s
        , writeCapacity = 102400 KBits/s
        }
    }
SimulationSummary
    { numberOfClients = 1000
    , numberOfEvents = 1035738
    , numberOfTransactions = NumberOfTransactions
        { total = 517869
        , belowPaymentWindow = 259512
        , belowHalfOfPaymentWindow = 207355
        , belowTenthOfPaymentWindow = 123539
        }
    , averageTransaction = Ada 24
    , lastSlot = SlotNo 1184
    }
[...]
Analyze
    { numberOfConfirmedTransactions = 28346
    , maxThroughput = 218.99746835443037
    , actualThroughput = 23.92067510548523
    , averageConfirmationTime = 10.123735995981168
    }

Just realized that if I run the simulations with an events file I got from MB for 1000 nodes and 20000 compression, I get the same 12.4 seconds average confirmation time.
- I wonder though why that events-clients_1000-compression_20000.csv only has 519 compressed slots, while my recreated 20000 compression of a shorter block chain has 1184 compressed slots?
Possible next steps:
- Store all confirmation times and plot them (likely will show that the average is strongly biased by a small number of slow txs which required a snapshot)
- Increase settlement delay to 3h and re-run a simulation
- Add pro-active snapshotting when reaching a certain window limit (without lookahead)
- Only do pro-active snapshotting when sender knows not to have another tx anytime soon (> settlement delay)
Using the same data as above, but with a 500 slot settlement delay, we get double on the average confirmation time and about half of confirmedTransactions:

Analyze
    { numberOfConfirmedTransactions = 14048
    , maxThroughput = 218.99746835443037
    , actualThroughput = 11.854852320675105
    , averageConfirmationTime = 22.912008787073507
    }

Compression 1000 and settlement delay 3600 slots (~1h):

Analyze
    { numberOfConfirmedTransactions = 22672
    , maxThroughput = 11.628016940092923
    , actualThroughput = 0.9321985115743596
    , averageConfirmationTime = 278.11186776079404
    }

Compression 1000 and settlement delay 10800 slots (~3h):

Analyze
    { numberOfConfirmedTransactions = 12842
    , maxThroughput = 11.628016940092923
    , actualThroughput = 0.5280210517659636
    , averageConfirmationTime = 484.0768607394298
    }

Complete PAB Interactions

Trying (again) to complete the first test of PAB, passing some parties' keys to the Init transaction and have it recorded and observed on the chain through smart contracts and PAB.

We are making too many shortcuts in the PAB/Main thing so things don't make sense to me...

There's confusion between the contract activation logic which basically instantiate a contract and returns a contract identifier that can later be used to invoke endpoints on it, and the actual endpoints handling. To watch transactions from the state machine one needs to run a contract that waits for state changes which requires the thread token (or a state machine client which is created through the thread token)

There is a logical problem: We cannot start the state machine until we have the initTx command, that's what got me confused. Also, how does the endpoint mapping works through the webserver? Seems like we are using Builtin and calling init endpoint but it does seem to be declared anywhere => This is the case in our presetn incarnation, as we have not declared any endpoint so this can't possibly work.

How can I observe the state of a SM if I don't know it's thread token? And then how do I know it's thread token if I did not create the SM in the first place?

As show in the Auction contract's test, the buyer needs an external way of getting the thread token to observe the SM progression:

auctionTrace1 :: Trace.EmulatorTrace ()
auctionTrace1 = do
    sellerHdl <- Trace.activateContractWallet w1 seller
    _ <- Trace.waitNSlots 3
    currency <- extractAssetClass sellerHdl
    hdl2 <- Trace.activateContractWallet w2 (buyer currency)
    _ <- Trace.waitNSlots 1
    Trace.callEndpoint @"bid" hdl2 trace1WinningBid
    void $ Trace.waitUntilTime $ apEndTime params
    void $ Trace.waitNSlots 2

In all instances of StateMachine I could find, this is done through forging a currency which can then be used as a unique identifier that's either used directly or as part of some larger initial state.

Here in the TokenSale example from week08 of PPP, the token is part of the TokenSale initialiser.

tsStateMachine :: TokenSale -> StateMachine (Maybe Integer) TSRedeemer
tsStateMachine ts = mkStateMachine (Just $ tsNFT ts) (transition ts) isNothing

This implies that a node that's not initiating a head won't be able to know what's the head identifier is if there's no way to get it through another mean: Either out-of-band, through the Head members network, or by watching a specific contract's address which is only dependent of the HeadParameters and henceforth knowable by all parties. Could also be some other address where the participation tokens are forged, with the PTs being defined with a currency symbol which is exactly the thread token.

Init for SM should then:

Create a unique thread token for the head -> this will be the head identifier
Post a transaction with outputs containing PTs for each head participant sent to their HeadParameter's pub keys -> like what we do now

All parties observe this known address and retrieve the PT sent to them to know what is the SM thread token -> then they can start monitoring the SM and observe its state changes specifically.

We need to pass both HydraKey abd CardanoKey in the InitTx so that listeners can retrieve Participation tokens and then the state machine's token. Listening should happen in 2 stages:

listen to the InitTx by listening to PTs being paid to one's pubkey
listen to the state machine's changes using the PT's currency symbol as key to the state machine's instance

2021-07-06

Research meetings

Had multiple meetings with researchers today
First it was about doing some additional Tail simulations:
- Using shelley data
- Focus on (optimistic) latency and ‘window recycling’
- Do some kind of "pro-active" snapshotting when reaching a certain window watermark, e.g. 0.8[-w,w]
- Ideally when the sender knows it is offline for at least s (settlement delay)
- Increase s to a more realistic length of ~500 slots / ~3h
- We are most interested in confirmation times (not really throughput) -> plot each tx individually as pointcloud? with time and value as axes?
But originally, this is motivated by "prioritization" of Hydra Tail / Head, which was then discussed in the full Research Meeting
- Pointed out that the Hydra Head is more realistic to be implemented any time soon
- Maybe "prioritization" issue is about just the wrong appearance of Tail being the only solution to (micro-)payments?
- Eventually pitched our MVP idea for a delegated hydra head
- Was somewhat well received and also incremental approach on creating this made sense to most
After Aggelos talked to Charles though, the "delegated hydra head" seemed to be a non-solution because of regulatory obligations being implied by it being actually a "custodial hydra head"

Re-doing some tail simulations

In order to be able to redo some of the simulations, I started by following these instructions
First I was using the combined Docker image for cardano-node-ogmios against a somewhat old db of a cardano-node and invoking npm run pipeline against the ogmios server running with that state
- For this I had to add nodejs-14_x to the shell.nix of hydra-sim
- Also the download first fails because of TypeError: reader.end is not a function, but restarting the pipeline picks up the downloaded blocks.json
- Contrary to the README, there is a third parameter which seems to be limiting the maxSlot (after compression?)
When seeing that the data is not complete (slot no is way too low), I realized my cardano-node had problems extending the chain
After trying several different tags and also re-synching completely from scratch, I found that there seems to be an issue with recent cardano-node versions and the allegra hard fork (in retrospect) - This was also observed by others on slack
Synching with the 1.25.1 node (using a docker image) seems to work now

2021-07-05

Pairing

While fixing PR review's issues, I noticed one problem with the use of JUnit formatter: It does not output anything anymore, the output is sent to the XML file but not to the console which is annoying. Trying to find a way to configure format and get both outputs. Managed to combine the 2 formatters, JUnit XML file generator and console reporter. Seems like there could be a generic function to define there, something like:

both :: (a -> m ()) -> (a -> m ()) -> (a -> m ())
both one two a = one a >> two a

This is aptly defined as tee.

Trying to understand why our close contract fails to validate properly, and at which points validation fails. We get an error about some signature not being done but it seems we add the constraints and lookups that are needed. Trying to dump the generated transcations using logInfo to see how it looks like.

Trying to use ownPubKey to sign the transaction instead of the pub key we pass in the contract but to no avail, it's still failing.
Trying to add traceIfFalse in the OnChain code => now fails in the collectCom transaction.
Trying to trim down the close validator to bare minimum.
Turnaround time is long: > 1 minute per compilation cycle which is horribly slow
Why is collectCom somtimes failing while we are changing seemingly unrelated code?

Going through Plutus code that constructs a transaction, trying to understand where it's validated and what each part is doing.

Trying to trace the transactions that are posted. It's not possible to get the ones that are not validated by the wallet, seems like only the failing on ledger ones are dumped.

Plan for afternoon:

Remove mock-chain -> complete PAB with lightweight contract logic so that we get a complete OnChain client talking to PAB
Good to have a look at the plutus pioneer program again

AB Solo Programming

Keep troubleshooting cloes contract failure: Seems pretty clear the failure is in the amounts but what's unclear is why adding traces can have side effects that make the collectCom transaction validation to fail. going to try to validate the transaction and then investigate the effects of traces.

Just adding the single validation mustBeSignedByOneOf makes the test fails at the collectcom call which does not really make sense.

Trying to remove and and the list of constraints makes the test pass!

Adding check on amounts with && operator (which is supposed to fail) makes the test fail and the error message is cryptic
Trying to add && True makes the test pass -> Seems like it's not the operator the problem, but the operand?

It's hard to troubleshoot errors when one cannot print/trace values: traceIfFalse takes a String but one cannot pass show from vlaues on chain apparently?

Trying to remove check on equality of committed values/closed values which seems to break thigns a lot, focusing on the correct transitioning to Closed state
Trying to replace the amount computed from the inputs with constant lovelace value of 1 and pass that off-chain

With a constant adaLovelaceValue 1 the transactions successfully completes but the test now fails with the wallet's balance not being the one expected. Alice's wallet should have changed by -1000 but it actually changed by 999 which probably comes from the fact it submitted the close transaction that only output 1 lovelace and the rest of the inputs went to Alice's wallet. Changing the test to have the transaction posted by Bob gives the same result, plus the state is not changed!!
The payToTheScript constraint in off-chain correctly generates a value that contains the committed ADAs and the participation tokens. Need to add that constraint in the on-chain validator which seems to be what we doing before but maybe not?

Putting an incorrect value in validator's verification raises an error in the close contract as expected.
Ended up submitting a Plutus Bug to try to have a better understanding of what's going on with our failures in the close contract's invocation.

Going to beef up Dev VM with C2 instance to have a faster CPU => does not significantly change turnaround time.

Changing the way the amount is computed in the validator changes the outcome of the test: Now I can see the transaction validation failing on-chain and the transaction is dumped to the console, which does not help that much troubleshooting it as it's huge progress nevertheless.

It seems the transaction has no input with value, I can see only a single input which is the script address with datum and redeemer.

The inputs and outputs of the CollectCom transaction are:

                   {inputs:
                                  - 268bf918b954642de3e4a1b2d108dee48f2ed4a0f9c974b35c6291b60070ab54!1
                                    Redeemer: <>
                                  - 3832e5b62e1bf8df95054f42d522ec24388b407652dd8564281a30367dcac0ad!1
                                    Redeemer: <>
                                  - 63fcd1840fb27fa8eef570f6f8ea42f1309518c17d4e252f3ee2ddc4c4492848!1
                                    Redeemer: <>
                               collateral inputs:
                               outputs:
                                 - Value (Map [(,Map [("",2000)]),(1dd1049cbd0ff6c47602ac8ce76d9d0558edec1c8f417ad6fd4a2111d8b10f10,Map [(0x21fe31dfa154a261626bf854046fd2271b7bed
4b6abe45aa58877ef47f9721b9,1),(0x39f713d0a644253f04529421b9f51b9b08979d08295959c4f3990ee617f5139f,1)])]) addressed to
                                   addressed to ScriptCredential: 43071a376b64c7e305c2fedff9125e3d498f2d00ab0f486fdcaad81baa99dbe1 (no staking credential)

The I/Os for the close transaction are:

                              {inputs:
                                  - 4fb4fd36491388958f3299ea539b574fdca790e8b5668599e99cac6ba5a39fac!1
                                    Redeemer: <<1,
                                    [<<<"2\130\148\246\255\SUB;X\235\236\191j\224\fZj\137\221\SOs i\EM\235\137\FS\199H(\192\178\178">,
                                    <>>,
                                    [<"", [<"", 1480>]>],
                                    <>>,
                                    <<<"\FS|\224\211\DC3\244\DEL\141R\144\229^i\171B\170E\DC2+E\180\186k\171\154>]T\252\171ub">,
 ....
                         outputs:
                                 - Value (Map [(,Map [("",2000)]),(1dd1049cbd0ff6c47602ac8ce76d9d0558edec1c8f417ad6fd4a2111d8b10f10,Map [(0x21fe31dfa154a261626bf854046fd2271b7bed
4b6abe45aa58877ef47f9721b9,1),(0x39f713d0a644253f04529421b9f51b9b08979d08295959c4f3990ee617f5139f,1)])]) addressed to
                                   addressed to ScriptCredential: 43071a376b64c7e305c2fedff9125e3d498f2d00ab0f486fdcaad81baa99dbe1 (no staking credential)

Logging the utxoAt result in the close endpoint that gives the TxOutTx attached to the script's address:

    Contract log: String "State machine UTxO: fromList [
    (TxOutRef {txOutRefId = 4fb4fd36491388958f3299ea539b574fdca790e8b5668599e99cac6ba5a39fac, txOutRefIdx =  1}
    ,TxOutTx {txOutTxTx = Tx {
    txInputs = fromList [ TxIn {txInRef = TxOutRef {txOutRefId = 268bf918b954642de3e4a1b2d108dee48f2ed4a0f9c974b35c6291b60070ab54, txOutRefIdx = 1}
                               , txInType = Just (ConsumeScriptAddress Validator { <script> } (Redeemer {getRedeemer = Constr 0 []}) (Datum {getDatum = Constr 0 [Constr 0 [Constr 0 [B \"9\\247\\DC3\\208\\166D%?\\EOTR\\148!\\185\\245\\ESC\\155\\b\\151\\157\\b)YY\\196\\243\\153\\SO\\230\\ETB\\245\\DC3\\159\"],Constr 1 []],List [Constr 0 [B \"\",List [Constr 0 [B \"\",I 1000]]]],Constr 1 []]}))}
                          ,TxIn {txInRef = TxOutRef {txOutRefId = 3832e5b62e1bf8df95054f42d522ec24388b407652dd8564281a30367dcac0ad, txOutRefIdx = 1}
                               , txInType = Just (ConsumeScriptAddress Validator { <script> } (Redeemer {getRedeemer = Constr 0 []}) (Datum {getDatum = Constr 0 []}))}
                          ,TxIn {txInRef = TxOutRef {txOutRefId = 63fcd1840fb27fa8eef570f6f8ea42f1309518c17d4e252f3ee2ddc4c4492848, txOutRefIdx = 0}
                               , txInType = Just ConsumePublicKeyAddress}
                          ,TxIn {txInRef = TxOutRef {txOutRefId = 63fcd1840fb27fa8eef570f6f8ea42f1309518c17d4e252f3ee2ddc4c4492848, txOutRefIdx = 1}
                                , txInType = Just (ConsumeScriptAddress Validator { <script> } (Redeemer {getRedeemer = Constr 0 []}) (Datum {getDatum = Constr 0 [Constr 0 [Constr 0 [B \"!\\254\\&1\\223\\161T\\162abk\\248T\\EOTo\\210'\\ESC{\\237Kj\\190E\\170X\\135~\\244\\DEL\\151!\\185\"],Constr 1 []],List [Constr 0 [B \"\",List [Constr 0 [B \"\",I 1000]]]],Constr 1 []]}))}]
, txCollateral = fromList [TxIn {txInRef = TxOutRef {txOutRefId = 63fcd1840fb27fa8eef570f6f8ea42f1309518c17d4e252f3ee2ddc4c4492848, txOutRefIdx = 0}, txInType = Just ConsumePublicKeyAddress}]
, txOutputs = [
    TxOut {txOutAddress = Address {addressCredential = PubKeyCredential 21fe31dfa154a261626bf854046fd2271b7bed4b6abe45aa58877ef47f9721b9
                  , addressStakingCredential = Nothing}
         , txOutValue = Value (Map [(,Map [(\"\",99915670)])])
         , txOutDatumHash = Nothing}
    , TxOut {txOutAddress = Address {addressCredential = ScriptCredential 43071a376b64c7e305c2fedff9125e3d498f2d00ab0f486fdcaad81baa99dbe1, addressStakingCredential = Nothing}
       , txOutValue = Value (Map [(,Map [(\"\",2000)]),(1dd1049cbd0ff6c47602ac8ce76d9d0558edec1c8f417ad6fd4a2111d8b10f10,Map [(0x21fe31dfa154a261626bf854046fd2271b7bed4b6abe45aa58877ef47f9721b9,1),(0x39f713d0a644253f04529421b9f51b9b08979d08295959c4f3990ee617f5139f,1)])])
       , txOutDatumHash = Just 2b0b15c43ac83cb6d7a68f7ed516e3017964b30f44d2d828408dd9559c8df82d
       }]
, txForge = Value (Map [])
, txFee = Value (Map [(,Map [(\"\",52304)])])
, txValidRange = Interval {ivFrom = LowerBound NegInf True, ivTo = UpperBound PosInf True}
, txForgeScripts = fromList []
, txSignatures = fromList [(d75a980182b10ab7d54bfed3c964073a0ee172f3daa62325af021a68f707511a,2dba3d2cc78c83aef5e080be8dbf85645f90a44edf596913abe466b8cd0634a4250239a127792629702cd8cc4178360999699590e05b38ad2cee9eed12d9bb01)]
, txData = fromList [(2b0b15c43ac83cb6d7a68f7ed516e3017964b30f44d2d828408dd9559c8df82d,Datum {getDatum = Constr 1 ...[]]]]})
                     ,(2cdb268baecefad822e5712f9e690e1787f186f5c84c343ffdc060b21f0241e0,Datum {getDatum = Constr 0 []}),(d38a1142ade90b55793912774ec6b633b03b810ce2f7513b9776d628a5387aa5,Datum {getDatum = Constr 0 [....]]})
                    ,(f37dfa2dac3e68fad98162f5fe2db3ea5e253dccad695ba540b16cbcdc486ece,Datum {getDatum = Constr 0...]})]}
, txOutTxOut = TxOut {txOutAddress = Address {addressCredential = ScriptCredential 43071a376b64c7e305c2fedff9125e3d498f2d00ab0f486fdcaad81baa99dbe1
                                             , addressStakingCredential = Nothing}
                     , txOutValue = Value (Map [(,Map [(\"\",2000)]),(1dd1049cbd0ff6c47602ac8ce76d9d0558edec1c8f417ad6fd4a2111d8b10f10,Map [(0x21fe31dfa154a261626bf854046fd2271b7bed4b6abe45aa58877ef47f9721b9,1),(0x39f713d0a644253f04529421b9f51b9b08979d08295959c4f3990ee617f5139f,1)])])
                     , txOutDatumHash = Just 2b0b15c43ac83cb6d7a68f7ed516e3017964b30f44d2d828408dd9559c8df82d}})]"

Submitted TX in the close:

Tx {
  txInputs = fromList [ TxIn  {txInRef = TxOutRef {txOutRefId = 4fb4fd36491388958f3299ea539b574fdca790e8b5668599e99cac6ba5a39fac, txOutRefIdx = 0}
                             , txInType = Just ConsumePublicKeyAddress}
                      , TxIn { txInRef = TxOutRef {txOutRefId = 4fb4fd36491388958f3299ea539b574fdca790e8b5668599e99cac6ba5a39fac, txOutRefIdx = 1}
                             , txInType = Just (ConsumeScriptAddress Validator { <script> } (Redeemer {getRedeemer = Constr 1 [Constr...}))}]
 , txCollateral = fromList [ TxIn {txInRef = TxOutRef {txOutRefId = 4fb4fd36491388958f3299ea539b574fdca790e8b5668599e99cac6ba5a39fac, txOutRefIdx = 0}
                                 , txInType = Just ConsumePublicKeyAddress}]
 , txOutputs = [ TxOut {txOutAddress = Address {addressCredential = PubKeyCredential 21fe31dfa154a261626bf854046fd2271b7bed4b6abe45aa58877ef47f9721b9, addressStakingCredential = Nothing}
                      , txOutValue = Value (Map [(,Map [(\"\",99894975)])])
                      , txOutDatumHash = Nothing}
                ,TxOut {txOutAddress = Address {addressCredential = ScriptCredential 43071a376b64c7e305c2fedff9125e3d498f2d00ab0f486fdcaad81baa99dbe1, addressStakingCredential = Nothing}
                , txOutValue = Value (Map [(,Map [(\"\",2000)]),(1dd1049cbd0ff6c47602ac8ce76d9d0558edec1c8f417ad6fd4a2111d8b10f10,Map [(0x21fe31dfa154a261626bf854046fd2271b7bed4b6abe45aa58877ef47f9721b9,1),(0x39f713d0a644253f04529421b9f51b9b08979d08295959c4f3990ee617f5139f,1)])])
                , txOutDatumHash = Just 98f5b7eed56b55ca67fb14a2f90708dc7e4939bdc424af280ad934d8343388fb}]
 , txForge = Value (Map [])
 , txFee = Value (Map [(,Map [(\"\",20695)])])
 , txValidRange = Interval {ivFrom = LowerBound NegInf True, ivTo = UpperBound PosInf True}
 , txForgeScripts = fromList []
 , txSignatures = fromList [(d75a980182b10ab7d54bfed3c964073a0ee172f3daa62325af021a68f707511a,bd8f627fef117528a32c8a48c0fa7992e2bbd03fee8b219c72b3a1f95ea8ec97140875009f4bfd7f1e9da7f263c619ff556a202a1d0d2cc9173b10a2445b8b01)]
 , txData = fromList [(2b0b15c43ac83cb6d7a68f7ed516e3017964b30f44d2d828408dd9559c8df82d
                      ,Datum {getDatum = Constr 1 [...]]})
                    ,(98f5b7eed56b55ca67fb14a2f90708dc7e4939bdc424af280ad934d8343388fb
                     ,Datum {getDatum = Constr 2 [Constr 0 [...]})]}"

Transaction seems correct, though!

2021-07-02

Ensemble Programming

Deciding what to do next in the aftermath of first milestone meeting and update:

Protocol is still not complete: There are a bunch of TODOs and Contest is not at all implemented . This is rather straightforward so better do it in solo mode
PAB integration: we'll wait for SN to do this together
Continuing smart contracts: We are not handling all transitions in the Head SCs and there's still a failing test (commented out)

Fixing commented test in ContractTest: Adding a tryCallEndpoint that returns something if an error thrown within the contract endpoint, then we can assert the return value. However there is a assertContractError functino that assert predicate over a ContractError supposedly thrown by a contract's instances.

Implemented basic endpoints logic for close

Add a Snapshot type containing a list of UTxO and a snapshot number. We should add the multisig later.
Add OnChain validator: It's pretty straightforward as there's not much to check.

Add OffChain code to submit transaction for the close -> test fails with a mysterious

      [WARNING] Slot 7: 00000000-0000-4000-8000-000000000000 {Contract instance for wallet 1}:
                        Contract instance stopped with error: WalletError (ValidationError (ScriptFailure (EvaluationError ["Missing signature","checkScriptContext failed"])))

Note: In the OCV algorithms, there's no mention of checking equality between the amount(s) initially committed and the amount of each transition in the SM, nor with the UTxO decommitted. This is implicit in the fact the snapshot committed is valid and signed hence has been produced by a valid ledger, yet it would probalby be better to check it in the OCV?

2021-07-01

Some Coordinated protocol simulation results

Bandwidth is fixed to 2000MB/s, all nodes are colocated in same DC, transactions are assumed to be always non-conflicting.

Nr. nodes	concurrency	tps (snapshot)	snap size
20	10	685	100
20	1	259	10
50	10	709	250
50	1	296	25
100	10	717	500
100	1	314	50

To compare with Simple Protocol's results.

Ensemble Programming Session

We should make an ADR for hiding technical layers behind modules, eg. Hydra.Network encapsulates and re-exports everything network-related.

Implementing Abort

What happens when CollectComTx and AbortTx happen concurrently?

We should not observe both coming back from the chain, but this assumes the chain is safe
We have this property stating we can receive messages in any state, but it's probably wrong
We should guard the OnChainEvent handlers too with the state

We should take care of mainchain rollbacks at some point:

Chain can be rolled back up to 36 hours in the past
This means our whole state could disappear, with the rug pulled under our feet while we run the head
=> we need to wait for opening the head until we get sufficient confidence it cannot be rolled back?
Also relevant for contestation => if you have enough stake you could succeed in forcing a rollback which means you could cheat...
=> delays full finalization even further
Heads us in the direction of long-running heads, with incremental commits/decommits (with same problem of rollbacks)
Ouroboros Genesis should solve part of the issue

There's a tradeoff between acceptable risk and head duration -> TINSTAAFL

There is an error in our code: We transition Abort to ClosedState which is wrong, but we cannot really observe that.

The only way is to make sure there are some actions we can or cannot do. Also, states naming is not consistent with the paper -> renaming InitState to ReadyState
Also, there is no need for a FinalState as there's only one head. When we Abort or finalize the head, we move back to ReadyState which means we can test the correctness of the abort by sneding Init again.
Test fails because we had some uniqueness requirement on the txs in mocked chain in BehaviorSpec so sending InitTx [1,2] twice fails -> remove the uniqueness requirements

Removing our property stating we should handle on-chain transactions in all states -> it's not true anymore. Then adding some unit tests in HeadLogicSpec to assert CollectComtX and AbortTx are exclusive of each other.

Older entries

See Logbook-2021-H1