Logbook 2022 H1

June 2022

2022-06-30

Mithril session

We have reviewed and merged one of the latest stage of the GCP Aggregator stabilization for this issue Stabilize GCP Aggregator #273
We have paired on the Retrieve SD from cardano-cli in Aggregator/Signer #275 issue and we have merged the real epoch retrieval from the Cardano node. There is still a problem with the Docker images that crash because of a version issue of glibc (version 2.29 is expected but not available on the debian:buster-slim image used). We will fix the problem by using a different base image for the Docker files.
Also we have paired on preparing the demo path:
- Road map to Open Sourcing
- Workshop summary presentation (see miro board)
- End to end demo on devnet with real Epoch Retrieval from Cardano node:

# Setup demo

## Checkout correct commit
cd mithril/
cd mithril-client && make build && cp mithril-client ../../ && cd ..
rm -rf mithril

--- 
# Demo: Bootstrap and start a Mithril/Cardano devnet

## Change directory
cd mithril-test-lab/mithril-devnet

## Run devnet with 1 BTF and 2 SPO Cardano nodes
NUM_BFT_NODES=1 NUM_POOL_NODES=2 ./devnet-run.sh

## Watch devnet logs
watch -n 1 LINES=5 ./devnet-log.sh

## Watch devnet queries
watch -n 1 ./devnet-query.sh

## Visualize devnet topology
./devnet-visualize.sh

## Stop devnet
./devnet-stop.sh

# Client
## Get Latest Snapshot Digest
LATEST_DIGEST=$(curl -s http://localhost:8080/aggregator/snapshots | jq -r '.[0].digest')
echo $LATEST_DIGEST

## List Snapshots
NETWORK=devnet AGGREGATOR_ENDPOINT=http://localhost:8080/aggregator ./mithril-client list -vvv

## Show Latest Snapshot
NETWORK=devnet AGGREGATOR_ENDPOINT=http://localhost:8080/aggregator ./mithril-client show $LATEST_DIGEST -vvv

## Download Latest Snapshot (Optional)
NETWORK=devnet AGGREGATOR_ENDPOINT=http://localhost:8080/aggregator ./mithril-client download $LATEST_DIGEST -vvv

## Restore Latest Snapshot
NETWORK=devnet AGGREGATOR_ENDPOINT=http://localhost:8080/aggregator ./mithril-client restore $LATEST_DIGEST -vvv

2022-06-29

Mithril session

We have re synchronized ourselves in order to progressively roll out the new features related to retrieving the real epoch and stake distribution from the Cardano node:
- First: real epoch
  - Wiring the new chain observer to the Aggregator and the Signer (only for the epoch retrieval code)
  - Merging the new test lab that works directly with the devnet
  - Fixing the rights issues between Cardano and Mithril nodes on GCP
- Second: real stake distribution
  - Activating the real stake distribution retrieval in the chain observer
  - Activating of the PoolId handling in the evnet and the test lab
  - Update the configuration of the Mithril nodes on GCP
We have also talked about the enhancements to be done on the documentation website as described in improve UI/UX of documentation site #245. These will be done shortly. We also discussed about including live information gathered from the Aggregator

2022-06-24

Mithril session

We have reviewed and merged the issue Add Single Signatures store to aggregator #282. It has raised some questions regarding some optimizations in the single signature registration process. We have opened a new issue to follow their implementation: Optimize single signature in Mithril Aggregator/Signer #296
We have talked about PRs related to the switch of the epochretrieval from the Cardano node:
- Devnet retrieves 'PoolId' artifacts from Cardano node #292
- Greg/250/chain observer #294
- A new PR is in progress for updating the Docker images so that they embed the cardano-cli
- A new PR is in progress for wiring the cardano_cli_observer in the aggregator and the signer instead of the fake_observer
- The merge of these PR will unlock the creation of the snapshots on GCP that is currently blocked
We also had talks about the finalization of the end to end test runner working on top of the devnet. We will merge shortly

2022-06-23

Mithril session

We have reviewed and paired on the almost finished work on the Use the devnet in rust e2e tests #274:
- The end to end test runner is now able to interact with the devnet
- It raised some issues with single signature registrations that take too long and make the tests fail with more than 1 pool node on the Cardano network. Some optimizations are being done in the issue Add Single Signatures store to aggregator #282
- This should drastically reduce the average time to run the tests on the CI (from 9 minutes to less that 3 minutes) 🥳
We have noticed some issues with the configuration cascade that is not working properly because of the default values of clap that always override any value passed with environment vars. We will need to work specifically on this problem shortly.
In the mean time, some optimizations have been done in the PR Enhance runtime configuration of Mithril Aggregator #291 that has been merged in order to Stabilize GCP aggregator #273. There are still issues related to epoch transitions (fake at the moment) that are occuring too often and that trigger unwanted transitions in the runtime of the Aggregator. We will add a quickfix on this in order to resume the snapshots production on GCP and at the same time we are implementing the real epoch retrieval from the cardano-cli (see Retrieve SD from cardano-cli in Aggregator/Signer #275)
We have also planned our developments for the switch to the real stake distribution of the Cardano network in Retrieve SD from cardano-cli in Aggregator/Signer #275:
- We will deploy in two phases:
  1. Use real epoch but fake stake distribution
  2. Use real epoch and real stake distribution
- A difficulty is to used dynamic PoolId from Cardano network as party_id in the Mithril network. This implies some modifications on the Docker images production (so that they embed a cardano-cli binary), the end to end test runner, as well as the devnet

2022-06-22

Mithril session

The incoming PR related to the stake distribution interface have been reviewed:
- Add chain observer interface #281 (and merged)
- Greg/250/chain observer #286 (and merged)
- Use offsetted epochs in Mithril Aggregator/Signer #287 (ready but not merged yet)
We had discussions about:
- The second step of the end to end tests runner which should use the new devnet in order to launch Cardano nodes (related to Use the devnet in rust e2e tests #274). We must take great care to the execution time of the tests as they are ran in the CI. This will imply fine tuning of the devnet Cardano nodes (epoch length for example)
- The need to implement and use the cardano-cli in the Mithril nodes (related to Retrieve SD from cardano-cli in Aggregator/Signer #275). We will pair on this tomorrow
- The need to clearly define scenarios to implement in the integration tests vs those run on the end to end tests (related to Add integration tests in Mithril Aggregator #284)
- The evolution that we should do on the Aggregator runtime in order to better manage the epoch transition vs the immutable file number transition in the beacon
- The runtime of the Signer that should be implemented with a state machine (as done in the Aggregator)

2022-06-21

Mithril session

We have reviewed the incoming issues and their respective PRs:
- Provide interface for retrieving SD from external source in Aggregator/Signer #250: should be merged shortly
- Retrieve SD from cardano-cli in Aggregator/Signer #275: the parsing of the datas from the stake distribution is almost done, and the cal to the cardano-cli is in progress
- Stabilize GCP aggregator #273: the developments are in progress
- Use the devnet in rust e2e tests #274:
  - The first PR has been merged: Add devnet choose nodes to start #278
  - The adaptation of the end to end test runner is in progress
- Remove Mithril error codes in aggregator #283 has been merged 🥳
- Handle the poolId along with party_id in Core/Aggregator/Signer: #276: work is blocked by #250 and will resume once it is merged

2022-06-20

Mithril session

We had talks about the GCP aggregator stabilization #273. Among issues, there seems to be some trouble with user rights on the file system. We will investigate them shortly. We now have multiple snapshots that are properly produced at each new beacon and that can be restored with the Mithril Client. We will continue focusing on this issue in the next few days. Also, we need to determine what is the target topology of the Cardano nodes we want to run on that environment (SPO or not).
We have reviewed the draft Add chain observer interface PR #281 that will be completed shortly. We have also talked about the first implementation of this interface using the Cardano cli related to Retrieve SD from cardano-cli in Aggregator/Signer#275. This will require some modifications of the Mithril nodes (Docker development containers, CI generated docker) and it will also some adaptation work on the end to end test runner and on the devnet.
A few tickets have been added to the board:
- Add Single Signatures store to aggregator #282
- Remove Mithril error codes in aggregator #283
We have talked about the need for integration tests, particularly to increase test coverage of the Aggregator runtime (which is very well unit tested and correctly end to end tested). We will start to work on that point shortly.

Technical discussions Mithril/Light Wallets

We had a meeting with the Light Wallets team in order to start understanding:
- How we can use the Mithril protocol for light wallets
- What technical difficulties we might encounter
We could use Mithril on two sides:
- On the backend side by enabling fast bootstrapping of full nodes used by light wallets (aligned with the current use case we are working on)
- On the frontend side, which could be a very interesting use case:
  - Provide a SPV mechanism for light wallets (as described in this documentation from Mithril website)
  - It may be possible to use Mithril certificates and embed more information about the Cardano chain than what we currently do such as Utxo set (which should be possible with the current Mithril network architecture). A light wallet would be able to certify the chain (including stake distribution) up to the previous epoch by following the Mithril certificate chain. From this point, as stake distribution is fixed up to the next epoch, a 'classical' SPV mechanism a la Bitcoin could be possible. However this assumption should be validated at first
  - Maybe we could run a full node on the client side?
A light wallet would need to run the mithril-core library to verify the multi signatures:
- An iOS or Android app, that would use it as a static library
- A browser plugin, that would also embed it as a static library
- A browser web page, with WASM compiled from Rust (however, we don't know at this stage if it will be complicated due to the underlying cryptographic backend)
- In any case, there should a single library (audited and trusted) on top of which to build applications
We will carry on investigating on these use cases with regular meetings and workshops

2022-06-17

Mithril session

We have prepared the tickets for the next iteration
We have also discussed about the best way to handle the Handle the poolId along with party_id in Core/Aggregator/Signer issue #276. Here is the strategy we agreed on:
- Remove the party_id in mithril-core interface (and recompute it inside the lib with custom sort based on stake and verification_key)
- Switch the party_id(u64) with pool_id(string) in mithril-aggregator
- Switch the party_id(u64) with pool_id(string) in mithril-signer

2022-06-16

Mithril session

We have paired on fixing the last issues with Implement state machine runtime in Mithril Aggregator #221. The runtime state machine PR #261 has been merged 🥳
We have also prepared the demo path for this iteration:

Mithril Multi Signatures End To End: On new devnet with Real Evolving Snapshots / new Aggregator runtime state machine

# Setup demo

## Download source (if needed)
git clone https://github.com/input-output-hk/mithril.git
#or
git clone git@github.com:input-output-hk/mithril.git

## Checkout correct commit
cd mithril/
git checkout 5fef1c7427cc7f11fad0e9fcc7f550e259768185

--- 
# Demo: Bootstrap and start a Mithril/Cardano devnet

## Change directory
cd mithril-test-lab/mithril-devnet

## Run devnet with 1 BTF and 2 SPO Cardano nodes
MITHRIL_IMAGE_ID=main-f9a51d8 NUM_BFT_NODES=1 NUM_POOL_NODES=2 ./devnet-run.sh

## Watch devnet logs
watch -n 1 LINES=5 ./devnet-log.sh

## Watch devnet queries
watch -n 1 ./devnet-query.sh

## Visualize devnet topology
./devnet-visualize.sh

## Stop devnet
./devnet-stop.sh

Mithril Test Lab: Rust End to end tests

# Setup demo

## Download source (if needed)
git clone https://github.com/input-output-hk/mithril.git
#or
git clone git@github.com:input-output-hk/mithril.git

## Make build (if needed)
mkdir bin
cd mithril/
git checkout 5fef1c7427cc7f11fad0e9fcc7f550e259768185
cargo build --release
cp target/release/mithril-{aggregator,client,signer,end-to-end} ../bin/
#cp target/release/mithril-{aggregator,client,signer,end-to-end} ~/.cabal/bin
cd ..
rm -rf mithril

--- 
# Demo: Bootstrap a Cardano node from a testnet Mithril snapshot

# Launch test end to end (error timeout)
./bin/mithril-end-to-end --db-directory ./db.timeout/ --bin-directory ./bin

# Launch test end to end (success)
./bin/mithril-end-to-end --db-directory ./db/ --bin-directory ./bin

Mithril Restore from GCP: On testnet snapshots (if it works 🤞)

# Setup demo

## Download source (if needed)
git clone https://github.com/input-output-hk/mithril.git
#or
git clone git@github.com:input-output-hk/mithril.git

## Make build (if needed)
cd mithril/
git checkout 5fef1c7427cc7f11fad0e9fcc7f550e259768185
cd mithril-client && make build && cp mithril-client ../../ && cd ..
cd ..
rm -rf mithril

--- 
s
# Demo: Bootstrap a Cardano node from a testnet Mithril snapshot

# Aggregator
## GCP logs
ssh curry@aggregator.api.mithril.network -- docker-compose logs -f mithril-aggregator

## Show pending certificate
watch -n 1 "curl -s -X 'GET' 'http://aggregator.api.mithril.network/aggregator/certificate-pending' -H 'accept: application/json' | jq ."

## Show snapshots
watch -n 1 "curl -s -X 'GET' 'http://aggregator.api.mithril.network/aggregator/snapshots' -H 'accept: application/json' | jq ."

# Client
## Get Latest Snapshot Digest
LATEST_DIGEST=$(curl -s http://aggregator.api.mithril.network/aggregator/snapshots | jq -r '.[0].digest')
echo $LATEST_DIGEST

## List Snapshots
NETWORK=testnet AGGREGATOR_ENDPOINT=http://aggregator.api.mithril.network/aggregator ./mithril-client list -vvv

## Show Latest Snapshot
NETWORK=testnet AGGREGATOR_ENDPOINT=http://aggregator.api.mithril.network/aggregator ./mithril-client show $LATEST_DIGEST -vvv

## Download Latest Snapshot (Optional)
NETWORK=testnet AGGREGATOR_ENDPOINT=http://aggregator.api.mithril.network/aggregator ./mithril-client download $LATEST_DIGEST -vvv

## Restore Latest Snapshot
NETWORK=testnet AGGREGATOR_ENDPOINT=http://aggregator.api.mithril.network ./mithril-client restore $LATEST_DIGEST -vvv

## Launch a Cardano Node
docker run -v cardano-node-ipc:/ipc -v cardano-node-data:/data --mount type=bind,source="$(pwd)/data/testnet/$LATEST_DIGEST/db",target=/data/db/ -e NETWORK=testnet inputoutput/cardano-node

2022-06-15

Mithril session

We have reviewed and merged the following PR:
- Rust test lab #260
- Mithril/Cardano devnet with SPO nodes #262
- Use stake store in Mithril Signer #263
We have talked about the demo path for this iteration
We have paired on the Mithril Aggregator runtime state machine #221:
- Implementation of the /certificate-pending route with the certificate pending store
- End to end tests to make sure that the new runtime is working properly
- We have noticed a bug with digest encoding in the certificate that will be solved in the PR
- We will merge it shortly 💪

2022-06-14

Mithril session

We have reviewed the Implement state machine runtime in Mithril Aggregator issue #221. The issue is almost completed and should be merged shortly 😄
We have also reviewed and paired on the Deploy test network w/ nodes in SPO mode issue #249. It is still impossible to launch a working Cardano private devnet in Docker Compose 🤔 Although we fixed networking issues that triggered warnings regarding the IP Subscriptions, we still receive TraceNoLedgerView errors and we can see that the nodes don't produce any blocks. We will carry on investigating this issue. However, the devnet behaves properly when launched with the shell
Finally, we have reviewed and paired on the Migrate test-lab to Rust issue #248. We will merge the code shortly 🚀. Once this issue and the previous one are done, we will followup with uncommission the "legacy" version of the end to end test and we will also work on improving the coverage of the tests

2022-06-13

Mithril session

The bug Fix digester hash is not deterministic #241 has been closed: the bug does not reproduce anymore on GCP 🥳
We have reviewed and merged the PRs related to issue Use "local" SD in Mithril Aggregator #252:
- Use stake store in Mithril Aggregator multi signer #257
- Remove stakes in Mithril Aggregator pending certificate #258
We have also reviewed the PR (in progress) related to issue Implement state machine runtime in Mithril Aggregator #221
Also we have talked and reviewed works on the rewritten end to end test:
- Migrate test-lab to Rust issue #248
  - Handling of return code in mithril-client
  - Macro implementation for retries/delays
- Deploy test network w/ nodes in SPO mode issue #249
  - The devnet is now working with SPO nodes
  - The code is being cleaned-up
  - There are still issues with running Cardano nodes when launched with Docker Compose

2022-06-10

Mithril session

We have reviewed the following PRs:
- Setup Devnet with nodes in SPO mode launched with Docker, related to issue #249. There are still some instablity on the nodes and investigations are in progress to solve the issue
- Rust adaptation of the end to end test of the issue #248
- Implementation of the aggregator state machine runtime related to #221

2022-06-09

Mithril session

The Flaky Tests CI bug #207 has been apparently fixed by switching the crypto backend of the mithril-core library to the one used by zcash 🥳. There is a configuration available to switch back to blst. In the mean time a bug has been created on the repository of the blst backend.
We have reviewed the development (in progress) of the Devnet with nodes in SPO mode launched with Docker, that is related to issue #249
We have reviewed the Rust adaptation of the end to end test of the issue #248
We have also reviewed the implementation of the aggregator state machine runtime linked to #221. We have done some pairing on this issue in order to refine the state machine transitions

2022-06-07

Mithril session

The PR Use verification key/beacon stores in Mithril Aggregator multi signer #247 has been reviewed and merged
We have prepared the tickets for the sprint
We have talked about the Flaky Tests CI issue #207 and tried to find ways to fix it (cargo-valgrind, signal_hook, using other backend crypto libraries using the same curve). Some tests are in progress 🤞
We have paired on the Mithril Aggregator runtime state machine (linked with Use store for Pending Certificates #221)
We have also talked about the end to end tests migration to Rust and we have stated that:
- The first development will be iso with the current version in Haskell (checking that the current scenario is working: snapshots/certificates are produced by querying the aggregator REST API)
- It will be integrated as a new dependency in the Cargo Workspace of the repository (and will not embed any other dependency from it)
- It should be easy to add new scenario in it when required
- It will bootstrap itself from a previously created snapshot (if possible, should require keeping previous state in a new run)

2022-06-03

Mithril session

We have concentrated our efforts on pairing to fix the bug #244 that froze the API service of the Mithril Aggregator during the snapshot digest computation and the snapshot archive creation. The problem has been fixed in the PR #246 and is deployed on the GCP environment 🥳
We also discussed about how to implement the end to end test in Rust

2022-06-02

Mithril session

We have discussed about the Flaky Tests CI issue #207 and the tests that we have been running yesterday. Here are our findings:
- We have not succeeded in outputting a stack trace: we are completely blind and have no clue of what causes the crash
- The issue has started at job run #801 just after the release of the new blst crypto backend
- We have tried to reproduce the flaky behavior before this release by relaunching multiple times the job run #766 but it never failed
- The SIGILL error is happening multiple times in the same job run with the same test executable as in #1022
- We have downloaded the test binary file computed as an artifact by the CI in #1024:
  - It causes the SIGILL in the CI
  - It does not cause the SIGILL on 2 computers out of the CI
- It appears that the problem could be located between:
  - The machine allocated by Github actions that is not consistent
  - The blst backend or its unsafe implementation in mithril-core
We have prepared the demo path that will showcase a full end to end snapshot creation/certification/verification/restoration:

# Mithril Multi Signatures End To End
# With Real Snapshot

# Resources

## Github
google-chrome https://github.com/input-output-hk/mithril

## Architecture
google-chrome mithril-mvp-architecture.jpg

## Interact with the aggregator through the OpenAPI UI
google-chrome  http://mithril.network/openapi-ui/

---

# Setup demo

## Download source (if needed)
git clone https://github.com/input-output-hk/mithril.git
#or
git clone git@github.com:input-output-hk/mithril.git

## Make build (if needed)
cd mithril/
git checkout 4f387911f7747b810758b3a4783134856307c13a
make build
cp ./target/release/{mithril-aggregator,mithril-client,mithril-signer} ../
cd ..
rm -rf mithril/

---
# Demo Step 1: Create a real certificate/multisignature from a real snapshot

## Prepare store
rm -rf ./stores

## Launch Aggregator
NETWORK=testnet SNAPSHOT_STORE_TYPE=local SNAPSHOT_UPLOADER_TYPE=local PENDING_CERTIFICATE_STORE_DIRECTORY=./stores/aggregator/pending-cert_db CERTIFICATE_STORE_DIRECTORY=./stores/aggregator/cert_db URL_SNAPSHOT_MANIFEST= ./mithril-aggregator -vvvv --db-directory=./db --runtime-interval=30

## Launch Signer #0
PARTY_ID=0 RUN_INTERVAL=30000 NETWORK=testnet DB_DIRECTORY=./db AGGREGATOR_ENDPOINT=http://localhost:8080/aggregator ./mithril-signer -vvvv

## Launch Signer #1
PARTY_ID=1 RUN_INTERVAL=30000 NETWORK=testnet DB_DIRECTORY=./db AGGREGATOR_ENDPOINT=http://localhost:8080/aggregator ./mithril-signer -vvvv

## Display pending certificate
curl -s "http://localhost:8080/aggregator/certificate-pending" | jq .

---

# Demo Step 2: Restore a real snapshot with a real certificate/multisignature validation

## Get Latest Snapshot Digest
LATEST_DIGEST=$(curl -s http://localhost:8080/aggregator/snapshots | jq -r '.[0].digest')
echo $LATEST_DIGEST

## List Snapshots
NETWORK=testnet AGGREGATOR_ENDPOINT=http://localhost:8080/aggregator ./mithril-client list -vvv

## Show Latest Snapshot
NETWORK=testnet AGGREGATOR_ENDPOINT=http://localhost:8080/aggregator ./mithril-client show $LATEST_DIGEST -vvv

## Download Latest Snapshot (Optional)
NETWORK=testnet AGGREGATOR_ENDPOINT=http://localhost:8080/aggregator ./mithril-client download $LATEST_DIGEST -vvv

## Restore Latest Snapshot
NETWORK=testnet AGGREGATOR_ENDPOINT=http://localhost:8080/aggregator ./mithril-client restore $LATEST_DIGEST -vvv

## Launch a Cardano Node
docker run -v cardano-node-ipc:/ipc -v cardano-node-data:/data --mount type=bind,source="$(pwd)/data/testnet/$LATEST_DIGEST/db",target=/data/db/ -e NETWORK=testnet inputoutput/cardano-node

2022-06-01

Mithril session

We have reviewed and closed multiple PRs:
- Aggregator uses a Certificate Store PR #227 that closes the issue #179
- Fix: URL for snapshot location is wrong PR #224 for bug #223
- Make ImmutableFile public PR #228
- Enhanced configs for GCP deployments PR #234
We have opened a complimentary issue to the bug #223 in order to handle Resolvable snapshots url in Mithril Aggregator #232
We have discussed about the Flaky Tests CI issue #207 in order to find ways to understand and solve the issue 🤔
Also we have paired on the creation of a Verification Store that will help implement smoothly the runtime of the aggregator. A PR should be created shortly for the issue Use verification key store #233

May 2022

2022-05-31

Mithril session

We have talked about the bug #223 and we have reviewed the proposed solution
We have paired on the state machine specification of the aggregator runtime and we have produced the following diagram that summarize the way it works:
We have reviewed and merged the PR #219 related to signing real snapshot digest in the client
We have also reviewd the PR #227 that closes the work on creating/using real certificates in the aggregator and the client. We have paired on fixing the end to end test that was not working anymore due to the underlying changes. This PR will be merged shortly.

2022-05-30

Mithril session

We have talked about the fix on the CI warnings #213:
- One fix is temporary (with an associated TODO in the code)
- The long term fix is to create a config for the AggregatorRuntime that implements the From trait with the general Config struct of the node
The aggregator runtime needs more tests than what is available currently. After the current work on the data stores is merged into it, we will start working on its integration tests 💪
We talked about the cryptographic library recent updates:
- Everything works fine with the latest merged implementation so far
- An update has been added regarding the serde configuration for the Aggregate Verification Keys
- We agreed that the panics should be replaced by errors for better error handling
- Some investigations will be lead regarding the flaky tests #207, specifically in the unsafe parts of the code to check if they are responsible for this behavior
We have also paired on implementing a more testable version of the snapshot digester:
- Use of mockall in a common library and using it only as dev-dependency
- The PR #218 has been merged
- This new version is currently implemented in PR #214 that will be merged once it is done
- It is also used in the issue #178 that is under development
The data stores PR #211 has been merged 🥳 We have also paired on implementing a declination store for the Certificate Store #222 that will be merged shortly
Now that the data stores are available, we will pair on using them:
- In the runtime
- In the http handlers
- In the multi signer

2022-05-25

Mithril session

We have reviewed and merged the Cargo Workspace PR #210 🥳
We have reviewed the latest version of the Real Certificate Production in Mithril Aggregator PR #209:
- It has been merged
- A more robust hash computation is done in a new PR #212
We have also reviewed the add generic adapters PR #211 that is being finalized and will be merged shortly. Once it is merged, we will pair on the wiring implementation of the stores in the aggregator

2022-05-24

Mithril session

We have reviewed the PR #209 that takes care of the Real Certificate Production in the Mithril Aggregator. We have paired on implementing the fixed crate in order to handle the Hash and PartialEq traits implementation for the ProtocolParameters struct
We have also reviewed the PR #211 in relation with the data stores implementation in the store
These two PRs will be merged shortly, and then the stores will be wired in the Aggregator Runtime and MultiSigner
We have also discussed in details about the implementation of the data stores and the link between the Beacons and the CertificatePendings (with some Miro charts)
Also, we have reviewed the PR #210 that enables a Cargo Workspace along with an enhanced CI workflow

2022-05-23

Mithril session

We have reviewed the PR #203 related to the Local Snapshot Store in the Mithril aggregator and we have merged it 🥳
A difficulty that was met during this PR is to define which parameters should be used in the clap Args struct and those in the Config struct. It appears that we should:
- Use the Args for setting the run mode and the verbosity level
- Use the following order when setting a parameter in the config:
  1. Value in config file first
  2. Then in env var if exists
  3. Then in cli args
- An ADR will be written describing this rule
We have paired on improving the end to end test so that it waits for the aggregator to be up and ready before running the tests on this PR #208 which has also been merged
We have discussed about the data stores of the aggregator and we have decided to use a simple approach. We will save the different certificate pending versions and embed directly the verification keys in them. A PR is in progress that will follow this approach
Also we have talked about a possible headless design in which the aggregator would be only responsible to produce static files (snapshot list, snapshot details, certificate details that would be the only information needed to restore a snapshot ; without any direct access to the aggregator). This would allow to use these materials to self-bootstrap a end to end test.

2022-05-20

Technical Retrospective

We have done our first technical retrospective and have decided the following actions:
- Action 1: Reschedule the Mithril Sessions in the morning and focus them on pairing
- Action 2: Start working on the Rust version of the Test Lab
- Action 3: Fix flaky tests in CI #207

Mithril session

We have reviewed the doc optimization PR #205. Minor modifications will be made before merging it. We will work on it in an iterative manner until we have a satisfactory result.
We have paired on the Aggregator local snapshot #206 and fixed some issues regarding the serving of static files with warp.
We have prepared the next iteration with goal: Produce/verify multi-signature & certificates for evolving snapshot (with fake stake distribution)
An important point that we discussed is related to the data storage of the aggregator node:
- The data stores can be described as beaconized key value stores
- They must provide access to data depending on a beacon, as the value associated to a key may evolve from one beacon to another (e.g. the signer verification key of a party that is registered during an epoch and available to use in the next one)
- They must expose a function to list all values for one beacon
- They must expose a function to list the n latest values inserted for one beacon
- They must handle pruning of the data with a retention parameter on the beacon

2022-05-19

Mithril session

We have merged the following PRs:
- Create multi signature in Mithril Aggregator runtime #204
- Pass on stm module #199
We have talked about some difficulties due to the fact that we are moving from fake to real data. We will add corresponding tasks to the next iteration in order to be able to produce/verify real certificates from real Cardano node data
We have taken some time to review the documentation PR #205 and to make some modifications. It will shortly be ready for merging.
We have prepared the demo path for this iteration:
- Showcase of:
  - The Mithril Aggregator node producing an evolving beacon following the database of the Cardano node
  - The Mithril Signers proceeding to their registration with the Mithril Aggregator
  - The production of the associated multi signatures by the nodes accordingly to the pending certificate broadcast
- Presentation of the updated doc related to the open sourcing of the repository

# Mithril Multi Signatures End To End
# With Real Beacon / Real Signer Registration

# Resources

## Github
google-chrome https://github.com/input-output-hk/mithril

## Architecture
google-chrome mithril-mvp-architecture.jpg

## Interact with the aggregator through the OpenAPI UI
google-chrome  http://mithril.network/openapi-ui/

---

# Setup demo

## Download source (if needed)
git clone https://github.com/input-output-hk/mithril.git
#or
git clone git@github.com:input-output-hk/mithril.git

## Make build (if needed)
cd mithril/
git checkout d896b557b69b3db120dcbece784611671141a635
cd mithril-aggregator && make build && cp target/release/mithril-aggregator ../../mithril-aggregator && cd .. 
cd mithril-signer && make build && cp target/release/mithril-signer ../../mithril-signer && cd .. 
cd mithril-client && make build && cp target/release/mithril-client ../../mithril-client && cd .. 
cd ..
rm -rf mithril/

---

# Demo Step 1: Display real beacon

## Prepare immutables
rm -f ./db/immutable/{00011,00012}.{chunk,primary,secondary} && tree -h ./db

## Launch Aggregator
NETWORK=testnet URL_SNAPSHOT_MANIFEST=https://storage.googleapis.com/cardano-testnet/snapshots.json ./mithril-aggregator -vvvv --db-directory=./db --snapshot-interval=30

## Display pending certificate
curl -s "http://localhost:8080/aggregator/certificate-pending" | jq .

---

# Demo Step 2: Display real beacon with new `immutable_file_number`

## Copy next immutables
cp -f ./db/immutable.next/00011.{chunk,primary,secondary} ./db/immutable/ && tree -h ./db

## Display pending certificate
curl -s "http://localhost:8080/aggregator/certificate-pending" | jq .

---

# Demo Step 3: Register signers and show signers field updated in aggregator pending certificate route
# Then signers sends single signatures that are aggregated in a multi signature by the aggregator
# At this point, there is no more pending certificate

## Launch Signer #0
PARTY_ID=0 RUN_INTERVAL=30000 NETWORK=testnet AGGREGATOR_ENDPOINT=http://localhost:8080/aggregator ./mithril-signer -vvvv

## Launch Signer #1
PARTY_ID=1 RUN_INTERVAL=30000 NETWORK=testnet AGGREGATOR_ENDPOINT=http://localhost:8080/aggregator ./mithril-signer -vvvv

## Display pending certificate
curl -s "http://localhost:8080/aggregator/certificate-pending" | jq .

## Display pending certificate (204)
curl -v -s "http://localhost:8080/aggregator/certificate-pending" | jq .

---

# Demo Step 4: Display real beacon with another `immutable_file_number`
# Then signers run another round of signatures for the pending certificate

## Copy next immutables
cp -f ./db/immutable.next/00012.{chunk,primary,secondary} ./db/immutable/ && tree -h ./db

## Display pending certificate
curl -s "http://localhost:8080/aggregator/certificate-pending" | jq .

2022-05-18

Mithril session

We have fixed the very long test end to end execution time which is now back to nominal work 🚀
We have reviewed the following PRs, that will be merged shortly:
- Local snapshot store of the aggregator #203
- Create multi signature in Mithril Aggregator runtime #204

2022-05-17

Mithril session

We have decided to gather in a Technical Retrospective every Friday at the end of the sprint:
- Manage the technical debt and code quality (Review of TODOs)
- Talk/Redact ADRs with at least:
  - Expose internal libs
  - Errors handling and propagating
  - Containerization and CI integration
  - Configuration with env vars
We have continued pairing on fixing the end to end test lab for the Realish Beacon issuance PR:
- The issue is more difficult to fix than expected
- In order to move forward, we have temporarily removed the multi signature in the Mithril Client (and accordingly commented the code)
- We will put it back, when the digest/certificate hash computation is final
- The end to end test is very long to run in the CI. This issue will be fixed shortly: Cardano node db files will be embedded as artifacts
- The PR has been merged 🥳
We have also reviewed the Real Signers registration PR:
- We have dealt with the numerous conflicts created after merging the Realish Beacon issuance PR
- Some optimizations have been proposed in this PR. We will review it and merge it shortly
- The PR has been merged 🥳

2022-05-16

Mithril session

We have tried to test the struct mocking with the mockall crate, but we didn't achieved a good result. We will try again during another session
We have reviewed the Real Beacon Producer PR:
- The code is ok and is ready to be merged
- This PR will be merged first as the digest computation is needed for remaining tickets of the iteration
- The end to end tests is red in the CI
We have paired on fixing the end to end test lab that is red, but we have not found a solution yet. We suspect a de synchronization of the digests between the nodes. We will carry on our investigation tomorrow in order to fix the test and proceed to the merge.

2022-05-12

Mithril session

We talked about the penultimate immutable number and its consequences regarding security. Apparently, the ledger state is computed with the latest immutable. We will continue our work by using a digest computed from the penultimate immutable number and include in the snapshot the latest immutable number associated files, as well as the ledger state. In parallel, we will investigate this issue and assess the security risks
We still have some troubles with the CI that is very flaky from time to time. As there is no clear patter, we will carry on our investigation and re-trigger the failed jobs manually in the mean time
Adding a cargo workspace looks like a good idea, but will require to modify the CI pipeline. As it is not a priority, we will work on it later
We have reviewed the matrix build PR of the CI and it will be merged shortly, after some minor fixes. It looks very good 👍
We have also reviewed the PRs in progress:
During these reviews, we have talked about some difficulties on how to better synchronize our developments and avoid going in different directions. One of the issues was to better understand/define the role of the stores in the current implementation of the nodes. We have stated that the business components (Snaphotter, MultiSigner, ... ):
- should embed the stores in their local implementations and determine where and how to use them
- the store should be injected at startup time in the main.rs dependencies init sequence
An idea would be to implement alphabetical order of the dependencies in the Cargo.toml files for better readability. The cargo-sort tool does this and we could implement it the make check calls and as a new step in the CI build jobs

2022-05-11

Mithril session

We have reviewed and talked about the implementation of the upcoming VerificationKeyStore
We have also reviewed the in progress work on the Snapshotter and its digest/fingerprint computation feature
⚠️ Question: What data do we need to certify and to embed in the snapshot ?
- If we work with an immutable number that is the penultimate, it means that we can't certify the latest immutable numberassociated files
- But maybe these files are used by the Cardano node to compute the ledger states
- Is it a security hole to embed the latest immutable number files even if they are not certified?
- If we can't embed them, can we still use the latest ledger state (which could considered as tampered by the Cardano node and/or could be a security hole if tampered and used jointly with tampered latest immutable number files)?
We have noticed that some tests are flaky and fail from time to time. For example this one. We need to investigate further to understand if it is due to the recent changes of the CI made to analyze the Rust test results or if this is linked to the modifications done in the mithril-core library.

2022-05-10

Mithril session

We have reviewed and merged the crypto_helper module in mithril-common and talked about issues when importing #[cfg(test)] modules from the common library (and how to bypass them)
Some modifications on the mithril-core library are in progress and will need to be adapted in the crypto_helper (mainly types renaming at this time)
We have paired on the computation of the real beacon based on the immutable files of the Cardano node:
- We have noticed that the latest immutable number associated files are updated until the following immutable number files are created
- Thus we have decided to rely on the penultimate immutable number to compute the beacon
- The Open API specifications and corresponding entities type will be modified accordingly by removing the now deprecated block field and replacing it by an immutable_number field

2022-05-09

Mithril session

We have created the tickets for the next sprint on the Kanban board and we have talked about what tasks they require
Apart from these tickets, we have included some optimization/cleanup tasks that we will work on during the sprint in order to lower the technical debt
We have talked about the freshly redesigned mithril-core library and the remaining questions/issues:
- StmSig and StmMultiSig are not yet serde compliant because of the hash blake2. A solution is under progress
- We will rely on the StmInitializer serialization instead of the signer secret key (and thus the secret_key accessor will be removed in the core library)
- Two tests from the protocol demonstrator tool are not working anymore as is (a protocol parameters modification was necessary, by changing the phi_f from 0.50 to 0.65). An investigation is under progress in order to understand what happened (https://github.com/input-output-hk/mithril/blob/79f0c5f48d7f30ef1821782a8777c9302ab7a612/demo/protocol-demo/src/demonstrator.rs#L604 and https://github.com/input-output-hk/mithril/blob/79f0c5f48d7f30ef1821782a8777c9302ab7a612/demo/protocol-demo/src/demonstrator.rs#L634)
We have made some pairing in order to include an internal lib with a lib.rs that exposes (and re-exposes) the internal modules. A PR and an ADR are in progress.
We have also reviewed and merged the PR of the new mithril-common library that will be used to implement shared features among the nodes. In order to simplify the CI, it may be a good idea to implement matrix builds with parameters inside the jobs definition, instead of simply duplicating the code.

2022-05-05

Mithril session

We talked about the mithril-core library:
- How to replace ark_ff FromBytes and ToBytes traits implementation. The preferred option is to use serde as a backbone for these tasks as it is very convenient to use and will offer possibility to import/export from different formats (json, yaml, ...)
- Next step to handle the backend update to blst: once all the work has been done on the core library, we will rebase the main branch on it and fix the code that uses it
- We should use these RNG out of the tests (that can still use a seeded one): OsRng or ThreadRng
- We should not need to use re exported modules from the core lib as it should fully wrap what's under the hood. If this needs arise, maybe we will have to modify the core library accordingly.
In order to facilitate E2E tests, we shoud implement a verify only or dry run mode in the Mithril Client
An idea would be to add a UI to the client by adding a http server attached to it
The newly bootstrapped Mithril Signer node should be included in the Terraform deployment of the CI: this will allow to produce live multi signatures certificates. We will do it during the next iteration.
There is a question about how to enforce that values passed to the aggregator are correctly formatted (e.g. base64 certificate hashes):
- Add explicit 400 http errors in OpenAPI specification and in the http handlers implementation
- Investigate further on how this issue has been elegantly addressed by the Hydra team
The security alerts thrown by dependabot have been screened:
- We will run a cargo update command at the end of every sprint in to use latest versions of the crates
- We could take advantage of the [cargo-audit](https://lib.rs/crates/cargo-audit) tool that is also available as a GitHub action
- Maybe the CI could turn red when a Critical level vulnerability is found? We need to see if it is a good option.
We have made a test showcase of the sprint demo path and everything worked as expected 😅

Sprint Demo

We have created the following demo path for the sprint demo:

# Mithril Multi Signatures End To End

## Github
google-chrome https://github.com/input-output-hk/mithril

## Architecture
google-chrome mithril-mvp-architecture.jpg

## Interact with the aggregator through the OpenAPI UI
google-chrome  http://mithril.network/openapi-ui/

## Download source (if needed)
git clone https://github.com/input-output-hk/mithril.git
# or
git clone git@github.com:input-output-hk/mithril.git
# optional
git switch ensemble/fix-client-certificate-hash-encoding

## Make build (if needed)
cd mithril/
cd mithril-aggregator && cargo build && cp target/debug/mithril-aggregator ../../mithril-aggregator && cd .. 
cd mithril-client && cargo build && cp target/debug/mithril-client ../../mithril-client && cd .. 
cd mithril-signer && cargo build && cp target/debug/mithril-signer ../../mithril-signer && cd .. 
cd ..

# Signer #0
PARTY_ID=0 RUN_INTERVAL=120000 NETWORK=testnet AGGREGATOR_ENDPOINT=http://localhost:8080/aggregator ./mithril-signer -vvvv

# Signer #1
PARTY_ID=1 RUN_INTERVAL=120000 NETWORK=testnet AGGREGATOR_ENDPOINT=http://localhost:8080/aggregator ./mithril-signer -vvvv

# Aggregator
NETWORK=testnet URL_SNAPSHOT_MANIFEST=https://storage.googleapis.com/cardano-testnet/snapshots.json ./mithril-aggregator -vvv

# Client
## Get Latest Snapshot Digest
LATEST_DIGEST=$(curl -s http://aggregator.api.mithril.network/aggregator/snapshots | jq -r '.[0].digest')
echo $LATEST_DIGEST

## List Snapshots
NETWORK=testnet AGGREGATOR_ENDPOINT=http://localhost:8080/aggregator ./mithril-client list -vvv

## Show Latest Snapshot
NETWORK=testnet AGGREGATOR_ENDPOINT=http://localhost:8080/aggregator ./mithril-client show $LATEST_DIGEST -vvv

## Download Latest Snapshot (Optional)
NETWORK=testnet AGGREGATOR_ENDPOINT=http://localhost:8080/aggregator ./mithril-client download $LATEST_DIGEST -vvv

## Restore Latest Snapshot
NETWORK=testnet AGGREGATOR_ENDPOINT=http://localhost:8080/aggregator ./mithril-client restore $LATEST_DIGEST -vvv

## Launch a Cardano Node
docker run -v cardano-node-ipc:/ipc -v cardano-node-data:/data --mount type=bind,source="$(pwd)/data/testnet/$LATEST_DIGEST/db",target=/data/db/ -e NETWORK=testnet inputoutput/cardano-node

2022-05-04

Mithril session

We have paired on the Mithril Signer single signatures and finished the work; the PR should be merged shortly #167
We have also reviewed the Mithril Client multi signature verification in the PR #166. Some minor fixes are in progress before merging.
We will have to prepare a demo path tomorrow morning for tomorrow afternoon sprint demo (once all PRs are merged)
The mithril-network folder has been merged into the root of the repository 🥳
Here are some points that we should address shortly:
- Implement typed error with thiserror crate in all nodes where it is not done yet, and redact an ADR for error handlig
- Implement a common library in a mithril-common folder
- Use the new version of the mithril-core library and handle the breaking changes
- Move the deterministic RNGs used in the code and use non deterministic ones
- Reexport libraries from the mithril-core library in order to simplify the dependencies in the Cargo.toml files

2022-05-03

Mithril session

We have paired on the Mithril Signer and followed up the work started yesterday
We have talked about the simplification of the Mithril Verifier without using a Clerk. It could work if we embed these information in the Stake Distribution certificate as the Merkle Tree root of the stake distribution (party_id, stake, verification_key). There are still many open questions.
We should open source the repository in mid June in order to be ready for the Cardano Summit

2022-05-02

On-boarding Grégoire on Mithril:

Intros
Explanation of Mithril
Explanation of the POC and MVP architecture and scope
Discovery of the Github repository
Demonstration of the first version of the Mithril Network that has been showcased last sprint
Q&A session
Plan next days/weeks

Mithril session

Rust Workspace setup

We will setup a Rust workspace for mithril-core and all the mithril-network nodes as a first step
We will also move the folders from mithril-network to the root of the repository

Review Multi Signer Aggregator

We have reviewed and merged the PR #163
Some issues with dependencies appeared with incompatible versions between the core library and the aggregator. To fix the issue, the jsonschema was roll-backed to an earlier compatible version. We think it is a good idea to import the core library as a crate instead of a path, but this will be possible only once the repository is open sourced.
- It's possible to have a file-based registry to/from which crates can be retrieved but this seems not worth the hassle. It's unfortunate GitHub does not support Cargo registry out of the box
The code should make a better use of errors and avoid the use of unwrap. There is an ADR in progress regarding a standardization of the error handling and propagation that will help achieve this target.
- Need to properly type all errors, instead of using String
- Use foo? construct to automatically propagate and wrap errors to pass to caller
- Rest layer = Imperative shell where errors are catched and formatted for consumption by client
- Business layer = "functional" core where errors are part of return type, either using Result or dome more specific type
The certificate_certificate_hash handler should make a call on a CertificateStorer in order to not need to modify the handler code when updating its implementation.
The create_multi_signature function should be called on the fly when a certificate is requested instead of each time a single signature is registered. This function should return an enum type (for clarity instead of a Result<Option>>). Also the function should be pure.
- Example of specific return type incorporating error: create_multi_signatures should return
```
enum MultiSig {
  QuorumNotReached,
  MultiSig(Multisignature),
  MultiSigError(..)
}
```

Pairing on Mithril Signer

We paired on implementing the retrieve pending certificate and register signatures features and started to write tests. We will continue the pairing during tomorrow session.

April 2022

2022-04-29

Mithril session

We have paired on the structured logs and found a way to filter the log level with a cli argument. We also investigated further multiple drains implementation
We noticed a bug on the CI that showed green status when a Rust test failed. The bug has been fixed and deployed to the main branch
We have investigated the security alerts that show up in the repository and we will talk about how to fix/ignore them in a next Mithril Session. As the security fix are generally delivered in more recent versions of the libraries, we think that we should run dependency updates regularly with cargo update (once a sprint/month)
We noticed a flaky test in the Mithril Client and a fix is in progress

2022-04-28

Mithril session

Structured Logs ADR

We have paired on implementing the slog in the Mithril Signer
We implemented a drain that will asynchronously (with (slog-async)) produce structured json logs (with slog-bunyan) and implemented custom vars inside them
We found a way to use a global logger (which would not require referencing a log object everywhere in the code). However, this approach is not recommended in case of a library. This means that we should shortly create a mithril-network/mithril-common library that would be used by the nodes instead of using the aggregator both as a binary and a lib
We still need to find a way to use the VerbosityLevel cli flag to be able to control the log level that is displayed
Question: Do we need to add a terminal specific drain? In that case, do we need to log jsons to a different place that stdout?

Mithril Core

We have reviewed the PR in progress for creating multi-signatures in the aggregator #163
A first live demonstration of the REST API of the aggregator MultiSigner in local environment was done:
- create a multi-signature from two single signatures
- retrieving it and its certificate
We noticed that there is no Verifier entity available in the cryptographic library: it needs to use a Clerk to verify signatures and thus uses the full stake distribution, whereas a verifier would only need the root of the associated Merkle tree. An issue has been created for this: #162. This will have consequences on the structure of the certificate that we will need to address.
The current PR #155 that implements a production ready backend library will not be merged during this sprint in order to avoid conflicts with the current developments.
This PR will be completed with some optimizations in the mean time:
- Make the implementation thread-safe
- Add native import/export functions for keys and signatures
When the cryptographic code is done, we will have to make some adaptations to the code of the Mithril Network nodes that use it before merging.
We have also talked about separating the Mithril Core library from the Mithril Network (by using a versioned crate of the core library). This will be possible as soon as the repository is open sourced.

2022-04-27

Mithril session

We have reviewed the PR that updated the CI workflows in order to retrieve save/compile the Rust test results. The PR has been merged #142
We have also paired on bootstrapping the Mithril Signer node #143

2022-04-26

Mithril session

Mithril core

Some dependencies conflicts occurred while importing the core library in the aggregator code base:
- We will update the dependencies of the core library and make sure they don't introduce any regression
- It could be a good idea to use a Rust Workspace to handle the dependencies smoothly. We'll investigate that option shortly
Some crates used in the core library are not production ready and need to be replaced:
- arf_ff should be replaced by blst that has been audited and formally verified
- This will have some impact on the current code that uses the core library
- An issue has been created for that matter #155
Nice features to have in the core library are:
- type aliasing for readability and ease of use
- import/export of hex values of verification keys, single and multi signatures

ADRs

We have reviewed an implementation attempt of anyhow in the SnapshotStore of the aggregator for clean error propagation (see #152)
We decided to use thiserror instead as it is better to get typed error. Anyhow could be used to take advantage of its Context feature that helps tracking the context in which the error occurred
The structured logging ADR has been adopted unanimously and we will implement it during the Mithril Signer Node bootstraping #143
A mithril-common library will be created in the mithril-network folder in order to provide a shared source for components that are needed in multiple nodes (such as Snapshotter)

End to end tests

The end to end tests that are currently coded in Haskell could be migrated to Rust for easier maintenance. We need to find a way to bootstrap the cryptographic materials needed by the local cardano nodes used during the test.
The current tests relies on external data hosted on Google Cloud. In order to get clean tests, we will need to decouple the snapshotter:
- separate the SnapshotProducer (produces the snapshot archive, computes the digest)
- from one/multiples SnapshotReplicator (replicates the archive file to CDN, IPFS and/or BitTorrent)
- implement a configurable version of the aggregator that would init specific dependencies for these e2e tests

Minimal snapshot digest source

Some tests have been conducted in order to get a first verification that the Cardano node only needs valid immutables to bootstrap a node #137
This will allow us to create snapshot from unmodified Cardano node and providing the fastest bootstrap at the same time
Further investigations need to be done with the Consensus team to get a final validation (scheduled for next sprint)

2022-04-25

Design session

We had a thorough discussion about what the REST API should look like as the current resources and verbs are not very consistent, and not very RESTish. The discussion devolves into investigating what the structure of certificates would look like and how chaining worked.

Stakes:

GET /stakes/0 -> signed by genesis key
GET /stakes/{epoch number}?limit=&from=&to -> Retrieve the (signed) stake distribution for some epoch -> link to previous epoch's stakes distribution

Snapshots:

GET /snapshots?limit=1&from=123&to=345 -> retrieve a list of snapshots, latest first?
GET /snapshots/{mithril round} -> return snapshots for given round -> contains:
- beacon information (what are we signing, at what epoch/block/slot)
- pparams
- list of signers -> with a link
- digest of message for some beacon (mithril round)
- multisignature -> link to signatures?
- stake distribution hash + link to /stakes/{epoch number}

Signers:

PUT /signers/{mithril round}/{party id} = register some signer's key for next "round" -> needs to be signed! by the skey associated with this party's identification vkey -> body: verification key (mithril specific key) + signature

Signatures:

(later) GET /signatures/{mithril round}
- 200: list signatures for given round
POST /signatures/{mithril round}
- contain signature(message digest || stake distribution hash)
- 400 -> hash exists but you're not allowed to sign now
- 404 -> no message with that hash to sign

scenario from signer perspective:

Assumption: We have already registrered for signing epoch 40000

PUT /signers/40000/123456789 -> register signer 123456789 to sign certificate at round 40001
GET /snapshots/40000 -> 404 while epoch has not changed
threshold => mithril round changes, list of signers is closed
GET /certificates/40000 -> gives me parameters for signing the certificate
POST /signatures/40000
epoch change -> sign new stake distribution
GET /stakes/{epoch} -> gives me parameters for signing the stake dsitribution
POST /stakes/40000

scenario from client perspective:

GET /snapshots
GET /snapshots/40000 -> verify multisignature

Question:

What if signature fails? Esp. problematic for signing the stake distribution
Previous model was assuming we would sign a certificate at epoch n + 2, using stake at epoch n, and containing stake at epoch n + 1

2022-04-22

We have created the tickets for the next sprint on the Kanban board and we have sliced them into multiple tasks
On top of the new increment tickets, we have included some optimization/cleanup tasks that we will work on today

2022-04-21

Mithril session

We have paired on improving the dependency manager in the aggregator #131:
- We added a SnapshotStoreHTTPClient (an implementation of a SnapshotStorer) as a new dependency
- We improved the API Spec in order to better handle default status code
- We have implemented a with_snapshot_storer used by the handlers of the http server
- The configuration implemented relies only on a config file, which is a problem with the test lab for example
- In order to fix this issue, we will implement the config crate so that we can easily substitute file configuration with env var configuration
The legacy POC Node has been removed from the CI (the associated Docker registry should be erased)
The aggregator snapshotter has been slightly modified in order to produce snapshots at startup (so that CD redeployment does not cancel them)
A test will be conducted during next sprint in order to understand if the digest could be computed only from the immutables:
- What happens if the ledger state distributed with the snapshot is tampered (but the immutables are genuine)?
- Try to restore a Cardano node with a past or future ledger state
- Issue created to for this test #137
Next week Mithril sessions will be used to talk about:
- Optimization of the aggregator REST API
- ADRs definition #128
- Error typology to put in place (codes, meaning, ...)

2022-04-20

Mithril session

All the PRs have been merged and we are now able to make a full end to end demo from snapshot creation to restoration of a Cardano Node with a cloud hosted aggregator 🤓
Here is the path for the sprint demo:

## Download source
git clone https://github.com/input-output-hk/mithril.git

## Go to Mithril Client directory
cd mithril/mithril-network/mithril-client

## Build Mithril Client
make build

## Get Latest Snapshot Digest
LATEST_DIGEST=$(curl -s http://aggregator.api.mithril.network/aggregator/snapshots | jq -r '.[0].digest')

## List Snapshots
./mithril-client -c ./config/testnet.json list

## Show Latest Snapshot
./mithril-client -c ./config/testnet.json show $LATEST_DIGEST

## Download Latest Snapshot
./mithril-client -c ./config/testnet.json download $LATEST_DIGEST

## Explore Data Folder
tree -h ./data

## Restore Latest Snapshot
./mithril-client -c ./config/testnet.json restore $LATEST_DIGEST

## Explore Data Folder
tree -h ./data

## Launch a Cardano Node
docker run -v cardano-node-ipc:/ipc -v cardano-node-data:/data --mount type=bind,source="$(pwd)/data/testnet/$LATEST_DIGEST/db",target=/data/db/ -e NETWORK=testnet inputoutput/cardano-node

We have a few cache issues with the snapshot manifest file that is stored on Google Cloud which we will try to fix asap
We have talked about the aggregator REST API optimizations that we could do with a more resources oriented interface
We have paired on implementing a dependency manager in the aggregator #131

Speed-up CI

CI takes way too much time, eg. > 20 minutes
Reduced full execution time to 10-12 minutes by:
- Moving mithril-core test execution to a different job so that docker-mithril-aggregator and docker-mithril-client do not depend on it
- Reuse executables produced by build-mithril-aggregator and -client jobs in the docker-xxx steps as they are binary compatible and it's pointless to do a rebuild that takes several minutes
Initially made terraform step only depend on docker-xx jobs but then added a dependency on tests, which might or might not be a good idea

2022-04-19

Mithril session

We have paired on many PRs in order to be ready for a end to end demo at the end of the sprint:
- Reviewing/finalizing the SnapshotStorer implementation of the aggregator
- Reviewing the Snapshotter implementation of the aggregator and the Terraform deployment to Google Cloud
- Reviewing/finalizing the Unpack snapshot archive implementation of the client
We have also talked about how to showcase the e2e demo (given the delays needed to download & restore for the testnet)
During the reviews we talked about architectural question that arose during the latest developments:
- Which pattern(s) to adopt to handle best errors?
- Which implementation(s) to use for a clean architecture?
- How to keep code readable, concise and not too complex?
In order to keep track of these decisions, we will add an ADR topic to the documentation website that will help us keep track of them #128
As it is not necessary at the moment, and as it adds complexity in the CI pipelines, we will remove the triggering of the CI for PRs #121
We have also talked about separating the Aggregator component from the Snapshotter component, which would be only responsible of creating the archive of the snapshot and serving/uploading it

Aggregator service

Provide a Cloud DNS configuration for aggregator service. It can now be accessed at http://aggregator.api.mithril.network/
This required setting up a zone managed by cloud DNS and update the NS entries in the mithril.network DNS zone definition
Had a look at configuring a HTTPS load balancer to "guard" access to the mithril aggregator service

2022-04-18

Run snapshotter w/in aggregator service

Goal is to automate the snapshotting process, emulating what the aggregator should be doing when it receives enough aggregated signatures, and upload the corresponding archives to a publicly available Gcloud storage bucket that will allow mithril-client to retrieve, verify and use them to bootstrap a cardano-node.

Initially thought I would use a simple cron-based process running the mithril-snapshotter-poc scripts. Trying to write some simple code in the aggregator that invokes mithril-snapshot.sh script to build an archive and then uploads the archive to gcloud storage. This won't work though as the container running the mithril-aggregator does not have much installed and thus cannot simply run the script. The simplest solution right now seems to rely on an external (cron driven) script that will do the necessary magic, perhaps based on some magic file that when present triggers the snapshot build and upload?
Then decided to rewrite the scripts in Rust and run a dedicated thread alongside the mithril-aggregator server
- We will need to be able to do it anyway, both for signing and aggregating, so better learn early how this works in Rust
snapshotter.rs runs in a separate thread alongside the main server, needed to create a Stopper structure that can be used to stop the thread passing it a poison pill. Assembling the various pieces in Rust:
- There is a crate for interacting with gcloud storage: https://docs.rs/cloud-storage/latest/cloud_storage/. There's a way to stream the file content: https://docs.rs/cloud-storage/latest/cloud_storage/client/struct.ObjectClient.html#method.create_streamed
- Rust for compressing a directory to an archive: https://rust-lang-nursery.github.io/rust-cookbook/compression/tar.html#compress-a-directory-into-tarball
- This SO answer explains how to compute sha256 of a file in rust: https://stackoverflow.com/a/69790544/137871
- Then listing all files in a direcotry: https://natclark.com/tutorials/rust-list-all-files/#listing-files-but-not-folders
Archive generation is somewhat tricky, the example above does not highlight the fact one needs to call finish() or into_inner() on the archive to ensure everything's flushed, which took me quite a while to get right. Also took me a while to compute correctly the size of the file...
Uploading the file to gcloud storage was also somewhat involved:
- Need to pass an environment variable contianing the credentials correctly formatted
- Then struggled to find how to stream a file so that we don't load the full 10+ GB of the archive into RAM before sending it to storage, turns out it relies tokio::fs module and tokio-utils crate but it's only two lines
the whole upload process is fully async based but the toplevel thread isn't: I had to pass down a tokio::Runtime to be able to block_on the result of the async tasks, which does not seem quite right Should probably use tokio spawn to fork the snapshotter thread instead of fiddling with Runtime::new and the likeâ€¦
Struggling to get snapshotter to work w/in docker, turnaround time is a PITA to work with as every docker build takes ages. Found this trick to speed things up by caching dependencies.
The snapshotter works within the container but fails to write the archive because access is denied. The problem is that the executable runs with a local user appuser which does not have rights to write to its current directory /app. Changing the directory's ownership within the build should be fine though:
```
WORKDIR /app/
RUN chown -R appuser /app/
```
Tried to use volumes but it did not work either as volumes are created as the user running the docker container which by default is root
Archive creation is now working in the container but it seems the container crashes when it tries to upload the file. Some testing inside the container shows that the executable segfaults when it tries to connect to gcloud which points at issues with system-level dependencies, eg. crypto or tls libraries in alpine.
Replaced the based docker images to be debian:buster-slim, and same for builder image (rust:1.60 which is debian-based)

2022-04-16

Deploy mithril aggregator service

Create mithril-network/mithril-infra terraform project to host configuration of mithril aggregator stack
- A virtual machine with a static address
- Running a docker-compose stack with a cardano-node (testnet) and a mithril-aggregator process
Added terraform step in CI to automate deployment: terraform apply is run when pushing to main branch
Configure secrets in github for GOOGLE_APPLICATION_CREDENTIALS_JSON containing service account's key that will be passed to the mithril-aggregator for uploading snapshots, and GCLOUD_SECRET_KEY to be used by the GitHub action running terraform
- Struggled to get escaping right both when passing the arguments to the github action's steps and to the terraform process
- In github Yaml, enclose the reference to the variable's content in single quotes: the variables are interpolated before being passed to the shell and their content is thus not interpreted
```
terraform apply -auto-approve  -var "image_id=${{ env.BRANCH_NAME }}-${{ steps.slug.outputs.sha8 }}" \
   -var 'private_key=${{ env.GCLOUD_PRIVATE_KEY }}' \
   -var 'google_application_credentials_json=${{ env.GOOGLE_CREDENTIALS }}'
```
- Same goes for the terraform configuration file:
```
provisioner "remote-exec" {
  inline = [
    "IMAGE_ID=${var.image_id} GOOGLE_APPLICATION_CREDENTIALS_JSON='${var.google_application_credentials_json}' docker-compose -f /home/curry/docker-compose.yaml up -d"
  ]
}
```
- The GCLOUD_SECRET_KEY contains the secret key associated with a public key stored in ssh_keys file that can be used by the terraform process to ssh into the VM, it's a RSA private in ascii armor format, including the newlines. I tried to remove or replace the newlines and it did not work, it seems the single quote escaping is just fine with newlines
- The GOOGLE_APPLICATION_CREDENTIALS_JSON contains JSON key for service account, with newlines removed

2022-04-15

Mithril session

There is an issue with the Docker images build pipelines that are not very reliable. After investigation, it appears that they should be fixed/enhanced. We have created an issue for this #121. The problem was apparently due to not using the latest version of the github actions: we have released a fix and monitor to see if this happens again
As the snapshot creation and storage on Google Cloud is now alive, we have created a task to implement a SnapshotStore in the Mithril Aggregator #120
A pairing session was done to work on this task and we achieved to retrieve the list of remote snapshots from the aggregator 🎉. The corresponding PR is #123

2022-04-14

Mithril session

We have paired in order to fix multiple issues:
- The branch of the Init Mithril Client was corrupted after a conflict resolution #110
- After merging the PR, we found out that the documentation workflow of the CI did not work. It was due to publishing build artifacts from the cargo doc processes to the gh-pages branch which reached its limit of 50MB. The fix was published in #118
We had talks about what is the best way to deploy the Mithril Aggregator server to public cloud:
- Several solutions were investigated (Kubernetes, Custom deployment Terraform, Google Cloud Run, ...)
- As we need to embed a Cardano Node with the Aggregator that has access to its local storage, we decided to start with a Terraform deployment of a Docker Compose setup
- This hosting setup will also allow us to produce easily and regularly snapshot archives (scheduled by a cron)
- At each merge of the main branch a workflow will trigger a deployment of the latest Docker image to the cloud

2022-04-13

Mithril session

We talked about the questions raised by Inigo about yesterday's meeting with Galois. A Q&A meeting session will be organized in that matter
The documentation job needs to be changed. This will be done in #104
- The documentation should be produced in the builds jobs (for more efficiency)
- Artifacts (such as compressed archive of the generated files) should be uploaded as artifacts
- The documentation job should use the previously produced artifacts to prepare the new release of the documentation website
We decided not to use jobs templates for now in the CI jobs definition
We worked on a Feature Map for the increments of the components of the Mithril network
We have reviewed the graceful shutdown fix for the Mithril Aggregator #116
We have also reviewed the code of the Mithril Client #110
We had some thoughts on how to produce snapshots and use the remote storage as a database at first and have the Mithril Aggregator make this information available through its API

2022-04-12

Mithril session

The aggregator has a few issues and fixes/optimizations that will be done in #109
We have worked on the CI pipeline architecture in order to get a clear view of it
We should publish the executable files and make them available on Github (Linux only, except for the Mithril Client that needs to be on macOS, Linux and Windows)
Question: Do we need to do the end to end testing on the executable files or the Docker images?
We have decided to work with the executable files at first, and then with Docker. We may need to change the way the Docker images are built and embed the previously built executable file.
We have successfully created a first ETE test scenario where we check that a freshly started aggregator effectively produces a snapshot and makes it available at its /snapshots route in assertClientCanVerifySnapshot. It will not work with the current aggregator implementation until the number of fake snasphots displayed is reduced to 1

Galois meeting

We had a fruitful discussion detailing the state machine view of the signer's registration process that will be formalised in the monitor framework.
The signer should be essentially stateless regarding the certificates, and retrieve all data from the aggregator, eg. pending certificate and certificates chain. If there are multiple aggregators it becomes possible to have multiple chains, which is probably not a big deal but could become quite messy
Aggregators are essentially infrastructure provider: They setup some infrastructure to efficiently distribute snapshots and they have an incentive to provide as up-to-date as possible snapshots to their "Customers"
In the long run, both signers and aggregators could use the blockchain itself as a communication medium in order to synchronise and ensure consensus over the certificates chain: The tip of the certificate could be part of a UTxO locked by a script enforcing single chain production, possibly with time bounds in order to prevent arbitrary locking of the certificate chain and thus DoS?
- The amount of transactions involved is not huge: 1000-2000 signatures + 1 aggregation per epoch (5 days) would be enough but this implies signers and aggregators spend money for posting the txs
- The problem of rewarding mithril network is still open
The previous certificate hash is missing from the /certificate-pending route. An issue has been created for this task: #112

2022-04-11

Mithril session

Pairing

We have setup Live Sharing on VS Code for pairing sessions (using Live Share extension)

Documentation

We have worked on some refinements of the openapi.yaml file so that the documentation generated by Docusaurus does not truncate the description sentences
In order to keep the generation process not too cumbersome, we have decided to keep the source .md files in the docs/root folder, mixed with the website files. When we need to refer to an external README.md file (for example mithril-network/mithril-aggregator/REAME.md) the easiest way to do it is to provide a direct Github link such as (https://github.com/input-output-hk/mithril/blob/main/mithril-network/mithril-aggregator/README.md)
The documentation is now live and accessible at (https://mithril.network/) and the CI updates it along the way
As releasing production Github actions can be tricky (especially when specific operations are launched only on the *main branch), it looks like a good idea to do it in pairing

Mithril Client

We have reviewed the first code base of the Mithril Client available at jpraynaud/mithril-client

2022-04-07

Mithril session

We have created the tickets/tasks on the project Kanban board following the sprint planning that took place yesterday
In order to facilitate the pairing sessions, we will setup Live Sharing on VS Code next week
Question: Do we move the openapi.yaml file from the root to the mithril-network/mithril-aggregator folder?
Question: In order to have as fast as possible CI workflows completion, what are the best parallelization/caching strategies of the jobs?

2022-04-06

Mithril session

Demo

We have talked about the sprint demo and reviewed the demo path
Questions asked during the demo:
- Can we cache the digest computation on the signer side? (Yes, good idea. It will lower the load on the signer side)
- Can we pipeline the download and the decompression of the snapshot? (Yes, good idea. We can do that as an optimization later)
- What happens with the Genesis certificate when the Genesis keys of the node are updated? (We need to investigate this)
- How long does it take to verify the certificate chain? (This could be done in parallel of the snapshot download/uncompress to avoid further delays)
- How do we handle the discovery of the aggregators? (This could be done on-chain. We will need to investigate this later)

How to produce deterministic ledger state

It appears that the tool db-analyser provides a good way to create deterministic snapshots of the ledger state:
- It will be much easier to use than modifying the Cardano Node snapshot policy
- We will investigate further in that direction
- We need to check if we can rely on the slot number or the block number to produce consistent snapshots

Documentation / Open sourcing

We have worked on the documentation and reviewed the first version of the docusaurus website
We still need to define precisely the final documentation structure
A session with the communication department will be setup to review the documentation and prepare open sourcing

CI / Docker registry

A new CI workflow has been released that better separates the steps of the process
We still struggle to get access to the Docker registry. IT department is working on it
Question: How can we get delete previously created packages? (mithril package now and mithril-node-poc later)

2022-04-06

Mithril session

Mithril Aggregator

The OpenAPI specification validation unit tests in Rust are almost done
They allow to validate that the request and the response sent to the aggregator server are compliant with the specification (for the routes we decide to test)
There are a few modifications/upgrade under development that should be finalized shortly:
- Create a separate module for the APISpec
- Create a Docker image of the aggregator
- Create a workflow for the aggregator server in the CI
- Make some refinements on the specification (include 400 errors for POST method, and handle more precisely the size of the string params)
Next steps are:
- create an integration test for the server
- implement the server in the ETE test lab
- optimize the tests code if necessary
Question: do we need to move the openapi.yaml file to the mithril-network/mithril-aggregator folder?

Documentation

We have now a better understanding of how Docusaurus works
As a start, we will try to reproduce the Hydra documentation website (when it applies)
We will update the docs directory structure and create Under construction pages where it makes sense

How to produce snapshots

Here is a new breakdown of the timings to produce/certify/restore snapshots on the mainnet:

Mithril

2022-04-05

Mithril session

Mithril Aggregator

We have finally found a way to test that the actual implementation of the Mithril Aggregator is conform to the OpenAPI specification
We have worked in order to make the testing work for a response object and with explicit error messages when tests fail in rust tests
We are currently trying to implement the validation of the requestBody according to the specification
We have also discussed about the several layers of test and how to implement them:
- At a upper level, we'd like to have an integration (or several) that would:
  - test the global conformity of the aggregator server implementation to the specification
  - use the specification as the source to generate tests: fuzzy tests and/or stress tests
  - we still need to investigate and find the best tooling for this and work on a CI implementation
- At a lower level, in the Rust server, we will need to test that each route is correctly implemented in unit tests:
  - use the actual implementation and test it to validate that it matches the specification
  - each route should be tested for every possible response status code (this will be easier when mocks will be implemented)
  - each of these tests must also validate the conformity of the request/response to the OpenAPI specification
  - an other unit test should also check that all the routes implemented in the server match exactly all the routes defined in the specification (we still need to find a way to list all these routes on the server - we will investigate further warp documentation)

Some more tools that could be useful for testing/checking REST APIs:

https://github.com/s-panferov/valico
There's a Rust implementation of Pact CDCT tool: https://github.com/pact-foundation/pact-reference/tree/master/rust
https://github.com/apiaryio/dredd/ also provides a mechanism for "hooks" implemented in Rust
Pact general approach seems to make sense in the long run but might be cumbersome to setup

Documentation

We have started to review the way the documentation is implemented on Hydra and the integration that is made with Docusaurus and the CI worklows
We will continue this investigation tomorrow
We will design a tree structure for the website that will be hosted on (https://mithril.network) and work on restructuring the docs folder of the repository

Snapshot Generation

Working on snapshot generation in the cardano-node, adding parameter to be able to generate snapshot every X blocks. Not sure how to test this though, as the test infrastructure of consensus is quite hard to grok

I was unable to build cardano-node using nix-shell -A dev nor nix develop as instructed in build instructions, but managed to build it installing the needed OS dependencies (forked libsodium and libsecp256k essentially). Adding the ouroboros-network packages as local packages in cardano-node's cabal.project enables incremental build whereby changes to the former entail rebuild of the later.

2022-04-04

On-boarding Denis on Mithril

Intros
Explanation of Mithril
Explanation of the POC and MVP architecture and scope
Demonstration of the cryptographic library
Plan next days/weeks
- Dig into repository codebase and documentation
- Development environment setup
- Setup of daily stand-up and added Denis to other meetings

Mithril Aggregator

First version of the server is pending review (in its fake version)
We need to enhance the tests and make sure the server implements correctly the OpenAPI specification
For these tests we are scouting some libraries and utilies: OpenAPI Fuzzer crate and/or OpenAPI crate
We cannot use openapi-fuzzer as a lib so we try to get inspiration from it instead. Package openapiv3 seems mature enough to be used to read the specification and use that to write simple tests to check output of various routes, and we can later generate arbitrary data to input.

Created an APISpec structure that will help us define what we test and what to expect

We are trying to use jsonschema to check conformance of document against schema
https://tarquin-the-brave.github.io/blog/posts/rust-serde/ fiddling with serde between Yaml and Json as our spec is written in Yaml but JSONSchema requires a JSON Value

Struggling with validating our dummy aggregator server against OpenApi specification using rust tooling, seems like jsonschema-rs does not work as easily as expected. Turns out things are more complicated than we would want: The OpenAPI specification is not really a proper JSONSchema, only some fragments of it are (eg. types in requests/responses) so we would need to find the correct part of the schema to check our code against.

Possible options from there:

Stick to use openapi-fuzzer as an external binary, running basic integration tests against a running server. Should be fine at first but will quickly break as we flesh out the internals of the server and need to tighten what's valid input
Understand how jsonschema-rs library works and make sure we check requests against it
Use another tool like https://github.com/Rhosys/openapi-data-validator.js/blob/main/README.md. This one is in node so this would require us to run it as an integration test. Hydra uses a similar approach based on anothoer jsonschema executable in node.

Test Lab

Question: Should we develop the ETE tester of the test lab in Rust instead of Haskell (for simplicity and maintainability)?

How to produce snapshots

New tests have been run and it outcomes that they are ~2x faster than the previous ones:

Mainnet

Data	Node	Full	Archive	Snapshot	Upload	Download	Restore	Startup
Immutable Only	standard	43GB	24GB	~28m	~45m	~25m	~12m	~420m
With Ledger State	modified	45GB	25GB	~28m	~45m	~25m	~12m	~65m

Testnet

Data	Node	Full	Archive	Snapshot	Upload	Download	Restore	Startup
Immutable Only	standard	9.5 GB	3.5 GB	~7m	~5m	~3m	~2m	~130m
With Ledger State	modified	10 GB	3.5 GB	~7m	~5m	~3m	~2m	~6m

Host: x86 / +2 cores / +8GB RAM / +100GB HDD Network: Download 150Mbps / Upload 75Mbps Compression: gzip

2022-04-01

Mithril's fool day 😆

The network is now live on the mainnet 🚀

Wiki

The Project Charter page is now available here and open to comments

AB on ETE Tests

Added basic ETE test to spin up a cardano-node cluster of 3 nodes, now adding the test to CI probably in another parallel job as it will take a while to build...

Having trouble building on CI

trace: WARNING: No sha256 found for source-repository-package https://github.com/input-output-hk/hydra-poc 60d1e3217a9f2ae557c5abf7c8c3c2001f7c2887 download may fail in restricted mode (hydra)
fatal: couldn't find remote ref refs/heads/60d1e3217a9f2ae557c5abf7c8c3c2001f7c2887

Probably should try adding the sha256 field:

the nar executable in https://github.com/input-output-hk/nix-archive/ will do it
```
nar  git-nar-sha --git-dir ../whatever --hash XXXXXX
```
Alternative command to compute sha256 for nix:
```
nix-prefetch-git <repo> <git-hash>
```
When ran nix just tells you the missing sha256's value 🤦

Got a weird error when building: Cabal cannot find commit 60d1e3217a9f2ae557c5abf7c8c3c2001f7c2887 for hydra-poc which is weird as it's definitely available on github. Removing dependency and rerunning cabal again works 🤔 🤷

Created an archive for testnet and uploaded it to google storage bucket

Not sure how to handle authentication properly, I had to upload a service account key file to impersonate the hydra-poc-builder service account, I should have been able to do it using the credentials/auths attached to the VM which is a resource

March 2022

2022-03-31

OpenAPI specification

A first version of the OpenAPI specification has been created
An OpenAPI UI allowing us to interact with the routes is accessible and updated by the CI
We will work on a basic REST server in Rust that implements this specification

Repository reorganization

The rust folder containing the cryptographic library has been moved to mithril-core

Mithril session

How to produce snapshots

We don't need to copy the files somewhere when we create a snapshot (or when a signer needs to compute the digest of the snapshot)
We will make some tests with an uncompressed archive (and/or slice the archive in smaller chunks) in order to see if this can make the restoration faster
We will make a test to see what's happening when we restore a snapshot with only ledger state (last time it worked 1 out of 2 times)
The following diagram shows a breakdown of the timing for the snapshots creation/restoration on the mainnet:

Mithril Snapshot timings

Cryptography

We need to conduct an external audit of the crypto library (it should take ~3 months)
The library will be published on crates.io once it is open sourced
How to sign genesis Mithril certificate:
- we will use Cardano genesis private key to sign it for the mainnet (manually)
- we will use test genesis keys published on the repository for the testnet

Next steps

The repository is now getting ready to being open sourced. Few verifications/update must be completed before going public #92
We can start working on a fake aggregator server that follows the OpenAPI specification
We have talked about the Project Charter page content (that will be completed soon)
The CI should be optimized/reorganized in order to better reflect the new structure of the repository and maybe operate faster

End-to-End Test

Starting to build an ETE test infrastructure based on the work we did for Hydra project

Moving mithril-test-lab to mithril-monitor package and adding a new mithril-end-to-end package to host the needed code for managing ETE test infra. Compilation of zlib dependency is failing due to missing zlib-dev deps -> adding it to shell.nix
Toplevel mithril-test-lab is now a cabal.project containing 2 packages: One for the monitor and one for the end-to-end test that will depend on cardano nodes et al.
Trying to get the dependencies right reusing Hydra's nix files so that we can benefit from iohk's nix caching too, which implies using haskell.nix 🤷
Tried to remove use of hydra-cardano-api but getting the types right now becomes way too complicated, so I just added hydra-poc as a dependency for the moment

2022-03-30

How to produce snapshots

Finished tests to create a snapshot from mainnet Cardano Node and restoring the snapshot. Currently running same tests on testnet:

Mainnet

Data	Node	Full	Archive	Snapshot	Upload	Download	Restore	Startup
Immutable Only	standard	41GB	25GB	~47m	~45m	~25m	~18m	~420m
With Ledger State	modified	43GB	26GB	~47m	~45m	~25m	~18m	~65m

Host: x86 / +2 cores / +8GB RAM / +100GB HD Network: Download 150Mbps / Upload 75Mbps Compression: gzip

A simple cli has been developped in order to conduct the tests
Here are the informations needed to create a working snapshot:
- the whole immutable folder of the database (required)
- the protocolMagicId file (required)
- the latest ledger state snapshot file in the ledger folder (optional)
Question: Do we need to stop the node when the snapshot is done? (the test cli currently makes a copy of the files to snapshot in a separate folder)
In order to create a deterministic digest, it appears that we will need to rely on the binary content of the snapshotted files (digest calculation from snapshot archive file is not enough)

Repository reorganization

The rust-node folder has been moved to mithril-prototype/test-node
The monitor folder has been renamed mithril-test-lab

2022-03-29

Worked on CEO update presentation: https://drive.google.com/file/d/19_Lrr5sYAhVatxdiMws6OGwvsVRn8p0Q/view?usp=sharing

Mainnet machine is still synchronizing the network, currently at slot 53M while tip's slot is 57M, hopefully should be done by noon.

Mainnet node crashed when it ran out of disk space... Current disk space usage:

3.2M    cardano-configurations
4.0K    configure-mainnet.sh
4.0K    docker-compose-mainnet.yaml
4.0K    install-nix-2.3.10
4.0K    install-nix-2.3.10.asc
4.0K    ipc
4.0K    ledger.gpg
30G     mainnet
8.4G    mainnet.bak
4.8G    mainnet.tar.bz2

Compression is about 60% with bzip2 which is onpar with pkzip, and mainnet directory is now at 30G with nearly completed sync up. Going to remove the backup to regain some space

cardano-node_1  | [d77879a4:cardano.node.ChainDB:Error:5] [2022-03-29 08:21:59.82 UTC] Invalid snapshot DiskSnapshot {dsNumber = 54107040, dsSuffix = Nothing}InitFailureRead (ReadFailed (DeserialiseFailure 400482271 "end of input"))

This error is annoying: This means the snapshot needs to be reconstructed :( The error does not say which file failed to be deserialised though

It's reconstructing the ledger but from the latest snapshot file:

cardano-node_1  | [d77879a4:cardano.node.ChainDB:Info:5] [2022-03-29 08:25:44.52 UTC] Replaying ledger from snapshot at 2e90104a43cd8ecfbd8d16f03ce17ac3e46ffdff0546f93079e4b3a9e298f8ed at slot 53558101
cardano-node_1  | [d77879a4:cardano.node.ChainDB:Info:5] [2022-03-29 08:25:44.72 UTC] Replayed block: slot 53558151 out of 54107040. Progress: 0.01%
cardano-node_1  | [d77879a4:cardano.node.ChainDB:Info:5] [2022-03-29 08:26:15.85 UTC] Replayed block: slot 53567929 out of 54107040. Progress: 1.79%
cardano-node_1  | [d77879a4:cardano.node.ChainDB:Info:5] [2022-03-29 08:27:27.14 UTC] Replayed block: slot 53589561 out of 54107040. Progress: 5.73%

Seems like this should happen relatively fast as the gap is not huge?

Interestingly there's no concurrency in the ledger replay logic: Everything happens sequentially hence CPU is at 100%, eg. 1 core is occupied

So node starts up by:

checking immutable DB and extracting the last known blocks from there
opening volatile DB at the point where immutable DB is
opening ledger DB and restoring it from snapshot
- in case there's a Δ between the ledger state and the immutable DB, replay the blocks to udpate ledger's state -> That's the part that takes a while (about 29 minutes for 548889 slots)
updating ledger DB with "new" blocks
- when node crashed it was at slot 54162434, so ledger is also updated from volatile DB's stuff passed the immutable one
then it connects to relays and start pulling new blocks and extend its state

current rate of slot validations is 95 slots/second, would be interesting to provide that information in the logs? => That could be extracted from parsing the logs obviously, or as a dashboard

Seems I will still need about 7 hours to catch up with tip, which is not exactly true because tip keeps growing 😬

Closer to the tip here are the current disk size:

~$ du -sh mainnet/node.db/*
33G     mainnet/node.db/immutable
2.8G    mainnet/node.db/ledger
0       mainnet/node.db/lock
4.0K    mainnet/node.db/protocolMagicId
175M    mainnet/node.db/volatile

Mithril session

Options:

Work on the "restore" side using testnet archive
Work on the consensus code to be able to generate snapshot reliably at some block X
how about epoch boundaries? Can we know when change epoch in the consensus DB?
modify cardano-node to emit snapshot at fixed intervals
run cardano-node on a private testnet

Acceptance test:

run a cardano-node on a private testnet or a cluster of cardano-nodes using withCardanoCluster from Hydra
- the cardano-node is forked and depends on a forked ouroboros-consensus containing snapshot policy
- we need to add some CLI arguments there: https://github.com/input-output-hk/cardano-node/blob/master/cardano-node/src/Cardano/Node/Parsers.hs#L300
we feed the private testnet a bunch of transactions
- generate n arbitrary transactions in a sequence, possibly not depending on each other
we should end up with identical snapshots on every node of the cluster

Discussion w/ Galois

What properties can we express and test?
We could start w/ high-level property stating that:

given signatures of a snapshot then aggregation produces a valid certificate of the expected snapshot

Once an aggregator receives valid signatures, it should produce valid certificate ultimately

Decentralised vision:

several aggregators
signers produce signatures in a decentralised manner

2022-03-28

Mithril Sessions

Morning session:

We updated the Story Mapping (added details for 1st, 2nd and 6th increments)
First investigation on the Cardano node database snapshots structure: Snapshots are located at mainnet/node.db/ledger/{block_no}. Need to review code to be able to produce snapshots at epoch boundary or at regular block number (this afternoon)
We need to get access to a project on a public cloud (GCP preferred or AWS). Arnaud asks Charles for credentials
We need 2 separate credentials on the public repository: Read/Write for producing snapshots and Read for restoring them
Later, we will also host a Mithril Aggregator on the project (Compute resources will be needed). Containers could be orchestrated by Nomad/Hashicorp (preferred) or Kubernetes. We will also need Devops resources at this time
In order to get a clear project organization, a specific "Project Charter" page will be created on the Wiki. It will list all the resources (rituals, links, ...)
In order to get the genesis state verification (first certificate in the Mithril chain), we will need to get it signed by someone who has access to the private key (the public key is packaged with the Cardano Node)
First step in repository reorganization has been completed: the go-node has been moved to mithril-proto/mithril-node-poc
Target repository structure is :
- .github
- demo
- - protocol-demo
- mithril-core < rust (TBD by JP)
- mithril-network (TB Created during MVP)
- - mithril-aggregator
- - mithril-client
- - mithril-signer
- mithril-proto
- - mithril-poc-node
- - test-node < rust-node (TBD by James)
- mithril-test-lab < monitor (TBD by James)

Afternoon pairing session, dedicated at investigating the production of Mithril compatible snapshots from a modified Cardano Node:

LedgerDB Disk Policy
Committed Code: Start hacking on disk snapshot
When we will produce the digest of the snapshot, we need to make sure that the file will have deterministic names or rely on the binary content
The minimum information required to bootstrap a Cardano Node is the immutable folder
- We have tried to bootstrap with the ledger snapshot only (no immutable data) and it did not work. The node started from the first block
- A faster bootstrap is possible with the ledger snapshot in the ledger folder. (If not, the node needs to recompute the ledger state first)
- A rapid test showed that it took ~20 minutes to recompute ~12,000,000 slots
- The blocks in the volatile folder are uncommitted to the immutable state (commit will occur after 2,160 blocks, the security parameter)
Currently, the snapshot policy of the node is to take snapshots at regular time intervals (take block_number < chain_tip - security_parameter)
New snapshot policy selected is to take snapshot at regular block intervals (e.g. every 10,000 blocks): (tip - security_parameter) % block_interval == 0
We will need some support from the consensus team to validate the new snapshot policy

In-memory LedgerDB works like a sliding window of k ledger states so that it can be rolled-back at most k blocks in the past.

Snapshots written is the oldest (aka. anchor) state of a given in-mem ledger state
Snapshotting on epoch boundary might prove difficult because it's not an easy information to compute in the snapshot policy
We might want to produce snapshots at some fixed block number

Mithril Galois discussion

Talked about new API for monitors which allow to express properties in a more succinct and legible way
everywhere combinator is reminiscnet of cooked-validators approach, but that's perhaps only by analogy?
All in all, everything that improves the expressiveness of properties goes in the right direction

what's next

We want to define a first or a handful of interesting properties and test them
We need to make sure the formalism makes sense even in the context of drastically simplified network layer whereby mithril signers would communicate with a single "certifier" or through on-chain transactions posting
at an abstract level, the information a node sends and receives from the certifier or the chain could be modelled as broadcast channel because we expect all nodes to see the same information. Then we can model and test interesting behaviour where signers have different world views

2022-03-24

Mithril Session

Discussions/thoughts about MVP and +:

In order to simplify the registration process, each signer must (re)register its keys at epoch n-1 in order to be able to sign at epoch n
The aggregator server could later keep track of the operations taking place and thus be able to produce reports & statistics
How can we verify the genesis state? The genesis state (first certificate in chain) must be signed by a genesis key. It will be verified by the associated public key delivered on the Cardano node (TBV)
Signers could store their single signatures on-chain
Aggregator could store the protocol parameters on-chain
Certificates (or hash of them) could be stored on-chain
Verification could use the on-chain storage (after restoring it on a node) to proceed to full verification (all the previous certificates chain)
The drawback of using on-chain transactions, is that it has a cost. We could maybe think of a "cashback" mechanism where the aggregator could refund (+reward) these transactions later by using the treasury
Question: should we hard code the protocol parameters or allow them to be modified, and thus include them into the certificate?
Question: do we need to prepare different snapshots for each targeted OS ?

2022-03-22

Mithril Session

We talked about the project organization with Roy:

We need to work on an estimated roadmap for the Mithril project:
- This will help check if we are on track with the goals
- For now let's focus on the MVP, from the end goal: allow people who use Cardano to bootstrap a node fast and securely
- The MVP is expected at +6 months on the real testnet (+3 months on the development testnet)
- Let's focus on priorities instead of deadlines
- We can work as for Hydra with a matrix of Features vs Priorities (must have, should have, could have, nice to have)
- Short term goal: have something working for people to start using (even though it is not full features yet)
- We will focus on the layout of the plan during the next session on Thursday
Agile organization:
- Let's work with agile sprints and ceremonies (duration 2 weeks)
- Ceremonies: 30 min for the demo (together with Galois for the Test Lab), 1h for the backlog grooming and sprint planning (on Thursday)
- First sprint would start on Thursday, March 24
- First demo would be on Thursday, April 07
Target features for the MVP would be:
- Create a snapshot from the Cardano node
- Create a certificate for a snapshot
- Store a certificate and a snapshot (centralized on a server)
- Verify a certificate and download a snapshot
- Bootstrap a Cardano node from a downloaded & verified snapshot

Mithril architecture meeting

Questions:

what is the thing we want to certify? => that's the thing a node would need to bootstrap
- point on the chain + corresponding ledger state at some epoch boundary
- node.db does contain stuff we don't need -> volatile db
- it contains the blocks up to some point + ledger state

Right now node assumes it has all blocks => can't run without previous blocks

In principle, it could "forget" past of the chain

Need to select parts of the node.db:

Node can already take a snapshot whenever we like -> current policy is snapshot 10K blocks, could be adapted Takes in memory state in a non-blocking way and serialises it to disk: contains only the ledger state ledger-state directory in the node.db
ledger state is internal format but block format is not block format is guaranteed to be portable

SPOs could post their signatures on the chain -> follow the chain to produce the

TODO:

look at consensus code (LedgerDB) for the snapshot policy, need to understand what are the data structures stored on need
details on how to make sure what does the node need to set up faster
use Mithril certificate to sign just segment from the previous certificate -> could download/verify segments in parallel
Sign point = (slot nr, hash) -> enough to ensure we have the valid chain
include list of hashes for all epoch files?
Index files can be reconstructed from the epoch files -> needed for fast bootstrap
tradeoff between speed and robustness
sign db immutable files independently from the ledger state

Q: epoch files?

Praos => large chunks (X nr of slots) per file
Need to check alignment on the epoch boundary -> https://hydra.iohk.io/build/13272534/download/1/report.pdf We could use a 10K or whatever limit as long as it includes the epoch boundary -> it's ok to have more blocks worst case = epoch boundary + 10k ?

2022-03-21

First Session with JP, some Q&A:

What's the role of the TestLab?
What about the repository's organisation? It feels a bit messy right now, with lot of inconsistencies here andther
- JP should feel free to reorganise the code to be cleaner

Work from past week:

Understanding how it works by Looking at the go node, it's interesting to see a first stab at the node, even though it's definitely not suited for production and there are quite a few problems
Developed a simple CLI utility in rust-demo to be an easy to use executable for mithril primitives
- currently, it can "run" mithril protocol with several parties, replicating what's done in the integration tests but more configurable
- Could be the basis for a more elaborate tool to be used by SPOs without needing a full blown network

Big question: What's a "signable state" from a cardano-node?

The node's DB might not be identical between nodes even at the same block no.
How do we ensure all potential signers sign the same "thing"?
We need to add stake distribution to the signed "thing" as this is part of the validity of the certificate?

How does bootstrapping would work:

Sign certificate with genesis keys at epoch X => Then parties can produce signature and certificates starting from epoch X + 1
Issuing a new genesis certificate could be a "version boundary". A new genesis certificate is issued with metadata for versions, that makes it clearer when protocol changes, nodes get upgraded...

What if nodes come and go?

Mithril nodes need Stake + Keys to produce signatures => Nodes that come within an epoch boundary need to wait next epoch
Each epoch boundary the parameters are recomputed and redistributed to all participants

2022-03-18

TestLab Demo

Had a first demo integrating the monitor framework with a primitive rust-node: The test lab spins up a number of nodes (actual processes), connects them and intercepts all messages between them, and checks basic properties.
- Tried verifying a simple fault injection during registration process
- We noted it's perfectly fine to "extend" the node's API in order to make testing easier, esp. as there aren't any node right now, as long as we take care of not breaking the "black box" abstraction: The test lab should talk to the node through an API that regular clients could make sense of
We decided to postpone next meeting till 2022-03-28 in order for everyone to have time to regroup, and think about what the next steps would look like

2022-03-10

On-boarding JP on Mithril

Intros
Explanation of Mithril
- details about the certificate validation -> chain de validation
- We should be able to run some program representing the protocol working -> Rust CLI program?
- local registration -> register all stake owners, all possible lottery winners
- signatures need to be broadcast or sent to aggregator -> could be anyone
- Q.: Certificates can be different?
- Test Lab
Q&A
- Incentives? paying for snapshot is valid only for large amount of sync
Plan next days/weeks
- Codebase
- Goal: Opensource repo end of march
- Goal: Write CLI simulating Mithril
- Have a recurring meeting -> 2hours block 3 times a day
- have an @iohk.io address -> Roy Nakakawa

Mithril Product Catch-up

Talking to Exchanges about Mithril
- Vitor talking to them abouyt Scientia
- Discussing problems about node bootstrap (run db-sync, lot of issues), exploring solutions
- Hard to find "friendly" exchanges
How about wallet providers?
Trying to talk to ops to be able to deploy a Mithril node on testnet/mainnet?
DApp developers could make use of a snapshot-enabled wallet
We could have a progressive strategy to increase %age of trust
- Maybe be cautious for mainnet?
Start product development from the consumption side: How would people use the snapshots/certificates, generate the certiciates "manually"
- We don't need a full fledged mithril node and network to be able to produce signatures and certificates
Certificates are signed using specific keys
We need to link the stake keys with the mithril keys
Cold keys -> sign -> KES keys
- Cold keys validates new KES every 2^6 epochs
- KES blocks are put on-chain

Should not be too hard to certify Mithril signing keys https://docs.cardano.org/core-concepts/cardano-keys