Skip to content

Logbook 2023 H1

Jean-Philippe Raynaud edited this page Dec 21, 2023 · 5 revisions

Newer Entries

Older Entries

June 2023

2023-06-30

2023-06-29

2023-06-28

2023-06-27

2023-06-26

2023-06-23

2023-06-22

2023-06-21

2023-06-20

2023-06-19

2023-06-16

2023-06-15

2023-06-14

# Setup demo

## Download source (if needed)
git clone https://github.com/input-output-hk/mithril.git
#or
git clone git@github.com:input-output-hk/mithril.git

## Checkout correct branch
cd mithril/
git checkout 2bc2a383765c9ae98b6fcfa8896d4b1de203b09d

## Build docker images
cd mithril/
docker rmi $(docker images -q) --force
docker rm -vf $(docker ps -a -q)

### Build docker images distribution-1 (Thales era only)
rm -f mithril-aggregator/mithril-aggregator && rm -f mithril-signer/mithril-signer && rm -f mithril-client/mithril-client
make build && cp target/release/mithril-aggregator mithril-aggregator/mithril-aggregator && cp target/release/mithril-signer mithril-signer/mithril-signer && cp target/release/mithril-client mithril-client/mithril-client
docker image rm mithril/mithril-signer-demo mithril/mithril-client-demo mithril/mithril-aggregator-demo --force
docker build -t mithril/mithril-aggregator-demo -f mithril-aggregator/Dockerfile.ci .
docker build -t mithril/mithril-signer-demo -f mithril-signer/Dockerfile.ci .
docker build -t mithril/mithril-client-demo -f mithril-client/Dockerfile.ci .

---

# Demo: Run demo 
cd ../devnet-demo

## Create functions
function stop_devnet {
	./devnet-stop.sh
	docker stop $(docker ps -a -q)
	rm -rf artifacts/node-bft1/mithril/aggregator
	rm -rf artifacts/node-pool1/mithril/signer
	rm -rf artifacts/node-pool2/mithril/signer
	rm -rf artifacts/node-pool3/mithril/signer
}

function start_devnet {
  ./devnet-stop.sh && NODES=cardano SLOT_LENGTH=0.35 EPOCH_LENGTH=120 NUM_POOL_NODES=3 ./devnet-run.sh
}

function monitor_devnet {
  watch -c "NODES=cardano ./devnet-query.sh"
}

function epoch_devnet {
	CARDANO_NODE_SOCKET_PATH=artifacts/node-bft1/ipc/node.sock ./artifacts/cardano-cli query tip --cardano-mode --testnet-magic 42 | jq '.epoch'
}

function containers_list {
  watch -c "docker ps --format '{{.Names}} - {{.Status}}' | sort"
}

function container_up {
  DISTRIBUTION_VERSION=$1 ERA_READER_ADAPTER_PARAMS=$(cat era-markers/config.json) docker-compose -f docker-compose-demo.yaml --profile $2 up --remove-orphans --force-recreate -d --no-build
}

function container_down {
  docker stop $2-$1
}

function container_run {
  DISTRIBUTION_VERSION=$1 ERA_READER_ADAPTER_PARAMS=$(cat era-markers/config.json) docker-compose -f docker-compose-demo.yaml run $2 $3 $4 $5 $6 $7 $8 $9
}

function container_exec {
  docker exec -it $2-$1 $3 $4
}

function container_logs {
  docker logs -f $2-$1 2>/dev/null
}

function era_activate_thales {
	cat > era-markers/markers.json << EOF
[
	{"name": "thales", "epoch": 1}
]
EOF
	cat era-markers/markers.json | jq .
}

## Reset demo if needed
stop_devnet

## Start Cardano network
start_devnet

## Start Mithril network
era_activate_thales
container_up demo mithril-aggregator
container_up demo mithril-signer-1
container_up demo mithril-signer-2
container_up demo mithril-signer-3
container_run demo mithril-aggregator-genesis
containers_list

## Query Aggregator database
watch_query_aggregator_db_file watch.sql

## Client
## Config
AGGREGATOR_ENDPOINT=http://localhost:8080/aggregator

### Commands Help
container_run demo mithril-client --help
container_run demo mithril-client snapshot --help
container_run demo mithril-client mithril-stake-distribution --help

## Snapshot Command
SNAPSHOT_DIGEST=$(curl -sL $AGGREGATOR_ENDPOINT/artifact/snapshots | jq -r '.[0].digest')
container_run demo mithril-client snapshot list
container_run demo mithril-client snapshot list --json > _ && cat _ | jq .
container_run demo mithril-client snapshot show $SNAPSHOT_DIGEST
container_run demo mithril-client snapshot show $SNAPSHOT_DIGEST --json > _ && cat _ | jq .
rm -rf ./download/db && container_run demo mithril-client snapshot download $SNAPSHOT_DIGEST --download-dir=/data/download
tree ./download/db

## Mithril Stake Distribution Command
MSD_HASH=$(curl -sL $AGGREGATOR_ENDPOINT/artifact/mithril-stake-distributions | jq -r '.[0].hash')
container_run demo mithril-client mithril-stake-distribution list
container_run demo mithril-client mithril-stake-distribution list --json > _ && cat _ | jq .
container_run demo mithril-client mithril-stake-distribution download $MSD_HASH --download-dir=/data/download
cat download/mithril_stake_distribution-$MSD_HASH.json | jq .

2023-06-13

  • We have worked on the following issues:

  • We have also started brainstorming on the epic Benchmark performances of Mithril Aggregator #904:

    • We need to run aggregator on one machine and the signers on another
    • We probably need more traces
    • How can we simulate 3K signers?
      • We could spoof real SPOs from a Cardano network, but it would create bias with the signer registration and would be hard to implement
      • A better idea is to use a fake Cardano cli:
        • It would behave deterministically on multiple machines (epoch, and immutable files could be based on time)
        • We would create in advance the cryptographic materials needed to register the signers
        • The stake distribution would be computed from this pre-generated signers
      • We can test:
        • Real signers: memory usage would be higher but the network would behave as in real life (with faster epoch)
        • Simulated signers: another program that would simulate the calls to the aggregator in a less realistic way, but with a finer control on some calls
        • We will probably experiment with both (depending on the needs/time)
        • A nice to have feature is to increase the numbers of signers during the test (in order to gradually reach the limit of the system)
    • What do we measure?
      • Is the service fulfilled?
        • Check if the certificates/artifacts are created at expected pace
        • Is the aggregator working properly when clients are retrieving artifacts/certificates
      • Spot bottlenecks?
        • Monitor aggregator physical resources (load, memory, i/o)
        • Monitor real curves under stress vs expected nominal curves
        • Keep logs and resources records out in a centralized repository for further analysis
        • Explore the tools that we will probably have to develop in order to analyze and extract information along the way
    • Other questions:
      • How to automate the stress tests (fully or partially)?
      • When to run the stress tests?
      • How to keep the tests up to date with new developments features of the networks?

2023-06-12

2023-06-09

2023-06-08

2023-06-07

2023-06-06

2023-06-05

  • We have reviewed and merged the Upgrade Rust 1.70.0 #959 that closes the issue CI tests fail with Rust 1.70.0 #958

  • We have created and groomed the following issues:

  • During our team session, we talked about:

    • Stake Distribution new computation and performances: we asked if the fix on the performance will be released with the new Cardano 8.1.0 and this will be the case 🎉 (see https://github.com/input-output-hk/ouroboros-consensus/pull/92#issuecomment-1576848835)
    • New P2P configuration seems to not be completely deployed on the mainnet: this requires a different configuration for mainnet vs preview/preprod. We are waiting for a confirmation from the Cardano team
    • Rolling update strategy: we discussed about how we could test breaking changes with a rolling update strategy early in the testing process. An idea is to use an on demand test in the Github Actions that would run a end to end test with multiple signers of 2 different distributions and would make sure that the aggregator is able to produce certificates/artifacts (with specific protocol parameters to make sure all signers contribute in order to produce a valid multi signature). Another option that we explored is a blue-green strategy, but given the epoch duration it does not look very efficient to test rapidly the upgrades
    • Hosting of mainnet Aggregator: the best option seems to keep ops on dev side at first with same SLO/SLA as for test networks, and work hand in hand with ops to implement best practices for monitoring/alerting and prepare for higher SLO
    • We also talked about testing the implementation of a client compiled in WASM that would run in the browser (but we need to test that this is possible with the current cryptographic backend)

2023-06-02

2023-06-01

May 2023

2023-05-31

2023-05-30

2023-05-26

2023-05-25

  • We have noticed some problems on the 2321.0-pre pre-release :

    • A breaking change has been introduced without creating a new era (bump of Open API version): this has lead to the signers running previous version to be prevented from communicating with the aggregator on the pre-release-preview network
    • In order to fix the problem, a fix has been created shortly with the 2321.1-pre, and the signers running the previous versions were back up again shortly after deployment
    • Another issue has been identified: the signature of a new type should also have been included in a new era. Given the current protocol parameters of the network, until 3 signers are running the new version the open message certifying the Mithril Stake Distribution at the beginning of epochs got stuck and blocked the queue of open messages. A manual intervention has enabled the network to resume certifying.
    • We have decided to postpone the release 2321 until the service is operating properly
    • We have created the issue Switch to Pythagoras era #941 and created the associated PR Create new Pythagoras era #942
    • Also it appears that from time to time, the open message queue is stuck and that the new immutable files are not detected properly. A restart of the aggregator fixes the problem. We are investigating this problem and we will also create an issue.
  • We have also created a PR to update the Mithril log on our websites Update Mithril logo #940

2023-05-24

2023-05-23

2023-05-22

2023-05-17

2023-05-16

  1. Introduction
  2. Explanation of the framework for signing multiple types of data
  3. Showcase of a Mithril network signing 2 types of data on a devnet
  4. Next steps
  5. Q&A
  6. Conclusion

Mithril

# Demo: Sign multiple types of data

---

# Setup demo

## Download source (if needed)
git clone https://github.com/input-output-hk/mithril.git
#or
git clone git@github.com:input-output-hk/mithril.git

## Checkout correct branch
cd mithril/
git switch jpraynaud/920-upgrade-cardano-node-8.0.0 
git cherry-pick 180f137b4ad5be6fdaaaebbfb7dc09049e1f24e9
cargo build --release

## Build docker images
cd mithril/
docker rmi $(docker images -q) --force
docker rm -vf $(docker ps -a -q)

### Build docker images distribution-1 (Thales era only)
rm -f mithril-aggregator/mithril-aggregator && rm -f mithril-signer/mithril-signer && rm -f mithril-client/mithril-client
make build && cp target/release/mithril-aggregator mithril-aggregator/mithril-aggregator && cp target/release/mithril-signer mithril-signer/mithril-signer && cp target/release/mithril-client mithril-client/mithril-client
docker image rm mithril/mithril-signer-demo mithril/mithril-client-demo mithril/mithril-aggregator-demo --force
docker build -t mithril/mithril-aggregator-demo -f mithril-aggregator/Dockerfile.ci .
docker build -t mithril/mithril-signer-demo -f mithril-signer/Dockerfile.ci .
docker build -t mithril/mithril-client-demo -f mithril-client/Dockerfile.ci .

---

# Demo: Run demo 
cd ../devnet-demo

## Create functions
function stop_devnet {
	./devnet-stop.sh
	docker stop $(docker ps -a -q)
	rm -rf artifacts/node-bft1/mithril/aggregator
	rm -rf artifacts/node-pool1/mithril/signer
	rm -rf artifacts/node-pool2/mithril/signer
	rm -rf artifacts/node-pool3/mithril/signer
}

function start_devnet {
  ./devnet-stop.sh && NODES=cardano SLOT_LENGTH=0.35 EPOCH_LENGTH=120 NUM_POOL_NODES=3 ./devnet-run.sh
}

function monitor_devnet {
  watch -c "NODES=cardano ./devnet-query.sh"
}

function epoch_devnet {
	CARDANO_NODE_SOCKET_PATH=artifacts/node-bft1/ipc/node.sock ./artifacts/cardano-cli query tip --cardano-mode --testnet-magic 42 | jq '.epoch'
}

function containers_list {
  watch -c "docker ps --format '{{.Names}} - {{.Status}}' | sort"
}

function container_up {
  DISTRIBUTION_VERSION=$1 ERA_READER_ADAPTER_PARAMS=$(cat era-markers/config.json) docker-compose -f docker-compose-demo.yaml --profile $2 up --remove-orphans --force-recreate -d --no-build
}

function container_down {
  docker stop $2-$1
}

function container_run {
  DISTRIBUTION_VERSION=$1 ERA_READER_ADAPTER_PARAMS=$(cat era-markers/config.json) docker-compose -f docker-compose-demo.yaml run $2 $3 $4
}

function container_exec {
  docker exec -it $2-$1 $3 $4
}

function container_logs {
  docker logs -f $2-$1 2>/dev/null
}

function monitor_versions {
  watch -c "./sqlite3 -table -batch artifacts/node-bft1/mithril/aggregator/stores/monitoring.sqlite3 < stake_signer_version.sql | head -n 50"
}

function era_activate_thales {
	cat > era-markers/markers.json << EOF
[
	{"name": "thales", "epoch": 1}
]
EOF
	cat era-markers/markers.json | jq .
}

function era_announce_pythagoras {
	cat > era-markers/markers.json << EOF
[
	{"name": "thales", "epoch": 1},
	{"name": "pythagoras", "epoch": null}
]
EOF
	cat era-markers/markers.json | jq .
}

function era_activate_pythagoras {
	EPOCH_ERA_SWITCH=$(( $(epoch_devnet) + 1))
	cat > era-markers/markers.json << EOF
[
	{"name": "thales", "epoch": 1},
	{"name": "pythagoras", "epoch": $EPOCH_ERA_SWITCH}
]
EOF
	cat era-markers/markers.json | jq .
}

function era_remove_thales {
	EPOCH_ERA_SWITCH=$(( $(epoch_devnet) - 1))
	cat > era-markers/markers.json << EOF
[
	{"name": "pythagoras", "epoch": $EPOCH_ERA_SWITCH}
]
EOF
	cat era-markers/markers.json | jq .
}

function query_aggregator_db {
	sqlite3 ./artifacts/node-bft1/mithril/aggregator/stores/aggregator.sqlite3 $1
}

function query_aggregator_db_file {
	sqlite3 -table ./artifacts/node-bft1/mithril/aggregator/stores/aggregator.sqlite3 < $1
}

function watch_query_aggregator_db_file {
	watch -c "sqlite3 -table ./artifacts/node-bft1/mithril/aggregator/stores/aggregator.sqlite3 < $1"
}

function list_certificates {
	curl -s http://localhost:8080/aggregator/certificates | jq '.[0:5]'
}

function get_certificate {
	curl -s http://localhost:8080/aggregator/certificate/$1 | jq .
}

function list_artifact_cardano_immutable_files_full_snapshots {
	curl -s http://localhost:8080/aggregator/artifact/snapshots | jq '.[0:5]'
}

function get_artifact_cardano_immutable_files_full_snapshot {
	curl -s http://localhost:8080/aggregator/artifact/snapshot/$1 | jq .
}

function list_artifact_mithril_stake_distributions {
	curl -s http://localhost:8080/aggregator/artifact/mithril-stake-distributions | jq '.[0:5]'
}

function get_artifact_mithril_stake_distribution {
	curl -s http://localhost:8080/aggregator/artifact/mithril-stake-distribution/$1 | jq .
}

## Reset demo if needed
stop_devnet

## Start Cardano network
start_devnet

## Start Mithril network
era_activate_thales
container_up demo mithril-aggregator
container_up demo mithril-signer-1
container_up demo mithril-signer-2
container_up demo mithril-signer-3
container_run demo mithril-aggregator-genesis
containers_list
container_logs demo mithril-signer-1 

## Query Aggregator database
query_aggregator_db ".tables"
query_aggregator_db ".schema"
query_aggregator_db_file entity_types.sql
watch_query_aggregator_db_file watch.sql

## List Signed Artifacts / Mithril Stake Distribution
list_artifact_mithril_stake_distributions
LAST_MITHRIL_STAKE_DISTRIBUTION=$(list_artifact_mithril_stake_distributions | jq -r '.[0'])
get_artifact_mithril_stake_distribution  $(echo $LAST_MITHRIL_STAKE_DISTRIBUTION | jq -r '.hash')
get_certificate $(echo $LAST_MITHRIL_STAKE_DISTRIBUTION | jq -r '.certificate_hash')

## List Signed Artifacts / Cardano Immutable Files Full Snapshot
list_artifact_cardano_immutable_files_full_snapshots
LAST_CARDANO_IMMUTABLE_FILES_FULL_SNAPSHOT=$(list_artifact_cardano_immutable_files_full_snapshots | jq -r '.[0'])
get_artifact_cardano_immutable_files_full_snapshot $(echo $LAST_CARDANO_IMMUTABLE_FILES_FULL_SNAPSHOT | jq -r '.digest')
get_certificate $(echo $LAST_CARDANO_IMMUTABLE_FILES_FULL_SNAPSHOT | jq -r '.certificate_hash')

2023-05-15

2023-05-12

2023-05-11

2023-05-10

2023-05-09

2023-05-05

2023-05-04

2023-05-03

2023-05-02

April 2023

2023-04-28

2023-04-27

2023-04-26

2023-04-25

2023-04-24

  • The bug Verification key discrepancy between signer and aggregator #873 has occurred during the weekend and it created a gap in the certificate chain of the testing-preview network. We had to re-regenesis the network and expect to get it signing back again tomorrow

  • We have mainly worked on fixing the tests and the bugs in the PR Certifier service #866 of the issue Implement Certifier service in aggregator #850 and making the network sign correctly certificates. We have made good progress as the end to end tests are working: so far there are few unit/integration tests that need to be fixed or that are flaky, and some cleanup that will be done tomorrow. We will be able to merge the PR shortly

  • During our team session, we have discussed of the following topics:

    • How Mithril can help Daedalus Turbo:
      • Using another compression algorithm for the archive: good idea, we will try to implement it as described in this issue Use zstandard compression for snapshot archives #876
      • Incremental snapshots: this is already part of our road-map. The idea is to certify all immutable files independently and to provide range restoration
      • Using BitTorrent with Mithril: this does not sound relevant as 1/ we would build a second P2P network to synchronize blocks (which Cardano network already does), and 2/ the user downloading an archive to bootstrap a node do not want to share their node/wallet data on such a network (which would considerably limit the efficiency)
    • Signer deployment model: we will check which team we need to get in touch with to validate our design
    • Full node verifier: this is a Mithril verifier running inside a full node (i.e. which is aware of the stake distribution). It would be able to process a lighter verification process
    • Smart contracts: we have decided to design and PoC a smart contract for signer registration. This will be the subject of our next team session

2023-04-21

2023-04-20

  • We have worked on the demo path:
  1. Introduction
  2. Architecture design for signing multiple types of data
  3. Explanation of the new relational database design of the aggregator
  4. Showcase of the new database of the aggregator in the devnet
  5. Explanation of the new services design of the aggregator
  6. Next steps
  7. Q&A
  8. Conclusion

2023-04-19

2023-04-18

2023-04-17

2023-04-14

  • We have released the new distribution 2315.0 🎉

  • Additionally, we have kept pairing on these issues today, and we will continue next week:

    • Implement Certifier service in aggregator #850: The trait that the Certifier service must implement is defined. We will start working on its implementation next week. We have also modified the design of the state machine of the aggregator so that it can support the new Certifier service and make dynamic calls to the SignableBuilder and ArtifactableBuilder given the signed entity type.
    • Define the interface of the generic entity service #847: We have almost completed the definition of the interfaces and we will complete the schema next week
    • We have decided to create a new Signed Entity Service whose responsibilities will be:
      • Call the artifacts builder adapters
      • Create signed entities from artifacts
      • Store signed entities

2023-04-13

2023-04-12

2023-04-11

2023-04-06

2023-04-05

2023-04-04

2023-04-03

  • We have reviewed and paired on the current issues:

  • We have started grooming the epic Design and implement generic actors #780 and we will continue in the following days

  • During our team session we talked about:

    • The problem that a SPO met last week and that prevented him to sign on the pre-release-preview network because the immutable files produced by the block producer were corrupted
    • This lead us to dive deeper into the mechanism of committing blocks to the immutable files and of the security parameter k of the Cardano protocol
    • We talked about the possibility:
      • To sign a bigger part of the latest immutable files that are currently being built
      • To deliver a snapshot that does not embed the ledger state and the latest immutable files
    • Overall, the Cardano node has a security mechanism that avoids using corrupted files so this should not be a problem to embed the latest immutable files
    • We discussed about the dev SPO that was created on the preview Cardano network and that is referenced as root peer: we will try to retire the SPO
    • Also, we talked about deploying to the mainnet and the a priori unnecessary SLA to put in place during the ramp up period

March 2023

2023-03-31

  • We have released the 2313.0 distribution on the release-preprod network 🚀. The documentation has been rotated as well with the merge of the PR Update current documentation #842

  • We have fixed a panic on the aggregator node of the testing-preview network because a function was not implemented yet. The problem has been fixed with the merge of the PR Fix Epoch Setting store panic #841

  • Additionally, we have kept working on the implementation of the issues:

  • Finally, we have proceeded to the rotation of the KES keys for some signers of the testing-preview and release-preprod networks

2023-03-30

2023-03-29

2023-03-28

  • Today, we have kept working on the issues:

  • We have also investigated a weird behavior on a SPO signer node which was unable to sign the messages correctly although its block producer Cardano node was working properly and producing new blocks:

    • The error message received is core error: 'A provided signature is invalid'
    • We noticed that one set of immutable files 02705 was different between the block producer and:
      • The associated relay on the SPO infrastructure
      • The block producers and relays of the testing-preview and pre-release-preview Cardano nodes that we operate
    • After restarting its block producer, it appears that the node found out that there was a discrepancy with the immutable files and fixed it
    • We are waiting the feedback from the SPO to check that its signer node is now able to sign snapshots

2023-03-27

  • We have worked on the following issues:

  • A bug has been reported regarding static builds: Debian package does not install cleanly on older ubuntu versions #834. We will try to fix it shortly

  • During our team sessions we discussed about:

    • The PR Add flake.nix #811:
      • This will help us build static binaries
      • This is normal that the build does not work before it is merged
      • We lack some documentation: it will be added to the PR and then some explanation on how to use the nix shell will be added on the documentation website
      • We keep it for Linux and mac OS at the moment. We will work on Windows adaptation later if it makes sense
      • We will also implement the nix shell in the CI to benefit from the static builds
    • We have also talked about the new use case which will certify the Cardano stake distribution:
      • We need to make the difference between the Mithril stake distribution (that needs to be signed prior to being used and that represents only the SPOs running signer nodes) and the Cardano stake distribution (that lists all the SPOs)
      • We will be able to sign the most recent version of the stake distribution (end of previous epoch)
      • We will provide a JSON map of the stake distribution (pool id <-> stake in lovelace and sign a Merkle tree representation of it):
        • We need to see which pool id needs to be used (hash or bech32 versions) in order to make sense for the widest usage (or maybe provide both)
        • We will provide an interface so that some internal developers can start using it and build some PoC
        • Some use cases could be a very light node that verifies incoming blocks with certified stake distribution or network relays

2023-03-24

2023-03-23

2023-03-22

2023-03-21

2023-03-20

  • We have reviewed the issues of the sprint and created the next issues:

  • We have noticed that an epoch gap occurred on the pre-release-preview network this weekend:

    • The machine went OOM which lead the Cardano node used by the aggregator to be stuck in epoch 144
    • We have manually upgraded the machine (which should have happened automatically with the next pre-release)
    • We have prepared the re-genesis of the network which is scheduled tomorrow with and expected certificate production resumed on Wednesday (issue Re-genesis pre-release-preview #818)
    • The release-preprod network which is running the same distribution is not impacted and is running as expected
    • We have also noticed that the participation arte has dropped suddenly last week. It seems that it is probably due to expiration of the KES keys on the signer nodes. We have created the issue Panic signer when KES keys expired #820 in order to panic the signer node with a Critical error when that happens
  • Additionally, we have paired on the refactoring of the multi-signer of the aggregator by implementing the Stake Distribution service to refactor the stake distribution usage in it. This is a bit complicated as it breaks a lot of things. We will continue tomorrow

  • Finally, we have completed the PR Handle API version with Era Switch #812 which will close the issue Handle API Version with Era switch #727 which will be merged shortly. We have found a solution to handle the era switch (when a breaking change in the API occurs) by implementing a retry mechanism (with version update)

2023-03-17

  • We have reviewed the PR Handle API version with Era Switch #812:

    • The PR is almost ready to merge and close issue Handle API Version with Era switch #727
    • We have found a way to implement a route that receives different request bodies before/after era switch (see comments at https://github.com/input-output-hk/mithril/pull/812#issue-1627961199)
    • We have found a way to handle era switch with breaking chages of API version without having access to the on chain era reader: for this we will implement a mechanism that checks which API version is ran by the aggregator (when multiple are available) and uses this version if it is available. We will implement it shortly
  • We have also paired on the issue Create a Stake Distribution service #799 with the PR stake pool service #800. We have completed the implementation of the stake pool service and we will use it in the state machine and the multi-signer

  • We have noticed that very few signers are registered on the pre-release-preview network since the problem that we had on the machine earlier this week. We will investigate further this issue with the SPOs next week if it keeps happening

  • We have also completed the issue Qualify new stake distribution computation #810

  • Finally, we have created the genesis certificate for the testing-mainnet network and we expect to create the first certificates at the next epoch 401

2023-03-16

2023-03-15

2023-03-14

  • We have noticed that the Cardano nodes running on the preview network suddenly were not responding through the socket. The result is that the service has been very erratic on the testing-preview and pre-release-preview networks today. We hope that a gap in the certificate chain will not occur. The release-preprod is working as expected. We will keep monitoring the networks tomorrow. It appears that the way we compute the stake distribution is very problematic:

    • We need to compute the stake distribution only once per epoch (this will be addressed in the issue Create a Stake Distribution service #799)
    • The computation of the stake distribution (without the unreleased optimization of the cardano-cli) is taking ~2 hours and is very compute intensive
  • We have reviewed and merged the issue Migrate/adapt stake_pool tables #787. The migration worked as expected on the testing-preview network 🎉 Following this issue we have created the issue Create a Stake Distribution service #799

  • We have worked on the issue Add Docker image in Mithril Client multi-platform test #794 and a PR is ready to be merged Update Mithril Client multi-platform test workflow #793. We have noticed that the usage of cache on the digest computation in the client brings many problems with rights on the creation of the cache directory when it does not bring a lot of added value. We will probably remove it in the near future. In the mean time we have implemented the Docker client test with the --disable-digests-cache option to overcome this problem on the runners in the CI

2023-03-13

  • We have reviewed the work on the issue Handle API Version with Era switch #727: in order to handle multiple versions of the Open API, we need to compute a build scirpt that pre-compiles a Rust file responsible for gathering api versions available in eachspec file. This is the first foundation that is used to handle API version depending on the era. Thi s issue is a bit more difficult that what was expected and we will keep working on it tomorrow

  • We have made a review and paired on the issue Migrate/adapt stake_pool tables #787. We have noticed some weird behavior on the store migration and we will investigate on that matter. In the mean time we have achieed the migration script for the stake store. During this pairing sessionw e have identified that:

  • During our team session, we have discussed about the following subjects:

    • Advertise the usage of Mithril for internal and/or external developers to restore Cardano nodes on preview and preprod
    • This will be probably done during the Dapp and SPO calls
    • Put in place some API usage statistics (basic data regarding API routes calls + snapshots downloads)
    • New computation of the Stake Distribution is OK and execution delay is acceptable for mainnet
    • We need to work an specifying the Signer node deployment on the Cardano SPO infrastructure for mainnet
    • We also to make sure there is no security impact: signer node should not be able to write in the Cardano database folder and also regarding the access to the KES keys

2023-03-10

2023-03-09

2023-03-08

  • Following the test that we operated on the unreleased cardano-cli 1.36.0 that does not have the 128 bytes size limitation for the bytes fields of datum submitted in a transaction, we have decided to implement a temporary mechanism that chunks automatically the bytes fields generated from the era generate-tx-datum command of the aggregator cli and to read all the bytes fields available in the datum from the Cardano chain era reader adapter. Once the new cardano-cli will be released, we will remove the chunking mechanism. However the full compatibility is guaranteed which is a mandatory requirement for rolling out the feature. Alos we have decided to keep the JSON serialization format instead of switching to CBOR. The PR Fix Datum generation for era markers #788 has been reviewed and merged. It fixes the issue Enhance Datum generation for Era Markers #786. The era markers have then been deployed to the era addresses of the testing-preview, pre-release-preview and release-preprod networks

  • A new distribution pre-release 2310.0-prerelease that activates the Era Switch to the networks. Also a dev blog post Mithril Era Switch has been released. We expect to release the dsitribution by end of week or early next week

  • Additionally, we have reviewed the PR in progress for migrating the stake distribution of the aggregator add stake_pool provider #789 that should be merged shortly

2023-03-07

2023-03-06

  • We have reviewed the following PRs:

  • During our team session, we have talked about the following topics:

    • A possible new use case for Mithril could be to certify the stake distribution only (which could be helpful for side chains). This is perfectly possible and will be the case when the signing of generic data is implemented. We will need to add a new route that helps retrieve the full stake distribution and a client command to verify the associated certificates
    • Some other teams also need to use a pub/sub mechanism in a peer to peer context, and as the Cardano network layer is probably not the best option, libp2p seems to be a good candidate. There are still question regarding the vulnerability to some attacks
    • We have talked about the security audit for Mithril that should probbaly be done only on the cryptographic primitives. We expect a formal approval of this strategy soon
    • In order to get ready for the release to the mainnet, we will prepare an Operational Plan for first release, +6 months and +1 year perspective. This will help us define the scope of the resources that we need to have allocated to the project from the release to the end of the ramp up phase
    • Also, we have investigated the problem that was raised last week regarding the limits of the bytes fields size in the transaction datum. A possible long term fix is to implement an encoding in CBOR indefinite bytes (chunks of 64 bytes). We have created an issue for this subject Enhance Datum generation for Era Markers #786

2023-03-03

  • We have mainly worked on the issue Run a mainnet test Mithril network #777:
    • Created a PR [Fix test Docker image workflow #781](https://github.com/input-output-hk/mithril/pull/781 that fixes the Docker test images workflow. It should be merged shortly and will allow us to build test Docker images that can be used without having to merge branches into main
    • Created a PR 🔥 Test mainnet setup #782 that must not be merged. This PR includes a test implementation of the network that runs on the unreleased Cardano node/cli 1.36.0 and that makes use of the new query stake-snapshot --all-stake-pools command to compute efficiently the stake distribution. The artifacts built from this PR will be used to create test Docker images that will be ran on a test mainnet Mithril network
    • Tested the new query stake-snapshot --all-stake-pools command on the Cardano mainnet: it took 1h05min to compute the stake distribution instead of multiple days 🎉

2023-03-02

  • We have worked on the following issues:

    • Add context to errors #665: We have paired on fixing a bug that occurred when multiple tests are created in a integration test (because of a shared logger). The PR handling errors better #776 is almost complete and should be merged shortly
    • Deploy Era Behavior Switch #752: We encountered a problem with the size of fields inside datum files which are limited to 128 bytes. The era markers that we originally prepared could not fit (as the signature is already 128 bytes long). We made a fix and stored the era markers in one byte field and the signature in another. It worked, but we will try to challenge the implementation shortly. Also the PR Deploy era reader on chain #775 is under review and should be merged very shortly. Once this is the case, we will create a new distribution and we will coordinate ourselves with the SPOs to update their signer nodes configuration
  • We have also continued grooming the signing of generic data in Mithril networks. We have created the following epic issues:

2023-03-01

  • We have worked on the following topics:

  • We have also started grooming the new feature "Signing generic data":

    • Our strategy is to rollout the new aggregator stores with 3 phases:
      1. Migrate/adapt stake_pool, signer, epoch_settings tables
      2. Migrate/adapt certificate, signed_entity and signed_entity_type
      3. Migrate/adapt open_message, signer_registration, single_signature
    • This progressive rollout will attempt to have the minimum impact to the existing code at first:
      • This will be a strong foundation on which we will build the new usecases
      • At first, we will process open messages sequentially, but the system will be evolved to a parallel runtimes setup (one for each type of message)
    • We will determine what interfaces need to be implemented when signing new types of data (like how to sign a message and from which beacon it can be computed deterministically). With this design, the mechanism of signing/verifying the message will not change with the type of message
    • We will also determine what optional implementation needs to be done (e.g. adding routes to the aggregator for retrieving proofs, and new features to handle them in the client)
    • The aggregator REST API will be also modified in order to host new types of data (e.g. list the certificates by type of data)
    • The explorer will also need to be adjusted to handle this new REST API
    • The certificate chain needs also to be slightly modified in order to handle these new types of data
    • We will start with the following types of data:
      • Sign the stake distribution once per epoch (as the first certificate of the epoch)
      • Then, sign the immutable snapshots when new immutable files are produced
    • We will keep working on this grooming tomorrow and we will then create the associated epics and issues

February 2023

2023-02-28

  • We have kept pairing on the issue Add context to errors #665:

    • We have managed to gracefully kill the aggregator when a critical error is encountered from the runtime
    • We have kept working on bringing business errors instead of technical errors
    • The current PR is handling errors better #776 should be merged tomorrow
  • We have created the issue Run a mainnet test Mithril network #777:

    • A first PR has been created Prepare run mainnet Mithril test network #778 that leverages the infrastructure and the CI to run custom versions of Mithril and Cardano on a network. It should be merged tomorrow
    • A testing mainnet network is being synchronized
    • The next step is to use the new command of the Cardano cli to compute the stake distribution all at once and try to produce the first test snpashots
  • Also, we have kept working on the issue Deploy Era Behavior Switch #752 by creating all the cryptographic material needed for the deployment

2023-02-27

  • We have paired on the issue Add context to errors #665 and a PR has been created handling errors better #776 that should be merged shortly

  • We have reviewed and merged the following PRs:

  • During our team session, we have discussed about:

    • The issue Add context to errors #665 and the best way to catch a failure from a tokio thread and abort all other threads with a JoinSet
    • How we could compute deterministically the Utxo set:
      • We will not be able to rely on the cardano CLI (as it gives real time results)
      • We could use the kupo chain indexer
      • We could use the oura chain indexer or directly the pallas library and its chainsync mini protocol implementation
      • These solutions could work, but would require to store a lot of data and handle the chain rollbacks that happen quite often
      • Another option is to use the immutable files to read directly the blocks from them and reconstruct the Utxo once the transaction are final (i.e. stored in an immutable). With this solution we would just need to read from the chunk files which format is available here and to use pallas capability to parse blocks from cbor

2023-02-23

  • We have worked on the demo path for showcasing the era switch behavior:

01-demo-path

# Setup demo

## Download source (if needed)
git clone https://github.com/input-output-hk/mithril.git
#or
git clone git@github.com:input-output-hk/mithril.git

## Build docker images
cd mithril/

### Build docker images distribution-1 (Thales era only)
git switch distribution_1
rm -f mithril-aggregator/mithril-aggregator && rm -f mithril-signer/mithril-signer && rm -f mithril-client/mithril-client
make build && cp target/release/mithril-aggregator mithril-aggregator/mithril-aggregator && cp target/release/mithril-signer mithril-signer/mithril-signer && cp target/release/mithril-client mithril-client/mithril-client
docker image rm mithril/mithril-signer-distribution-1 mithril/mithril-client-distribution-1 mithril/mithril-aggregator-distribution-1 --force
docker build -t mithril/mithril-aggregator-distribution-1 -f mithril-aggregator/Dockerfile.ci .
docker build -t mithril/mithril-signer-distribution-1 -f mithril-signer/Dockerfile.ci .
docker build -t mithril/mithril-client-distribution-1 -f mithril-client/Dockerfile.ci .

### Build docker images distribution-2 (Thales & Pythagoras eras)
git switch distribution_2
git rebase distribution_1
rm -f mithril-aggregator/mithril-aggregator && rm -f mithril-signer/mithril-signer && rm -f mithril-client/mithril-client
make build && cp target/release/mithril-aggregator mithril-aggregator/mithril-aggregator && cp target/release/mithril-signer mithril-signer/mithril-signer && cp target/release/mithril-client mithril-client/mithril-client
docker image rm mithril/mithril-signer-distribution-2 mithril/mithril-client-distribution-2 mithril/mithril-aggregator-distribution-2 --force
docker build -t mithril/mithril-aggregator-distribution-2 -f mithril-aggregator/Dockerfile.ci .
docker build -t mithril/mithril-signer-distribution-2 -f mithril-signer/Dockerfile.ci .
docker build -t mithril/mithril-client-distribution-2 -f mithril-client/Dockerfile.ci .

### Build docker images distribution-3 (Pythagoras era only)
git switch distribution_3
git rebase distribution_2
rm -f mithril-aggregator/mithril-aggregator && rm -f mithril-signer/mithril-signer && rm -f mithril-client/mithril-client
make build && cp target/release/mithril-aggregator mithril-aggregator/mithril-aggregator && cp target/release/mithril-signer mithril-signer/mithril-signer && cp target/release/mithril-client mithril-client/mithril-client
docker image rm mithril/mithril-signer-distribution-3 mithril/mithril-client-distribution-3 mithril/mithril-aggregator-distribution-3 --force
docker build -t mithril/mithril-aggregator-distribution-3 -f mithril-aggregator/Dockerfile.ci .
docker build -t mithril/mithril-signer-distribution-3 -f mithril-signer/Dockerfile.ci .
docker build -t mithril/mithril-client-distribution-3 -f mithril-client/Dockerfile.ci .

---

# Demo: Run demo 
cd ../devnet-demo

## Create functions
function stop_devnet {
	./devnet-stop.sh
	docker stop $(docker ps -a -q)
	rm -rf artifacts/node-bft1/mithril/aggregator
	rm -rf artifacts/node-pool1/mithril/signer
	rm -rf artifacts/node-pool2/mithril/signer
	rm -rf artifacts/node-pool3/mithril/signer
}

function start_devnet {
  ./devnet-stop.sh && NODES=cardano EPOCH_LENGTH=60 NUM_POOL_NODES=3 ./devnet-run.sh
}

function monitor_devnet {
  watch -c "NODES=cardano ./devnet-query.sh"
}

function epoch_devnet {
	CARDANO_NODE_SOCKET_PATH=artifacts/node-bft1/ipc/node.sock ./artifacts/cardano-cli query tip --cardano-mode --testnet-magic 42 | jq '.epoch'
}

function containers_list {
  watch -c "docker ps --format '{{.Names}} - {{.Status}}' | sort"
}

function container_up {
  DISTRIBUTION_VERSION=$1 ERA_READER_ADAPTER_PARAMS=$(cat era-markers/config.json) docker-compose -f docker-compose-demo.yaml --profile $2 up --remove-orphans --force-recreate -d --no-build
}

function container_down {
  docker stop $2-$1
}

function container_run {
  DISTRIBUTION_VERSION=$1 ERA_READER_ADAPTER_PARAMS=$(cat era-markers/config.json) docker-compose -f docker-compose-demo.yaml run $2 $3 $4
}

function container_exec {
  docker exec -it $2-$1 $3 $4
}

function container_logs {
  docker logs -f $2-$1 2>/dev/null
}

function monitor_versions {
  watch -c "./sqlite3 -table -batch artifacts/node-bft1/mithril/aggregator/stores/monitoring.sqlite3 < stake_signer_version.sql | head -n 50"
}

function era_activate_thales {
	cat > era-markers/markers.json << EOF
[
	{"name": "thales", "epoch": 1}
]
EOF
	cat era-markers/markers.json | jq .
}

function era_announce_pythagoras {
	cat > era-markers/markers.json << EOF
[
	{"name": "thales", "epoch": 1},
	{"name": "pythagoras", "epoch": null}
]
EOF
	cat era-markers/markers.json | jq .
}

function era_activate_pythagoras {
	EPOCH_ERA_SWITCH=$(( $(epoch_devnet) + 1))
	cat > era-markers/markers.json << EOF
[
	{"name": "thales", "epoch": 1},
	{"name": "pythagoras", "epoch": $EPOCH_ERA_SWITCH}
]
EOF
	cat era-markers/markers.json | jq .
}

function era_remove_thales {
	EPOCH_ERA_SWITCH=$(( $(epoch_devnet) - 1))
	cat > era-markers/markers.json << EOF
[
	{"name": "pythagoras", "epoch": $EPOCH_ERA_SWITCH}
]
EOF
	cat era-markers/markers.json | jq .
}

## Reset demo if needed
stop_devnet

## Start Cardano network
start_devnet
monitor_devnet
containers_list

## Start network with version distribution-1
era_activate_thales
container_up distribution-1 mithril-aggregator
container_up distribution-1 mithril-signer-1
container_up distribution-1 mithril-signer-2
container_up distribution-1 mithril-signer-3
container_run distribution-1 mithril-aggregator-genesis
container_logs distribution-1 mithril-signer-1 | grep "Current Era"
container_logs distribution-1 mithril-signer-2 | grep "Current Era"
container_logs distribution-1 mithril-signer-3 | grep "Current Era"

## Update network partially with version distribution-2
container_up distribution-2 mithril-aggregator
container_up distribution-2 mithril-signer-1
era_announce_pythagoras
container_logs distribution-1 mithril-signer-1 | grep "Upcoming Era"
container_logs distribution-1 mithril-signer-2 | grep "Upcoming Era"
container_logs distribution-1 mithril-signer-3 | grep "Upcoming Era"
container_up distribution-2 mithril-signer-2
container_logs distribution-2 mithril-signer-2 | grep "Upcoming Era"
container_logs distribution-1 mithril-signer-3 | grep "Upcoming Era"
era_activate_pythagoras
container_logs distribution-2 mithril-signer-1 | grep "Current Era"
container_logs distribution-2 mithril-signer-2 | grep "Current Era"
container_logs distribution-1 mithril-signer-3 | grep "UnsupportedEraError"
container_up distribution-2 mithril-signer-3
container_logs distribution-2 mithril-signer-3 | grep "Current Era"

2023-02-22

  • We have reviewed and merged the PR Add dynamic matrix in CI for end to end tests #761 that activates the dynamic runs of the end to end tests in the CI depending on the supported eras

  • We have noticed some problems with the monitoring on the testing-preview network:

  • Finally, we have also paired on preparing the demo for the review that will take place tomorrow and that will showcase the era switch mechanism that allows Mithril hard forks

2023-02-21

2023-02-20

  • We have paired on:

  • We have also reviewed the PR Add era command in aggregator cli #756. It should be merged tomorrow

  • During our technical team session, we have discussed about:

    • Cardano-cli 1.36.0 and the issue about the computation of the total stakes:
      • We are now able to build the node and cli from source. We will implement them in custom Docker images and star running a mainnet test Mithril network which will help us check if there are issue when we scale to compute large snasphots
      • The problem with the stakes total is not critical for us as we don't rely on those fields to compute the total stakes
    • Signing new types of data new feature:
      • We will need to update slightly the structure of the certificate chain: there will probably be one certificate for the stake distribution (the first of an epoch, the only required for an epoch) and one certificate for each new message that has beensigned (that will have a specific type) linking to the first of the epoch. This will allow us to avoid breaking changes to introduce new types of data: certificates for these new types will be produced as soon as enough signer run the new version(and are bale to sign) so that a multi signature can be produced
      • A good example to start working with is signing the Utxo set (which will help us design the interface needed to add a new type of data): we will work on this subject with the wallet team soon to get a better understanding of their needs. In the meantime, we know that we will have to retrieve the Utxo set from the Cardano node:
        • First with the Cardano cli which is able to produce this set for small networks (e.g. preview and preprod)
        • Than with a better suited way of retrieving it for the mainnet: we will investigate if it is possible to do it with some internal tools used for accessing the internal database of the Cardano node or with third party tools such as https://github.com/sierkov/daedalus-turbo
    • Testing strategy:
      • In to order to get an even better testing strategy, we will try to investigate the possibility of doing model based testing with Mithril protocol implementation
      • This will help us test some edgy scenarios like network partition. We will see what can Rust bring to the table, probably with the crate madsim. We will try to pair on these subjects during the next team sessions

2023-02-16

  • We have discussed about:

    • The event store and the fact that we need to add metadata along with the content of the event. We agreed that this will be done by the event creator that will take another parameter "headers" and that will wrap them with the actual content of the event
    • The way to implement easily the SupportedEra enum with the use of macros to avoid modification of the code at multiple places each time we add/remove an era. The first draft implementation custom made is a bit cumbersome so we have decided to use the macros from the strum crate
  • We have worked on the following PRs:

2023-02-15

2023-02-14

2023-02-13

  • We have paired on the issue Define the structure of an event #739 and closed it. The outcome of the session is the following diagram that describes the design of the event store and of the events themselves:

Mithril

  • We have also paired on the issue Implement an event producer/consumer via channel #738 to complete the first draft on which we worked last week, and to confirm that we could produce an event and receive it on a different thread

  • We have reviewed and merged the following PRs:

  • During our team session, we have discussed about the following subjects:

    • Strategy to migrate the store of the aggregator: it seems like a good idea to prepare and maintain a road book to keep track of best practices when administering an aggregator (such as not upgrading to close to era transition e.g.)
    • Testing Mithril on the mainnet:
      • We could prepare a build of the cardano-cli, host it somewhere, and use it to test the new stake distribution computation
      • If we keep having glibc errors when building the cardano cli on the master branch, we will ask for support from the Cardano team
      • We have validated that the solution we have designed to stress test the network by implementing a fake cardano cli that will avoid PoolId spoofing is a good option
    • Displaying the SPO ticker on the explorer: we could do it by querying the API from cexplorer for example or retrieve it from the pool metadata when the signer registers (seems much more complicated)

2023-02-10

2023-02-09

2023-02-08

2023-02-07

  • We have kept monitoring the issue Signer can't sign on testing-preview network #730: the re-genesis of the testing-preview worked as expected and new certificates are being produced on testing-preview (with the signer that did not had signing troubles). We will see tomorrow if the other signer is back in the signatures. We have not identified yet the source of the problem and we keep investigating

  • We have paired on the issue Define relational design of stores #476 and we have achieved a first version of the aggregator store relational design:

Mithril

2023-02-06

  • We have paired on the issue Implement an EraReader trait that gathers era activation data #709 and added the documentation on the updates done on the state machine of the signer. The PR is ready to merge, which should be done tomorrow 💪

  • We have merged the following PRs:

  • We also have created the following issues:

    • Signer can't sign on testing-preview network #730: a bug that prevents some signer to sign on the testing-preview network with a ProtocolInitializerNotRegistered(CoreRegister(UnregisteredInitializer)) error. This error created a gap in the certificate chain of the testing-preview network which has been re-genesis. We will closely monitor the problem and see if it is reproduced in the following days. Also, we may post-pone the distribution that we have scheduled to prepare this week if we are not able to fix the problem
    • Add a new SPO on testing-preview network #729: this will add a third SPO on the testing-preview network which will help debugging and reduce the need for re-genesis
  • Finally, we have discussed about the following subjects during our team session:

    • UtxoHD compatibility with Mithril: at first sight, it appears that it is not a problem for Mithril: we currently don't sign the ledger state that is not deterministically produced on the cardano nodes. However, we will review the specs and dive deeper in order to validate that this doesn't break anything on the snapshot creation/restoration process
    • Possibilities to create a decentralized setup of the Mithril network:
      • We just have the constraint that the solution can easily be switched to another with an adapter mechanism for example
      • Use of libp2p Rust implementation to create a peer to peer network (between Mithril signer relays and/or aggregators) that would implement the pub/sub gossip protocol. We could also use the Kademlia implementation to provide peer to peer discovery of the nodes (this would help to bootstrap an aggregator with data from another aggregator)
      • Draft an implementation on the Cardano node with mini protocols
      • Use of IPFS only (if that alleviates the need to maintain a peer to peer network?)
      • Use a tool like wireguard to connect the nodes

2023-02-02

  • We have mainly paired on the issue Implement an EraReader trait that gathers era activation data #709 that should be completed and merged shortly

  • We have also discussed about some edge cases that we have identified concerning seamless updates:

    • When releasing a new era switch enabled version, we will need to do a hot switch of the API version as well. This is something that is not currently supported. We have created an issue to address this problem: Handle API Version with Era switch #727
    • When the underlying Cardano cli used by the signer and aggregator nodes is changing its interface, we will need to be able to handle both versions in a soft update (with a switch based on the cardano cli version)
  • Finally, we have worked on the issue Upgrade Cardano node to 1.35.5 #725 and created a PR that should be merged shortly. It also fixes few bugs related to building develoepr Docker images and timeout management when running the devnet

2023-02-01

  • The fix published with the 2304.1-prerelease distribution has fixed the problem on the pre-release-preview network and signer are able to sign back. We have thus released the 2304.1 distribution 🎉. We will keep closely monitoring the pre-release-preview and release-preprod networks in the following days

  • We have paired on defining the new relational design of the stores of issue Define relational design of stores #476. This is a work in progress and we will produce a database schema as an outcome of our work

  • Finally, we have continued pairing on the implementation of the issue Implement an EraReader trait that gathers era activation data #709 with implementing the dummy and bootstrap adapters and wiring the era reader in the dependencies of the signer and aggregator nodes

January 2023

2023-01-31

2023-01-30

  • We have reviewed and merged a fix for the Signers are unable to sign with 2304.0-prerelease #716: a new pre-release version of the distribution has been created 2304.1-prerelease. We need to wait a 2 epochs delay before we can confirm that the fix worked as expected, and we will release the distribution then

  • We have also reviewed and merged the following PRs:

  • Also, we have worked on the Era mechanism:

  • Finally, we had a team session during which we addressed the following topics:

    • Need for signing the era activation marker stored in an on chain transaction, so that we can filter it out of the utxos of an address
    • We will probably use a new secret/verification keypair with the same format as the Genesis keypair
    • In the long run, it would be nice to add a multi signature for this transaction ( same as hard fork in Cardano node)
    • We have noticed that the deployment model is not completely defined and that some SPOs are running their test signer nodes on the relay nodes (probably copying their KES keys on the relay node). The next sprint review will be a good time to talk about this topic with the SPOs
    • We have also talked about the possibility of relying on a different decentralized network than the Cardano node network layer

2023-01-27

2023-01-26

  • There seems to be a problem with the new 2304.0-prerelease: Many signers are receiving a ProtocolInitializerNotRegistered(CoreRegister(UnregisteredInitializer)) error when they are trying to sign. This means that the verification key they are using is not the one that was registered 2 epochs ago (i.e. before activation of the new release). The signer node we are maintaining does not have the problem, and the certificate chain is regularly appended at the moment. However, if we had used the full security parameters there would not have been enough signers to create new multi-signatures and the chain would have been broken. We have not seen such a problem on the testing-preview network. We will keep investigating the issue and probably postpone the release of the distribution until we understand the source of the problem

  • We have prepared the demo path of this iteration:

    1. Introduction
    2. Showcase of the backward compatibility of messages
    3. Showcase of the API version enforcement
    4. Presentation of the PoC for era activation with on chain transaction
    5. Conclusion/Next steps
    6. QA
  • Showcase of the backward compatibility of messages 01-demo-backward-compatibility

  • Showcase of the API version enforcement 02-demo-api-version-enforcement

  • Presentation of the PoC for era activation with on chain transaction 03-demo-era-activation-on-chain

  • Showcase path for backward compatibility and version enforcement:

# Demo: Backward compatibility and API version enforcement

---

# Setup demo

## Download source (if needed)
git clone https://github.com/input-output-hk/mithril.git
#or
git clone git@github.com:input-output-hk/mithril.git

## Build docker images
cd mithril/

### Build docker images 0.1.1
git switch api_0.1.1
rm -f mithril-aggregator/mithril-aggregator && rm -f mithril-signer/mithril-signer && rm -f mithril-client/mithril-client
make build && cp target/release/mithril-aggregator mithril-aggregator/mithril-aggregator && cp target/release/mithril-signer mithril-signer/mithril-signer && cp target/release/mithril-client mithril-client/mithril-client
docker image rm mithril/mithril-signer-0.1.1 mithril/mithril-client-0.1.1 mithril/mithril-aggregator-0.1.1 --force
docker build -t mithril/mithril-aggregator-0.1.1 -f mithril-aggregator/Dockerfile.ci .
docker build -t mithril/mithril-signer-0.1.1 -f mithril-signer/Dockerfile.ci .
docker build -t mithril/mithril-client-0.1.1 -f mithril-client/Dockerfile.ci .

### Build docker images 0.1.2
git switch api_0.1.2
git rebase api_0.1.1
rm -f mithril-aggregator/mithril-aggregator && rm -f mithril-signer/mithril-signer && rm -f mithril-client/mithril-client
make build && cp target/release/mithril-aggregator mithril-aggregator/mithril-aggregator && cp target/release/mithril-signer mithril-signer/mithril-signer && cp target/release/mithril-client mithril-client/mithril-client
docker image rm mithril/mithril-signer-0.1.2 mithril/mithril-client-0.1.2 mithril/mithril-aggregator-0.1.2 --force
docker build -t mithril/mithril-aggregator-0.1.2 -f mithril-aggregator/Dockerfile.ci .
docker build -t mithril/mithril-signer-0.1.2 -f mithril-signer/Dockerfile.ci .
docker build -t mithril/mithril-client-0.1.2 -f mithril-client/Dockerfile.ci .

### Build docker images 0.2.0
git switch api_0.2.0
git rebase api_0.1.2
rm -f mithril-aggregator/mithril-aggregator && rm -f mithril-signer/mithril-signer && rm -f mithril-client/mithril-client
make build && cp target/release/mithril-aggregator mithril-aggregator/mithril-aggregator && cp target/release/mithril-signer mithril-signer/mithril-signer && cp target/release/mithril-client mithril-client/mithril-client
docker image rm mithril/mithril-signer-0.2.0 mithril/mithril-client-0.2.0 mithril/mithril-aggregator-0.2.0 --force
docker build -t mithril/mithril-aggregator-0.2.0 -f mithril-aggregator/Dockerfile.ci .
docker build -t mithril/mithril-signer-0.2.0 -f mithril-signer/Dockerfile.ci .
docker build -t mithril/mithril-client-0.2.0 -f mithril-client/Dockerfile.ci .

---

# Demo: Run demo 

## Reset demo if needed
docker stop $(docker ps -a -q)
rm -rf artifacts/node-bft1/mithril/aggregator
rm -rf artifacts/node-pool1/mithril/signer
rm -rf artifacts/node-pool2/mithril/signer

## Start Cardano network
cd ../devnet-demo
./devnet-stop.sh && NODES=cardano EPOCH_LENGTH=60 ./devnet-run.sh
watch -c "NODES=cardano ./devnet-query.sh"
watch -c "docker ps --format '{{.Names}} - {{.Status}}' | sort"


## Create functions
function container_up {
  API_VERSION=$1 docker-compose -f docker-compose-demo.yaml --profile $2 up --remove-orphans --force-recreate -d --no-build
}

function container_down {
  docker stop $2-$1
}

function container_run {
  API_VERSION=$1 docker-compose -f docker-compose-demo.yaml run $2 $3 $4
}

function container_exec {
  docker exec -it $2-$1 $3 $4
}

function container_logs {
  docker logs -f $2-$1
}

## Backward compatibility

### Start network with version 0.1.1
container_up 0.1.1 mithril-aggregator
container_up 0.1.1 mithril-signer-1
container_up 0.1.1 mithril-signer-2
container_run 0.1.1 mithril-aggregator-genesis
container_logs 0.1.1 mithril-signer-1 2>/dev/null | grep "register_signer" | grep "new_field"
container_logs 0.1.1 mithril-signer-2 2>/dev/null | grep "register_signer" | grep "new_field"
container_logs 0.1.1 mithril-aggregator 2>/dev/null | grep "register_signer" | grep "new_field"
container_run 0.1.1 mithril-client list

### Update network partially with version 0.1.2
container_up 0.1.2 mithril-aggregator
container_up 0.1.2 mithril-signer-1
container_logs 0.1.2 mithril-signer-1 2>/dev/null | grep "register_signer" | grep "new_field"
container_logs 0.1.1 mithril-signer-2 2>/dev/null | grep "register_signer" | grep "new_field"
container_logs 0.1.2 mithril-aggregator 2>/dev/null | grep "register_signer" | grep "new_field"
container_run 0.1.1 mithril-client list
container_run 0.1.2 mithril-client list

## API version enforcement

### Update network fully with version 0.2.0
container_up 0.2.0 mithril-aggregator
container_up 0.2.0 mithril-signer-1
container_up 0.2.0 mithril-signer-2
container_run 0.1.1 mithril-client list
container_run 0.1.2 mithril-client list
container_run 0.2.0 mithril-client list

2023-01-25

  • We have mainly paired on the issue Implement an EraChecker that checks if an era is active #708. We have decided to re challenge the static implementation to make testing more straightforward. In the mean time we have also made an attempt at coding a breaking change that will probably happen when we add a new part in the message to sign (the Epoch in this illustrative example). We have used the Either pattern implemented by the crate either to implement both behavior (Left is for new era, and Right is for legacy era). We have published the modification made to the mithril-common and mithril-aggregator crates on this branch ensemble/708-create-era-checker-test-new-era. We will keep publishing on this branch the mithril-signer adaptation and we will use this example as a real test case for the overall implementation of the era. Also it gives us some track record and experience on how to handle breaking changes and what type of modification we should not do (e.g. modifying the entities stored in a non backward compatible fashion so that it could crash the nodes at era transition)

  • We have also created the pre-release of the new 2304 distribution that is available at 2304.0-prerelease. It is being qualified and should be ready to be released by end of week.

  • Finally, a problem has been identified on the API version enforcement that should be fixed by this PR Fix API version sent in wrong header #712

2023-01-24

2023-01-23

2023-01-20

  • We have discussed about the separation of the messages from the entities for sub fields of the message for issue Implement backward/forward compatible API messages #688. We will probably have a fully separated approach: the messages will not make any reference to the entities in their definition. We will resume this work next week. Also we have noticed that the current implementation of API version enforcement is not compatible with the backward compatibility we are implementing: we have created a new issue Update enforcement of API version with Semver #705 in order to refine the compatibility check with Semver

  • We have talked about the PoC on which we worked for issue PoC Read/Write transaction on chain (for version activation) #672 and decided to rechallenge it with a second implementation that does not use a smart contract. This has been completed and works well, with a much simpler setup. Some drawbacks / pain-points still exist and we will brainstorm about them during our team session next week

  • Finally, we have discussed about the possibility of removing the allow_non_certified_registration Rust feature and all the dead code that we maintain with it. The only remaining point was to find a way of making stress test on a mainnet like environment without spoofing the pool ids (which this feature enables). It appears that a solution could be to rely on a synthetic mainnet (same number of SPOs and same type of immutable database, i.e. immutable files numbers and size):

    • A fake cardano node would be responsible to create a regular intervals new immutable files in the database folder that is used by the aggregator/signers
    • A fake cardano cli would be responsible for answering requests on the epoch, stake distribution, ...
    • These fake cardano node & cli would communicate with a (remote?) service to get all node synchronized, and that would be in charge of gathering the verified pool ids of the signers in order to create a usable stake distribution
    • Given the average memory usage of signer / aggregator nodes (~500MB/~1,000MB), and the number of nodes in a mainnet like network (~3000) it is not likely possible to host them all at once on a single computer
    • We will keep thinking about the design of such an infrastructure to host our stress tests and decide shortly of the opportunity to get rid of the allow_non_certified_registration feature

2023-01-19

2023-01-18

2023-01-17

2023-01-16

  • We have talked about the usage of the portable feature. We have decided to use it by default when building the nodes in order to maximize coverage. This will avoid the crashes encountered by some SPOs when they are building the binaries on a computer and deploying them on a different computer (with a different CPU). In the long run, SPOs will be able to build without the portable feature as an optimization of their node. The associated PR has been merged Fix make build portable #685. Also, a fix has been merged in order to avoid SPOs receiving error messages when they run the make build command from the root of the repository with PR Fix make errors from root #684

  • We have discussed about the PoC under development for the issue PoC handle backward compatibility of API messages #673. It appears that the added value from library such as avro and protobuf is not as much as what we could have expected. In our opinion, we will be able to handle backward compatible messages with the usage of some serde annotations as well as a golden test strategy when updating our models. We will conduct a PoC for a in house development shortly

  • We have merged the issue Update Run Signer Node documentation #681 related to the incomplete documentation for building a signer node as a SPO

  • During our team session:

    • We have talked about the issue PoC Read/Write transaction on chain (for version activation) #672:
      • So far the PoC is working in a very basic setup and we can retrieve some Datum from the utxo of the script address with the Cardano cli
      • Possible alternatives to writing smart contracts are:
      • A good entry point for using Plutus TX without nix is this repository: https://github.com/abailly/black-jack
      • We have 2 options for deploying the script address that will be used to retrieve the era activation markers:
        • Burn the address at compile time (depends on the Mithril network being used)
        • Use a configuration option for this address. In order to avoid possible attacks, it looks reasonable to sign this script address with the Mithril Genesis Keys (preferred solution)
      • Regarding the security of the smart contract:
        • We will conduct at least an internal audit
        • As the features are very simple, we shouldn't need any formal verification
      • A solution that makes use of the possibility for an utxo to have Datum exists but would require to change the address requested after each update. We will probably not use this technique.
    • Regarding the stress testing of the network in a mainnet like setup. We will probably work with synthetic data. However, we will get in touch with the Ledger/Performance team of the Cardano node to get some insights/ideas on how they perform their load tests

2023-01-13

  • We have released the new 2303.0 distribution:

    • It has been successfully deployed to the release-preprod network 🚀
    • However, a SPO has encountered an issue with a SIGILL error due to an "old" CPU (Q3'14). The quick-fix was to activate the portable feature at compilation. We maybe need to also force this feature in the make build script used by the SPOs to build their node. This will ensure that we have a higher adoption rate, even though there is a small impact on the performance
  • We have reviewed the following PRs:

2023-01-12

  • We have prepared the demo path of this iteration:

    1. Introduction
    2. Presentation of the new Batch Verification of Mithril multi-signature
    3. Presentation of the strategy for Mithril Network Update
    4. Showcase of the Mithril Client multi-platform test workflow
    5. Conclusion/Next steps
    6. QA
  • We have talked about the solution implemented in the PR Upgrade devnet to Cardano 1.35.4 #667 to fix the flakiness occurring at protocol parameters transition:

    • We agreed on using 1 more offset when recording the new protocol parameters. This avoids broadcasting 2 different versions of the epoch settings during the same epoch (happens when the node restarts and the protocol parameters are updated in the aggregator)
    • We decided to activate the matrix end to end tests with 3 runs
    • The PR should be merged shortly
  • We have also discussed and exchanged about the PoC under development of the issue PoC handle backward compatibility of API messages #673:

    • We probably don't need many of the features provided by protobuf and avro, but it is worth assessing the feasibility of these solutions
    • We will probably work on a more advanced PoC with an in house develpment based on serde and default values handling

2023-01-11

2023-01-10

2023-01-09

  • We have paired on the issue Upgrade Cardano devnet to 1.35.4 #523 and tried to make the devnet work on all the developers computers. We have also kept on fixing the flakiness that occur

  • During our team session, we have reviewed the draft ADR for the issue Write ADR for graceful updates #671:

    • We could use a 2 phase commit in order to announce an upcoming era:
      • First step: announce the new era not activated yet
      • Second step: activate the new era
    • All the signers should be able to verify the adoption rate (which means that they should also be able to compute stakes era adoption rate given nodes version). This will help avoid manual errors and also avoid some attacks were the transitioned era does not exist (leading to all signer nodes down). The threshold used in that case would be hard-coded in the nodes and would be different than the activation threshold. Also, we would need to embed in the node the correspondence table of the eras included in the node versions.
    • We could use the Mithril certificate to provide the era marker and have the signers create automatically the transaction when the threshold is reached. But it is probably a better idea to keep a manual activation for the era transition
    • We definitely miss an incentive mechanism to avoid being stuck with not reaching the threshold and therefore not being able to activate an era
    • How do we handle rollbacks? What if something goes wrong?
    • Maybe a dual mode would be the solution where there would be 2 mainnet networks (one for preview and the other for real use), and the preview would be activated one or more epoch in advance on the enrolled nodes. In a decentralized setup, we would need to be able to discriminate all the messages by Mithril network (and also to target a Mithril network with an era activation transaction)
    • This means that every information that leaves the node should probably be labeled by the version of the node it comes from
  • Finally, we discussed the possible implementations for the issue PoC Read/Write transaction on chain (for version activation) #672:

    • Transaction Metadata: probably not the better option
    • Transaction Data: we create a Plutus address for a script and we use the TxOutDatum as a placeholder for the era activation information. These information could be read by the cardano cli
    • The chain could also be read with:
      • Reading the database immutable files themselves
      • Using oura or scrolls from txpipe to follow the chain
    • Regarding the secrets management, the SRE team is probably able to provide insights
    • On the long run, we will probably delegate the era activation to the governance mechanism of Cardano (Voltaire)

2023-01-06

  • We have paired on the issue Upgrade Cardano devnet to 1.35.4 #523:
    • We have aligned the use of the portable Rust feature that is now used in the unit tests and release builds (used by the test lab, the docker images, the debian packages). This will avoid the flakiness of the test lab
    • We have investigated the last flakiness observed that occurs at epoch transition. After investigation, it appears that this could be due to the offset applied when updating the protocol parameters: it should probably set to one more epoch (in order to avoid next protocol parameters fluctuation during an epoch). We will attempt to fix this problem shortly

2023-01-05

  • We have mainly worked on the redaction of the issue Write ADR for graceful updates #671. We have created a PR Mithril Network Update ADR #676. Next steps are:

    • Make it review by all the team
    • Prepare tickets for its implementation
    • Answer remaining open questions such as:
      • What value do we need for the stake share threshold to be reached before activating a new era?
      • How to accurately compute this threshold given the evolution of the stakes on the Cardano chain (retiring/new pools)?
  • We have also tried to activate the babbage era on the devnet. Unfortunately this is not working properly: the Cardano nodes stop producing new slots at hard fork activation. We will thus complete the issue Upgrade Cardano devnet to 1.35.4 #523 with running only on the alonzo era. Running the babbage era is not mandatory at the moment, but we will probably need it when we implement the new Mithril network update strategy.

2023-01-04

  • We have reviewed and merged the following PRs:

  • We have mainly focused on drafting the ADR for the issue Write ADR for graceful updates #671:

    • We will handle breaking changes with an era mechanism similar to what is used to active a hard fork on Cardano
    • An era (which could be named differently if this creates confusion for the SPOs) is a sort able value (that will be probably represented with variants of an enum in Rust)
    • New eras are incrementally created
    • When we need to get a minimum threshold of the stakes to activate a breaking change update, an era is used
    • When we need to make a soft update, there is no change in an era
    • A new version of the signer/aggregator node will embed code that can act differently depending on the active era
    • The era is activated for the next (or later) epoch whenever the penetration rate threshold of the version is reached
    • The era transition will not be associated to a database upgrade for simplicity (we will soft update the database first if needed)
    • The era activation is setup in an on chain transaction
    • An updated node will activate the new era once the associated epoch is transitioned to
    • A non updated node will detect era transitions once they are scheduled:
      • Before it is activated: it will display warning in logs, asking for update
      • After it is activated: it will crash, and display a clear error message asking for update
    • If we need to add new configuration parameters, it will be done in the first version of the node that embeds the new era (and the configuration will be used when the era is activated)
    • This mechanism will be used in situations such as:
      • Signing a new type of message (to avoid not reaching the Mithril protocol quorum)
      • Whenever a new version of the cryptographic library requires that all signer use the same version for the network to produce valid multi-signatures/certificates
      • Previous case, including the case where a re-genesis of the certificate chain is needed (to avoid failed re-genesis)
      • When we will change from a centralized setup to a decentralized setup:
        • Transition to on-chain signer key registration
        • Switch to Cardano network backend
    • Regarding the monitoring:
      • We will use versions of the signer/aggregator nodes to monitor the adoption rate of an era enabled nodes (expressed in %age of the Mithril stakes)
      • Each version of the nodes will be mapped to their associated enabled era(s) in a dedicated wiki page of the repository
      • We will rely on the signer registration call made by the signer to the aggregator to record the versions of the node:
        • By adding a HTTP header with the version to the request sent in the centralized setup
        • By appending the version to the transaction written on chain in the decentralized setup
      • We will probably need to use a relational database model to compute accurately the adoption rate (and eventually display it on the explorer)
    • We will probably need to think about the way we could update the certificate chain transparently to be more resistant to cryptographic updates (will be done separately)

2023-01-03

  • We had discussions about the issues regarding flakiness of the test lab in the CI. There are 3 options:

    • Compile with portable feature all the binaries and use GitHub runners:
      • We lose ~20% of performance on the STM computations
      • We avoid ~30% of flakiness of the test lab
      • We could add a matrix test lab with 5/10 runs in order to early detect flakiness
    • Compile without portable feature all the binaries and use GitHub runners (current):
      • We have optimum performance on the STM computations
      • We have ~30% of flakiness of the test lab
      • No matrix test lab in this situation
    • Compile without portable feature all the binaries and use hosted runners (possible?):
      • We have optimum performance on the STM computations
      • We avoid ~30% of flakiness of the test lab
      • We could add a matrix test lab with 5/10 runs in order to early detect flakiness
  • We have created some new issues related to the Graceful Updates on Mithril Network:

    • Write ADR for graceful updates #671:
      • We have paired on this issue and have discussed the design of the ADR
      • We will allocate 1-2 h a day to discuss on this topic during this iteration and probably the next
      • We have already come to the conclusion that we should use a mechanism of feature flag in order to silently deploy a new breaking change version of the nodes that will be activated later, once enough signers have installed it. Soft updates will be deployed similarly as today as different versions will be compatible
      • We will not implement an automatic update feature as it brings too many security issues
      • This means that we will have to rely on an incentive mechanism of the SPOs in order to reach the adoption rate mandatory to activate a new version. If not, we could get stuck in the position where the threshold is never reached and the new version can not be activated
      • We will probably have the signer and aggregator state machines perform an update of the version activation data when an epoch transition is detected (or at node startup). This will allow us to activate a new version for the very next epoch easily (even though there could be a transitory period during which activation occurs)
      • We will need to trigger database migration as well at new version activation
      • We still have to determine which version will be used to provide feature activation
      • Also the testing strategy still needs to be defined for the test lab
    • PoC Read/Write transaction on chain (for version activation) #672
    • PoC handle backward compatibility of API messages #673

2023-01-02

  • We have reviewed and discussed about the issues that we have created before the holiday break

  • We have dedicated our efforts to debugging some flakiness on the CI that were discovered during the implementation of issue Upgrade Cardano devnet to 1.35.4 #523:

    • 30% of the run were blocked on the signer nodes at the "signer registration" step: nothing happened after this step. We have found the problem that was related to the portable feature that should be used when we run pre-compiled binaries in the test lab
    • We still have some flakiness due to many immutable files being created at very close timestamps around epoch 10. We will try fine tuning the Cardano node genesis configuration and the Mithril nodes in order to void this effect
    • We have also investigated the problem that prevents the end to end test to run on some local development environments
Clone this wiki locally