-
Notifications
You must be signed in to change notification settings - Fork 41
Logbook 2023 H1
-
We have reviewed the following PRs:
- Sqlite upgrade and bundling #1024: it has been merged and is the first step of the issue SQLite compatibility in aggregator #837
- nix: bump nixpkgs to get newer rustc #1025: it has been merged and should fix the failing jobs in the Hydra CI
-
Additionally, we have worked on the following issues:
- Add infrastructure monitoring #987: a PR has been created and is still in draft
- Aggregator does not detect certificate chain epoch gap #952: a PR will be created shortly
- testing-preview network does not create certificates #1015: the origin of the problem comes probably from the verification key store legacy implementation. We will migrate it to a database provider to fix the problem shortly
- Factorize protocol crypto operations #669: a PR will be created shortly
-
Finally, we have also verified if the KES keys of the SPOs of the Mithril networks should be rotated: the operational certificate has been renewed for the
Mithril Signer 1
,Mithril Signer 2
,Mithril Signer 3
in thetesting-preview
network
-
We have worked on the following PRs:
- Signer deployment model dev blog post #1016: the PR has been reviewed and merged. It closed the issue Announce the new signer deployment model in a dev blog post #1017
- Aggregator Recompute cert hash command #1013: the PR has been reviewed and merged, and closed the issue Add recompute-certificates-hash command to aggregator #1001
- Implement Mithril Relay in infra #1021: the PR has been reviewed and merged, and closed the issue Adapt infrastructure to use Mithril Relay #1018
- retry when artifact creation failed #1019: the PR has been reviewed and merged. It closed the issue refactor download code in client #1010
- Update openapi spec examples #1020: the PR has been reviewed and merged, and closed the issue Update OpenAPI spec examples #1000
- Use cargo-nextest to run test on CI and reenable tests report #1022: the PR has been created and will be reviwed shortly
-
We have also paired on the Factorize protocol crypto operations #669: a PR will be created shortly
-
Additionally, we have created the bug issue E2E tests are flaky in CI #1023
-
Finally, we have prepared the next iteration 💪
-
We have reviewed the following PRs:
- refactor client snapshot download #1012: the PR has been merged which closes the issue
-
Update dependencies #1014: the PR has been merged and fixes a vulnerability of
openssl
in Rust - Aggregator Recompute cert hash command #1013: the PR needs some changes which have been implemented and will be reviewed tomorrow. We expect the PR to be merged tomorrow as well
- retry when artifact creation failed #1019: the PR has been created and will be reviewed shortly
- Signer deployment model dev blog post #1016: the PR has been created and will be reviewed shortly
-
Additionally, we have worked on the following issues:
- Design & implement basic stress test tool for aggregator #991: pairing on this issue and pushed some updates on the branch https://github.com/input-output-hk/mithril/tree/ensemble/991-stress-test-aggregator. We will keep on working on this first draft ofthe benchmark tool next week
- testing-preview network does not create certificates #1015: this bug has been created and investigated. We understand what prevents the network from signing but we are still investigating why this occurred. A re-genesis of the network has been done and we expect to have it resume the production of certificates tomorrow
- Add infrastructure monitoring #987: a PR is in progress
- Adapt infrastructure to use Mithril Relay #1018: a PR is in progress
-
We have reviewed and merged the following PRs:
- Clean pending_snapshot directory of aggregator #1011: this closes issue Clean pending_snapshot directory of aggregator #983
- Fix darwin nix build #1008: this fixes the problem that we had since yesterday where some hydra builds were not working
-
We have also kept working on the issues:
- refactor download code in client#1010: the PR refactor client snapshot download #1012 has been created and is ready to be reviewed. It will be merged shortly
- Add recompute-certificates-hash command to aggregator #1001: a PR will be created shortly, and once it is merged, we will be able to create a new distribution
- Add infrastructure monitoring #987
- We have worked on the following PRs:
-
Update dependencies #1009: It has been merged and fixed some vulnerability of
openssl
, another got opened. We will create another PR shortly - Fix master certificate retrieval #1007: merged, and closes the critical bug Computation of master certificate of an epoch is incorrect #1006
- Simplify aggregator integration tests #1003: merged and brings some enhancement to the integration tests of the aggregator
- refactoring client #998: merged and closes the issue Refactoring client #982
- Fix flaky end to end tests #977: merged and closes the issue End to end tests are flaky #954
-
Update dependencies #1009: It has been merged and fixed some vulnerability of
-
We have paired on troubleshooting the
testing-preview
aggregator that is unable to sign following the merge of the issue Dates format is not standardized #946 and the re-genesis of the certificate chain that followed. We have understood the problem and created a critical bug for it in the issue Computation of master certificate of an epoch is incorrect #1006. The aggregator should be able to sign back tomorrow -
We have also reviewed the work on the issue Refactoring client #982
-
Additionally , we have verified if the KES keys of the SPOs of the Mithril networks should be rotated: this was not needed this week
-
We have reviewed and merged the PRs:
-
Uniformise datetime usage #994: this PR closes the issue Dates format is not standardized #946. The
testing-preview
network has been re-genesis has there were breaking changes on the certificate chain - Fix rfc3339 datetime migration #1005: this PR is a fix on the migration from the PR #994 that inverted 2 fields when reimporting them in the signed entity table
- nix: bump crane to fix build. Add basic hydraJobs. #1002
-
Uniformise datetime usage #994: this PR closes the issue Dates format is not standardized #946. The
-
We have also groomed and sliced the issue Factorize protocol crypto operations# 669 on which we will pair next week
-
Finally, we have kept working on the issues:
-
We have reviewed the following PRs:
- Signer deployment model #999: the PR has been merged and it closes the issue Design recommended deployment model for SPOs on mainnet and preview/preprod #961
- Uniformise datetime usage #994: the PR is ready to be merged and will be merged tomorrow (as it is a breaking change)
- refactoring client #998: the PR should be completed shortly
-
The production signer deployment model is now ready to be tested by some SPO: we will create a post on the discord channel inviting them to test the unstable version of the signer on the
pre-release-preview
network -
We have also created the issue Add recompute-certificates-hash command to aggregator #1001: this feature will help us avoid breaking changes such as those introduced in Uniformise datetime usage #994 that requires a re-genesis of the certificate chain. We will postpone the new distribution until the feature is implemented (expected by EOW)
-
We have reviewed and merged the following PRs:
- Upgrade Cardano node to 8.1.1 #997: this closes the issue Upgrade Cardano node to 8.1.1 #973
- fix hanging critical error #996: this closes the critical bug issue Aggregator does not exit on critical error #993
- [Add log with node version at startup in aggregator/signer #995]: this closes the issue Log node version at startup in Aggregator/Signer #944
-
We also have created the following PRs that will be reviewed shortly:
-
We have groomed the issue Refactoring client#982 and we will start working on it shortly and created the issue Update OpenAPI spec examples #1000
-
We have reviewed and approved the PR stake distribution artifact #980 which should be merged shortly
-
We have talked about the issue Certificate dates in metadata are not on the same timezone #946 and the difficulty that arose from migrating the database version table. We have decided to attempt to do it and postpone if it is to causing too much troubles
-
During our team session, we have worked on the issue Design & implement basic stress test tool for aggregator #991:
- We have designed a load test architecture:
- We have started implementing some functionalities in the branch ensemble/991-stress-test-aggregator. We will continue working on it on Friday
-
We have worked on the following issues:
- Refactor MithrilStakeDistribution entity #967: the PR Greg/967/stake distribution artifact #980 has been created and is under review. It will be merged shortly
- Certificate dates in metadata are not on the same timezone #946: the PR Uniformise datetime usage #994 has been drafted
- End to end tests are flaky #954: we are still working on fixing the multiplke sources of flakiness
-
The following issues have been created:
- Aggregator does not exit on critical error#993: a critical bug has been created that blocks the production of certificates. We will fix it very shortly
- PoC Stress test tool with E2E test#991: a simple version of the stress test tool from the existing end to end test to be tested in a PoC
-
We have kept working on the issues:
- Certificate dates in metadata are not on the same timezone #946: we have decided to work with datetimes generated only by the Rust code in order to get consistent UTC datetimes. We will create an ADR to explicit this choice shortly
- Refactor MithrilStakeDistribution entity #967: we fixed some tests that were broken and we will have a PR ready for review shortly
- End to end tests are flaky #954: a first source of flakiness has been identified and is in the process of being fixed. We ahev also identified that there exists couple of other source of flakiness that we will also try to fix
-
Also we have reviewed and merged the follwoing PRs:
-
Additionally, we have created the following issues:
-
Finally, we have discussed about the performance benchmark of the aggregator, and it seems that the end to end test that we are using could be a good starting point to provide a convenient first simple stress test tool. We will investigate on this further shortly
-
We have kept working on the issues:
- Refactor MithrilStakeDistribution entity #967: a PR will be created shortly
- Certificate dates in metadata are not on the same timezone #946: a PR will be created shortly
- End to end tests are flaky #954: we are still experimenting and investigating in order to understand where the flakiness comes from
- Aggregator does not always detect new immutable file #953: we have closed the issue once again as the problem does not occur anymore
-
We have also created a bug issue on the client that has been discovered while preparing the demo path: Client snapshot download command fails with option --download-dir #979
-
Finally, we have worked on the demo path that we will shocase tomorrow:
Sign & restore multiple types of data on the devnet
# Setup demo
## Download source (if needed)
git clone https://github.com/input-output-hk/mithril.git
#or
git clone git@github.com:input-output-hk/mithril.git
## Checkout correct branch
cd mithril/
git checkout 2bc2a383765c9ae98b6fcfa8896d4b1de203b09d
## Build docker images
cd mithril/
docker rmi $(docker images -q) --force
docker rm -vf $(docker ps -a -q)
### Build docker images distribution-1 (Thales era only)
rm -f mithril-aggregator/mithril-aggregator && rm -f mithril-signer/mithril-signer && rm -f mithril-client/mithril-client
make build && cp target/release/mithril-aggregator mithril-aggregator/mithril-aggregator && cp target/release/mithril-signer mithril-signer/mithril-signer && cp target/release/mithril-client mithril-client/mithril-client
docker image rm mithril/mithril-signer-demo mithril/mithril-client-demo mithril/mithril-aggregator-demo --force
docker build -t mithril/mithril-aggregator-demo -f mithril-aggregator/Dockerfile.ci .
docker build -t mithril/mithril-signer-demo -f mithril-signer/Dockerfile.ci .
docker build -t mithril/mithril-client-demo -f mithril-client/Dockerfile.ci .
---
# Demo: Run demo
cd ../devnet-demo
## Create functions
function stop_devnet {
./devnet-stop.sh
docker stop $(docker ps -a -q)
rm -rf artifacts/node-bft1/mithril/aggregator
rm -rf artifacts/node-pool1/mithril/signer
rm -rf artifacts/node-pool2/mithril/signer
rm -rf artifacts/node-pool3/mithril/signer
}
function start_devnet {
./devnet-stop.sh && NODES=cardano SLOT_LENGTH=0.35 EPOCH_LENGTH=120 NUM_POOL_NODES=3 ./devnet-run.sh
}
function monitor_devnet {
watch -c "NODES=cardano ./devnet-query.sh"
}
function epoch_devnet {
CARDANO_NODE_SOCKET_PATH=artifacts/node-bft1/ipc/node.sock ./artifacts/cardano-cli query tip --cardano-mode --testnet-magic 42 | jq '.epoch'
}
function containers_list {
watch -c "docker ps --format '{{.Names}} - {{.Status}}' | sort"
}
function container_up {
DISTRIBUTION_VERSION=$1 ERA_READER_ADAPTER_PARAMS=$(cat era-markers/config.json) docker-compose -f docker-compose-demo.yaml --profile $2 up --remove-orphans --force-recreate -d --no-build
}
function container_down {
docker stop $2-$1
}
function container_run {
DISTRIBUTION_VERSION=$1 ERA_READER_ADAPTER_PARAMS=$(cat era-markers/config.json) docker-compose -f docker-compose-demo.yaml run $2 $3 $4 $5 $6 $7 $8 $9
}
function container_exec {
docker exec -it $2-$1 $3 $4
}
function container_logs {
docker logs -f $2-$1 2>/dev/null
}
function era_activate_thales {
cat > era-markers/markers.json << EOF
[
{"name": "thales", "epoch": 1}
]
EOF
cat era-markers/markers.json | jq .
}
## Reset demo if needed
stop_devnet
## Start Cardano network
start_devnet
## Start Mithril network
era_activate_thales
container_up demo mithril-aggregator
container_up demo mithril-signer-1
container_up demo mithril-signer-2
container_up demo mithril-signer-3
container_run demo mithril-aggregator-genesis
containers_list
## Query Aggregator database
watch_query_aggregator_db_file watch.sql
## Client
## Config
AGGREGATOR_ENDPOINT=http://localhost:8080/aggregator
### Commands Help
container_run demo mithril-client --help
container_run demo mithril-client snapshot --help
container_run demo mithril-client mithril-stake-distribution --help
## Snapshot Command
SNAPSHOT_DIGEST=$(curl -sL $AGGREGATOR_ENDPOINT/artifact/snapshots | jq -r '.[0].digest')
container_run demo mithril-client snapshot list
container_run demo mithril-client snapshot list --json > _ && cat _ | jq .
container_run demo mithril-client snapshot show $SNAPSHOT_DIGEST
container_run demo mithril-client snapshot show $SNAPSHOT_DIGEST --json > _ && cat _ | jq .
rm -rf ./download/db && container_run demo mithril-client snapshot download $SNAPSHOT_DIGEST --download-dir=/data/download
tree ./download/db
## Mithril Stake Distribution Command
MSD_HASH=$(curl -sL $AGGREGATOR_ENDPOINT/artifact/mithril-stake-distributions | jq -r '.[0].hash')
container_run demo mithril-client mithril-stake-distribution list
container_run demo mithril-client mithril-stake-distribution list --json > _ && cat _ | jq .
container_run demo mithril-client mithril-stake-distribution download $MSD_HASH --download-dir=/data/download
cat download/mithril_stake_distribution-$MSD_HASH.json | jq .
-
We have worked on the following issues:
-
Aggregator does not always detect new immutable file #953: the problem does not occur anymore on the
testing-preview
network. We keep watching if it happens again, if this is the case we will keep investigating or we will close the issue - End to end tests are flaky #954: after investigation, we have noticed that the aggregator is verifying single signature against the AVK of the previous epoch than the one that is expected. We are still trying to fix the problem
- Certificate dates in metadata are not on the same timezone #946: we have started working on the issue and we will likely align the usage of all datetimes with a PR to be created shortly. This will be a breaking change and will require a re-genesis of the certificate chains
- Refactor MithrilStakeDistribution entity #967: we have worked on this issue and a PR will be created shortly
- Design recommended deployment model for SPOs on mainnet and preview/preprod #961: the idea that we had to use a relay seems to be harder to implement than what we expected and we will likely implement proxy instead (which will require an update of the signer)
-
Aggregator does not always detect new immutable file #953: the problem does not occur anymore on the
-
We have also started brainstorming on the epic Benchmark performances of Mithril Aggregator #904:
- We need to run aggregator on one machine and the signers on another
- We probably need more traces
- How can we simulate 3K signers?
- We could spoof real SPOs from a Cardano network, but it would create bias with the signer registration and would be hard to implement
- A better idea is to use a fake Cardano cli:
- It would behave deterministically on multiple machines (epoch, and immutable files could be based on time)
- We would create in advance the cryptographic materials needed to register the signers
- The stake distribution would be computed from this pre-generated signers
- We can test:
- Real signers: memory usage would be higher but the network would behave as in real life (with faster epoch)
- Simulated signers: another program that would simulate the calls to the aggregator in a less realistic way, but with a finer control on some calls
- We will probably experiment with both (depending on the needs/time)
- A nice to have feature is to increase the numbers of signers during the test (in order to gradually reach the limit of the system)
- What do we measure?
- Is the service fulfilled?
- Check if the certificates/artifacts are created at expected pace
- Is the aggregator working properly when clients are retrieving artifacts/certificates
- Spot bottlenecks?
- Monitor aggregator physical resources (load, memory, i/o)
- Monitor real curves under stress vs expected nominal curves
- Keep logs and resources records out in a centralized repository for further analysis
- Explore the tools that we will probably have to develop in order to analyze and extract information along the way
- Is the service fulfilled?
- Other questions:
- How to automate the stress tests (fully or partially)?
- When to run the stress tests?
- How to keep the tests up to date with new developments features of the networks?
-
We have reviewed and merged the following PRs:
- Fix Mithril Client multi-platform test workflow #965 which closes the issue Fix Mithril Client multi-platform test with new client interface #956
- add blog article about new client API #975: this PR and the PR below close the issue Update client documentation #897 and the epic Handle signed entity types in client #894
- Update Mithril Client documentation #974
- update client architecture documentation #976 which closes the issue Update architecture documentations for new types of data #898 and epic Design and implement generic signing/verification entity services #780 🥳
-
We have kept working on the following issues:
- Certificate dates in metadata are not on the same timezone #946: we are assessing the problems linked to the way dates are handled and we will create a PR that fixes the problem. We will probably have to re-genesis the certificate chains with these modifications
- End to end tests are flaky #954: We have noticed that there exists a discrepancy between the epoch used to sign the messages in the signers and the epoch that is used to verify the signature received on the aggregator. A draft PR that partially fixes the problem has been created Fix flaky end to end tests #977
-
We have also created the issue Reactivate Publish Results job in CI #978
-
We have reviewed and merged the following PRs:
- refacto digester #972: this closes the issue Refactoring client #960
- Enhance ImmutableDigesterError::NotEnoughImmutable error #971: this closes the issue Enhance ImmutableDigesterError::NotEnoughImmutable error#969
- Enhance terraform infrastructure #957 which closes the issue Enhance terraform infrastructure #930. All the networks have been upgraded
-
Additionally, we have noticed that the issue Aggregator does not always detect new immutable file #953 is not fixed after applying the fix: it has been reopened and we will investigate further on the problem shortly
-
Finally, we have also verified if the KES keys of the SPOs of the Mithril networks should be rotated: this was not needed this week
-
We have paired and merged the PR Stake distribution client subcommand #966 that closes the issue Create the sub-command for Mithril Stake Distribution in client #896 🎉. We have also merged the PR ete msd command #968 which closes the issue Adapt end to end tests to handle new types of data #899
-
The issue Aggregator does not always detect new immutable file #953 has also been closed following the merge of the PRs Fix retrieve open message from correct epoch #964 and Fix clean epoch Certifier service #970
-
We have also reviewed and validated the PR Enhance terraform infrastructure #957 which will be merged tomorrow
-
The issue Upgrade Cardano node to 8.1.1 #973 was created following the pre-release of the Cardano node
8.1.0
. We will work on it shortly. -
Additionally, we have kept working on the issue Fix Mithril Client multi-platform test with new client interface #956. We are fixing somerights problem on the Docker images when they restore/verify artifacts
-
Finally, we have started workingon the issue Design recommended deployment model for SPOs on mainnet and preview/preprod #961. We have opted for a design with a relay instead of a proxy and we will use the
socat
tool that is largely distributed with Linux: this will help us keep a very light setup procedure for themainnet
-
We have paired on the PR Stake distribution client subcommand #966 that relates to issue Create the sub-command for Mithril Stake Distribution in client #896. The PR will be ready to merge shortly
-
We have also worked on the issues:
- Enhance terraform infrastructure #930: PR is ready to be reviewed
- Fix Mithril Client multi-platform test with new client interface #956: PR has been created
- Aggregator does not always detect new immutable file #953: PR has been created
- End to end tests are flaky #954: under investigation
- Refactor MithrilStakeDistribution entity #967: created and we will work on it tomorrow
-
We have paired, reviewed and merged on the PR refactoring client #963 which closes on task of the issue Refactoring client #960
-
We have also paired on the Create the sub-command for Mithril Stake Distribution in client #896 for which a PR will be created shortly
-
We have also worked and reviewed the PR Enhance terraform infrastructure #957. We will merge it shortly
-
We have also done a test of performance of the computation of the stake distribution on the
mainnet
following the improvements done by the Cardano team and that will be released with the new8.1.0
versio: we have noticed a drastic improvement that made the computation much faster to execute (<1s
when it used to take~1h
) 🎉 This closes the issue Check performance impact of new stake distribution command on the mainnet #962 and epic Compute Stake Distribution for mainnet #880. We will rollout the Cardano8.1.0
version on the Mithril networks as soon as it is released
-
We have reviewed and merged the Upgrade Rust 1.70.0 #959 that closes the issue CI tests fail with Rust 1.70.0 #958
-
We have created and groomed the following issues:
-
During our team session, we talked about:
- Stake Distribution new computation and performances: we asked if the fix on the performance will be released with the new Cardano
8.1.0
and this will be the case 🎉 (see https://github.com/input-output-hk/ouroboros-consensus/pull/92#issuecomment-1576848835) - New P2P configuration seems to not be completely deployed on the mainnet: this requires a different configuration for
mainnet
vspreview
/preprod
. We are waiting for a confirmation from the Cardano team - Rolling update strategy: we discussed about how we could test breaking changes with a rolling update strategy early in the testing process. An idea is to use an on demand test in the Github Actions that would run a end to end test with multiple signers of 2 different distributions and would make sure that the aggregator is able to produce certificates/artifacts (with specific protocol parameters to make sure all signers contribute in order to produce a valid multi signature). Another option that we explored is a blue-green strategy, but given the epoch duration it does not look very efficient to test rapidly the upgrades
- Hosting of
mainnet
Aggregator: the best option seems to keep ops on dev side at first with same SLO/SLA as for test networks, and work hand in hand with ops to implement best practices for monitoring/alerting and prepare for higher SLO - We also talked about testing the implementation of a client compiled in WASM that would run in the browser (but we need to test that this is possible with the current cryptographic backend)
- Stake Distribution new computation and performances: we asked if the fix on the performance will be released with the new Cardano
-
We have noticed that the CI is broken and fails on Rust tests following upgrade to version
1.70.0
. The issue CI tests fail with Rust 1.70.0 #958 has been created and we will fix the problem shortly -
We have worked on the issue Enhance terraform infrastructure #930 and a PR Enhance terraform infrastructure #957 has been created
-
Finally, we have also verified if the KES keys of the SPOs of the Mithril networks should be rotated: the operational certificate has been renewed for the
Mithril Signer 2
in therelease-preprod
network
-
We have paired, reviewed and merged the PR mithril client snapshot commands #951 which closes the issues:
-
We have also created the following issues for the next iteration(s):
-
We have reviewed and merged:
-
We have also kept pairing on the issue Create the sub-command for Cardano Immutable Files Full in client #895: we have created the PR Ensemble/895/mithril client snapshot commands #951 which will be merged shortly
-
We have released the new distribution
2321.1
on therelease-preprod
network, and we will monitor that everything works as expected in the coming days (especially at next epoch transition) -
We have merged the following PRs:
-
Additionally, we have paired on the issues:
- Create the sub-command for Cardano Immutable Files Full in client #895: the PR is almost ready and should be created shorlty
- Add export path in Client CLI #512: done in the same PR
- Adapt end to end tests to handle new types of data #899: partially done in the same PR
- Update client documentation #897: partially done in the same PR
- Update architecture documentations for new types of data #898: partially done in the same PR
-
Finally, we have also reviewed the issues:
- Adapt the aggregator REST API to list certificates #892: we reviewed and applied some review comments on the PR Add Certificate list route to aggregator REST API #949. It should be ready for being merged tomorrow
-
Switch to Pythagoras era #941: we will postpone this issue as we probably need a broader scope for it (and alos it is not any more required to release
2321
distribution)
-
We have noticed that there was a breaking change of the ledger state between Cardano node versions
1.35.7
and8.0.0
which makes the node replay all blocks to rebuild the ledger state when being restored from an archive that was created by another version of the Cardano node. In order to mitigate the problem, we have created the following issues:- Make Cardano node version part of the Mithril network configuration #947
- Add aggregator Cardano node version in snapshot artifact #948
- We also think that the Cardano node update should warn from such breaking changes
-
We have noticed some flakiness on the CI end to end tests, we are investigating the problem and we will create a fix shortly
-
We have worked on the issue Adapt the aggregator REST API to list certificates #892 and a draft PR Add Certificate list route to aggregator REST API #949 has been created
-
We have created the following optimization issues:
-
We have created a PR Fix curl commands for latest snapshot digest in docs #943 that fixes some commands of the documentation that were not working since our latest pre-release
-
Finally, we have also verified if the KES keys of the SPOs of the Mithril networks should be rotated: this was not needed this week
-
We have noticed some problems on the
2321.0-pre
pre-release :- A breaking change has been introduced without creating a new era (bump of Open API version): this has lead to the signers running previous version to be prevented from communicating with the aggregator on the
pre-release-preview
network - In order to fix the problem, a fix has been created shortly with the
2321.1-pre
, and the signers running the previous versions were back up again shortly after deployment - Another issue has been identified: the signature of a new type should also have been included in a new era. Given the current protocol parameters of the network, until 3 signers are running the new version the open message certifying the Mithril Stake Distribution at the beginning of epochs got stuck and blocked the queue of open messages. A manual intervention has enabled the network to resume certifying.
- We have decided to postpone the release
2321
until the service is operating properly - We have created the issue Switch to Pythagoras era #941 and created the associated PR Create new Pythagoras era #942
- Also it appears that from time to time, the open message queue is stuck and that the new immutable files are not detected properly. A restart of the aggregator fixes the problem. We are investigating this problem and we will also create an issue.
- A breaking change has been introduced without creating a new era (bump of Open API version): this has lead to the signers running previous version to be prevented from communicating with the aggregator on the
-
We have also created a PR to update the Mithril log on our websites Update Mithril logo #940
-
We have reviewed and merged the following issues:
- Enhance state machines Aggregator/Signer #933: the PR Update state machines runtime in Aggregator/Signer #934 has been reviewed and merged
- Upgrade Cardano node to 8.0.0#920: the PR Upgrade Cardano node to 8.0.0 #922 has been reviewed and merged
- Replace current_beacon with current_epoch in aggregator runtime #935: the issue has been created
-
We have also continued pairing on the issue Create the sub-command for Cardano Immutable Files Full in client #895 and worked on the refactoring of the commands. A PR will be created shortly.
-
Finally, we have created a new distribution pre-release
2321.0-pre
that has been deployed on thepre-release-preview
network
-
We have continued pairing on the issue Create the sub-command for Cardano Immutable Files Full in client #895 which requires a refactoring of the client. We will create the associated PR tomorrow. In the mean time, we have pushed the WIP work on this branch
-
We have also worked on the issues:
- Enhance state machines Aggregator/Signer #933: created the issue and the PR Update state machines runtime in Aggregator/Signer #934 which is ready to be reviewed
- Upgrade Cardano node to 8.0.0 #920: the PR Upgrade Cardano node to 8.0.0 #922 is ready to be reviewed
- Update architecture documentations for new types of data #898: in progress
-
We have reviewed and merged the PR rename signed entity service #929 which closes the issue Enhance MessageAdapter for Artifact in aggregator REST API #925
-
The issue Remove certificate hash from Snapshot #932 has been created and we will attempt to work on it during this iteration
-
We have also paired on the issues:
- Create the sub-command for Cardano Immutable Files Full in client #895: we started adapting the client to handle mutiple commands and a PR will be created shortly
- Upgrade Cardano node to 8.0.0 #920: we have worked on the modification of the aggregator state machine required for this update, and started implement it
-
We have paired on the issue Enhance MessageAdapter for Artifact in aggregator REST API#925 and the PR rename signed entity service #929 is almost ready to be merged
-
We have created a fix for the backward compatibility of the stake distribution computation with the new cardano cli command in the PR Fix stake distribution computation for zero stake pools #931 which has been reviewed and merged: the zero stake pools must be removed from the computation or it could lead to a discrepancy of the AVK between new and legacy computations
-
We have also created the issue Enhance terraform infrastructure #930
-
We have worked on the following issues:
- We have fixed a problem with handling of legacy snasphot routes in the aggregator with the PR Fix legacy snapshot routes redirect #926 which was reviwed and merged
- We have created and paired on the technical issue Enhance MessageAdapter for Artifact in aggregator REST API #925. The PR rename signed entity service #929 has been created and we will keep working on it tomorrow
- We have created and worked on the issue Adapt the explorer to handle new aggregator /artifact routes #927, the PR Add mithril stake distribution to explorer #928 has been created, reviewed and merged. The new version of the explorer is now live on https://mithril.network/explorer 🚀
-
We have also prepared the demo path for the iteration:
- Introduction
- Explanation of the framework for signing multiple types of data
- Showcase of a Mithril network signing 2 types of data on a
devnet
- Next steps
- Q&A
- Conclusion
# Demo: Sign multiple types of data
---
# Setup demo
## Download source (if needed)
git clone https://github.com/input-output-hk/mithril.git
#or
git clone git@github.com:input-output-hk/mithril.git
## Checkout correct branch
cd mithril/
git switch jpraynaud/920-upgrade-cardano-node-8.0.0
git cherry-pick 180f137b4ad5be6fdaaaebbfb7dc09049e1f24e9
cargo build --release
## Build docker images
cd mithril/
docker rmi $(docker images -q) --force
docker rm -vf $(docker ps -a -q)
### Build docker images distribution-1 (Thales era only)
rm -f mithril-aggregator/mithril-aggregator && rm -f mithril-signer/mithril-signer && rm -f mithril-client/mithril-client
make build && cp target/release/mithril-aggregator mithril-aggregator/mithril-aggregator && cp target/release/mithril-signer mithril-signer/mithril-signer && cp target/release/mithril-client mithril-client/mithril-client
docker image rm mithril/mithril-signer-demo mithril/mithril-client-demo mithril/mithril-aggregator-demo --force
docker build -t mithril/mithril-aggregator-demo -f mithril-aggregator/Dockerfile.ci .
docker build -t mithril/mithril-signer-demo -f mithril-signer/Dockerfile.ci .
docker build -t mithril/mithril-client-demo -f mithril-client/Dockerfile.ci .
---
# Demo: Run demo
cd ../devnet-demo
## Create functions
function stop_devnet {
./devnet-stop.sh
docker stop $(docker ps -a -q)
rm -rf artifacts/node-bft1/mithril/aggregator
rm -rf artifacts/node-pool1/mithril/signer
rm -rf artifacts/node-pool2/mithril/signer
rm -rf artifacts/node-pool3/mithril/signer
}
function start_devnet {
./devnet-stop.sh && NODES=cardano SLOT_LENGTH=0.35 EPOCH_LENGTH=120 NUM_POOL_NODES=3 ./devnet-run.sh
}
function monitor_devnet {
watch -c "NODES=cardano ./devnet-query.sh"
}
function epoch_devnet {
CARDANO_NODE_SOCKET_PATH=artifacts/node-bft1/ipc/node.sock ./artifacts/cardano-cli query tip --cardano-mode --testnet-magic 42 | jq '.epoch'
}
function containers_list {
watch -c "docker ps --format '{{.Names}} - {{.Status}}' | sort"
}
function container_up {
DISTRIBUTION_VERSION=$1 ERA_READER_ADAPTER_PARAMS=$(cat era-markers/config.json) docker-compose -f docker-compose-demo.yaml --profile $2 up --remove-orphans --force-recreate -d --no-build
}
function container_down {
docker stop $2-$1
}
function container_run {
DISTRIBUTION_VERSION=$1 ERA_READER_ADAPTER_PARAMS=$(cat era-markers/config.json) docker-compose -f docker-compose-demo.yaml run $2 $3 $4
}
function container_exec {
docker exec -it $2-$1 $3 $4
}
function container_logs {
docker logs -f $2-$1 2>/dev/null
}
function monitor_versions {
watch -c "./sqlite3 -table -batch artifacts/node-bft1/mithril/aggregator/stores/monitoring.sqlite3 < stake_signer_version.sql | head -n 50"
}
function era_activate_thales {
cat > era-markers/markers.json << EOF
[
{"name": "thales", "epoch": 1}
]
EOF
cat era-markers/markers.json | jq .
}
function era_announce_pythagoras {
cat > era-markers/markers.json << EOF
[
{"name": "thales", "epoch": 1},
{"name": "pythagoras", "epoch": null}
]
EOF
cat era-markers/markers.json | jq .
}
function era_activate_pythagoras {
EPOCH_ERA_SWITCH=$(( $(epoch_devnet) + 1))
cat > era-markers/markers.json << EOF
[
{"name": "thales", "epoch": 1},
{"name": "pythagoras", "epoch": $EPOCH_ERA_SWITCH}
]
EOF
cat era-markers/markers.json | jq .
}
function era_remove_thales {
EPOCH_ERA_SWITCH=$(( $(epoch_devnet) - 1))
cat > era-markers/markers.json << EOF
[
{"name": "pythagoras", "epoch": $EPOCH_ERA_SWITCH}
]
EOF
cat era-markers/markers.json | jq .
}
function query_aggregator_db {
sqlite3 ./artifacts/node-bft1/mithril/aggregator/stores/aggregator.sqlite3 $1
}
function query_aggregator_db_file {
sqlite3 -table ./artifacts/node-bft1/mithril/aggregator/stores/aggregator.sqlite3 < $1
}
function watch_query_aggregator_db_file {
watch -c "sqlite3 -table ./artifacts/node-bft1/mithril/aggregator/stores/aggregator.sqlite3 < $1"
}
function list_certificates {
curl -s http://localhost:8080/aggregator/certificates | jq '.[0:5]'
}
function get_certificate {
curl -s http://localhost:8080/aggregator/certificate/$1 | jq .
}
function list_artifact_cardano_immutable_files_full_snapshots {
curl -s http://localhost:8080/aggregator/artifact/snapshots | jq '.[0:5]'
}
function get_artifact_cardano_immutable_files_full_snapshot {
curl -s http://localhost:8080/aggregator/artifact/snapshot/$1 | jq .
}
function list_artifact_mithril_stake_distributions {
curl -s http://localhost:8080/aggregator/artifact/mithril-stake-distributions | jq '.[0:5]'
}
function get_artifact_mithril_stake_distribution {
curl -s http://localhost:8080/aggregator/artifact/mithril-stake-distribution/$1 | jq .
}
## Reset demo if needed
stop_devnet
## Start Cardano network
start_devnet
## Start Mithril network
era_activate_thales
container_up demo mithril-aggregator
container_up demo mithril-signer-1
container_up demo mithril-signer-2
container_up demo mithril-signer-3
container_run demo mithril-aggregator-genesis
containers_list
container_logs demo mithril-signer-1
## Query Aggregator database
query_aggregator_db ".tables"
query_aggregator_db ".schema"
query_aggregator_db_file entity_types.sql
watch_query_aggregator_db_file watch.sql
## List Signed Artifacts / Mithril Stake Distribution
list_artifact_mithril_stake_distributions
LAST_MITHRIL_STAKE_DISTRIBUTION=$(list_artifact_mithril_stake_distributions | jq -r '.[0'])
get_artifact_mithril_stake_distribution $(echo $LAST_MITHRIL_STAKE_DISTRIBUTION | jq -r '.hash')
get_certificate $(echo $LAST_MITHRIL_STAKE_DISTRIBUTION | jq -r '.certificate_hash')
## List Signed Artifacts / Cardano Immutable Files Full Snapshot
list_artifact_cardano_immutable_files_full_snapshots
LAST_CARDANO_IMMUTABLE_FILES_FULL_SNAPSHOT=$(list_artifact_cardano_immutable_files_full_snapshots | jq -r '.[0'])
get_artifact_cardano_immutable_files_full_snapshot $(echo $LAST_CARDANO_IMMUTABLE_FILES_FULL_SNAPSHOT | jq -r '.digest')
get_certificate $(echo $LAST_CARDANO_IMMUTABLE_FILES_FULL_SNAPSHOT | jq -r '.certificate_hash')
-
We have reviewed and merged the PR Implement new Stake Distribution computation #921 which closes the issue Implement new stake distribution computation in the ChainObserver #919
-
We have also reviewed, paired and merged the PR Implement Artifact routes #924 which closes the issue Adapt the aggregator REST API to retrieve list/detail of signed entities by signed entity type #893
-
Additionally, during our team session we have discussed about the following topics:
- Cardano cli command performance issue for computing the stake distribution: even though the new command is much faster that the previous computation we used, it still takes
1 h
on themainnet
to be computed with2
cores used at100%
. This performance should be fixed when we release for themainnet
as we want to minimize the performance impact of a Mithril signer on a Cardano block producing node. We will file an issue on the Cardano node repository - We have also talked about the assumption that we can make regarding honest Cardano stakes and we will probably keep a
60-65%
assumption - Regarding KYC for the SPOs, we think that:
- SPOs that will be whitelisted at launch, should benefit from a
Pioneer
badge that identifies them as "trusted" parties - We could probably also provide a badge for them to display on their website
- We could work with Pool Tool, Cexplorer and Blockforst to implement tools that provide visibility for the SPOs involved in Mithril
- SPOs that will be whitelisted at launch, should benefit from a
- Cardano cli command performance issue for computing the stake distribution: even though the new command is much faster that the previous computation we used, it still takes
-
Finally, we have reviewed and merged the PR Update dependencies #923
-
We have worked on the following issues:
- Implement new stake distribution computation in the ChainObserver #919: the PR Implement new Stake Distribution computation #921 is ready to be reviewed
-
Upgrade Cardano node to 8.0.0 #920: the end to end test has been accelerated and now takes
~1'30"
, when it used to take2'30"
💪 . However, there are some modifications that need to be implemented in the signer and aggregator state machines so that the current immutable beacon is not used any more to detect state transitions. An issuewill be created shortly with the required update - Adapt the aggregator REST API to retrieve list/detail of signed entities by signed entity type #893: the PR Implement Artifact routes #924 has been drafted and we will keep working on it next week
-
Also, we have created a new PR Update dependencies #923 for updating the dependencies of the repository that is ready to be reviewed and merged
-
Finally, we have also verified if the KES keys of the SPOs of the Mithril networks should be rotated: this was not needed this week
-
We have kept pairing on the issue Handle multiple signed entity types in aggregator runtime #907 and we have merged the PR Handle multiple Signed Entity Types in aggregator runtime #908. We have seen the first certificates and signed entities created on the
testing-preview
network with the first types of data to be signed:Mithril Stake Distribution
andCardano Immutable Files Full Snapshot
🥳 -
We have also worked on the epic Compute Stake Distribution for mainnet #880:
- Implement new stake distribution computation in the ChainObserver #919: issue was crated and the associated PR Implement new Stake Distribution computation #921 is ready to be reviewed
- Upgrade Cardano node to 8.0.0 #920: issue was crated and the associated PR Upgrade Cardano node to 8.0.0 #922 is ready to be reviewed
- We have mainly paired on the issue Handle multiple signed entity types in aggregator runtime #907 for which we have created the PR Handle multiple Signed Entity Types in aggregator runtime #908:
- The PR is almost complete, and should be merged tomorrow
- The end to end test are green (which means that we are now able to sign multiple types of data end to end 🎉)
- We are still in the process of:
- Adding missing tests in the runner of the aggregator to make sure we compute correctly the next open message to sign
- Fixing some integration tests that need to be adapted to reflect the usage of multiple types of data
-
Following the request from a SPO, we have created a poll on discord in order to check if the community is OK to modify the nomenclature of the pre-release tags by suffixing with
-pre
instead of-prerelease
. We will follow up with this poll and create an issue if we decide to implement it -
We have worked on the following issues:
- Adapt runtime to use signable builder service in signer #854: we have reviewed and merged the associated PR Use Signable Builder Service in signer #903, which closes the issue
- Adapt runtime to use artifact builder service in aggregator #869: we have reviewed, paired and merged the PR Use Artifact Builder Service in aggregator #906, which closes the issue
- Handle multiple signed entity types in aggregator runtime #907: we have created this issue that was missing and we have started working on it
-
We have released the new
2318.0
distribution and deployed it on therelease-preprod
network. Everything is behaving as expected and we the network has already appended new certificate to the chain -
We have also created (or updated existing) epics for the
mainnet
release: -
Additionally, we have worked on the issue Adapt runtime to use artifact builder service in aggregator #869 and created the PR Use Artifact Builder Service in aggregator #906 that is ready to be reviewed, and that should be merged shortly
-
Finally, we have also verified if the KES keys of the SPOs of the Mithril networks should be rotated: we have rotated the keys and created a new operational certificate for the
signer-2
of thepre-release-preview
network
-
We have reviewed and merged the PR Cardano Immutable Files Full Artifact builder in aggregator #900 which closes the issue Implement artifact builder for Full Immutables Snapshot #871
-
We have also paired on the following issues:
- Adapt runtime to use signable builder service in aggregator #853: we reviewed and merged the PR Use Signable Builder Service in aggregator #901
- Adapt runtime to use signable builder service in signer #854: we created the PR Use Signable Builder Service in signer #903 which is ready to be reviewed
-
Additionally, we had a workshop about making the Signer Registration process on-chain:
- We will use the Cardano chain as a decentralized broadcast channel for the signer registration
- The signer and aggregator nodes will have to keep track of the latest OpCert counter for SPOs at each epoch (as explained in the issue Verify that the OpCert used for registration is the latest #872) and enforce the latest counter was used at registration epoch
- The data will be stored in TxDatum/Metadata of a transaction that will occur through a smart contract (only needed to have a common address for the Utxos)
- The storage requirement is at most
2KB
of compressed data per epoch to store: theMithril Signer Verification Key
, itsKES Signature
and the associatedOperational Certificate
- This translates to a cost of
~300K Lovelace
which is~0.10 USD
per epoch, i.e~1 USD
per month - We will need to have a chain observer (using txpipe crate) to observe transactions taking place for such signer registrations in the Mithril nodes
- Here is schema of the design:
-
Finally, we have kept monitoring the new pre-release
2318.0-pre-release
: everything is still working as expected and we will release the new2318
distribution tomorrow
-
Today, we have created the following items:
-
Beta Release Mainnet Milestone: a milestone that will help us work toward releasing to the
mainnet
- Adapt the aggregator REST API to list certificates #892: groomed the issue
- Adapt the aggregator REST API to list signed entities by signed entity type #893: groomed the issue
- Handle signed entity types in client #894: groomed the epic and sub issues
-
Beta Release Mainnet Milestone: a milestone that will help us work toward releasing to the
-
We have also merged the PR Mithril Stake Distribution Signable builder #885 which has closed the issue Implement signable builder for Mithril Stake Distribution #851
-
Additionally, we have paired on the issue Implement artifact builder for Full Immutables Snapshot #871 and created the PR Cardano Immutable Files Full Artifact builder in aggregator #900 which is ready to be merged
-
Finally, we have monitored the new pre-release
2318.0-pre-release
: everything is working as expected and we will probably release the new2318
distribution tomorrow or by end of week at most
-
We have worked on the following PRs:
- immutable signable #886: it has been reviewed and merged
- Mithril Stake Distribution Artifact builder in aggregator #887: it has been reviewed and merged
- Make doc ci reports & fail if there's an error #890: it has been reviewed and merged
- add signable builder in signer #891: we have paired, reviewed and merged it
- Make Artifact usable as trait object #889: it is ready to be reviewed and merged
- Mithril Stake Distribution Signable builder in signer #885: it is ready to be reviewed and merged
-
We have also created a new distribution pre-release:
2318.0-prerelease
. We have also ran the clients test workflow with success: https://github.com/input-output-hk/mithril/actions/runs/4860119493. We expect to release it by the end of the week -
During our team session, we discussed about the following subjects:
- Q&A: we will organize a meeting with the Q&A of the Core tech tribe and conduct a sanity check with them
- Delivery board: it is accessible here
- Snapshot the ledger state as a new type of data (for this we will need to be able to compute ledger state deterministically, but it looks feasible)
- Relevancy of using BitTorrent or IPFS as a peer to peer (and less expensive) artifact delivery method
- SPO deployment model: we will organize a meeting with the Cardano node team to explain the model we have currently designed:
- Security impact of the Mithril Signer/Relay deployment on the SPO standard infrastructure
- Performance impact on the Cardano Block Producing node (especially, when the stake distribution is computed)
- Infrastructure for
mainnet
and the task we need to accomplish in order to get ready (and support we need from SRE) - Security audit and plans to get ready on time for release
-
We have created some new issues:
- Upgrade Cardano node to 1.35.7 #881
- Append Next AVK to all protocol messages in Signable Builder Services #888: this will be implemented next week and will help minimizing the occurrences of a gap in the certificate chain
-
We have worked on the following PRs:
- Implement Artifact Restorer #875: it has been reviewed and merged
- Update Cardano node to 1.35.7 #882: it has been reviewed and merged
- Update dependencies #883: it has been reviewed and merged
- Cleanup legacy single signature store #884: it has been reviewed and merged
- Mithril Stake Distribution Signable builder in aggregator #867: it has been reviewed and merged
- Greg/852/immutable signable #886: it has been reviewed and is ready to be merged
- Mithril Stake Distribution Artifact builder in aggregator #887: it has been reviewed and is ready to be merged
-
We have made a final review of the PR Certifier service #866 and merged it 🎉 The issue Implement Certifier service in aggregator #850 has been closed
-
We have also paired and merged the issue:
- Add OpenMessage domain type #878: the PR Add open message domain type #879 has been merged
- Verification key discrepancy between signer and aggregator #873: the PR Fix Signer Registration Discrepancy #877 has been merged
-
Additionally, we have created:
- A
Delivery Mainnet Board
that will help us track issue that must be completed on our path tomainnet
deployment - A
Bugs Board
that will help us track bugs - A new issue Use new cardano-cli stake distribution command #880 that will be implemented after the release of the next version of the Cardano node
- A
-
Finally, we have discussed about the difficulties that we have to implement the signable builder trait for the Mithril Stake Distribution in the signer. We think that we will probably need to store signers registrations with their stakes on the signer node. This means that the epoch settings route will need to include the signers registrations for this epoch and the next epoch ; thus they will not be required in the pending certificate (same for the protocol parameters) and they would be kept only for compatibility at first. We have also talked about creating a new era once we have implemented the epic Design and implement generic signing/verification entity services #780 in order to cleanup some legacy code
-
We have paired and reviewed the following issues:
- Implement Certifier service in aggregator #850: the associated PR Certifier service #866 is now ready to be merged 🎉. We will merge it tomorrow as GitHub actions workflows were very busy this afternoon
- Verification key discrepancy between signer and aggregator #873 a PR is being implemented that will fix the problem
-
Also, when closing the issue #850, we will close:
-
The re-genesis of the
testing-preview
has worked as expected and new certificates are appended to the chain. Also we have understood the origin of the problem of the issue Verification key discrepancy between signer and aggregator #873 which has been edited with a solution for a long term fix. We will implement it shortly -
We have mainly reviewed and paired on the issue Implement Certifier service in aggregator #850: we have almost finished the unit tests of the certifier service, and all the other tests (unit, integration, end to end) are green. We expect to merge the PR Certifier service #866 tomorrow
-
The bug Verification key discrepancy between signer and aggregator #873 has occurred during the weekend and it created a gap in the certificate chain of the
testing-preview
network. We had to re-regenesis the network and expect to get it signing back again tomorrow -
We have mainly worked on fixing the tests and the bugs in the PR Certifier service #866 of the issue Implement Certifier service in aggregator #850 and making the network sign correctly certificates. We have made good progress as the end to end tests are working: so far there are few unit/integration tests that need to be fixed or that are flaky, and some cleanup that will be done tomorrow. We will be able to merge the PR shortly
-
During our team session, we have discussed of the following topics:
- How Mithril can help Daedalus Turbo:
- Using another compression algorithm for the archive: good idea, we will try to implement it as described in this issue Use zstandard compression for snapshot archives #876
- Incremental snapshots: this is already part of our road-map. The idea is to certify all immutable files independently and to provide range restoration
- Using BitTorrent with Mithril: this does not sound relevant as 1/ we would build a second P2P network to synchronize blocks (which Cardano network already does), and 2/ the user downloading an archive to bootstrap a node do not want to share their node/wallet data on such a network (which would considerably limit the efficiency)
- Signer deployment model: we will check which team we need to get in touch with to validate our design
- Full node verifier: this is a Mithril verifier running inside a full node (i.e. which is aware of the stake distribution). It would be able to process a lighter verification process
- Smart contracts: we have decided to design and PoC a smart contract for signer registration. This will be the subject of our next team session
- How Mithril can help Daedalus Turbo:
-
We have reviewed and merged the PR Add flake.nix #811. We have created a new issue for implementing the nix build in the CI workflow Build static binaries in CI with nix #874
-
We have worked on the following PRs:
-
Certifier service #866: we have fixed all the unit tests and the
simple
integration test. Also we have started fixing the remaining integration tests, which will be fixed next week - Implement Artifact Restorer #875: we have created the PR that is related to issue Define the interface of the generic entity service for verification #868 and that will be reviewed next week
-
Certifier service #866: we have fixed all the unit tests and the
-
Additionally, we have noticed a bug that occurs from time to time on the
testing-preview
network as described in issue Verification key discrepancy between signer and aggregator #873. We have started the investigation and we will keep investigating next week -
Finally, we have also verified if the KES keys of the SPOs of the Mithril networks should be rotated: this was not needed this week
- We have worked on the demo path:
- Introduction
- Architecture design for signing multiple types of data
- Explanation of the new relational database design of the aggregator
- Showcase of the new database of the aggregator in the
devnet
- Explanation of the new services design of the aggregator
- Next steps
- Q&A
- Conclusion
-
We have also paired on the issue Implement Certifier service in aggregator #850 and are almost ready to merge the PR Certifier service #866
-
Following a conversation with the cryptographers, we have created an issue for enhancing the security of the signer registration Verify that the OpCert used for registration is the latest #872
-
We have merged the PR Implement Signable & Artifact Builders #865 of the issue Define the interface of the generic entity service #847. We will keep working on this issue and implement the
Artifact Restorer
shortly -
We have also kept pairing on the issue Implement Certifier service in aggregator #850 for which we have made good progress. We expect to merge the PR Certifier service #866 by end of week
-
We have paired on the following issues:
- Implement Certifier service in aggregator #850: we have made good progress and we should be able to merge the PR Certifier service #866 tomorrow
- Define the interface of the generic entity service #847: the PR Implement Signable & Artifact Builders #865 is ready to merge and implements the Signable and Artifact builders in the signer and aggregator. It will be merged tomorrow
- Implement signing entity service for Mithril Stake Distribution #851: we have started implementing the Signable builders for this message type. A PR will be created shortly
-
We have also discussed about the signer deployment model on the
mainnet
. We are waiting for an analysis from the cryptographers to understand better the risk associated with the cryptographic material that we use (KES keys)
-
We have re-genesis the
testing-preview
network as a gap in the certificate chain occurred this weekend. It appears that there was a discrepancy between the verification keys in the aggregator and in the signer. We will investigate to find out why this problem happened. In the mean time the network has started creating new certificates -
We have reviewed and merged the issue Migrate/adapt single_signature table #829, which closes the epic Implement relational store in aggregator #779 🎉
-
Finally, we have paired on the issues:
- Implement Certifier service in aggregator #850: the PR Certifier service #866 has been created. We will keep pairing on it tomorrow
- Define the interface of the generic entity service #847: the PR Implement Signable & Artifact Builders #865 has been created as a first draft. We'll also keep pairing on it tomorrow
-
We have released the new distribution
2315.0
🎉 -
Additionally, we have kept pairing on these issues today, and we will continue next week:
-
Implement Certifier service in aggregator #850: The trait that the
Certifier
service must implement is defined. We will start working on its implementation next week. We have also modified the design of the state machine of the aggregator so that it can support the new Certifier service and make dynamic calls to theSignableBuilder
andArtifactableBuilder
given the signed entity type. - Define the interface of the generic entity service #847: We have almost completed the definition of the interfaces and we will complete the schema next week
- We have decided to create a new
Signed Entity Service
whose responsibilities will be:- Call the artifacts builder adapters
- Create signed entities from artifacts
- Store signed entities
-
Implement Certifier service in aggregator #850: The trait that the
-
We have prepared the pre-release of the new
2315
distribution:2315.0-prerelease
that is currently tested on thepre-release-preview
network. Everything is working as expected and we should be able to release the new distribution by end of the week -
We have reviewed and merged the PR Create/Migrate signer store #861 which closes the issue Create signer table #814
-
Additionally, we have kept pairing on the issues:
- Implement Certifier service in aggregator #850: we have completed the definition of the scope of the service and we will continue working on it tomorrow
- Define the interface of the generic entity service #847: we have refined the definition of the interfaces and here is a new draft
-
We have completed the issue Implement Tick service in aggregator #849 by reviewing and merging the PR Add ticker service #860
-
We have also paired on the issue Implement Certifier service in aggregator #850 and we started defining the responsibility of the Certifier and the associated trait. We will keep working on it tomorrow
-
Finally, we have paired on defining the scope of the entity services as described in issue Define the interface of the generic entity service #847. Here is a draft that we have produced for the design of these services, that we will complete tomorrow:
-
We have worked on preparing the PI2 next steps:
- Finalize signing generic data
- Releasing to
mainnet
(infrastructure, testing, documentation) - Implementing the signing of Cardano stake distribution
-
We have also reviewed and merged the following issues:
- Extend SignedEntityType to hold beacon or epoch #848: we have paired on the PR embed beacon in open_message type #857 and we have merged it with the new design
- Migrate/adapt signed_entity table #816: we have made the modifications required by the updated aforementioned PR #857, and we have merged it
-
We have started working on new issues:
- Create signer table #814: it should be merged tomorrow
- Implement Tick service in aggregator #849: it should be merged tomorrow
-
We will then work on the following topics in pairing sessions:
-
We have worked on the review of this iteration, and prepared:
- A high level architecture diagram for signing generic data:
- A summary of the aggregator structure
- A timeline for the implementation
-
Additionally, we have merged the following PRs:
-
Finally, we have worked on the following issues:
-
We have worked on the following PRs:
- Create/Migrate signer_registration store #838: it has been merged, but there was a problem occurring during the database migration at aggregator startup
- Remove store retention limit in aggregator infra #845: a fix to the aforementioned problem which is due to the pruning being incompatible with the use of foreign keys has been merged. The pruning has been deactivated on the test networks and will be handled with cascade delete in the SQLite database and handled at a higher level in the node
- Update Cardano node to 1.35.6 #844: it has been merged
-
We have completed the first version of the design of the architecture that will provide signature for generic types of data for epic Design and implement generic actors #780, and we have created the following issues:
- Define the interface of the generic actors #847
- Extend SignedEntityType to hold beacon or epoch #848
- Implement Tick service in aggregator #849
- Implement Certifier service in aggregator #850
- Implement actor for Mithril Stake Distribution #851
- Implement actor Full Immutables Snapshot #852
- Adapt runtime to use actors in aggregator #853
- Adapt runtime to use actors in signer #854
-
We have discussed about an optimization that we will have to implement before going to
mainnet
and created an issue for it Signer/Aggregator nodes sign only when Cardano node is synchronized #846 which alleviate the burden of bootstrapping a Cardano node and a Mithril signer node at the same time
-
Today, we have completed our work on the issues:
- Create open_message table #827: it has been merged
- Migrate/adapt signer_registration table #828: it is ready to be merged, but GitHub actions are slowed down today
-
We have also continued working on grooming the next phase of signing generic data on top of the Mithril infrastructure
-
Additionally, we have prepared the upgrade of the Cardano node to
1.35.6
on the test networks and thedevnet
as described in issue Upgrade Cardano node to 1.35.6 #843
-
We have reviewed and paired on the current issues:
- Create open_message table #827: it should be merged tomorrow
- Migrate/adapt signer_registration table #828: it should be merged tomorrow
-
We have started grooming the epic Design and implement generic actors #780 and we will continue in the following days
-
During our team session we talked about:
- The problem that a SPO met last week and that prevented him to sign on the
pre-release-preview
network because the immutable files produced by the block producer were corrupted - This lead us to dive deeper into the mechanism of committing blocks to the immutable files and of the security parameter
k
of the Cardano protocol - We talked about the possibility:
- To sign a bigger part of the latest immutable files that are currently being built
- To deliver a snapshot that does not embed the ledger state and the latest immutable files
- Overall, the Cardano node has a security mechanism that avoids using corrupted files so this should not be a problem to embed the latest immutable files
- We discussed about the dev SPO that was created on the
preview
Cardano network and that is referenced as root peer: we will try to retire the SPO - Also, we talked about deploying to the
mainnet
and the a priori unnecessary SLA to put in place during the ramp up period
- The problem that a SPO met last week and that prevented him to sign on the
-
We have released the
2313.0
distribution on therelease-preprod
network 🚀. The documentation has been rotated as well with the merge of the PR Update current documentation #842 -
We have fixed a panic on the aggregator node of the
testing-preview
network because a function was not implemented yet. The problem has been fixed with the merge of the PR Fix Epoch Setting store panic #841 -
Additionally, we have kept working on the implementation of the issues:
- Create open_message table #827: first review done, should be ready to merge early next week
- Migrate/adapt signer_registration table #828: first review done, should be ready to merge early next week
-
Finally, we have proceeded to the rotation of the KES keys for some signers of the
testing-preview
andrelease-preprod
networks
-
We have reviewed and merged the following issues:
-
We have also worked on the following issues:
-
During the test phase of the issue Migrate/adapt certificate table #817, we have noticed that the SQLite version running on the GitHub runners is not the same as the minimum version that we advertise (i.e.
3.40+
). An issue has been created to address this problem: SQLite compatibility in aggregator#837 -
The
2313.0-prerelease
is running as expected on88%
of the stakes of thepre-release-preview
network and will be released tomorrow
-
Today, we continued working on the issues:
- Refactor Dependency Injection in Aggregator #823 ready to be merged, but will be merged tomorrow because of an incident on Github Actions
- Migrate/adapt epoch_setting table #813 will be merged tomorrow after Refactor Dependency Injection in Aggregator #823 is merged
- Create signed_entity_type table#815 will be merged tomorrow after Refactor Dependency Injection in Aggregator #823 is merged
- Migrate/adapt certificate table#817 will be reviewed and probably merged tomorrow after Refactor Dependency Injection in Aggregator #823 is merged
-
We have additionally created the new distribution pre-release
2313.0-prerelease
:- It is deployed on the
pre-release-preview
network and SPOs are currently in the process of upgraing their nodes - We have verified that the client work as expected (including the Docker version) in this run of the workflow: https://github.com/input-output-hk/mithril/actions/runs/4551634889
- We expect the distribution to be released by end of the week
- It is deployed on the
-
Also, the SPO that had hard times signing the snapshots because of discrepancy on the immutable files is now signing snapshots 🎉
-
Today, we have kept working on the issues:
- Refactor Dependency Injection in Aggregator #823 is almost completed and expected to be merged tomorrow
- Migrate/adapt certificate table#817 migration of the certificates has been reviewed and provider development are in progress
-
We have also investigated a weird behavior on a SPO signer node which was unable to sign the messages correctly although its block producer Cardano node was working properly and producing new blocks:
- The error message received is
core error: 'A provided signature is invalid'
- We noticed that one set of immutable files
02705
was different between the block producer and:- The associated relay on the SPO infrastructure
- The block producers and relays of the
testing-preview
andpre-release-preview
Cardano nodes that we operate
- After restarting its block producer, it appears that the node found out that there was a discrepancy with the immutable files and fixed it
- We are waiting the feedback from the SPO to check that its signer node is now able to sign snapshots
- The error message received is
-
We have worked on the following issues:
- Cleanup multi-signer in Aggregator #824 has been reviewed and merged
- Refactor Dependency Injection in Aggregator #823 is almost ready for final review and merge 🎉
- Migrate/adapt epoch_setting table #813 has been reviewed. The wiring into the new dependency injector will be done once Refactor Dependency Injection in Aggregator #823 is merged
- Create signed_entity_type table#815 is ready for review. The wiring into the new dependency injector will be done once Refactor Dependency Injection in Aggregator #823 is merged
- Migrate/adapt certificate table#817 is in progress and should be ready to be reviewed tomorrow
- We have fixed a security issue on OpenSSL with merging the PR Update Rust dependencies #832
-
A bug has been reported regarding static builds: Debian package does not install cleanly on older ubuntu versions #834. We will try to fix it shortly
-
During our team sessions we discussed about:
- The PR Add flake.nix #811:
- This will help us build static binaries
- This is normal that the build does not work before it is merged
- We lack some documentation: it will be added to the PR and then some explanation on how to use the nix shell will be added on the documentation website
- We keep it for Linux and mac OS at the moment. We will work on Windows adaptation later if it makes sense
- We will also implement the nix shell in the CI to benefit from the static builds
- We have also talked about the new use case which will certify the Cardano stake distribution:
- We need to make the difference between the Mithril stake distribution (that needs to be signed prior to being used and that represents only the SPOs running signer nodes) and the Cardano stake distribution (that lists all the SPOs)
- We will be able to sign the most recent version of the stake distribution (end of previous epoch)
- We will provide a JSON map of the stake distribution (
pool id <-> stake in lovelace
and sign a Merkle tree representation of it):- We need to see which pool id needs to be used (hash or bech32 versions) in order to make sense for the widest usage (or maybe provide both)
- We will provide an interface so that some internal developers can start using it and build some PoC
- Some use cases could be a very light node that verifies incoming blocks with certified stake distribution or network relays
- The PR Add flake.nix #811:
-
We have reviewed and merged the PR Update dependencies #830
-
We have worked and talked about the following issues:
-
We have reviewed and merged the PR stake pool service #800 that closes the issue Create a Stake Distribution service #799 🎉
-
We have also reviewed and merged the PR Enhance Key Registration errors #822 that closes the issue Panic signer when KES keys expired #820
-
Additionally, we have worked on the following issues:
- Refactor Dependency Injection in Aggregator #823 which should be ready to be merged early next week
- Migrate/adapt epoch_setting table #813 which is ready to be reviewed and that will be merged once the new dependency injection is available
- Migrate/adapt signed_entity_type table #815 onwhich we started to work
-
Cleanup multi-signer in Aggregator #824 in which we have removed some dead code as well as some data from the multi-signer state. We will also work on simplifying the
MultiSigner
, replace beacon state with an epoch state and try to remove the current message form the state in the following days
-
Finally, we have created the following issues for the next iteration:
-
We have reviewed the PR stake pool service #800 and we have decided to rollback a part of it:
- There was too many modifications that made the PR too unstable
- We have kept the modifications that did not modify the multi-signer
- We will merge it shortly but it will not use the Stake Pool service yet
- We have created new issues:
- Refactor Dependency Injection in Aggregator #823 which will facilitate the addition of new dependencies
- Cleanup multi-signer in Aggregator #824 which will skim as much as possible the multi-signer with unused code and state. A draft PR Cleanup multi-signer #825 has been created
-
We have completed the issue Run a mainnet test Mithril network #777:
- We were able to compute a snapshot, certify it and restore it later with the client 🎉
- Database: Compressed ~60GB / Uncompressed ~115GB
- Snapshot restored in 1h15 (download in 15 min, restore in 15 min and Cardano node restart in 45 min)
-
Finally, we have reviewed the PR Enhance Key Registration errors #822 which will be merged shortly
-
We have reviewed and merged the PR Handle API version with Era Switch #812 which closes the issue Handle API Version with Era switch #727 and the epic Implement eras behavior switch #707
-
We have mainly paired on the issue Create a Stake Distribution service #799 and we are facing some difficulties with the refactoring. We will try to use a slightly different approach tomorrow on how to refactor the multi-signer
-
Additionally, we have re-genesis the
pre-release-preview
network which should resume certificate production tomorrow. Also the participation arte has increased since yesterday. However, we are still in the process of understanding what could explain the sudden drop: KES expiration is a good candidate, and we have also noticed some signer nodes that got stuck fro some SPOs. We will keep on investigating the problem -
We have also created the following PRs:
- Enhance Key Registration errors #822 which should be merged shortly
- Clean Genesis Verification Key #821 which should be merged shortly
-
We have reviewed the issues of the sprint and created the next issues:
-
We have noticed that an epoch gap occurred on the
pre-release-preview
network this weekend:- The machine went OOM which lead the Cardano node used by the aggregator to be stuck in epoch
144
- We have manually upgraded the machine (which should have happened automatically with the next pre-release)
- We have prepared the re-genesis of the network which is scheduled tomorrow with and expected certificate production resumed on Wednesday (issue Re-genesis pre-release-preview #818)
- The
release-preprod
network which is running the same distribution is not impacted and is running as expected - We have also noticed that the participation arte has dropped suddenly last week. It seems that it is probably due to expiration of the KES keys on the signer nodes. We have created the issue Panic signer when KES keys expired #820 in order to panic the signer node with a Critical error when that happens
- The machine went OOM which lead the Cardano node used by the aggregator to be stuck in epoch
-
Additionally, we have paired on the refactoring of the multi-signer of the aggregator by implementing the Stake Distribution service to refactor the stake distribution usage in it. This is a bit complicated as it breaks a lot of things. We will continue tomorrow
-
Finally, we have completed the PR Handle API version with Era Switch #812 which will close the issue Handle API Version with Era switch #727 which will be merged shortly. We have found a solution to handle the era switch (when a breaking change in the API occurs) by implementing a retry mechanism (with version update)
-
We have reviewed the PR Handle API version with Era Switch #812:
- The PR is almost ready to merge and close issue Handle API Version with Era switch #727
- We have found a way to implement a route that receives different request bodies before/after era switch (see comments at https://github.com/input-output-hk/mithril/pull/812#issue-1627961199)
- We have found a way to handle era switch with breaking chages of API version without having access to the on chain era reader: for this we will implement a mechanism that checks which API version is ran by the aggregator (when multiple are available) and uses this version if it is available. We will implement it shortly
-
We have also paired on the issue Create a Stake Distribution service #799 with the PR stake pool service #800. We have completed the implementation of the stake pool service and we will use it in the state machine and the multi-signer
-
We have noticed that very few signers are registered on the
pre-release-preview
network since the problem that we had on the machine earlier this week. We will investigate further this issue with the SPOs next week if it keeps happening -
We have also completed the issue Qualify new stake distribution computation #810
-
Finally, we have created the genesis certificate for the
testing-mainnet
network and we expect to create the first certificates at the next epoch401
-
We have worked following PRs:
- Update KES #783 has been merged
- 799/pruning #808 has been reviewed and merged
- 799/refactor sqlite connection #809 has been reviewed and merged
- 799 stake pool service #800 we have paired on it
- Handle API version with Era Switch #812 has been created
-
The
testing-preview
has created new snapshots after the re-genesis and the network works correctly -
We have created and worked the issue Qualify new stake distribution computation #810
-
The
testing-mainnet
network created for issue Run a mainnet test Mithril network #777 has now been able to compute stake distribution and to register a signer. We need to wait2
epochs until we can test the signature of a snapshot
-
Following the problems encountered yesterday with the
testing-preview
andpre-release-preview
networks:- It appears that the machines were OOM
- The startup time is very long which did not give enough time for the nodes to register sign a message
- We have created the following issues to fix the problem:
- Re-genesis testing-preview #803: it has been completed
- Fix Stake Distribution done multiple times per epoch #804: the PR Avoid compute stake distribution multiple times per epoch #805 has been merged
- Upgrade VM testing-preview & pre-release-preview #801: the PR Upgrade VM testing-preview & pre-release-preview networks #802 has been merged
-
We have also:
- Completed the issue Add Docker image in Mithril Client multi-platform test #794
- Created and paired on the issue Fix SQLite connection deadlock #807 which should be merged tomorrow
- Worked on issue Create a Stake Distribution service #799
- Worked on issue Handle API Version with Era switch #727
-
We have noticed that the Cardano nodes running on the
preview
network suddenly were not responding through the socket. The result is that the service has been very erratic on thetesting-preview
andpre-release-preview
networks today. We hope that a gap in the certificate chain will not occur. Therelease-preprod
is working as expected. We will keep monitoring the networks tomorrow. It appears that the way we compute the stake distribution is very problematic:- We need to compute the stake distribution only once per epoch (this will be addressed in the issue Create a Stake Distribution service #799)
- The computation of the stake distribution (without the unreleased optimization of the cardano-cli) is taking ~2 hours and is very compute intensive
-
We have reviewed and merged the issue Migrate/adapt stake_pool tables #787. The migration worked as expected on the
testing-preview
network 🎉 Following this issue we have created the issue Create a Stake Distribution service #799 -
We have worked on the issue Add Docker image in Mithril Client multi-platform test #794 and a PR is ready to be merged Update Mithril Client multi-platform test workflow #793. We have noticed that the usage of cache on the digest computation in the client brings many problems with rights on the creation of the cache directory when it does not bring a lot of added value. We will probably remove it in the near future. In the mean time we have implemented the Docker client test with the
--disable-digests-cache
option to overcome this problem on the runners in the CI
-
We have reviewed the work on the issue Handle API Version with Era switch #727: in order to handle multiple versions of the Open API, we need to compute a build scirpt that pre-compiles a Rust file responsible for gathering api versions available in eachspec file. This is the first foundation that is used to handle API version depending on the era. Thi s issue is a bit more difficult that what was expected and we will keep working on it tomorrow
-
We have made a review and paired on the issue Migrate/adapt stake_pool tables #787. We have noticed some weird behavior on the store migration and we will investigate on that matter. In the mean time we have achieed the migration script for the stake store. During this pairing sessionw e have identified that:
- We will need to refactor further errors as explained in the created issue Errors refactoring #798
- We will probably need to enhance the dependency injection mechanism that is getting a bit cumbersome. A first PoC has been worked on https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=46d228db0f33adaae9adb899e791474a
-
During our team session, we have discussed about the following subjects:
- Advertise the usage of Mithril for internal and/or external developers to restore Cardano nodes on
preview
andpreprod
- This will be probably done during the Dapp and SPO calls
- Put in place some API usage statistics (basic data regarding API routes calls + snapshots downloads)
- New computation of the Stake Distribution is OK and execution delay is acceptable for
mainnet
- We need to work an specifying the Signer node deployment on the Cardano SPO infrastructure for
mainnet
- We also to make sure there is no security impact: signer node should not be able to write in the Cardano database folder and also regarding the access to the KES keys
- Advertise the usage of Mithril for internal and/or external developers to restore Cardano nodes on
-
The new distribution
2310.0
has been released 🎉 -
We have merged the following PRs:
-
We have also continued working on the following issues:
-
We have worked on the following issues:
- Mithril client fails to restore snapshot. #791: the bug has been fixed with the merged PR Fix Docker Client crash #792. We have updated the documentation to detail the usage of the Docker client image (available in the next version currently): Mithril Client Node developers doc and Bootstrap a Cardano Node user guide. Also a new issue Add Docker image in Mithril Client multi-platform test #794 has been created in order to prevent this problem from occurring again
- Migrate/adapt stake_pool tables #787: We reviewed and discussed about the PR add stake_pool provider #789
- We created the PR Fix clippy warnings from Rust 1.68.0 #795 that should be merged shortly
- We also created the PR Update dependencies #796 that should be merged shortly
- Run a mainnet test Mithril network#777: we found that there is a problem with the fact that the stake distribution gets recomputed many times on the aggregator when the certificate chain is not (yet) valid. This make the signer registration closed most of the time: the result is that the signer never succeeds in getting registered and the network is stuck. We will work on a fix shortly
-
The distribution pre-release
2310.0-prerelease
is working as expected on thepre-release-network
. We will proceed to the release of the2310.0
release tomorrow
-
Following the test that we operated on the unreleased cardano-cli
1.36.0
that does not have the128
bytes size limitation for thebytes
fields of datum submitted in a transaction, we have decided to implement a temporary mechanism that chunks automatically thebytes
fields generated from theera generate-tx-datum
command of the aggregator cli and to read all thebytes
fields available in the datum from the Cardano chain era reader adapter. Once the new cardano-cli will be released, we will remove the chunking mechanism. However the full compatibility is guaranteed which is a mandatory requirement for rolling out the feature. Alos we have decided to keep the JSON serialization format instead of switching to CBOR. The PR Fix Datum generation for era markers #788 has been reviewed and merged. It fixes the issue Enhance Datum generation for Era Markers #786. The era markers have then been deployed to the era addresses of thetesting-preview
,pre-release-preview
andrelease-preprod
networks -
A new distribution pre-release
2310.0-prerelease
that activates the Era Switch to the networks. Also a dev blog post Mithril Era Switch has been released. We expect to release the dsitribution by end of week or early next week -
Additionally, we have reviewed the PR in progress for migrating the stake distribution of the aggregator add stake_pool provider #789 that should be merged shortly
-
We have worked on the following PRs:
- handling errors better #776 has been reviewed and merged 🎉
- Fix test Docker image workflow #784 has been reviewed and merged
-
Fix Datum generation for era markers #788 which fixes the problem due to the limited size of the
bytes
fields in datum from issue Enhance Datum generation for Era Markers#786. The unreleased version1.36.0
of the Cardano cli does not raise this error with the same datum file that is rejected by1.35.5
. The fix consists in chunking thebytes
fields in128
bytes chunks. This will allow to get full compatibility of the era reader on the Cardano chain. The PR should be merged shortly
-
We have also continued grooming the new
generic-data
feature. We have decided to implement the simplification of the multi signer in the epic Implement relational store in aggregator #779. We have defined a methodology for implementing the migration/adaptation of the tables in the store of the aggregator. We have created the first issue of the epic Migrate/adapt stake_pool tables #787 to validate the methodology. Once this is done, we will create the other issues
-
We have reviewed the following PRs:
- Fix test Docker image workflow #781 that attempted to fix the test Docker images workflow. The PR was merged, but there were still some glitches that needed to be addressed. This is the case in the PR Fix test Docker image workflow #784
- Deploy era reader on chain #775 that deploys the first version of the era reader on chain has been merged
- handling errors better #776 is approved and will be merged tomorrow
-
🔥 Test mainnet setup #782 was used to create artifacts used to build test Docker images that were used to compute the stake distribution of a test network running on the Cardano
mainnet
. The computation of the stake distribution took~1h
. Some complimentary tests will be cnducted to validate that there is no blocking issue to move forward tomainnet
deployment
-
During our team session, we have talked about the following topics:
- A possible new use case for Mithril could be to certify the stake distribution only (which could be helpful for side chains). This is perfectly possible and will be the case when the signing of generic data is implemented. We will need to add a new route that helps retrieve the full stake distribution and a client command to verify the associated certificates
- Some other teams also need to use a pub/sub mechanism in a peer to peer context, and as the Cardano network layer is probably not the best option, libp2p seems to be a good candidate. There are still question regarding the vulnerability to some attacks
- We have talked about the security audit for Mithril that should probbaly be done only on the cryptographic primitives. We expect a formal approval of this strategy soon
- In order to get ready for the release to the
mainnet
, we will prepare anOperational Plan
for first release, +6 months and +1 year perspective. This will help us define the scope of the resources that we need to have allocated to the project from the release to the end of the ramp up phase - Also, we have investigated the problem that was raised last week regarding the limits of the
bytes
fields size in the transaction datum. A possible long term fix is to implement an encoding in CBOR indefinite bytes (chunks of64
bytes). We have created an issue for this subject Enhance Datum generation for Era Markers #786
- We have mainly worked on the issue Run a mainnet test Mithril network #777:
- Created a PR [Fix test Docker image workflow #781](https://github.com/input-output-hk/mithril/pull/781 that fixes the Docker test images workflow. It should be merged shortly and will allow us to build test Docker images that can be used without having to merge branches into main
- Created a PR 🔥 Test mainnet setup #782 that must not be merged. This PR includes a test implementation of the network that runs on the unreleased Cardano node/cli
1.36.0
and that makes use of the newquery stake-snapshot --all-stake-pools
command to compute efficiently the stake distribution. The artifacts built from this PR will be used to create test Docker images that will be ran on a testmainnet
Mithril network - Tested the new
query stake-snapshot --all-stake-pools
command on the Cardanomainnet
: it took1h05min
to compute the stake distribution instead of multiple days 🎉
-
We have worked on the following issues:
- Add context to errors #665: We have paired on fixing a bug that occurred when multiple tests are created in a integration test (because of a shared logger). The PR handling errors better #776 is almost complete and should be merged shortly
- Deploy Era Behavior Switch #752: We encountered a problem with the size of fields inside datum files which are limited to 128 bytes. The era markers that we originally prepared could not fit (as the signature is already 128 bytes long). We made a fix and stored the era markers in one byte field and the signature in another. It worked, but we will try to challenge the implementation shortly. Also the PR Deploy era reader on chain #775 is under review and should be merged very shortly. Once this is the case, we will create a new distribution and we will coordinate ourselves with the SPOs to update their signer nodes configuration
-
We have also continued grooming the signing of generic data in Mithril networks. We have created the following epic issues:
- Implement relational store in aggregator #779
- Design and implement generic actors #780
- We will work on them sequentially and slice the first tickets out of them next week
-
We have worked on the following topics:
- Issue Add context to errors #665: Separating the critical errors from th e recoverable ones and implement the same mechanism in the signer
- Reviewed and merged the PR Prepare run mainnet Mithril test network #778. The workflow is not working yet and we are working on a fix
- Issue Deploy Era Behavior Switch #752
-
We have also started grooming the new feature "Signing generic data":
- Our strategy is to rollout the new aggregator stores with 3 phases:
- Migrate/adapt
stake_pool
,signer
,epoch_settings
tables - Migrate/adapt
certificate
,signed_entity
andsigned_entity_type
- Migrate/adapt
open_message
,signer_registration
,single_signature
- Migrate/adapt
- This progressive rollout will attempt to have the minimum impact to the existing code at first:
- This will be a strong foundation on which we will build the new usecases
- At first, we will process open messages sequentially, but the system will be evolved to a parallel runtimes setup (one for each type of message)
- We will determine what interfaces need to be implemented when signing new types of data (like how to sign a message and from which beacon it can be computed deterministically). With this design, the mechanism of signing/verifying the message will not change with the type of message
- We will also determine what optional implementation needs to be done (e.g. adding routes to the aggregator for retrieving proofs, and new features to handle them in the client)
- The aggregator REST API will be also modified in order to host new types of data (e.g. list the certificates by type of data)
- The explorer will also need to be adjusted to handle this new REST API
- The certificate chain needs also to be slightly modified in order to handle these new types of data
- We will start with the following types of data:
- Sign the stake distribution once per epoch (as the first certificate of the epoch)
- Then, sign the immutable snapshots when new immutable files are produced
- We will keep working on this grooming tomorrow and we will then create the associated epics and issues
- Our strategy is to rollout the new aggregator stores with 3 phases:
-
We have kept pairing on the issue Add context to errors #665:
- We have managed to gracefully kill the aggregator when a critical error is encountered from the runtime
- We have kept working on bringing business errors instead of technical errors
- The current PR is handling errors better #776 should be merged tomorrow
-
We have created the issue Run a mainnet test Mithril network #777:
- A first PR has been created Prepare run mainnet Mithril test network #778 that leverages the infrastructure and the CI to run custom versions of Mithril and Cardano on a network. It should be merged tomorrow
- A testing
mainnet
network is being synchronized - The next step is to use the new command of the Cardano cli to compute the stake distribution all at once and try to produce the first test snpashots
-
Also, we have kept working on the issue Deploy Era Behavior Switch #752 by creating all the cryptographic material needed for the deployment
-
We have paired on the issue Add context to errors #665 and a PR has been created handling errors better #776 that should be merged shortly
-
We have reviewed and merged the following PRs:
- Fix client Docker image crash #770 which closes the bug Docker image of client crashes #769
- Update dependencies #771 which fixes some security warnings
-
During our team session, we have discussed about:
- The issue Add context to errors #665 and the best way to catch a failure from a tokio thread and abort all other threads with a
JoinSet
- How we could compute deterministically the Utxo set:
- We will not be able to rely on the cardano CLI (as it gives real time results)
- We could use the
kupo
chain indexer - We could use the
oura
chain indexer or directly thepallas
library and itschainsync
mini protocol implementation - These solutions could work, but would require to store a lot of data and handle the chain rollbacks that happen quite often
- Another option is to use the immutable files to read directly the blocks from them and reconstruct the Utxo once the transaction are final (i.e. stored in an immutable). With this solution we would just need to read from the chunk files which format is available here and to use pallas capability to parse blocks from cbor
- The issue Add context to errors #665 and the best way to catch a failure from a tokio thread and abort all other threads with a
- We have worked on the demo path for showcasing the era switch behavior:
# Setup demo
## Download source (if needed)
git clone https://github.com/input-output-hk/mithril.git
#or
git clone git@github.com:input-output-hk/mithril.git
## Build docker images
cd mithril/
### Build docker images distribution-1 (Thales era only)
git switch distribution_1
rm -f mithril-aggregator/mithril-aggregator && rm -f mithril-signer/mithril-signer && rm -f mithril-client/mithril-client
make build && cp target/release/mithril-aggregator mithril-aggregator/mithril-aggregator && cp target/release/mithril-signer mithril-signer/mithril-signer && cp target/release/mithril-client mithril-client/mithril-client
docker image rm mithril/mithril-signer-distribution-1 mithril/mithril-client-distribution-1 mithril/mithril-aggregator-distribution-1 --force
docker build -t mithril/mithril-aggregator-distribution-1 -f mithril-aggregator/Dockerfile.ci .
docker build -t mithril/mithril-signer-distribution-1 -f mithril-signer/Dockerfile.ci .
docker build -t mithril/mithril-client-distribution-1 -f mithril-client/Dockerfile.ci .
### Build docker images distribution-2 (Thales & Pythagoras eras)
git switch distribution_2
git rebase distribution_1
rm -f mithril-aggregator/mithril-aggregator && rm -f mithril-signer/mithril-signer && rm -f mithril-client/mithril-client
make build && cp target/release/mithril-aggregator mithril-aggregator/mithril-aggregator && cp target/release/mithril-signer mithril-signer/mithril-signer && cp target/release/mithril-client mithril-client/mithril-client
docker image rm mithril/mithril-signer-distribution-2 mithril/mithril-client-distribution-2 mithril/mithril-aggregator-distribution-2 --force
docker build -t mithril/mithril-aggregator-distribution-2 -f mithril-aggregator/Dockerfile.ci .
docker build -t mithril/mithril-signer-distribution-2 -f mithril-signer/Dockerfile.ci .
docker build -t mithril/mithril-client-distribution-2 -f mithril-client/Dockerfile.ci .
### Build docker images distribution-3 (Pythagoras era only)
git switch distribution_3
git rebase distribution_2
rm -f mithril-aggregator/mithril-aggregator && rm -f mithril-signer/mithril-signer && rm -f mithril-client/mithril-client
make build && cp target/release/mithril-aggregator mithril-aggregator/mithril-aggregator && cp target/release/mithril-signer mithril-signer/mithril-signer && cp target/release/mithril-client mithril-client/mithril-client
docker image rm mithril/mithril-signer-distribution-3 mithril/mithril-client-distribution-3 mithril/mithril-aggregator-distribution-3 --force
docker build -t mithril/mithril-aggregator-distribution-3 -f mithril-aggregator/Dockerfile.ci .
docker build -t mithril/mithril-signer-distribution-3 -f mithril-signer/Dockerfile.ci .
docker build -t mithril/mithril-client-distribution-3 -f mithril-client/Dockerfile.ci .
---
# Demo: Run demo
cd ../devnet-demo
## Create functions
function stop_devnet {
./devnet-stop.sh
docker stop $(docker ps -a -q)
rm -rf artifacts/node-bft1/mithril/aggregator
rm -rf artifacts/node-pool1/mithril/signer
rm -rf artifacts/node-pool2/mithril/signer
rm -rf artifacts/node-pool3/mithril/signer
}
function start_devnet {
./devnet-stop.sh && NODES=cardano EPOCH_LENGTH=60 NUM_POOL_NODES=3 ./devnet-run.sh
}
function monitor_devnet {
watch -c "NODES=cardano ./devnet-query.sh"
}
function epoch_devnet {
CARDANO_NODE_SOCKET_PATH=artifacts/node-bft1/ipc/node.sock ./artifacts/cardano-cli query tip --cardano-mode --testnet-magic 42 | jq '.epoch'
}
function containers_list {
watch -c "docker ps --format '{{.Names}} - {{.Status}}' | sort"
}
function container_up {
DISTRIBUTION_VERSION=$1 ERA_READER_ADAPTER_PARAMS=$(cat era-markers/config.json) docker-compose -f docker-compose-demo.yaml --profile $2 up --remove-orphans --force-recreate -d --no-build
}
function container_down {
docker stop $2-$1
}
function container_run {
DISTRIBUTION_VERSION=$1 ERA_READER_ADAPTER_PARAMS=$(cat era-markers/config.json) docker-compose -f docker-compose-demo.yaml run $2 $3 $4
}
function container_exec {
docker exec -it $2-$1 $3 $4
}
function container_logs {
docker logs -f $2-$1 2>/dev/null
}
function monitor_versions {
watch -c "./sqlite3 -table -batch artifacts/node-bft1/mithril/aggregator/stores/monitoring.sqlite3 < stake_signer_version.sql | head -n 50"
}
function era_activate_thales {
cat > era-markers/markers.json << EOF
[
{"name": "thales", "epoch": 1}
]
EOF
cat era-markers/markers.json | jq .
}
function era_announce_pythagoras {
cat > era-markers/markers.json << EOF
[
{"name": "thales", "epoch": 1},
{"name": "pythagoras", "epoch": null}
]
EOF
cat era-markers/markers.json | jq .
}
function era_activate_pythagoras {
EPOCH_ERA_SWITCH=$(( $(epoch_devnet) + 1))
cat > era-markers/markers.json << EOF
[
{"name": "thales", "epoch": 1},
{"name": "pythagoras", "epoch": $EPOCH_ERA_SWITCH}
]
EOF
cat era-markers/markers.json | jq .
}
function era_remove_thales {
EPOCH_ERA_SWITCH=$(( $(epoch_devnet) - 1))
cat > era-markers/markers.json << EOF
[
{"name": "pythagoras", "epoch": $EPOCH_ERA_SWITCH}
]
EOF
cat era-markers/markers.json | jq .
}
## Reset demo if needed
stop_devnet
## Start Cardano network
start_devnet
monitor_devnet
containers_list
## Start network with version distribution-1
era_activate_thales
container_up distribution-1 mithril-aggregator
container_up distribution-1 mithril-signer-1
container_up distribution-1 mithril-signer-2
container_up distribution-1 mithril-signer-3
container_run distribution-1 mithril-aggregator-genesis
container_logs distribution-1 mithril-signer-1 | grep "Current Era"
container_logs distribution-1 mithril-signer-2 | grep "Current Era"
container_logs distribution-1 mithril-signer-3 | grep "Current Era"
## Update network partially with version distribution-2
container_up distribution-2 mithril-aggregator
container_up distribution-2 mithril-signer-1
era_announce_pythagoras
container_logs distribution-1 mithril-signer-1 | grep "Upcoming Era"
container_logs distribution-1 mithril-signer-2 | grep "Upcoming Era"
container_logs distribution-1 mithril-signer-3 | grep "Upcoming Era"
container_up distribution-2 mithril-signer-2
container_logs distribution-2 mithril-signer-2 | grep "Upcoming Era"
container_logs distribution-1 mithril-signer-3 | grep "Upcoming Era"
era_activate_pythagoras
container_logs distribution-2 mithril-signer-1 | grep "Current Era"
container_logs distribution-2 mithril-signer-2 | grep "Current Era"
container_logs distribution-1 mithril-signer-3 | grep "UnsupportedEraError"
container_up distribution-2 mithril-signer-3
container_logs distribution-2 mithril-signer-3 | grep "Current Era"
-
We have also paired on the following subjects:
- Merged the PR make signer warn when coming era is unsupported #768 that adds a missing warning message when a node is not supporting the upcoming era
- Discussed about the issue Add a pruning mechanism for events #744 which we have decided to postpone as it is not mandatory at the moment as we are only expecting
<250K
events par year on themainnet
which is very low - Closed the epic Implement signer versions deployment monitoring #718
-
Finally, we have met with
Quvik
team in order to give them some explanation about the protocol and its current implementation so that they can evaluate what could be the scope of model based testing for Mithril
-
We have reviewed and merged the PR Add dynamic matrix in CI for end to end tests #761 that activates the dynamic runs of the end to end tests in the CI depending on the supported eras
-
We have noticed some problems with the monitoring on the
testing-preview
network:- We created the bug issue Fix SQLite query bug #762 that was patched with the PR fix sqlite sprintf bug #763
- We also noticed that the version of SQLite needed to be upgraded on the aggregator. We created the issue Update SQLite version to 3.40 in aggregator infra#765 and the PR Upgrade SQLite to 3.40 in aggregator infra #766. However, when we merged the PR and ran the terraform application the VM of the
testing-preview
network was recreated and failed to start. This incident resulted in a loss of some data (as we were able to recover from a snapshot taken automatically in the morning). We also achieved to restart the service on the network after some manual intervention. We have created an issue to make the infra more robust Enhance Mithril infra #767. Given these circumstances, we have decided to postpone the release of a new distribution
-
Finally, we have also paired on preparing the demo for the review that will take place tomorrow and that will showcase the era switch mechanism that allows Mithril hard forks
-
We have worked on the following issues:
- Implement Era cli command in aggregator #755: Reviewed and merged the PR Add era command in aggregator cli #756. We now have the ability to retrieve the list of eras implemented in the nodes and to generateTxDatum that will be stored when submitting a transaction on-chain
- Create a query to extract the node versions stakes distribution #743: Paired and merged the PR monitor stake distribution Vs Signer version #759. This allows us to monitor the penetration of the signer node versions (with stakes shares by epoch for each version) and decide when we can switch eras
- Make dynamic matrix in CI end to end #760: Created this reviewed that will be merged shortly. It allows us to compute dynamically the test lab matrix of cases depending on the available eras given by the era commands of the aggregator cli)
-
We have also met with the Quviq team to start elaborating with them a strategy to test the implementation of the Mithril protocol by our nodes (and eventually other implementations in the future) with Model Based Testing. This will help us get a higher guarantee regarding the safety of the implementation that will keep evolving
-
We have paired on:
- Fixing a bug with the events recorded and created the issue Fix unsupported unixepoch() function #757. The fix has been merged in the PR eplace unixepoch with strftime #758
- The issue Create a query to extract the node versions stakes distribution #743 which required to make some modifications on the event that is recorded at signer registration (to include the stake value as well). This issue should be finished shortly
-
We have also reviewed the PR Add era command in aggregator cli #756. It should be merged tomorrow
-
During our technical team session, we have discussed about:
- Cardano-cli
1.36.0
and the issue about the computation of the total stakes:- We are now able to build the node and cli from source. We will implement them in custom Docker images and star running a
mainnet
test Mithril network which will help us check if there are issue when we scale to compute large snasphots - The problem with the stakes total is not critical for us as we don't rely on those fields to compute the total stakes
- We are now able to build the node and cli from source. We will implement them in custom Docker images and star running a
- Signing new types of data new feature:
- We will need to update slightly the structure of the certificate chain: there will probably be one certificate for the stake distribution (the first of an epoch, the only required for an epoch) and one certificate for each new message that has beensigned (that will have a specific type) linking to the first of the epoch. This will allow us to avoid breaking changes to introduce new types of data: certificates for these new types will be produced as soon as enough signer run the new version(and are bale to sign) so that a multi signature can be produced
- A good example to start working with is signing the Utxo set (which will help us design the interface needed to add a new type of data): we will work on this subject with the wallet team soon to get a better understanding of their needs. In the meantime, we know that we will have to retrieve the Utxo set from the Cardano node:
- First with the Cardano cli which is able to produce this set for small networks (e.g.
preview
andpreprod
) - Than with a better suited way of retrieving it for the
mainnet
: we will investigate if it is possible to do it with some internal tools used for accessing the internal database of the Cardano node or with third party tools such as https://github.com/sierkov/daedalus-turbo
- First with the Cardano cli which is able to produce this set for small networks (e.g.
- Testing strategy:
- In to order to get an even better testing strategy, we will try to investigate the possibility of doing model based testing with Mithril protocol implementation
- This will help us test some edgy scenarios like network partition. We will see what can Rust bring to the table, probably with the crate
madsim
. We will try to pair on these subjects during the next team sessions
- Cardano-cli
-
We have discussed about:
- The event store and the fact that we need to add metadata along with the content of the event. We agreed that this will be done by the event creator that will take another parameter "headers" and that will wrap them with the actual content of the event
- The way to implement easily the
SupportedEra
enum with the use of macros to avoid modification of the code at multiple places each time we add/remove an era. The first draft implementation custom made is a bit cumbersome so we have decided to use the macros from thestrum
crate
-
We have worked on the following PRs:
- send event message on signer registration #753: the PR has been reviewed and merged and closes the following issues Create the signer registration event #742 and Add the signer node version in a header NODE_VERSION #737
- Add era command in aggregator cli #756: the PR is in draft as it is not completed yet. It should be ready to merge shortly
-
We have merged the following PRs:
- Load era reader adapters from config #751 which closes issue Load era reader adapters from config in signer and aggregator #732
- create a producer/consumer event channel to monitor signers version #750 which closes the following issues: Create events and send them on the channel on the producer side #741, Implement an event producer/consumer via channel #738 and Create database and configuration to save the events on the consumer side #740
-
We have created the following issue Update doc for Era Reader signer config #752 in order to update the documentation for the SPOs not too early
- We have worked on the following PRs:
- For epic Implement signer versions deployment monitoring #718: [wip] first POC #750. We have paired on fixing a problem with building a thread safe code that uses a SQLite connection that needs to be read or write. We should merge this PR shortly
- For issue Load era reader adapters from config in signer and aggregator #732. The PR Load era reader adapters from config #751 is almost finished and should be merged tomorrow. We have added a default configuration for the era reader adapter so that we don't need to reconfigure the signer nodes until the era switch based on an on-chain transaction is completed. This PR also includes a modification of the CI so that it can handle multiple eras in the end to end test
- We have paired on the issue Define the structure of an event #739 and closed it. The outcome of the session is the following diagram that describes the design of the event store and of the events themselves:
-
We have also paired on the issue Implement an event producer/consumer via channel #738 to complete the first draft on which we worked last week, and to confirm that we could produce an event and receive it on a different thread
-
We have reviewed and merged the following PRs:
-
During our team session, we have discussed about the following subjects:
- Strategy to migrate the store of the aggregator: it seems like a good idea to prepare and maintain a road book to keep track of best practices when administering an aggregator (such as not upgrading to close to era transition e.g.)
- Testing Mithril on the
mainnet
:- We could prepare a build of the cardano-cli, host it somewhere, and use it to test the new stake distribution computation
- If we keep having
glibc
errors when building the cardano cli on the master branch, we will ask for support from the Cardano team - We have validated that the solution we have designed to stress test the network by implementing a fake cardano cli that will avoid PoolId spoofing is a good option
- Displaying the SPO ticker on the explorer: we could do it by querying the API from
cexplorer
for example or retrieve it from the pool metadata when the signer registers (seems much more complicated)
-
We have released the new distribution
2306.0
which has been successfully deployed on therelease-preprod
network -
We have merged the following PRs:
-
The following PRs are ready to be reviewed:
- Add always restart Cardano nodes in infra #748: this will avoid having a crashing cardano node not attempting to restart in one of our Mithril networks
- Fix 'UnregisteredInitializer' error on signer #749: this PR fixes an error that occurs when a signer tries to sign and is not registered for the next epoch
-
We have groomed the epic Implement signer versions deployment monitoring #718 and created the following issues:
- Add the signer node version in a header NODE_VERSION #737
- Implement an event producer/consumer via channel#738: we have started pairing on a draft implementation of this issue
- Define the structure of an event #739
- Create events and send them on the channel on the producer side #741
- Create database and configuration to save the events on the consumer side #740
- Create a query to extract the node versions stakes distribution#743
-
We have monitored the next distribution pre-release
2306.0-prerelease
: everything is working as expected. We have scheduled the release of the distribution tomorrow -
We have also worked on some fixes/optimizations:
- Fix CI flakiness with ImmutableFileObserver(Missing) error #733: fixed the flakiness of the CI and merged the PR
- display better errors #746: created a PR that enhances error messages displayed by signer and aggregator when they fail. It is ready to be merged
- Enhance errors catching in end to end test #747: created a PR that uses the status code from shell commands and early stops the end to end tests when this happens. Should be merged tomorrow
- Update dependencies #736: created a PR to include project dependencies updates and fixes. It will be merged once the new distribution is released
-
Update current documentation #735: created a PR to update
next
documentation tocurrent
. It will be merged once the new distribution is released
-
We have paired on these subjects:
- Handle API Version with Era switch #727: this is a little bit more tricky than what we expected. This raises some questions for which we need to find answers. We will keep working on this issue in the following days
- Define relational design of stores #476: reviewed and closed
- Implement an EraReader adapter with on chain transaction as source #710: reviewed and merged
- Fix CI flakiness with ImmutableFileObserver(Missing) error #734: we created this issue as we have noticed high rate of error on the test end to end on the CI. We are working on a fix that will be merged shortly
-
Also, we have created a new pre-release for the next distribution
2306.0-prerelease
. We will test it with the SPOs registerd on ourpre-release-preview
network and we expect a release by end of week
-
We have kept monitoring the issue Signer can't sign on testing-preview network #730: the re-genesis of the
testing-preview
worked as expected and new certificates are being produced ontesting-preview
(with the signer that did not had signing troubles). We will see tomorrow if the other signer is back in the signatures. We have not identified yet the source of the problem and we keep investigating -
We have paired on the issue Define relational design of stores #476 and we have achieved a first version of the aggregator store relational design:
-
We have reviewed the PR Implement Era Reader on chain adapter #721 which should be completed and merged tomorrow
-
We have merged the following PRs:
- add era reader #720 which closes the issue mithril #709 Implement an EraReader trait that gathers era activation data #709
- Add new signer in 'testing-preview' #731 which closes the issue Add a new SPO on testing-preview network #729
-
We have paired on the issue Implement an EraReader trait that gathers era activation data #709 and added the documentation on the updates done on the state machine of the signer. The PR is ready to merge, which should be done tomorrow 💪
-
We have merged the following PRs:
-
We also have created the following issues:
-
Signer can't sign on testing-preview network #730: a bug that prevents some signer to sign on the
testing-preview
network with aProtocolInitializerNotRegistered(CoreRegister(UnregisteredInitializer))
error. This error created a gap in the certificate chain of thetesting-preview
network which has been re-genesis. We will closely monitor the problem and see if it is reproduced in the following days. Also, we may post-pone the distribution that we have scheduled to prepare this week if we are not able to fix the problem -
Add a new SPO on testing-preview network #729: this will add a third SPO on the
testing-preview
network which will help debugging and reduce the need for re-genesis
-
Signer can't sign on testing-preview network #730: a bug that prevents some signer to sign on the
-
Finally, we have discussed about the following subjects during our team session:
- UtxoHD compatibility with Mithril: at first sight, it appears that it is not a problem for Mithril: we currently don't sign the ledger state that is not deterministically produced on the cardano nodes. However, we will review the specs and dive deeper in order to validate that this doesn't break anything on the snapshot creation/restoration process
- Possibilities to create a decentralized setup of the Mithril network:
- We just have the constraint that the solution can easily be switched to another with an adapter mechanism for example
- Use of
libp2p
Rust implementation to create a peer to peer network (between Mithril signer relays and/or aggregators) that would implement thepub/sub gossip
protocol. We could also use theKademlia
implementation to provide peer to peer discovery of the nodes (this would help to bootstrap an aggregator with data from another aggregator) - Draft an implementation on the Cardano node with mini protocols
- Use of IPFS only (if that alleviates the need to maintain a peer to peer network?)
- Use a tool like
wireguard
to connect the nodes
-
We have mainly paired on the issue Implement an EraReader trait that gathers era activation data #709 that should be completed and merged shortly
-
We have also discussed about some edge cases that we have identified concerning seamless updates:
- When releasing a new era switch enabled version, we will need to do a hot switch of the API version as well. This is something that is not currently supported. We have created an issue to address this problem: Handle API Version with Era switch #727
- When the underlying Cardano cli used by the signer and aggregator nodes is changing its interface, we will need to be able to handle both versions in a soft update (with a switch based on the cardano cli version)
-
Finally, we have worked on the issue Upgrade Cardano node to 1.35.5 #725 and created a PR that should be merged shortly. It also fixes few bugs related to building develoepr Docker images and timeout management when running the
devnet
-
The fix published with the
2304.1-prerelease
distribution has fixed the problem on thepre-release-preview
network and signer are able to sign back. We have thus released the2304.1
distribution 🎉. We will keep closely monitoring thepre-release-preview
andrelease-preprod
networks in the following days -
We have paired on defining the new relational design of the stores of issue Define relational design of stores #476. This is a work in progress and we will produce a database schema as an outcome of our work
-
Finally, we have continued pairing on the implementation of the issue Implement an EraReader trait that gathers era activation data #709 with implementing the dummy and bootstrap adapters and wiring the era reader in the dependencies of the signer and aggregator nodes
-
We have followed up with the issue Signers are unable to sign with 2304.0-prerelease #716:
- We have seen in the aggregator logs that the expected behavior is occurring: no more signer is able to register with an epoch discrepancy during the update of the stake distribution
- Some SPOs have updated their node yesterday and should therefore be able to sign as early as tomorrow. Others should be able to sign back on the day after tomorrow
-
We have mainly focused our efforts in pairing on the epic Implement eras behavior switch #707:
- Implement an EraReader trait that gathers era activation data #709: we have created a draft PR add era reader #720 that should be merged shortly
- Implement an EraReader adapter with on chain transaction as source #710: we have created a draft PR Implement Era Reader on chain adapter #721
-
We have reviewed and merged a fix for the Signers are unable to sign with 2304.0-prerelease #716: a new pre-release version of the distribution has been created
2304.1-prerelease
. We need to wait a2
epochs delay before we can confirm that the fix worked as expected, and we will release the distribution then -
We have also reviewed and merged the following PRs:
-
Also, we have worked on the Era mechanism:
- We have paired, reviewed and will merge shortly the issue Implement an EraChecker that checks if an era is active #708
- We have started grooming the epic Implement signer versions deployment monitoring #718 which relates to monitoring the stake share adoption rate of an era compatible version of the signer nodes
-
Finally, we had a team session during which we addressed the following topics:
- Need for signing the era activation marker stored in an on chain transaction, so that we can filter it out of the utxos of an address
- We will probably use a new secret/verification keypair with the same format as the Genesis keypair
- In the long run, it would be nice to add a multi signature for this transaction ( same as hard fork in Cardano node)
- We have noticed that the deployment model is not completely defined and that some SPOs are running their test signer nodes on the relay nodes (probably copying their KES keys on the relay node). The next sprint review will be a good time to talk about this topic with the SPOs
- We have also talked about the possibility of relying on a different decentralized network than the Cardano node network layer
- We have mainly worked on the problem with almost all signer nodes not able to sign on the
pre-release-preview
network after updating to the2304.0-prerelease
distribution:- After investigation, we may have found the origin of the problem which is described in this issue Signers are unable to sign with 2304.0-prerelease #716
- A PR with the proposed fix has been created and will be reviewed and merged early next week: Fix signer unable to sign with 2304.0 prerelease #717
- We will postpone the release of the
2304
distribution until the problem is solved
-
There seems to be a problem with the new
2304.0-prerelease
: Many signers are receiving aProtocolInitializerNotRegistered(CoreRegister(UnregisteredInitializer))
error when they are trying to sign. This means that the verification key they are using is not the one that was registered2
epochs ago (i.e. before activation of the new release). The signer node we are maintaining does not have the problem, and the certificate chain is regularly appended at the moment. However, if we had used the full security parameters there would not have been enough signers to create new multi-signatures and the chain would have been broken. We have not seen such a problem on thetesting-preview
network. We will keep investigating the issue and probably postpone the release of the distribution until we understand the source of the problem -
We have prepared the demo path of this iteration:
- Introduction
- Showcase of the backward compatibility of messages
- Showcase of the API version enforcement
- Presentation of the PoC for era activation with on chain transaction
- Conclusion/Next steps
- QA
-
Showcase of the backward compatibility of messages
-
Showcase of the API version enforcement
-
Presentation of the PoC for era activation with on chain transaction
-
Showcase path for backward compatibility and version enforcement:
# Demo: Backward compatibility and API version enforcement
---
# Setup demo
## Download source (if needed)
git clone https://github.com/input-output-hk/mithril.git
#or
git clone git@github.com:input-output-hk/mithril.git
## Build docker images
cd mithril/
### Build docker images 0.1.1
git switch api_0.1.1
rm -f mithril-aggregator/mithril-aggregator && rm -f mithril-signer/mithril-signer && rm -f mithril-client/mithril-client
make build && cp target/release/mithril-aggregator mithril-aggregator/mithril-aggregator && cp target/release/mithril-signer mithril-signer/mithril-signer && cp target/release/mithril-client mithril-client/mithril-client
docker image rm mithril/mithril-signer-0.1.1 mithril/mithril-client-0.1.1 mithril/mithril-aggregator-0.1.1 --force
docker build -t mithril/mithril-aggregator-0.1.1 -f mithril-aggregator/Dockerfile.ci .
docker build -t mithril/mithril-signer-0.1.1 -f mithril-signer/Dockerfile.ci .
docker build -t mithril/mithril-client-0.1.1 -f mithril-client/Dockerfile.ci .
### Build docker images 0.1.2
git switch api_0.1.2
git rebase api_0.1.1
rm -f mithril-aggregator/mithril-aggregator && rm -f mithril-signer/mithril-signer && rm -f mithril-client/mithril-client
make build && cp target/release/mithril-aggregator mithril-aggregator/mithril-aggregator && cp target/release/mithril-signer mithril-signer/mithril-signer && cp target/release/mithril-client mithril-client/mithril-client
docker image rm mithril/mithril-signer-0.1.2 mithril/mithril-client-0.1.2 mithril/mithril-aggregator-0.1.2 --force
docker build -t mithril/mithril-aggregator-0.1.2 -f mithril-aggregator/Dockerfile.ci .
docker build -t mithril/mithril-signer-0.1.2 -f mithril-signer/Dockerfile.ci .
docker build -t mithril/mithril-client-0.1.2 -f mithril-client/Dockerfile.ci .
### Build docker images 0.2.0
git switch api_0.2.0
git rebase api_0.1.2
rm -f mithril-aggregator/mithril-aggregator && rm -f mithril-signer/mithril-signer && rm -f mithril-client/mithril-client
make build && cp target/release/mithril-aggregator mithril-aggregator/mithril-aggregator && cp target/release/mithril-signer mithril-signer/mithril-signer && cp target/release/mithril-client mithril-client/mithril-client
docker image rm mithril/mithril-signer-0.2.0 mithril/mithril-client-0.2.0 mithril/mithril-aggregator-0.2.0 --force
docker build -t mithril/mithril-aggregator-0.2.0 -f mithril-aggregator/Dockerfile.ci .
docker build -t mithril/mithril-signer-0.2.0 -f mithril-signer/Dockerfile.ci .
docker build -t mithril/mithril-client-0.2.0 -f mithril-client/Dockerfile.ci .
---
# Demo: Run demo
## Reset demo if needed
docker stop $(docker ps -a -q)
rm -rf artifacts/node-bft1/mithril/aggregator
rm -rf artifacts/node-pool1/mithril/signer
rm -rf artifacts/node-pool2/mithril/signer
## Start Cardano network
cd ../devnet-demo
./devnet-stop.sh && NODES=cardano EPOCH_LENGTH=60 ./devnet-run.sh
watch -c "NODES=cardano ./devnet-query.sh"
watch -c "docker ps --format '{{.Names}} - {{.Status}}' | sort"
## Create functions
function container_up {
API_VERSION=$1 docker-compose -f docker-compose-demo.yaml --profile $2 up --remove-orphans --force-recreate -d --no-build
}
function container_down {
docker stop $2-$1
}
function container_run {
API_VERSION=$1 docker-compose -f docker-compose-demo.yaml run $2 $3 $4
}
function container_exec {
docker exec -it $2-$1 $3 $4
}
function container_logs {
docker logs -f $2-$1
}
## Backward compatibility
### Start network with version 0.1.1
container_up 0.1.1 mithril-aggregator
container_up 0.1.1 mithril-signer-1
container_up 0.1.1 mithril-signer-2
container_run 0.1.1 mithril-aggregator-genesis
container_logs 0.1.1 mithril-signer-1 2>/dev/null | grep "register_signer" | grep "new_field"
container_logs 0.1.1 mithril-signer-2 2>/dev/null | grep "register_signer" | grep "new_field"
container_logs 0.1.1 mithril-aggregator 2>/dev/null | grep "register_signer" | grep "new_field"
container_run 0.1.1 mithril-client list
### Update network partially with version 0.1.2
container_up 0.1.2 mithril-aggregator
container_up 0.1.2 mithril-signer-1
container_logs 0.1.2 mithril-signer-1 2>/dev/null | grep "register_signer" | grep "new_field"
container_logs 0.1.1 mithril-signer-2 2>/dev/null | grep "register_signer" | grep "new_field"
container_logs 0.1.2 mithril-aggregator 2>/dev/null | grep "register_signer" | grep "new_field"
container_run 0.1.1 mithril-client list
container_run 0.1.2 mithril-client list
## API version enforcement
### Update network fully with version 0.2.0
container_up 0.2.0 mithril-aggregator
container_up 0.2.0 mithril-signer-1
container_up 0.2.0 mithril-signer-2
container_run 0.1.1 mithril-client list
container_run 0.1.2 mithril-client list
container_run 0.2.0 mithril-client list
-
We have mainly paired on the issue Implement an EraChecker that checks if an era is active #708. We have decided to re challenge the static implementation to make testing more straightforward. In the mean time we have also made an attempt at coding a breaking change that will probably happen when we add a new part in the message to sign (the Epoch in this illustrative example). We have used the
Either
pattern implemented by the crateeither
to implement both behavior (Left
is for new era, andRight
is for legacy era). We have published the modification made to themithril-common
andmithril-aggregator
crates on this branch ensemble/708-create-era-checker-test-new-era. We will keep publishing on this branch themithril-signer
adaptation and we will use this example as a real test case for the overall implementation of the era. Also it gives us some track record and experience on how to handle breaking changes and what type of modification we should not do (e.g. modifying the entities stored in a non backward compatible fashion so that it could crash the nodes at era transition) -
We have also created the pre-release of the new
2304
distribution that is available at2304.0-prerelease
. It is being qualified and should be ready to be released by end of week. -
Finally, a problem has been identified on the API version enforcement that should be fixed by this PR Fix API version sent in wrong header #712
- We have completed the grooming of the epic Implement eras behavior switch #707 and created the following issues:
- Implement an EraChecker that checks if an era is active #708: we have paired on this issue and a PR should be ready shortly
- Implement an EraReader trait that gathers era activation data #709
- Implement an EraReader trait implementation with on chain transaction as source #710
-
We have reviewed the updated PoC for issue PoC Read/Write transaction on chain (for version activation) #672. The second solution wich does not rely on a smart contract seems to be the best solution in terms of simplicity and maintenance. We will probably add a signature to the data embedded in the datum to be able to authenticate them
-
We have reviewed and merged the issues (this closes the epic Implement backward/forward compatible API messages #688):
- Implement the mechanism for snapshots list #698
- Implement the mechanism for certificate pending #696
- Update enforcement of API version with Semver #705
- We have discussed about the proper way of handling the entities in the messages definition: we agreed that we will keep using entities in the messages and rely on their golden tests to detect problems when (if) we make changes to the entities used in the messages. This will simplify the maintenance and avoid code duplication
-
We have also started grooming the epic Implement eras behavior switch #707. We will finish slicing the issues tomorrow and start working on their implementation
-
We have discussed about the separation of the messages from the entities for sub fields of the message for issue Implement backward/forward compatible API messages #688. We will probably have a fully separated approach: the messages will not make any reference to the entities in their definition. We will resume this work next week. Also we have noticed that the current implementation of API version enforcement is not compatible with the backward compatibility we are implementing: we have created a new issue Update enforcement of API version with Semver #705 in order to refine the compatibility check with Semver
-
We have talked about the PoC on which we worked for issue PoC Read/Write transaction on chain (for version activation) #672 and decided to rechallenge it with a second implementation that does not use a smart contract. This has been completed and works well, with a much simpler setup. Some drawbacks / pain-points still exist and we will brainstorm about them during our team session next week
-
Finally, we have discussed about the possibility of removing the
allow_non_certified_registration
Rust feature and all the dead code that we maintain with it. The only remaining point was to find a way of making stress test on amainnet
like environment without spoofing the pool ids (which this feature enables). It appears that a solution could be to rely on a syntheticmainnet
(same number of SPOs and same type of immutable database, i.e. immutable files numbers and size):- A fake cardano node would be responsible to create a regular intervals new immutable files in the database folder that is used by the aggregator/signers
- A fake cardano cli would be responsible for answering requests on the epoch, stake distribution, ...
- These fake cardano node & cli would communicate with a (remote?) service to get all node synchronized, and that would be in charge of gathering the verified pool ids of the signers in order to create a usable stake distribution
- Given the average memory usage of signer / aggregator nodes (~500MB/~1,000MB), and the number of nodes in a
mainnet
like network (~3000) it is not likely possible to host them all at once on a single computer - We will keep thinking about the design of such an infrastructure to host our stress tests and decide shortly of the opportunity to get rid of the
allow_non_certified_registration
feature
-
We have talked about the issue PoC Read/Write transaction on chain (for version activation) #672 and the problem we have noticed regarding the possibility to have multiple utxo for the same script address that would be in conflict. We will keep investigating in order to find a solution
-
We have mainly paired on the implementation of the epic Implement backward/forward compatible API messages #688:
- Implement the mechanism for the signature registration #693: reviewed and merged
- Implement the mechanism for epoch settings #695: created, reviewed and merged
- Implement the mechanism for certificate pending #696: created and waiting for review
- Implement the mechanism for certificate #697: created, reviewed and ready to be mreged
- Implement the mechanism for snapshots list #698: created and pending review
- Implement the mechanism for snapshot #699: created, reviewed and ready to be merged
- We need to work on a way to efficiently remove the
entities
dependency from themessages
module inmithril-common
in order to handle properly the fields of messages that are entities currently
-
We have paired on implementing the epic issue Implement backward/forward compatible API messages #688:
- Implement the mechanism for the signer registration #689: The issue has been merged and closed and gave us a clear path on how to implement the following adaptation of the entities to messages
- Implement the mechanism for the signature registration#693: The issue has been created, and a PR has beeen created that will be merged shortly
- We will create new issues for the remaining entities that need to be converted
-
We have discussed about the next epic we need to groom that is related to how we will implement the eras activation and behavior switch. We will resume the grooming shortly and slice the epic into tickets
-
We have closed the issue PoC Read/Write transaction on chain (for version activation) #672. There are a few tricky questions that we need to investigate related to the fact that multiple utxos can be created for a script address but that it is difficult to discriminate them. We will keep on investigating on this problem and see if it is a blocker for the implementation era activation marker data source
-
Also we have merged an optimization on the execution time of the end to end test with PR Accelerate end to end test execution #692
-
We have debriefed on the PoC done with
protobuf
andavro
:- We have agreed that we will work on an in-house development that will allow us to leverage backward/forward compatibility for the API messages
- This is the outcome that we expected from issue PoC handle backward compatibility of API messages #673 which is now closed
- We have worked on a Rust playground in order to test some hypothesis with
serde
here, thus we don't need to make a specific PoC for this - We have decided to move forward with the implementation of this backward/forward compatibility with the epic Implement backward/forward compatible API messages #688
- We have paired on the first issue of this epic Implement the mechanism for the signer registration #689
-
We have also discussed about the issue PoC Read/Write transaction on chain (for version activation) #672 which still under development and that should be completed shortly
-
Finally, we have worked on an optimized solution for fixing the errors with the
make build
andmake test
commands when ran from the root of the repository and a PR has been created Fix make errors from root v2 #687
-
We have talked about the usage of the
portable
feature. We have decided to use it by default when building the nodes in order to maximize coverage. This will avoid the crashes encountered by some SPOs when they are building the binaries on a computer and deploying them on a different computer (with a different CPU). In the long run, SPOs will be able to build without theportable
feature as an optimization of their node. The associated PR has been merged Fix make build portable #685. Also, a fix has been merged in order to avoid SPOs receiving error messages when they run themake build
command from the root of the repository with PR Fix make errors from root #684 -
We have discussed about the PoC under development for the issue PoC handle backward compatibility of API messages #673. It appears that the added value from library such as
avro
andprotobuf
is not as much as what we could have expected. In our opinion, we will be able to handle backward compatible messages with the usage of someserde
annotations as well as a golden test strategy when updating our models. We will conduct a PoC for a in house development shortly -
We have merged the issue Update Run Signer Node documentation #681 related to the incomplete documentation for building a signer node as a SPO
-
During our team session:
- We have talked about the issue PoC Read/Write transaction on chain (for version activation) #672:
- So far the PoC is working in a very basic setup and we can retrieve some Datum from the utxo of the script address with the Cardano cli
- Possible alternatives to writing smart contracts are:
- A good entry point for using Plutus TX without nix is this repository: https://github.com/abailly/black-jack
- We have 2 options for deploying the script address that will be used to retrieve the era activation markers:
- Burn the address at compile time (depends on the Mithril network being used)
- Use a configuration option for this address. In order to avoid possible attacks, it looks reasonable to sign this script address with the Mithril Genesis Keys (preferred solution)
- Regarding the security of the smart contract:
- We will conduct at least an internal audit
- As the features are very simple, we shouldn't need any formal verification
- A solution that makes use of the possibility for an utxo to have Datum exists but would require to change the address requested after each update. We will probably not use this technique.
- Regarding the stress testing of the network in a
mainnet
like setup. We will probably work with synthetic data. However, we will get in touch with the Ledger/Performance team of the Cardano node to get some insights/ideas on how they perform their load tests
- We have talked about the issue PoC Read/Write transaction on chain (for version activation) #672:
-
We have released the new
2303.0
distribution:- It has been successfully deployed to the
release-preprod
network 🚀 - However, a SPO has encountered an issue with a
SIGILL
error due to an "old" CPU (Q3'14). The quick-fix was to activate theportable
feature at compilation. We maybe need to also force this feature in themake build
script used by the SPOs to build their node. This will ensure that we have a higher adoption rate, even though there is a small impact on the performance
- It has been successfully deployed to the
-
We have reviewed the following PRs:
- Upgrade devnet to Cardano 1.35.4 #667, which has been merged
- Update PR template #680, which has been merged
- Update build documentation #682 which will be merged shortly
-
We have prepared the demo path of this iteration:
- Introduction
- Presentation of the new Batch Verification of Mithril multi-signature
- Presentation of the strategy for Mithril Network Update
- Showcase of the Mithril Client multi-platform test workflow
- Conclusion/Next steps
- QA
-
We have talked about the solution implemented in the PR Upgrade devnet to Cardano 1.35.4 #667 to fix the flakiness occurring at protocol parameters transition:
- We agreed on using
1
more offset when recording the new protocol parameters. This avoids broadcasting 2 different versions of the epoch settings during the same epoch (happens when the node restarts and the protocol parameters are updated in the aggregator) - We decided to activate the matrix end to end tests with
3
runs - The PR should be merged shortly
- We agreed on using
-
We have also discussed and exchanged about the PoC under development of the issue PoC handle backward compatibility of API messages #673:
- We probably don't need many of the features provided by
protobuf
andavro
, but it is worth assessing the feasibility of these solutions - We will probably work on a more advanced PoC with an in house develpment based on
serde
and default values handling
- We probably don't need many of the features provided by
-
We have prepared the pre-release version of the new distribution:
2302.0-prerelease
, which has been deployed to thepre-release-preview
network successfully. The distribution2203
should be released by the end of week -
We have noticed that some SPO are building their signer nodes from the main branch which leads them to not being able to sign as expected (as they are using a version that has no been released:
- We have created the issue Update Run Signer Node documentation #681 in order to provide build instructions aligned with the release process
- It was tricky to debug as the versions of the nodes are updated just before creating the distribution. We have decided that it is better to update the versions along the way in each PR to avoid confusion. Thus we have created a PR Update PR template #680 that adds this step in he verification checklist
-
We have also discussed and kept working on the PoCs in progress:
-
We have reviewed the ADR created for the issue Write ADR for graceful updates #671 and made adjustments following the team remarks from yesterday meeting
-
We have also talked about the issue PoC handle backward compatibility of API messages #673:
- We have clearly defined the perimeter of the PoC with the scenario we want to test
- The outcome we expect from the PoC
- Created the first sub issues to test different implementations:
-
We have paired on the issue Upgrade Cardano devnet to 1.35.4 #523 and tried to make the
devnet
work on all the developers computers. We have also kept on fixing the flakiness that occur -
During our team session, we have reviewed the draft ADR for the issue Write ADR for graceful updates #671:
- We could use a 2 phase commit in order to announce an upcoming era:
- First step: announce the new era not activated yet
- Second step: activate the new era
- All the signers should be able to verify the adoption rate (which means that they should also be able to compute stakes era adoption rate given nodes version). This will help avoid manual errors and also avoid some attacks were the transitioned era does not exist (leading to all signer nodes down). The threshold used in that case would be hard-coded in the nodes and would be different than the activation threshold. Also, we would need to embed in the node the correspondence table of the eras included in the node versions.
- We could use the Mithril certificate to provide the era marker and have the signers create automatically the transaction when the threshold is reached. But it is probably a better idea to keep a manual activation for the era transition
- We definitely miss an incentive mechanism to avoid being stuck with not reaching the threshold and therefore not being able to activate an era
- How do we handle rollbacks? What if something goes wrong?
- Maybe a dual mode would be the solution where there would be 2
mainnet
networks (one for preview and the other for real use), and the preview would be activated one or more epoch in advance on the enrolled nodes. In a decentralized setup, we would need to be able to discriminate all the messages by Mithril network (and also to target a Mithril network with an era activation transaction) - This means that every information that leaves the node should probably be labeled by the version of the node it comes from
- We could use a 2 phase commit in order to announce an upcoming era:
-
Finally, we discussed the possible implementations for the issue PoC Read/Write transaction on chain (for version activation) #672:
- Transaction Metadata: probably not the better option
- Transaction Data: we create a Plutus address for a script and we use the TxOutDatum as a placeholder for the era activation information. These information could be read by the cardano cli
- The chain could also be read with:
- Reading the database immutable files themselves
- Using oura or scrolls from txpipe to follow the chain
- Regarding the secrets management, the SRE team is probably able to provide insights
- On the long run, we will probably delegate the era activation to the governance mechanism of Cardano (Voltaire)
- We have paired on the issue Upgrade Cardano devnet to 1.35.4 #523:
- We have aligned the use of the
portable
Rust feature that is now used in the unit tests and release builds (used by the test lab, the docker images, the debian packages). This will avoid the flakiness of the test lab - We have investigated the last flakiness observed that occurs at epoch transition. After investigation, it appears that this could be due to the offset applied when updating the protocol parameters: it should probably set to one more epoch (in order to avoid next protocol parameters fluctuation during an epoch). We will attempt to fix this problem shortly
- We have aligned the use of the
-
We have mainly worked on the redaction of the issue Write ADR for graceful updates #671. We have created a PR Mithril Network Update ADR #676. Next steps are:
- Make it review by all the team
- Prepare tickets for its implementation
- Answer remaining open questions such as:
- What value do we need for the stake share threshold to be reached before activating a new era?
- How to accurately compute this threshold given the evolution of the stakes on the Cardano chain (retiring/new pools)?
-
We have also tried to activate the
babbage
era on the devnet. Unfortunately this is not working properly: the Cardano nodes stop producing new slots at hard fork activation. We will thus complete the issue Upgrade Cardano devnet to 1.35.4 #523 with running only on thealonzo
era. Running the babbage era is not mandatory at the moment, but we will probably need it when we implement the new Mithril network update strategy.
-
We have reviewed and merged the following PRs:
-
We have mainly focused on drafting the ADR for the issue Write ADR for graceful updates #671:
- We will handle breaking changes with an era mechanism similar to what is used to active a hard fork on Cardano
- An era (which could be named differently if this creates confusion for the SPOs) is a sort able value (that will be probably represented with variants of an enum in Rust)
- New eras are incrementally created
- When we need to get a minimum threshold of the stakes to activate a breaking change update, an era is used
- When we need to make a soft update, there is no change in an era
- A new version of the signer/aggregator node will embed code that can act differently depending on the active era
- The era is activated for the next (or later) epoch whenever the penetration rate threshold of the version is reached
- The era transition will not be associated to a database upgrade for simplicity (we will soft update the database first if needed)
- The era activation is setup in an on chain transaction
- An updated node will activate the new era once the associated epoch is transitioned to
- A non updated node will detect era transitions once they are scheduled:
- Before it is activated: it will display warning in logs, asking for update
- After it is activated: it will crash, and display a clear error message asking for update
- If we need to add new configuration parameters, it will be done in the first version of the node that embeds the new era (and the configuration will be used when the era is activated)
- This mechanism will be used in situations such as:
- Signing a new type of message (to avoid not reaching the Mithril protocol quorum)
- Whenever a new version of the cryptographic library requires that all signer use the same version for the network to produce valid multi-signatures/certificates
- Previous case, including the case where a re-genesis of the certificate chain is needed (to avoid failed re-genesis)
- When we will change from a centralized setup to a decentralized setup:
- Transition to on-chain signer key registration
- Switch to Cardano network backend
- Regarding the monitoring:
- We will use versions of the signer/aggregator nodes to monitor the adoption rate of an era enabled nodes (expressed in %age of the Mithril stakes)
- Each version of the nodes will be mapped to their associated enabled era(s) in a dedicated wiki page of the repository
- We will rely on the signer registration call made by the signer to the aggregator to record the versions of the node:
- By adding a HTTP header with the version to the request sent in the centralized setup
- By appending the version to the transaction written on chain in the decentralized setup
- We will probably need to use a relational database model to compute accurately the adoption rate (and eventually display it on the explorer)
- We will probably need to think about the way we could update the certificate chain transparently to be more resistant to cryptographic updates (will be done separately)
-
We had discussions about the issues regarding flakiness of the test lab in the CI. There are 3 options:
- Compile with
portable
feature all the binaries and use GitHub runners:- We lose ~20% of performance on the STM computations
- We avoid ~30% of flakiness of the test lab
- We could add a matrix test lab with 5/10 runs in order to early detect flakiness
- Compile without
portable
feature all the binaries and use GitHub runners (current):- We have optimum performance on the STM computations
- We have ~30% of flakiness of the test lab
- No matrix test lab in this situation
- Compile without
portable
feature all the binaries and use hosted runners (possible?):- We have optimum performance on the STM computations
- We avoid ~30% of flakiness of the test lab
- We could add a matrix test lab with 5/10 runs in order to early detect flakiness
- Compile with
-
We have created some new issues related to the Graceful Updates on Mithril Network:
-
Write ADR for graceful updates #671:
- We have paired on this issue and have discussed the design of the ADR
- We will allocate 1-2 h a day to discuss on this topic during this iteration and probably the next
- We have already come to the conclusion that we should use a mechanism of feature flag in order to silently deploy a new breaking change version of the nodes that will be activated later, once enough signers have installed it. Soft updates will be deployed similarly as today as different versions will be compatible
- We will not implement an automatic update feature as it brings too many security issues
- This means that we will have to rely on an incentive mechanism of the SPOs in order to reach the adoption rate mandatory to activate a new version. If not, we could get stuck in the position where the threshold is never reached and the new version can not be activated
- We will probably have the signer and aggregator state machines perform an update of the version activation data when an epoch transition is detected (or at node startup). This will allow us to activate a new version for the very next epoch easily (even though there could be a transitory period during which activation occurs)
- We will need to trigger database migration as well at new version activation
- We still have to determine which version will be used to provide feature activation
- Also the testing strategy still needs to be defined for the test lab
- PoC Read/Write transaction on chain (for version activation) #672
- PoC handle backward compatibility of API messages #673
-
Write ADR for graceful updates #671:
-
We have reviewed and discussed about the issues that we have created before the holiday break
-
We have dedicated our efforts to debugging some flakiness on the CI that were discovered during the implementation of issue Upgrade Cardano devnet to 1.35.4 #523:
- 30% of the run were blocked on the signer nodes at the "signer registration" step: nothing happened after this step. We have found the problem that was related to the
portable
feature that should be used when we run pre-compiled binaries in the test lab - We still have some flakiness due to many immutable files being created at very close timestamps around epoch
10
. We will try fine tuning the Cardano node genesis configuration and the Mithril nodes in order to void this effect - We have also investigated the problem that prevents the end to end test to run on some local development environments
- 30% of the run were blocked on the signer nodes at the "signer registration" step: nothing happened after this step. We have found the problem that was related to the