-
Notifications
You must be signed in to change notification settings - Fork 41
Logbook 2022 H2
-
We have talked about the issue Upgrade Cardano devnet to 1.35.4 #523. The upgrade to the latest version of the Cardano node has introduced flakiness in the end to end test. We are currently working on fine tuning the genesis block of the
devnet
to fix these hiccups. We have also talked about the usage of a custom environment variable that will allow us to update the url where the cardano node is downloaded without modifying the workflow -
We have paired and merged the issue Refactoring Crypto test helpers #663:
- It introduces a more versatile & clear way of preparing protocol fixtures to feed our unit & integration tests 💪
- During our work we have identified other points that need refactoring:
-
We have discussed about how we could remove the 'allow_non_certified_registration' feature and completely remove the uncertified part of the code. In order to do this, we need to investigate how we can dodge the spoofing of the Pool Ids from the signers nodes when we want to simulate stress tests in as close as possible conditions as in
mainnet
(i.e.3K+
SPOs and100GB+
database). We will work on this subject shortly
-
We have paired on the redaction of a document that prepares our work for Handling Graceful Updates on Mithril Network:
- We have raised many questions that we need to answer
- We will proceed with the redaction of an ADR
- We will PoC:
- Interaction with the Cardano chain (to activate a new version): read & write transactions
- Handle backward compatibility of API messages (with protobuf, AVRO, in house development etc.)
- Once these steps completed, we will move forward with the implementation
-
We have continued pairing on the issue Refactoring Crypto test helpers #663 for which a PR should be ready shortly
-
We have paired and merged the issue Deactivate uncertified signer registration #621:
- We have fixed the difficulties we faced yesterday regarding the usage of the Rust features when artifacts are built from the workspace. For this we have removed the usage of a feature flag that must be activated only on one crate: it must be activated for all at once. In our case, we have decided to simply not use one anymore and it lead us to refactor the protocol demo tool and make it use its own types (including direct access to
mithril-stm
types in order to keep it chain agnostic) - We have also deactivated the uncertified signers from the Mithril networks
- We have fixed the difficulties we faced yesterday regarding the usage of the Rust features when artifacts are built from the workspace. For this we have removed the usage of a feature flag that must be activated only on one crate: it must be activated for all at once. In our case, we have decided to simply not use one anymore and it lead us to refactor the protocol demo tool and make it use its own types (including direct access to
-
We have paired on the issue Refactoring Crypto test helpers #663 and we have started implementing a PR. We will continue working on it tomorrow
-
We have also discussed about the way to implement the upgrade strategy we have talked about yesterday during our team session
-
Finally, we have created an issue Add context to errors #665 in which we will try to provide better debugging information by adding context to errors and by providing less technical error messages
-
We have closed the following issues and PRs:
-
Re-genesis Mithril test networks #651: following the re-genesis of its certificate chain, the
release-preprod
network is producing new certificates, as expected, since yesterday -
Optimize Snapshot Digest Computation #510: the cache is now available on the
testing-preview
and we already notice a speedup 💪 - Enforcement of API Protocol versions in Client/Signer/Aggregator #633: the nodes now embed a verification layer that enforces the usage of compatible versions of the nodes
-
Re-genesis Mithril test networks #651: following the re-genesis of its certificate chain, the
-
We have created a new issue Refactoring Crypto test helpers #663 to refactor the cryptographic test helpers used in the tests to provide easy access to protocol ready to use signers (key registration with Cardano certification, certificate chain, ...)
-
We have also paired on an issue with the PR Decommission signer registration with declarative PoolId #653 for which tests that were broken locally were still succeeding on the CI. After investigating the cache, we verified that they were not the source of the problem. The problem is related to the usage of features in the context of Rust workspace (and feature unification): when we build (or test) by calling
cargo
command from the root of the workspace, the features used are different that the ones used if we use the command from the crate directory. We actually were building tests and release binaries with unwanted features. We will think about how to solve this issue in the following days as no perfect solution seem to exist and probably create an ADR to set rules on how to use features in the future to avoid this pitfall -
During the team session, we discussed about:
- How to handle upgrades of the signer as smooth as possible when we reach
mainnet
:- We must limit the usage of the re-genesis of the certificate chain to the strict necessary
- When a new version of the signer is released we need to reach the quorum at least once per epoch. This means that we can't afford to have the signers split in 2 populations that would not be able to create multi signature
- We will adopt a strategy that is close to the one used by Cardano: the idea is to deploy silently a new "big" version that gets activated once the deployment of the version is high enough (a la hard fork). This means that we need to monitor the deployment by using for example the single signatures that are regularly sent to the aggregator
- We will use a transaction on chain that will be read by the signer nodes to proceed to a synchronous upgrade
- Also, we will work in order to provide backward compatibility for "small" model updates:
- We need to version all the messages exchanged (protocol version + agent version)
- We need to provide golden tests to make sure that we can handle previous versions of the models in the newer versions
- We have decided to postpone the work on issue Add Stake Shares in Certificate #636 as we are not completely ready to move forward on this subject
- How to handle upgrades of the signer as smooth as possible when we reach
-
We have reviewed and merged the PR:
- Fix clippy warnings from Rust 1.66.0 #657 the new version of Rust created warnings that prevented the CI from being successful
- Update dependencies #659 the fortnihtly update of dependencies of the repository
-
We have reviewed the final adjustements to the PR Optimize snapshot digest computation #652 and talked about the robustness of the timed tests if we compile them for release (where the optimization is less obvious on small files). It should be merged very shortly
-
We have talked about some CI improvements that we need to address:
- Find a way to optimize the use of the cache as we have a hard limit of
10GB
that is reached very often and that leads to higher computation delays of the Rust jobs - Find a way to add more tags to an existing Docker image on the registry instead of rebuilding them from scratch for Pre-Release and Release
- Find a way to optimize the use of the cache as we have a hard limit of
-
We have also created a new issue Delete test lab monitor #658 to clean the code base and to avoid having come build issues for some SPOs
-
Finally, we have released a new distribution
2250.1
💪
-
We have prepared the demo path of this iteration:
- Introduction
- Presentation of the optimization of the single signature (and why we need a re-genesis of the certificate chain)
- Showcase of the optimization of the snapshot digest computation
- Showcase of a protocol parameters transition on a
devnet
network - Presentation of the road map
- Conclusion/Next steps
- QA
-
We have also:
- Merged the PR Upgrade instances capacity infrastructure #656 that increases the memory of the instances VM running on the
testing-preview
andpre-release-preview
networks - Merged the PR Extract signer registration from multi-signer in Aggregator #655
- Reviewed the PR Optimize snapshot digest computation #652 which will be ready to be merged by end of week
- Reviewed the PR check API version #641 which will be merged by end of week
- Merged the PR Upgrade instances capacity infrastructure #656 that increases the memory of the instances VM running on the
-
We have prepared a pre-release for the next distribution:
2250.0-prerelease
. We have also made a re-genesis of thepre-relase-preview
network for which we should see new certificates produced tomorrow, as described in issue Re-genesis Mithril test networks #651 -
We have reviewed the following issues:
- Enforcement of API Protocol versions in Client/Signer/Aggregator #633: Some minor adjustments in progress and once done, it will be ready to be merged
-
Protocol parameters transition is not working #627: It has been merged and we will proceed to a test update of the protocol parameters on
testing-preview
soon - Optimize Snapshot Digest Computation #510: Some minor adjustments in progress and once done, it will be ready to be merged
-
Finally, we have paired on the issue Extract the signer registration from multi-signer #642. It is completed and t will be merged tomorrow. We had encountered some difficulties when working on the tests and it appears that the
mithril_common::crypto_helper::tests_setup::setup_signers
could probably be refactored in order to avoid them. We will pair on this subject while working on the issue Deactivate uncertified signer registration #621 for which tests added in #642 will break after merging and rebasing
- We have started working on moving toward mainnet. We have tried to assess the subjects that need to be addressed first:
- The storage of the keys & signatures is currently done with a hex encoding in the database stores (especially in the certificate chain) and in the messages exchanged by the nodes, and also in the Genesis verification key file for tests. We should be ready to handle multiple types of encoding in order to:
- Avoid breaking changes (e.g. not being able to validate the certificate chain after a change of encoding)
- Optimize the size of the data (e.g. the size of a certificate) (this should benchmarked)
- The solution that we have identified is to create a codec that would be able to:
- Serialize in the default (or a specific) encoding (which can evolve in the future)
- Deserialize the data by attempting to parse a list of maintained decoding formats
- Activate the Mithril nodes only when the attached Cardano node is (almost) fully synced (threshold to be determined). This will avoid unnecessary computations when they are not appropriate (e.g. compute stake distribution, snapshot digest and archive)
- Separate the objects used for communication between the nodes and the business objects they use
- We have also discussed about adaptations that will be needed in order to handle new types of certified data (not final):
- Associate a type to the certificates so that they can represent accurately certified data
- Make the signer sign
2
messages for each signing round (the next stake distribution and the message associated with the signing round) - Let the aggregator select which message it needs to aggregate first (the next stake distribution if it has not already created a certificate for the epoch, the message of the signing round otherwise). This could also be an efficient strategy in a decentralized context
- We will keep thinking on other features and we will also need to get a share of the iteration velocity dedicated to refactoring/technical debt
- The storage of the keys & signatures is currently done with a hex encoding in the database stores (especially in the certificate chain) and in the messages exchanged by the nodes, and also in the Genesis verification key file for tests. We should be ready to handle multiple types of encoding in order to:
-
We have reviewed the drafts implementations of:
-
We have merged the issue Remove VerificationKey and Stake from individual signature #619. As there are some breaking changes on the encoding of the multi-signatures, we are compelled to proceed to a re-genesis of the certificate chains of the Mithril networks:
- We have defined a short-term plan (to be reproduced whenever we have a re-genesis on the tests networks):
-
testing-preview
re-genesis has been done. New certificates should show up tomorrow -
pre-release-preview
re-genesis scheduled on Wednesday with new distribution pre-release. New certificates should be up on Thursday -
release-preprod
re-genesis scheduled on Friday with new distribution release. New certificates should be up on Sunday - Communications will be done with SPOs on the discord channel when we proceed to re-genesis of
pre-release-preview
andrelease-preprod
-
- We have also upgraded the version of
mithril-stm
to0.2.0
- We have also talked about how we could handle the breaking changes in
mithril-stm
in the future:- when working on test networks, we simply re-genesis the certificate chain
- when working on
mainnet
inbeta
version (when we have not reached a high enough adoption rate), we simply re-genesis the certificate chain - when working on
mainnet
: no more breaking changes, which means that the library should take care of handling compatibility as in other Cardano cryptographic libraries. The idea that we had to embed multiple versions of the library is not acceptable because of the high risk of embedding security vulnerabilities
- We have defined a short-term plan (to be reproduced whenever we have a re-genesis on the tests networks):
-
We also have paired on the Extract the signer registration from multi-signer #642. We have extracted the signer registration responsibility to a
Signer Registerer
module last week, which we have wired to the HTTP server and the state machine of the Aggregator. The last step will be to clean the multi-signer -
Our team session has mainly been dedicated to discussing about the Security Indicator of the certificates:
- Maybe we just need an "Unsafe" warning to be displayed in the UX (explorer and client) when the security is not full
- We could only rely on the percentage of stakes for this (as long as the full security protocol parameters are used)
- Using the signers list of the certificate might not be enough to guarantee security by checking that a well-known signer (or multiple) are listed. We could probably embed this list in the message that is signed, but this would only be interesting while we have not reached the 90% threshold of participation rate
- An important information is the adoption rate for which we could provide an evolution graph in the explorer
- Another idea, would be to have an external process (IOG hosted) that continuously checks the validity of the certificate chain produced by the aggregator, and in case of discrepancy with the actual Cardano chain, would revoke the genesis verification key used by clients to prevent them from restoring the snapshots
- We have agreed that we will add "Security" page to the documentation website that will explain how the ramp up (aka beta) phase on the
mainnet
will work and what security will be provided. We will dedicate a team session to the redaction of this page.
-
We have reviewed the code in progress and discussed about the issue Optimize Snapshot Digest Computation #510:
- We have decided to use a
CacheProvider
trait the will be responsible to provide cache of the immutable files given its (their)Immutable File Number
- This will allow us to provide the following implementations:
- In memory at first, for being able to provide a minimal working implementation (for testing and that could also be used in the Client)
- In memory with state stored in the SQLite database (for Signer and Aggregator nodes that already have a store)
- In memory with state stored in a file with JSON format (that could used in the Client)
- We still wonder how we can test the trait efficiently:
- Use a mock to test behavior of the digester
- Benchmark the time gained with/without cache
- Maybe both approaches should be implemented
- We have decided to use a
-
We have also prepared the issue Deactivate uncertified signer registration #621 by deploying tests SPOs on the
pre-release-preview
andrelease-preprod
that will be able to sign in2
epochs and that should thus be ready when we decommission the declarative signer registration
-
We have reviewed and merged the issue Add signature of binaries in the artifacts released #587. This was the last issue of the epic issue Implement Release process #500 that is now finalized 💪 🎉
-
We have continued pairing on the issue Extract the signer registration from multi-signer #642 and we will keep our pairing sessions on the issue Simplify the Multi Signer in Aggregator #398 next week
-
We have taken some time to debug the PR check API version #641 for which the test end to end is always failing
-
Finally we have started designing a consistent way of handling compatibility between the Mithril nodes:
- We want to deal as efficiently as possible with situations where:
- We are introducing breaking changes that make nodes versions incompatible (avoid them if backward compatibility is possible or provide a way to dodge them. This is critical as we will need to get a very high level of participation of SPOs in order to provide full security for the certificates and also to avoid epoch gaps in the certificate chain)
- We are introducing breaking changes that make validation of a part of the certificate chain impossible (new version of nodes would not be able to validate previously generated certificates and reciprocally)
- We will create an ADR once our design is final
- Here some ideas that have talked about:
- We could use multiple versions of the
mithril-stm
crate and switch to the correct version to proceed to the certificate verification depending on the version embedded in the certificate. This solution is interesting but has some caveats: it is a bit cumbersome and raise questions on how to handle security issues that would be fixed in recent versions only for example. We will probably try to PoC this solution soon. - We could use a shift mechanism that would activate versions later at a defined epoch transition: we would embed 2 versions (current + next) in the nodes and make an announcement to the SPOs that a new critical version must be installed before the epoch transition. This would give time to upgrade the signers and maximize our chances to avoid epoch gaps. This would also be a convenient way to prepare for new use cases that involve new types of data to certify. We will probably try to PoC this solution soon.
- We could use multiple versions of the
- We need to make some adjustments on the way we handle the detection of incompatible versions of the nodes:
- Our current
MITHRIL_API_VERSION
that is the OpenAPI specification version does not fully reflect incompatibility between nodes which can occur when the content of fields of the data exchanged are modified (e.g. in Optimize Snapshot Digest Computation #510 where the way digest are computed changes or in Remove VerificationKey and Stake from individual signature #619 where single and multi-signatures formats change) - We could extend the "meaning" of the
MITHRIL_API_VERSION
version that would be updated when:- OpenAPI specification is updated
- Encoding or values computation is modified
- Breaking changes in the certificate chain occur (such that a version of the node is not able to validate it completely)
- We could rely on the crates nodes versions to establish compatibility tables (e.g. this version of the aggregator is compatible with these versions of the signer node and these versions of the client node)
- We could also rely on a baked minimum version of the distribution acceptable for a given node (e.g. aggregator running
2248.1
is compatible with signer not older than2244
distribution) - Some drawbacks exist with all the solutions. Relying on the distribution looks interesting even though it will more work
- Our current
- We want to deal as efficiently as possible with situations where:
-
We have reviewed and merged the dev blog post that describes the release process in the PR Start blog post describing release process #533
-
We have paired on the issue Simplify the Multi Signer in Aggregator #398:
- Reviewed and merged the issue Extract the Certificate creation from the multi-signer #638
- Started working on the issue Extract the signer registration from multi-signer #642 on which we will continue pairing tomorrow
-
We have made a test usage of the manually triggered workflow that has just been merged Mithril Client multi-platform test. We have agreed that we would use this manual workflow at least once when a pre-release distribution is created, and whenever is needed by the ongoing developments (as it is possible to target a commit from any branch)
-
We have talked about how to handle the breaking changes of issue Remove VerificationKey and Stake from individual signature#619:
- The breaking changes require a re-genesis of the certificate chains of the
3
existing Mithril networks as soon as they are updated (which will not occur at the same time) - We will establish a short term plan in order to have a minimal impact and to communicate accordingly with the Pioneer SPOs
- This will be a good opportunity to structure a deployment plan that will be re-used when a re-genesis is required
- We will also organize a dedicated session in order to work on possible solutions to avoid/limit the re-genesis in the future
- The breaking changes require a re-genesis of the certificate chains of the
-
We have published a new distribution
2248.1
and we have also published the first version of themithril-stm
library oncrates.io
automatically with the CI/CD 💪 -
We have also created the first ticket associated to the issue Simplify the Multi Signer in Aggregator #398: issue Extract the Certificate creation from the multi-signer #638 for which we have paired and finished a PR that will be merged shortly. We will add new sub issues in the next days and keep our efforts on this simplification.
-
In order to finalize issue Implement Release process #500:
- We have reviewed and merged the issue Create manually triggered workflow to test Client binaries of all platforms (Windows, macOS, Linux) against testing-preview network #601
- We have reviewed issue Add signature of binaries in the artifacts released #587 that will be merged shortly
- We have also updated the PR Start blog post describing release process #533
- Once all of the issues/PR are closed, we will close issue #500
-
We also had followed a presentation of the
ΔQSD
paradigm for quality and started applying it to the Mithril protocol. We will keep working on this in the next weeks -
Finally, we have done some cleaning on the repository and deleted the stale branches
-
We have groomed the following issues ofor this iteration:
- Optimize Snapshot Digest Computation #510
- Enforcement of API Protocol versions in Client/Signer/Aggregator #633
- Compute Security Level in Mithril Explorer#513 Needs more refinements from Product/Research
- Add Stake Shares in Certificate #636
- Protocol parameters transition is not working #627
- Deactivate uncertified signer registration #621
-
We have reviewed and merged the following PRs:
- Fix Cardano bin download URL #635: A change of the download location for the cardano binaries that prevented the CI to work
- Update dependencies #634: An update done at the end of each iteration to use the latest versions of the dependencies of the project
-
We have created the pre-release version
2248.1-prerelease
for the2248
distribution. It has been qualified and under deployment a final2248.1
release has been created. It is under deployment as the GitHub actions are currently very slow -
During our team session, we have made a final review of the issue Prepare CIP/CPS for Mithril piggybacked on Cardano network #586. A draft PR Mithril Decentralized Network CIP #637 has been created and for which we are expecting feedback from the Cardano network team shortly. If we ask for SPOs to register their signers on the Cardano chain at each epoch it means that we need to find a way to incentive their contribution as well
-
We have also discussed about the Compute Security Level in Mithril Explorer #513:
- We will probably use pre-computed values for the Security Level of the multi-signatures as we are already using the full security parameters on the
testing-preview
network - We could use only the Mithril Stake Share in order to get a reliable Security Indicator (if the full security protocol parameters are used and use 0 if not)
- We have also mentioned that displaying the Pool Ids (and/or tickers) of the SPO that have signed a certificate could be a good way to leave the choice of trusting a certificate based on who signed it (at least during ramp up phase on the
mainnet
)
- We will probably use pre-computed values for the Security Level of the multi-signatures as we are already using the full security parameters on the
-
We have prepared the demo path of this iteration:
- Introduction
- Presentation of the first draft of the "CIP Mithril Decentralized Network"
- Showcase of the Store Automatic Migration second milestone for Signer and Aggregator
- Video demo of benchmark bootstrap of Daedalus on mainnet with/without Mithril
- Finalization/optimizations of the release process
- Announcement of deprecation of declarative Pool ID signer registration and next steps
- Conclusion/Next steps
- QA
-
We have prepared the pre-release of the next distribution:
2248.0-prerelease
. It is currently tested and should be released tomorrow -
We have also been working on the issue CI does not trigger for PR from forks #597. We are now able to run correctly the CI for a PR that comes from a fork. We agreed that it could be a good idea to separate the CI workflow in 2 parts and putting the Docker build/push and Terraform deployment steps in a new Testing workflow
-
We have created the following issues:
- Protocol parameters transition is not working #627: A bug is preventing correct transition when updating the protocol parameters
- Deactivate uncertified signer registration #621: Decommission of the deprecated declarative signer registration mode
-
We have discussed about the CI does not trigger for PR from forks #597 which is very tricky. We have decided to rollback the trigger of the artifacts recording, Docker registry, Terraform deployments on the CI only when there is a push on the main branch. In other cases, only the build and testing part will run. This means that we will have to create tags for new distributions on commits merged by collaborators of the repository. We will investigate further and try to find a better option. We aso have had many difficulties with the CI being very slow for the last few days with some delays of more than 2 hours
-
The issue Implement Mithril SPO on testing/pre-release environments #563 has been merged and some tests SPOs are being setup on the
testing-preview
network -
We have also reviewed and merged the PR make SQL entities to create their projection #625
-
We have paired on the issue Prepare CIP/CPS for Mithril piggybacked on Cardano network #586
- We made final adjustments of the lately redacted parts Abstract, Motivation, Specification/Overview, Rationale, Path to Active and Further Reading
- We had a meeting with researchers regarding the issue that we ave on achieving consensus on the signer registration:
- The best option that we have at this time is to make a transaction on chain to reach the consensus (for every signer registration at each epoch)
- We could probably have a KES like evolution mechanism for the Mithril keys in order to reduce the transaction frequency at once every few epochs
- Researchers will keep on reviewing our DIP draft and trying to find other solutions
-
We have reviewed the PR Add Mithril SPO on testing/pre-release environments #589 that will be ready to merge shortly after the documentation is updated. It will allow the creation/maintenance of SPOs on the Mithril test networks
-
We have reviewed and paired on the SQL automatic migration #600 that has been merged and will be embedded in the next distribution
2248
-
We have also reviewed and merged the Add versioning to documentation #555 issue that separates the documentation website in 2 separate versions (accessible via the drop-down top right menu on the website):
- Current version: that has been merged with the latest distribution
- Next version: the under construction version that will be shipped with the next distribution
-
We have paired on the CI does not trigger for PR from forks #597 for which we are still having some troubles with the management of the build caches. We will keep on investigating on this issue in the next days
-
We have continued working on the issue Prepare CIP/CPS for Mithril piggybacked on Cardano network #586 which is close to get in a decent first draft status. In the next days, we will:
- Make a full review of the document
- Enhance the schema overview to make it closer to the final specifications
- Enhance the description of the handling of the several aggregators certificate chains (regarding the genesis certificate) in this decentralized setup
- Work on dedicated sessions with researchers in order to find answers and solutions to the signer registration consensus problem that we have identified
-
Regarding the publication of the
mithril-stm
crypto library tocrates.io
, we will proceed as follow:- First publish the crate with a crates.io
API Token
from Inigo - He will then invite other members of the team as co-owners of the crate
- Finally, a team will be created in the IOHG GitHub organization that will also be added as owner of the crate (name of the team to be confirmed, e.g.
Core
,Crypto
,Rust
,Mithril
, and will depend on the strategy defined regarding grouping of the published crates)
- First publish the crate with a crates.io
-
We have talked about the issue CI does not trigger for PR from forks #597. We will probably have to trigger the CI only when a PR is created/updated/merged in order to avoid duplicate triggers. We need to make sure that this is not a problem when we retrieve the produced artifacts from other workflows. We will conduct some tests on that matter in the following days
-
We have paired on the issue SQL automatic migration #600 and the associated PR should be ready to merge shortly
-
We have merged the following PRs:
-
STM Readme update #616: This makes the publication to crates.io ready. We just miss the
API_TOKEN
in order to create the first publication - Deprecate uncertified signer registration #617: The stable mode of registration of signers is now the Certified Pool Id mode. We will decommission the deprecated declarative mode in a couple of weeks (see issue Deactivate uncertified signer registration #621)
-
Update 'testing-preview' protocol parameters #618: The
testing-preview
environment now uses the full security parameters (which will be activated in2
epochs)
-
STM Readme update #616: This makes the publication to crates.io ready. We just miss the
-
Finally, we have paired on the Prepare CIP/CPS for Mithril piggybacked on Cardano network #586:
- We have reworked all the min protocols to follow the formalism of the Shelley Networking Protocol
- We have identified a difficulty with the consensus that needs to be reached on the verification keys of the signers when we broadcast the signer registration. We will work on this subject with researchers in the next days to try to find a solution
- In the mean time, we will complete the redaction of the first draft of the CIP tomorrow in a dedicated session
- We will also have to create a Mithril CIP in the next future as in CIP-0035. It will commit our team to be fully part of the CIP process
-
We have merged the PR Fix KES period verification #609 which narrows the range of KES Period verification when a signer registers. This closes the issue Signer registration fails with key certification mode #548. The next step is to deprecate the unverified signer registration as detailed in Deprecate Signer Declarative Pool Id mode #585. After a few weeks period we will decommission this mode of registration
-
We have reviewed the PR Greg/600/database migration #611. There are still some modifications that need to be addressed such as using a separate version mechanism from the one used by the application itself in order to be compatible with the life cycle of the nodes versions. We will pair on this early next week
-
We have continued pairing on the issue Prepare CIP/CPS for Mithril piggybacked on Cardano network #586. We have completed a first version of the
Mithril Signer Registration Protocol
specification. We will continue with multiple pairing sessions in order duplicates this on the other mini protocols that need to be specified, as well as on the Motivation, Rationale and Path to Active sections
-
We have merged the following PRs:
-
Deployment to crates.io #610:
- We just need to update the final
API_TOKEN
in the GitHub secrets once we receive it - We will wait for a cleanup of the README file of the
mithril-stm
crates (akamithril-core
) before activating the publication to crates.io - When publication tie has come, we will remove the
--dry-run
argument in the publish step of the Pre-release workflow
- We just need to update the final
-
Add Daedalus/Mithril benchmark video #614 that adds the YouTube video of the benchmark we have done on the
mainnet
with/without Mihtril. It is accessible on the Bootstrap a Cardano Node guide of the documentation website
-
Deployment to crates.io #610:
-
We have paired on the issue Add nodes/libraries versions matrix in releases #599 and we have merged the PR Produce versions table in Release description #612 that will add a version table in the release description automatically
-
Finally we have paired on the issue Prepare CIP/CPS for Mithril piggybacked on Cardano network #586:
- We have carefully reviewed the
Mithril Signer Protocol
part and have made some refinements on it ⚠️ We have identified a tricky issue regarding the signer registration for which we need to find a consensus among the nodes. In order to do so, we could probably use the slot leader to certify (with its VRF keys) the list of signers registered to Mithril for an epoch- We have also scheduled a new session tomorrow dedicated at finalizing the specifications of this mini protocol
- We have carefully reviewed the
-
We have merged a quick fix on Store migration process does not accept a newer version #603 that as blocking the CI. It simply deactivates the panic that occurs when version mismatch is detected. The real fix will come with the issue SQL automatic migration #600
-
We have also paired on the issue Activate deployment to crates.io #588 for which:
- We have pushed the PR Deployment to crates.io #610 that should be merged shortly
- We are waiting for the API token of the
crates.io
account of IOG that will be used to deploy. In the mean time, we have kept adry-run
version of the publication step in the Release workflow
-
Finally, we have paired on the issue Prepare CIP/CPS for Mithril piggybacked on Cardano network #586:
- We made a full review of the CIP
- We have agreed that a light summary of the protocol should be added at the beginning of the CIP
- We still have to properly design the bootstrap of the certificate chain for an aggregator in this decentralized context
- In order to complete the work during this iteration and to get a first clean version:
- We will all re-read the document prior to new pairing sessions
- We will schedule 3 other pairing sessions dedicated at that CIP in the following days
- We have discussed and paired on some bugs:
- Computation of Stake Distribution is computed twice during Signer registration #596: It has been fixed and merged
- CI does not trigger for PR from forks #597:This issue is a little bit trickier than what we expected as it also has security implications. We have created a dummy PR from a fork Remove 'clippy' file #605 and have made some experiments in order to prepare a plan for fixing the problem. We will continue to work on the problem in the following days
- Store migration process does not accept a newer version #603: This issue require that we make some adjustments on the way we handle database upgrades. We will rollback to a separate version for the nodes and the database. We will concentrate our efforts on this issue as it is blocking on the CI
-
We have sliced the tickets of this iteration
-
We have talked about the issue SQL automatic migration #600 for which we will need to embed the actual SQL upgrade files. For this we will probably make use of a macro such as
include_bytes
-
We have discussed about he next steps for the issue Simplify the Multi Signer in Aggregator #398 and about some possible enhancements in the test setup functions from the crypto helper so that they can provide a simpler usage in the integration tests
-
We have also talked about the bug CI does not trigger for PR from forks #597 that might be trickier than what we expected. We will pair on tomorrow in order to understand what is the bets way to fix the problem.
-
A new bug has been created Store migration process does not accept a newer version #603 that should be fixed shortly
-
Finally, these PR have been merged:
- Enhance Mithril networks infra #584, also the environments have been migrated to handle the associated breaking change in the terraform deployment
- Update dependencies #602
-
We have prepared the demo path of this iteration:
- Introduction
- Showcase of the Store Automatic Migration first milestone for Signer and Aggregator
- Showcase of the enhancements of the Explorer
- Showcase of live release of the
2246.1
distribution - Conclusion
- QA
-
Showcase path of the
Live release of the 2246.1 distribution
:
# Demo: Release distribution `2246.1`
## Open pre-release page
google-chrome https://github.com/input-output-hk/mithril/releases/tag/2246.1-prerelease
## Switch to main branch
git switch main
git fetch
git pull --rebase
## Show tag on repository
git log --oneline
## Create final tag
git tag -s 2246.1 0bff212a767399b01aef152e27782a7e7ba934f2 -m "2246.1 release"
## Show tag on repository
git log --oneline
## Push the final tag
git push origin 2246.1
## Open Pre-lease Workflow
google-chrome https://github.com/input-output-hk/mithril/actions/workflows/pre-release.yml
## Open release page
### Generate release notes
### Uncheck "Set as a pre-release"
### Check "Set as the latest release"
google-chrome https://github.com/input-output-hk/mithril/releases/tag/2246.1
## Open Release workflow
google-chrome https://github.com/input-output-hk/mithril/actions/workflows/release.yml
- We have also reviewed the issues of the current iteration and prepared work for the next iteration. We have created a new issue Create manually triggered workflow to test Client binaries of all platforms (Windows, macOS, Linux) against testing-preview network #601 that relates to the epic Implement Release process #500
-
We have created few tickets, some of which are bugs:
-
We have created pre-releases for a new distribution
2246
:-
2246.0-prerelease
: This was missing update of the versions of the modified nodes -
2246.1-prerelease
: This pre-release is under qualification and should be released tomorrow 💪
-
-
We have paired and merged the PR database migration framework #571 that implements database version update detection and that closes issue Implement stores migration process #562 🎉
-
Finally, we have continued our pairing effort on the elaboration of the CIP for piggybacking Mithril nodes on the Cardano node network layer
-
We have merged the PR add API version in HTTP headers #566 that closes the issue API version #565. The next step is to enforce the compatibility of the nodes and as for update when an incompatibility is detected
-
We have reviewed the final modifications of the PR database migration framework #571 that should be merged shorty. Once this is done, we will work on the automatic upgrade of the stores of the nodes.
-
We also have reviewed, requested some modifications and merged this PR More refined list of pre-reqs #591 coming from the community
-
Following many comments, and some confusion that we have noticed on the discord channel regarding the configuration of the nodes for the several environments, we have merged this PR Enhance Mithril Networks documentation #593 which goal is to provide clear section for the configuration in every guide that requires it. This section is now centralized to provide up to date information efficiently. Also, we have removed all the mentions to the now decommissioned previous infrastructure that used to be accessible on the
https://aggregator.api.mithril.network/aggregator
endpoint -
Finally, we have merged the PR Upgrade to Cardano 1.35.4 #595 that uses the latest stable version of the cardano node as the previous
1.35.3
will not be working any more by November 16th
-
We have talked about issue Implement stores migration process #562 and reviewed the PR database migration framework #571. We have decided to align the version number of the database to the version number of the node. The auto upgrade mechanism will be:
- Check if the version of the node has changed (from previously recorded in the database state)
- If the version has changed, select the ordered list of upgrade files that need to be applied to the node
- For each of these files (associated to a version):
- Apply the upgrade file (first file)
- If upgrade went OK, check the upgraded database (second file)
- If upgrade is checked successfully, record the updated version to the database
- Once all the upgrades have been applied, record the current version of the application and the last updated date
- There are 2 special cases:
- Table creation, for which a first upgrade will be a
CREATE IF NOT EXISTS
query - If list of upgrades to apply includes a version lower that the currently recorded version, for which a panic and error message should happen
- Table creation, for which a first upgrade will be a
-
We have also paired on the issue API version #565 for which we have added the
Mithril API Version
in the headers of the calls made to the Aggregator from the Signer/Client -
Finally, we have continued pairing on the issue Prepare CIP/CPS for Mithril piggybacked on Cardano network #586. We will do another dedicated session with the whole team this week
-
We have reviewed the following PRs:
- Fix explorer state reload #592 has been merged and introduces new unit tests for the explorer
- Decommission legacy infra #578 has been reviewed and will be merged next week. At the same time the previous test infrastructure for Mithril will be destroyed
- database migration framework #571 has been reviewed and will e ready to merge next week
-
We have paired on the issue Simplify the Multi Signer in Aggregator #398and almost finished the refactoring of the Certificate production out of the multi-signer. We will keep our work on it in the following days
-
Finally, we have paired on the Prepare CIP/CPS for Mithril piggybacked on Cardano network #586 and continued redacting the CIP. A draft is available here. We have scheduled 2 dedicated sessions next week to continue our work.
-
We have reviewed, paired on a bug and merged the PR Enhance explorer aggregator selection #590 that closes issue Provide a 'copy' button for the aggregator URL in explorer #576 and also bring some enhancements of the Mithril Explorer like display of the epoch settings as well as the usage of redux storage that simplifies the code
-
We have also paired on the issue Simplify the Multi Signer in Aggregator #398 and worked on a first step: remove the responsibility of producing certificates form the multi-signer published on the branch simplify_multi_signer
-
We have also been experimenting with:
- Running a test network on
mainnet
in order to evaluate the path to being live on themainnet
- Producing multi-signatures with full security parameters (k=2422, m=20973, f=0.2) on a
devnet
- Running a test network on
-
We have prepared the tickets for the current iteration:
-
We have reviewed the enhancements of the Mithril Explorer of issue Provide a 'copy' button for the aggregator URL in explorer #576. A nice feature to have is also to be able to open the explorer to a specific Aggregator. We need to investigate if there exists security concern regarding this feature (or if it is problem to make the explorer available to potentially adversaries Aggregators)
-
We will resume our work on the issue Simplify the Multi Signer in Aggregator #398 tomorrow with a dedicated session
-
We have also started working on the issue during the team session Prepare CIP/CPS for Mithril piggybacked on Cardano network #586. A first draft of the CIP is available on the wiki. We will keep iterating on it this week and next week as well.
-
We have reviewed the work in progress on the Mithril explorer for issue Provide a 'copy' button for the aggregator URL in explorer #576. The associated PR should be ready to merge shortly
-
We have identified some problems on the
testing-preview
andpre-release-preview
networks that were not producing snapshot for epoch10
. Apparently some problems may exist in the fast bootstrap genesis tools/process. We are investigating the problem. In the mean time we have:- Reset the
testing-preview
network with fast bootstrap genesis: new certificates are produced and no epoch gap with protocol initializers/verification keys exist in the databases of the signer and aggregator nodes. We will see if the problem occurs again in the following epochs - Re-genesis the
pre-release-preview
network (as fast genesis is not possible anymore once new signers have registered). New certificates should be produced in the next epoch
- Reset the
-
Following the release of Rust
1.65.0
someclippy
warnings occurred in the CI and were blocking the process. We have paired to apply a fix for these warnings in Update rust dependencies #583
-
The first distribution of Mithril has been released 2244.0 🚀 🎉 💪
-
We have paired and merged the PR Add Debian packaging to CI #579 producing the debian packages for the installation of the Linux binaries in the CI. We will adjust the documentation to make this installation the preferred installation type for Mithril nodes
-
We have worked on the demo path of this iteration:
- Introduction
- Showcase of the new release process and of the first Mithril distribution
- Presentation of single signature without Merkle path
- Conclusion
- Q&A
-
Showcase path of the
new release process and of the first Mithril distribution
:
# Demo: Bootstrap a Cardano node from a preprod Mithril snapshot with latest Client distribution
## Download binary
rm -f mithril-client
wget https://github.com/input-output-hk/mithril/releases/download/2244.0/mithril-client_0.1.0.12bb705_amd64.deb
sudo dpkg -x mithril-client_0.1.0.12bb705_amd64.deb .
sudo mv usr/bin/mithril-client ./mithril-client
## Test installation
./mithril-client
./mithril-client --version
## Get Latest Snapshot Digest
export NETWORK=preprod
export AGGREGATOR_ENDPOINT=https://aggregator.release-preprod.api.mithril.network/aggregator
export GENESIS_VERIFICATION_KEY=$(wget -q -O - https://raw.githubusercontent.com/input-output-hk/mithril/main/TEST_ONLY_genesis.vkey)
SNAPSHOT_DIGEST=$(curl -s $AGGREGATOR_ENDPOINT/snapshots | jq -r '.[0].digest')
echo $SNAPSHOT_DIGEST
## List Snapshots
./mithril-client list
## Show Latest Snapshot
./mithril-client show $SNAPSHOT_DIGEST
## Download Latest Snapshot
./mithril-client download $SNAPSHOT_DIGEST
## Restore Latest Snapshot
./mithril-client restore $SNAPSHOT_DIGEST
## Launch a Cardano Node
docker run -v $(pwd)/ipc:/ipc -v cardano-node-data:/data --mount type=bind,source="$(pwd)/data/preprod/$SNAPSHOT_DIGEST/db",target=/data/db/ -e NETWORK=preprod inputoutput/cardano-node:1.35.3-configs
## Query tip of the chain
watch -n 1 "sudo CARDANO_NODE_SOCKET_PATH=./ipc/node.socket ./cardano-cli query tip --cardano-mode --testnet-magic 1 | jq ."
-
We have paired on fixing the tests not working with the PR Single signature without merkle path #484. The PR is now merged, and the
release-preprod
environment has been accordingly re-genesis (as the AVK format is not anymore compatible) 🎉 -
We have merged the PR Activate new Mithril networks #577 that activates the new Mithril networks for each workflow of the new release process:
Mithril Network | Workflow |
---|---|
testing-preview |
CI |
pre-release-preview |
Pre-Release |
release-preprod |
Release |
- Tomorrow we will create the first distribution release of the repository. We ave discussed about this first release and nice to have features to implement shortly in the distribution:
- Debian package
- GPG signature of the binaries
- Better handling of Docker artifacts re-tagging
- Manual testing of Client artifacts for macOS and Windows platforms
-
We have worked toward releasing the new
release-preprod
environment:- ✔️ Deprecate current aggregator: it will not be updated anymore when some branches are merged on main
- ✔️ Use
release-preprod
as the new environment that is deployed when branches are merged on main (temporarily, until the newpreview
cardano testnet is re-spun) - ❌ Merge breaking changes of mithril-core in the PR Single signature without merkle path #484. A blocking issue forced us to postpone the merge until a fix is implemented.
- ⌚ Fast re-genesis the aggregator of release-preprod (<30 min). Will be done after the #484 merge.
- ⌚ Communicate with SPOs on discord and dev blog about the new & deprecated environments. A blog post has been created and is under review in PR New environments documentation #575
-
We have paired on fixing the tests of the Aggregator of the PR Single signature without merkle path #484. We did not succeed, but we found out that there is probably an issue with the registration. We will keep on investigating this problem.
-
We have also discussed about how we could test that the macOS and Windows Client builds are running correctly when connected to an Aggregator that runs on Linux. We think that a good option is to create manually triggered complimentary pipelines. We will try to investigate this shortly.
-
Finally, we have reviewed the work in progress on the issue Implement stores migration process #562
-
We have reviewed and merged the following PRs:
-
We have discussed about the issue Implement Release process #500:
- Some fixes/optimizations will be addressed shortly on the CI pipeline
- We will continue our efforts on deploying the new
release-preprod
environment that should be up and running by tomorrow 💪
-
We have discussed about the issue Implement stores migration process #562:
- This issue is closely linked to Move stores to relational design with SQLite #476. We will start working on it once #562 is completed
- We have agreed that it would be easier to release a first version of the system that is already handling the migration steps described by sequential migration files
-
We have reviewed, paired and merged the PR Adapt ci workflow to Release Process #557 🎉:
- There is small bug regarding naming of artifacs
- We still need to have the CI append the commit sha in the versions of the cargo.toml files
- We need to find a way to reuse docker artifacts between pipelines
- We must make some tests on the macOS and Windows client binaries to make sure they are working properly
-
The following PRs have been also merged:
-
Finally, we have decided to spin-up the
release-preprod
environment at the EOW:- After merge of the breaking change PR Single signature without merkle path #484 or re-spin after this merge)
- Temporarily implement it in the CI pipeline (and then moving it to the Release pipeline when it is released)
- Communicate with the SPOs on discord and dev blog:
- Explain that the current Aggregator running on
preview
is deprecated and will be decommissioned Nov, 1st - Explain that they need to move their Signer nodes to
release-preprod
environment which will be thestable
environment - Encourage them to also have a Signer node running on the
pre-release-preview
environment to keep participating in the testing effort
- Explain that the current Aggregator running on
-
We have paired on fixing partially the issue Signer registration fails with key certification mode #548:
- We will merge shortly the PR Fix KES key update #569
- Once merged, we will make sure that the certification works as expected and that some Signers (that would have been recompiled) will show a verification on a KES Period strictly greater than
0
- Then we will reduce the range of KES Period verification to
[current_period - start_period - 1,current_period - start_period + 1]
- Later the
KES Agent
of the Cardano node will take care of signing the Operational Certificate with the correctly evolved KES Secret Key
-
We have discussed about the Mithril network API version and we have stated that it should be the version of the OpenAPI specification. This will be the only version given by request/response headers of the Client/Signer/Aggregator nodes. We will continue by enforcing semver compatibility and return a
HTTP 406
error for example in case of incompatible versions -
Finally, we have continued working on the issue Implement Release process #500:
-
As the
preview
Cardano network will be re-spin next week (November, 1st), we will:- Add a deployment environment
release-preprod
temporarily on theCI
workflow - Communicate with the SPOs so that they run their test Signer nodes on this new environment
- Add a deployment environment
-
We have prepared the new tickets of the iteration:
-
The PR ADR of the release process #556 has been reviewed and merged. The ADR is available at https://mithril.network/doc/adr/3
-
We have paired on the sub issues of Implement Release process #500:
-
We have also paired on the issue API version #565 in order to add the communication protocol version to response headers of the Aggregator
-
We have fixed the issue with
cargo sort
that was crashing the CI with PR Cargo update sort and dependencies #558 -
We have also reviewed and merged the PR Add version information #553
-
The issue that we have with the Signer registration (as in issue Signer registration fails with key certification mode #548) seems to be related to the fact that KES Secret Keys evolves in memory. This explains why we can verify the signature only with a
0
value for the KES Period. In order to fix the problem some solutions exist:- Compute the correct KES Period when doing the signature of the Mithril Verification Key (
current_period - start_period
,current_period
given by the cardano cli andstart_period
given by the Operational Certificate). We will pair on this solution next week - Update the cardano cli so that it computes the signature with the in memory KES Secret Key. We expect an estimate from the Cardano node team for this feature
- Compute the correct KES Period when doing the signature of the Mithril Verification Key (
-
We have worked on the demo path of this iteration:
- Introduction
- Presentation of the results of the SPO certification on the hosted Aggregator
- Presentation of the release process updated
- Showcase of the CI/CD Workflows: Testing -> Pre-Release -> Release
- Showcase of the bootstrapping of a deployment environment on preview
- Conclusion
- Q&A
-
Showcase path of the
bootstrapping of a deployment environment on preview
:
# Mithril Bootstrap Deployment Environment
# On preview network
---
# Setup demo
## Download source (if needed)
git clone https://github.com/input-output-hk/mithril.git
#or
git clone git@github.com:input-output-hk/mithril.git
---
# Demo: Bootstrap Deployment Environment
## Change directory
cd mithril/mithril-infra
## Setup environment variables
DEPLOY_ENVIRONMENT=demo-preview
API_DOMAIN=api.mithril.network
## Setup terraform variables
cat > env.$DEPLOY_ENVIRONMENT.tfvars << EOF
environment_prefix = "demo"
environment_suffix = ""
cardano_network = "preview"
google_project = "mithril-test-365514"
google_region = "europe-west1"
google_zone = "europe-west1-b"
google_machine_type = "e2-medium"
google_service_credentials_json = "../gcp-credentials.json"
google_application_credentials_json = ""
mithril_api_domain = "$API_DOMAIN"
mithril_image_id = "latest"
mithril_genesis_verification_key_url = "https://raw.githubusercontent.com/input-output-hk/mithril/main/TEST_ONLY_genesis.vkey"
mithril_genesis_secret_key = ""
mithril_signers = {
"1" = {
pool_id = "pool15qde6mnkc0jgycm69ua0grwxmmu0tke54h5uhml0j8ndw3kcu9x",
},
"2" = {
pool_id = "pool10g0tvpyc3phkym8r6hamdulyzd6shzjldpahyvdkljl7ur2adfe",
}
}
EOF
## Create & init terraform workspace
terraform workspace new $DEPLOY_ENVIRONMENT
terraform init
## Plan terraform deployment
terraform plan --var-file=env.$DEPLOY_ENVIRONMENT.tfvars
## Apply terraform deployment
terraform apply --var-file=env.$DEPLOY_ENVIRONMENT.tfvars
## Connect to VM and list docker containers
ssh curry@aggregator.demo-preview.$API_DOMAIN -- docker ps
ssh curry@aggregator.demo-preview.$API_DOMAIN -- tree /home/curry/data
## Query aggregator REST API
curl -sk https://aggregator.demo-preview.$API_DOMAIN/aggregator/epoch-settings | jq .
watch -n 1 "curl -sk https://aggregator.demo-preview.$API_DOMAIN/aggregator/epoch-settings | jq ."
## Destroy terraform deployment
terraform destroy --var-file=env.$DEPLOY_ENVIRONMENT.tfvars
ssh-keygen -f "/home/jp/.ssh/known_hosts" -R "aggregator.demo-preview.$API_DOMAIN"
rm -f env.$DEPLOY_ENVIRONMENT.tfvars
rm -rf .terraform
rm -rf terraform.tfstate.d
rm -f .terraform.lock.hcl
-
We have discussed about how we could implement a differential download system for immutable files:
- It would allow to download a specific range of immutable files
- Parallelization would be easy to implement for snapshot chunks download, verify and restore
- In this setup, we would only sign the penultimate immutable file instead of the whole immutable folder
- We would also need to add a snapshot retrieve route by immutable file number
-
We have also paired on the issue Implement Release process #500:
- Conceptualizing and formalizing the case of
hotfix
for a release (added in the ADR of the release process #556) - Implementing multi target compilation of the nodes: the Client binaries will be available for Linux, macOS and Windows and the Signer for Linux and macOS(in the PR Adapt ci workflow to Release Process #557)
- Stabilizing the deployment environments of issue Setup new hosted environments for testing-preview, pre-release-preview and release-preprod) with their terraform and GitHub environments #542
- Conceptualizing and formalizing the case of
-
We have discussed and paired on the issue Implement Release process #500:
- Following our previous discussions, we have decided all the details regarding the handling of versions
- The decisions have been gathered in a new ADR, waiting for review in the PR ADR of the release process #556
- We have agreed on:
- Working on a distribution release that will package all the artifacts produced for the nodes/libraries
- Each node will have its own version
- A communication protocol version will be introduced to handle compatibility between nodes
- The CI will automatically append the hash of the commit for which the artifacts are being produced. This will allow a full artifacts promotion flow
- We will try to sign the releases with GPG e.g.
-
We have also talked about:
- Issue Setup new hosted environments for testing-preview, pre-release-preview and release-preprod) with their terraform and GitHub environments #542: Work is in progress, should be completed shortly with ful
terraform
environments - Issue Simplify the Multi Signer in Aggregator #398: We will resume work on this issue shortly as there are no breaking changes planned on the multi signer currently
- Issue Setup new hosted environments for testing-preview, pre-release-preview and release-preprod) with their terraform and GitHub environments #542: Work is in progress, should be completed shortly with ful
-
We had talks about issue Get/Show current version on Mithril nodes cli / APIs #541. We agreed that:
- The Client/Signer nodes should expose the version they run in the headers when requesting the Aggregator
- The Aggregator node should expose the version it runs in a header when it is called
- We could implement a version check system that returns an error message stating an update is required if the Aggregator version is not compatible with the Client/Signer version
-
We also discussed about the issue Implement Release process #500:
- Adapt CI workflows to work with the new release process #543: In progress, has been tested in a temporary repository and new workflows will be added soon
- A first task of extracting the documentation generation in a separate workflow is in progress
- We think that we may need to handle the documentation a bit differently than the rest of the process:
- We need to produce new dev blog posts without releasing a new version
- We could use the versioning feature of
docusaurus
and publish pre-release/release versions on the same website - This would require some manual operations on developers end
- There are advantages and drawbacks on this approach. Well keep on improving the design of this part of the process
-
We have investigated the case of a SPO which was unable to get the
Verified Signer
badge. It appears that his pool Id was spoofed by one of the Signer nodes running on the GCP platform. We have fixed and merged the PR Fix mithril infra configuration #554 -
We have talked more about the release process during the team session:
- Regarding the versions management of the versions, we have worked on several ideas:
- We could have an hybrid version where the major+minor would be handled by the
cargo.toml
version and the patch would be handled remotely in a "directory" of all versions - We could also handle the full version in a remote directory
- We could package the version in an external file dedicated to the versioning and that would be embedded in the GitHub package
- Another idea that looks simpler is to add a patch identifier that reflects the commit id like in
-{COMMIT_SHA1}
for example
- We could have an hybrid version where the major+minor would be handled by the
- Regarding the documentation website:
- There will be only one version of the website that is deployed when a merge occurs on the main branch
- The website will support 2 versions:
current
andnext
- We will create a commit post release that will update the
current
andnext
versions and also the versions in thecargo.toml
file(s)
- Regarding the versions management of the versions, we have worked on several ideas:
We had an introductory call today with Alex and the Mithril team. After some presentations, we went through the current state of Mithril and the short term roadmap, emphasizing our current target is to address the specific need of fast bootstrap of a full node.
Alex asked some questions about the roadmap:
- What do we think of distributing data using "alternative" to HTTP?
We think this is a good idea, we made room for it in the snapshot's schema, and we did not tackle it for want of time and because it seems something that can be contributed later
- What's the plan for deploying to mainnet and how much stake do you need?
We don't know exactly yet, one idea would be to grade the signatures according to the amount of stake while we ramp up. Beside, signers are known so that's also a possible source of trust
- How about speeding up client's state reconstruction process through some form of indexing (eg. think SPV + Bloom filter)?
That's something we explored briefly in the initial prototype phase. We want to make mithril "extensible" in the sense that SPO could sign various artifacts beside the node's db, which could make this feature possible
- What about the use case of a node/wallet catching up on a few months of activity?
Right now we "naively" sign and store full snapshots but obviously we want to chunk those for download and snapshot signing performance reasons
We agreed on these follow-up items:
- Answer any question Alex has on the dedicated discord channel (#moria of course)
- Alex is most welcomed to attend the bi-weekly demo/Q&A session.
If need be we are comfortable with the idea of "Mithril Office Hours" on a weekly basis should the community feels a need for it
- @Reza will be main contact point with the team when it comes to discussing features and roadmap
-
Following the release of the experimental certified signers mode, we can now see some green badges next to the verified
PoolIds
in the certificates of the Explorer 🎉 -
We have discussed about the issue Get/Show current version on Mithril nodes cli / APIs#541:
- The node will display its version when launched
- We will add a
version
command on the CLIs that will output the running version - We will add headers with versions in the Signer and Client requests as well as in the Aggregator responses
-
We have talked and paired on the Adapt CI workflows to work with the new release process#543. We have made a PoC of the pre-release pipeline in order to test that:
- We catch the correct triggers ✔️
- We can retrieve artifacts produced in a previous/different workflow run ✔️
- We can produce GitHub releases from the workflows run ✔️
-
During our discussions, we have talked about how to handle:
- Adding new information that are part of the signed message (as Signers will probably not all upgrade at the same time). In that case will it be possible to produce signatures in that conditions ?
- A solution could be to type certificates depending on what information is signed and to chain only the ones that embed the next stake distribution
- We will probably have the same issue when we upgrade protocol versions that are not backward compatible
- We had talks and paired on the issue Implement Release process#500:
- The process of artifacts promotion required some clarifications:
- Each commit triggers a first
CI
workflow that builds artifacts and deploys totesting
environment - Each git tag triggers a second
Pre-Release
workflow that promotes artifacts topre-release
environment and also creates the associated GitHub release (same name, inpre-release
status) - When the release candidate is validated, the
pre release
status is removed from the GitHub release and that promotes artifacts torelease
environment
- Each commit triggers a first
- We have tried to define a process regarding when/how to update the versions of the crates:
- One version for the workspace and one different version for
mithril-core
- Just after releasing a version
0.1.2
:- We commit a new
0.1.3-dev
version until we are happy with a release candidate (tagged0.1.3-rcX
) - When we are ready to release a candidate, we update the version to
0.1.3
and we tag it as0.1.3
(and re test it) - We then release version
0.1.3
and we start all over again this process
- We commit a new
- One version for the workspace and one different version for
- The process of artifacts promotion required some clarifications:
-
Following the activation of the experimental Signer certified registration, SPOs have reported troubles with their nodes:
- Issue Unhelpful Log message#546: The error messages provided were not helpful to the users. We paired on improving them by a giving a detailed feedback on the bad request status code from the Aggregator
- Issue Signer registration fails with key certification mode#548: Signer trying to register by using the certification mode fails because the
KES Signature
can't be verified. This is still under investigation as the underlying cryptography is complex. In the mean time, we have paired and merged a temporary fix that tries all the possibleKES Period
values: the Signer are now registered. We expect them to be able to sign the snapshots in2
epochs (rebuild of their nodes is however mandatory)
-
We have noticed some warning messages in the CI jobs and have created the issue Update workflow github actions#550. A first PR will be merged shortly to update a first part of the GitHub actions. We will keep an eye on the other actions to be updated as soon as updates are released
-
We have also merged the PR remove SQL migration tool #540 which decommission the data stores migration tools of the Aggregator and the Signer
-
The PR New STM registration procedure #433 has finally been merged 🚀. A data store upgrade has been produced as well as some explanation about the process in a dev blog post. We are monitoring the GCP network and expect feedbacks from the community soon
-
We have discussed and sliced the first tasks to be done on the new release process as in issue Implement Release process #500:
-
We also had talks about:
- The optimizations of the Docker images in issue Optimize Docker CI images #318:
- We have recreated an ad hoc builder for the
devnet
images (and thus got rid of the legacylibssl1.1
dependency) - We have aligned all the source images from
ubuntu:latest
toubuntu:22.04
- We have recreated an ad hoc builder for the
- Serialization/Deserialization of the keys in the
entities
models:- We will try to implement automatic serialization/deserialization either with
serde
annotations or with a custom behavior - We may have to take care of log verbosity and implementation of custom display traits
- We will try to implement automatic serialization/deserialization either with
- The optimizations of the Docker images in issue Optimize Docker CI images #318:
- Key certification
We need to compute the KES range -> need to pass the KES period
- compute range for KES period from genesis parameters
- Pb: we don't know what's really useful on mainnet registration process requires each signer to know the key of every other signer
- write one or more CIPs for "Mithril Decentralisation"?
- Mithril networking CIP
- key registration process
- What about multi-pool runners? -> no need to take care of
- Signer deployment -> which deployment model? Mithril Deployment Model CIP? RFC?
- TODO:
- draft something for each "CIP" -> 2 pager (A3) respecting somewhat the structure of a CIP
- check with CIP process "guardians" whether or not they would fit -> Michael, Matthias a. if OK -> write the full thing b. if NOK -> turn into a GH discussion -> invite people from Community + IOG to react/comment/propose
- Use A3 format ?
- Edinburgh:
- JP -> presentation 15'
- Iñigo -> support + Q/A
- Arnaud -> test demo w/ daedalus
- Deadline EOW Slide deck for the talk
- Reza + Arnaud -> presentation 1-slide for CH keynote + Slide deck
-
We had discussions about the
rename
attribute inserde
annotations. It was used in almost all of the entities fields even though they had the same as the JSON version. The redundant annotations have been removed in the PR Enhance serde annotations #538 that has been merged -
We have reviewed some build time analysis that have been produced in order to understand the bottleneck of the build step of the CI. We didn't find any interesting information and we think that it is maybe due some cache loading issue within the CI. We will continue our investigations
-
We have reviewed the modifications done to the PR New STM registration procedure #433. We are pairing on final modifications before we can merge it tomorrow
-
Also, the PR use command and parameters for the client #536 has been reviewed and merged
-
We have reviewed the last version of the issue Fix CLI args precedence in Client/Signer/Aggregator #511. Everything is done now, except the
digest
argument that could probably be passed without being named (as this is the case until now). This will avoid current users of the client to break their implementation. The PR should be merged early next week -
Many comments have been received on the PR New STM registration procedure #433. They are currently being treated and the review should be merged early next week
-
Here is the schema presented yesterday during the demo that illustrates the certification process of the Mithril verification keys:
-
We have discussed about the peer review that we have made yesterday on the PR New STM registration procedure #433. All the points noted during the session have been fixed. The documentation part will be added today, stating that ths feature is experimental. We will prepare a Dev Blog post in a separate review that will explain the next steps:
- Testing of the new feature with volunteer SPOs for a transient period that allows both
Certified
andNon Certified
SPOs for smooth transitioning - Improvement of the design so that it fits well with the SPO Cardano nodes architecture (Core/Relay/Firewall/Keys security)
- Progressive deprecation of the
Non Certified
mode
- Testing of the new feature with volunteer SPOs for a transient period that allows both
-
We have talked about the issue Fix CLI args precedence in Client/Signer/Aggregator #511 that will be merged very shortly once some issues with the test lab are fixed
-
We have also worked on the demo path of this iteration:
- Introduction
- Showcase of the
Mithril Keys Certification
on thedevnet
- Presentation of the
Evolution of arguments handling in the CLI
of the nodes - Conclusion
- Q&A
-
Showcase path of the
Mithril Keys Certification
:
# Mithril Keys Certification
# On devnet, with evolving undeterministic verification keys, with evolving stake distribution, with full Certificate chain, with keys certification
---
# Setup demo
## Download source (if needed)
git clone https://github.com/input-output-hk/mithril.git
#or
git clone git@github.com:input-output-hk/mithril.git
## Checkout correct branch
cd mithril/
git switch mock_certification
cd mithril-client && make build && cp mithril-client ../../ && cd ..
cd ..
---
# Demo: Run devnet
## Start explorer
cd mithril/mithril-explorer
make dev &
cd ../..
google-chrome http://localhost:3000/explorer
## Change directory
cd mithril/mithril-test-lab/mithril-devnet
## Query Cardano
watch -n 1 NODES=cardano ./devnet-query.sh
## Logs Mithril
watch -n 1 NODES=mithril LINES=100 ./devnet-log.sh
## Start devnet with 5 pools
./devnet-stop.sh && NUM_POOL_NODES=5 DELEGATE_PERIOD=100 EPOCH_LENGTH=60 ./devnet-run.sh
---
# Demo: Restore a snapshot from devnet
## Prepare vars
NETWORK=devnet
AGGREGATOR_ENDPOINT=http://localhost:8080/aggregator
GENESIS_VERIFICATION_KEY=5b33322c3235332c3138362c3230312c3137372c31312c3131372c3133352c3138372c3136372c3138312c3138382c32322c35392c3230362c3130352c3233312c3135302c3231352c33302c37382c3231322c37362c31362c3235322c3138302c37322c3133342c3133372c3234372c3136312c36385d
LATEST_DIGEST=$(curl -s ${AGGREGATOR_ENDPOINT}/snapshots | jq -r '.[0].digest')
echo $LATEST_DIGEST
## List snasphots
NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT GENESIS_VERIFICATION_KEY=$GENESIS_VERIFICATION_KEY ./mithril-client list
## Show snasphot details
NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT GENESIS_VERIFICATION_KEY=$GENESIS_VERIFICATION_KEY ./mithril-client show $LATEST_DIGEST
## Download snasphot
NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT GENESIS_VERIFICATION_KEY=$GENESIS_VERIFICATION_KEY ./mithril-client download $LATEST_DIGEST
## Restore snasphot
NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT GENESIS_VERIFICATION_KEY=$GENESIS_VERIFICATION_KEY ./mithril-client restore $LATEST_DIGEST
-
We have also talked about:
- The possibility to use the Mithril Signer as a process that would not be running as daemon. It could be launched by a
cron
or the Cardano node itself at regular intervals - With that perspective, we could piggyback on the Cardano node which would be used to broadcast (Tx/Rx) the messages and store them in a bus. The Mithril Signer would use this bus whenever it is launched
- The possibility to use the Mithril Signer as a process that would not be running as daemon. It could be launched by a
-
During the demo some interesting points were addressed:
- The
Mithril Relay
design seems to be preferred by the SPOs as it would provide more security (theCardano Relay
is very likely to be subject to attack attempts) - We need to understand the impact of using
Operational Certificates
forMultiple Pools
and see if this is a concern (as each server would have its ownOperational Certificate
)
- The
-
We have paired on resolving the issue that we discovered regarding
KES Period
usage in Implement Certification of the Mithril Verification Keys in Signer/Aggregator #455:-
Simple (but not complete) solution implemented: store the
KES Period
along with theSignerWithStake
in the aggregator store and send it back to the Signers for them to make a valid key registration (even if theKES Period
has expired when used) -
Simple solution (next steps): enforce the range of valid
KES Periods
valid for anEpoch
(which should be easily computable given the currentSlot
and the genesis parametersslotsPerKESPeriod
andmaxKESEvolution
(could be added to the AggregatorBeacon
in the pending certificate or computed from theEpoch
number directly on the Signer node) -
More difficult solution: build the
KeyReg
at the same time on the Signers and Aggregator and store theCloseReg
(and use it on the Signer when the time has come). This would require a broadcatst/gossip mechanism between the nodes
-
Simple (but not complete) solution implemented: store the
-
We also had some discussions on the design of the Signer (with Key Certification) given the topology of the Cardano nodes run by the SPOs:
- We could use the
Core
node to process the signature w/ KES secret key given a message through the Cardano CLI - Or maybe use the
Relay
node to act as a proxy to make this operation - Some other discussions will take place to find the best architecture in the next weeks
- We could use the
-
We have made a thorough peer review (with the whole team) of the PR New STM registration procedure #433 that should be merged before the end of the week 💪
- We discussed about the security that should be applied to
Mithril Secret Keys
versus theCardano Secret Keys
:- The best option is to delete the keys as soon as the associated
Certificate
is produced - We must keep in mind that in case of an epoch gap in the
Certificate Chain
, we may need the keys for1
more epoch - The storage of the keys is also a concern (maybe they should be on the Core node)
- The best option is to delete the keys as soon as the associated
- Presentation of the team
- Discovery of the GitHub repository (Project, Wiki, ...)
- Discovery of the documentation website and of the Mithril Explorer
- Q&A session
- Plan next sessions in the following days/weeks
-
We have discovered a problem with
cargo2junit
plugin in the CI: thetest-mithril-core
job was not able to produce tests result file and failed. It did not make the CI fail completely which was odd behavior (green in the "Actions" tab and red in the "Pull Requests" tab). We have identified that it was apparently due to stale cache version on the CI. However, a fix has been merged with the PR Enhance Mithril networks documentation #534 that will avoid failure when that situation occurs -
We have also talked about the progress of the issues:
-
Implement Certification of the Mithril Verification Keys in Signer/Aggregator #455:
- Final review will be done this week and the work can be showcased during the iteration demo
- Many discussion have taken place in the discord channel regarding the security of the certification. They have been resumed on a discussion as well: How should we link the Mithril identity with Cardano identity #508
- Fix CLI args precedence in Client/Signer/Aggregator#511: the work merged on the Aggregator can be showcased during the iteration demo and the adaptation of the Client is in progress
-
Implement Certification of the Mithril Verification Keys in Signer/Aggregator #455:
-
We have reviewed the issue Fix CLI args precedence in Client/Signer/Aggregato #511:
- The adaptation needs also to be done on the Client (so that the use of the
Genesis Verification Key
is mandatory only on therestore
command) - The PR make parameters precedence on signer #529 has been merged
- The PR Fix crash on startup GCP Aggregator #535 has been merged to fix the GCP infrastructure
- The adaptation needs also to be done on the Client (so that the use of the
-
We have paired on implementing a
Verified Signer
on the Mithril Explorer for the Signers that have registered their SPO with the certification process as in Implement Certification of the Mithril Verification Keys in Signer/Aggregator #455. Also thedevnet
Docker images were not working since the merge of the PR fix SQLite deadlocks #521 -
We have talked about the process of Mithril Keys Certification which was challenged on the Discord channel:
- The
Operational Certificate
does not need to be available on the Cardano chain (which means that any pool that has not produced blocks yet can register on a Mithril network) - The validation mechanism works this way:
- The
Mithril Signer Verification Key
is signed by theKES Secret Key
of the SPO - This signature is verified with the
KES Verification Key
stored in theOperational Certificate
- The
Operational Certificate
is signed by theCold Secret Key
- This signature is verified with the
Cold Verification Key
stored in theOperational Certificate
- The
PoolId
is computed as the hash of theCold Verification Key
stored in theOperational Certificate
- This ensures that only the holder of the SPO
Cold Secret Key
is able to register itsPoolId
andMithril Signer Verification Key
on a Mithril network
- The
- We will open a GitHub discussion regarding this subject and we will as well create clear documentation for this feature
- The
-
Following our work from last week, we have continued working on the setup of the new
Release Process
, as in issue Implement Release process #500:⚠️ The reset of thepreview
andpreprod
networks that will occur in a near future will require a newGenesis Certificate
for the currenttesting
environment- We agreed that the SPO that we host on the
testing
andpre-release
environments will be in anaive
setup at first (only oneCore
Cardano node, aRelay
Cardano node will be added in a second time). We won't apply heavy security requirements on the keys (cold/air-gap) and we will keep things simple and maintainable with automation - Once a commit artifacts are deployed on the
testing-preview
and/orpre-release-preview
environments, we will launch automatedSmoke Tests
(to be defined) that will validate the conformity of the development (by testing the available routes and their responses, and that snapshots/certificates are produced after a deployment) - A
pre-release
deployment will be tested on a 24-48 hours depending if it is a minor or patch update before being qualified as releasable - Some selected SPOs will be running some Signer nodes on the
pre-release-testing
environment and will provide with some feedback before release - In case of critical bug fix, the qualification phase will be drastically shortened and the main indicator that will be used will be
MTTR
(Main Time To Repair) - We still have to find solutions on how to manage the window release length vs the merge locks that it could create
- We will have to refine our vision of how to manage failing deployments with dedicated process/checklists
- We will try to release a new version every 2 weeks, even if it only embeds crates update and small fixes
- We have decided to implement a lightweight
Monitoring
/Alerting
/Status Page
solution:uptime robot
that will help us monitor closely failing deployments and provide status feedback
- We have reviewed the PRs on the current issues:
- Fix CLI args precedence in Client/Signer/Aggregator #511: We have paired on fixing the test lab that was not working properly and we have also made some optimizations concerning the default configurations handling. The PR is now completed and will be merged very shortly
- Implement Certification of the Mithril Verification Keys in Signer/Aggregator#455: the unit tests have been adapted so that the new key certification is properly tested along with the legacy declarative version. Some optimizations are in progress and a full pair review of the PR New STM registration procedure #433 will be done early next week (as well as documentation updates)
- We have reviewed the work in progress on the issues:
- Fix CLI args precedence in Client/Signer/Aggregator #511: some re factorization of the commands and their arguments handling is in progress and should be completed shortly. This will fix a problem of having to use unnecessary arguments for some sub commands.
-
Implement Certification of the Mithril Verification Keys in Signer/Aggregator#455: the adaptation of the test setups (module
test_setup
inmithril-common
) is under development. In particular, it requires to be able to generate on the flyOperational Certificates
,KES Key Pairs
andCold Key Pairs
-
We have talked about the issue Fix CLI args precedence in Client/Signer/Aggregator #511:
- The problem is linked to the default value of the arguments passed by
clap
that is always used (even though an overriding value has been passed by an environment var or via a configuration file) - Some of the arguments used to setup the nodes are thus working only if we use the clap arguments which is not very convenient/coherent (as the vast majority of the others are set with environment vars)
- The best solution is to not use default values for configuration that can be overridden (all except
run_mode
andverbosity_level
) - We will formalize this rule in a dedicated
ADR
- The problem is linked to the default value of the arguments passed by
-
We have also made a deep review of the PR New STM registration procedure #433 that is linked to issue Implement Certification of the Mithril Verification Keys in Signer/Aggregator #455:
- The development of the first phase are close to getting ready and we hope to merge it soon
- It will not include breaking changes as the Signer and Aggregator will be able to work on hybrid modes:
-
Declarative mode with a non certified
PoolId
(as already running) -
Certified mode with a certified
PoolId
(activated only when a Signer is associated to anOperational Certificate
and aKES Secret Key
-
Declarative mode with a non certified
- A second phase will involve the development of a dedicated
Mithril Certifier
that will help handlingKES Secret Key
that will not be stored on the same Cardano node (Core
) as theMithril Signer
which will be running on top of theRelay
Cardano node
-
We had discussions about the issue Move stores to relational design with SQLite#476 for which we will probably proceed in multiple phases (Signer + Aggregator):
- Use a relational data model that will be used to implement the current
Store traits
- Implement a data model upgrade a la
sqitch
- Refactor(if needed) the several
Store traits
used to access these datas - Create ways of aggregating the relational data (with new routes to access them). We will need to dedicate a session for this
- Use a relational data model that will be used to implement the current
-
We talked about the next steps following the setup of our first SPO on
preview
:- Automate with scripts to deploy easily with
terraform
in the different environments - Handle pool metadata hosting on the documentation website
- Implement the Core/Relay nodes topology
- Work on automating the rotation of the keys
- We will dedicate a session to these next steps
- Automate with scripts to deploy easily with
-
We also discussed about the progress of the issue Implement Certification of the Mithril Verification Keys in Signer/Aggregator#455 which is close to getting ready:
- Test adaptation to do (vs hybrid mode of the Signer/Aggregator certification for a smooth transition with SPOs)
- Updating documentation to reflect the changes
- Write a blog post to explain the Certification activation road map (with
Mithril Certifier
to come)
-
We have paired and merged on the issue migrate snapshot store to SQLite#518. The last store has been successfully migrated on the GCP hosted Aggregator 🎉 However, we faced some difficulties with dynamic libraries (
libssl
) that was different between the compiled binary and the running OS which made themigrator
binary crash. We will have to take care of these details when ramping up the new release process (issue Implement Release process#500) -
We have discussed about the progress of the issue Implement Certification of the Mithril Verification Keys in Signer/Aggregator#455
-
We have also paired on setting up a SPO node from scratch on the
preview
network:- We have followed this guide
- After few attempts, we have been able to activate a pool on the
preview
network that is pool15qde6mnkc0jgycm69ua0grwxmmu0tke54h5uhml0j8ndw3kcu9x 💪 - We will keep on working on streamlining the setup of SPOs for our
testing
andpre-release
future environments, as well as the management tasks of a SPO to use them in the long run
-
We have paired on the issue Fix database dead locks in Aggregator#517. The solution that we have implemented is the following:
- Add a minimum version of
SQLite
:3.35+
so that we can useDELETE...RETURNING
statements that avoid explicit use of transactions - Update the CI so that it embeds this minimum version of
SQLite
- Add a retry mechanism to fetching data (simple but efficient with fixed sleep duration and max retry limit)
- We have merged the PR fix SQLite deadlocks #521
- We will keep watching if the database locks keep occurring on GCP and on the CI
- Add a minimum version of
-
We had talks about evolution of the stores that will be required by the issue Implement Certification of the Mithril Verification Keys in Signer/Aggregator#455:
- We will probably prepare a manual update script (as only the Aggregator is concerned with this upgrade)
- We definitely need to work with a relational data model soon to handle smoothly this type of upgrade (that could also occur on the Signer)
- This will be addressed in issue Move stores to relational design with SQLite#476
-
We have also prepared a demo path for the first demo with the members of the Mithril Pioneer Program:
- Introduction
- Showcase of the
Genesis Certificate
on thedevnet
- Presentation of the milestone of
10
SPOs signing on ourpreview
network - Presentation of the
Dev Blog
- Showcase of the SQLite migration
- Presentation of the
Store Retention
feature - Presentation of the upcoming
Release Process
- Conclusion
- Q&A
-
Here is the showcase path for the
Genesis Certificate
on thedevnet
:
# Mithril Genesis Certificate
# On devnet, with evolving undeterministic verification keys, with evolving stake distribution, with full Certificate chain, without keys certification
# Resources
## Github
google-chrome https://github.com/input-output-hk/mithril
## Website
google-chrome https://mithril.network/doc
## Explorer
google-chrome https://mithril.network/explorer/
---
# Setup demo
## Download source (if needed)
git clone https://github.com/input-output-hk/mithril.git
#or
git clone git@github.com:input-output-hk/mithril.git
## Checkout correct commit
cd mithril/
git checkout b7069fd6281f21052f90b80d149f743471c63bbe
cd mithril-client && make build && cp mithril-client ../../ && cd ..
cd ..
---
# Demo: Run devnet
## Start explorer
cd mithril/mithril-explorer
make dev &
cd ../..
google-chrome http://localhost:3000/explorer
## Change directory
cd mithril/mithril-test-lab/mithril-devnet
## Query Cardano
watch -n 1 NODES=cardano ./devnet-query.sh
## Logs Mithril
watch -n 1 NODES=mithril LINES=100 ./devnet-log.sh
## Start devnet
./devnet-stop.sh && DELEGATE_PERIOD=100 EPOCH_LENGTH=60 ./devnet-run.sh
---
# Demo: Restore a snapshot from devnet
## Prepare vars
NETWORK=devnet
AGGREGATOR_ENDPOINT=http://localhost:8080/aggregator
GENESIS_VERIFICATION_KEY=5b33322c3235332c3138362c3230312c3137372c31312c3131372c3133352c3138372c3136372c3138312c3138382c32322c35392c3230362c3130352c3233312c3135302c3231352c33302c37382c3231322c37362c31362c3235322c3138302c37322c3133342c3133372c3234372c3136312c36385d
LATEST_DIGEST=$(curl -s ${AGGREGATOR_ENDPOINT}/snapshots | jq -r '.[0].digest')
echo $LATEST_DIGEST
## List snasphots
NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT GENESIS_VERIFICATION_KEY=$GENESIS_VERIFICATION_KEY ./mithril-client list
## Show snasphot details
NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT GENESIS_VERIFICATION_KEY=$GENESIS_VERIFICATION_KEY ./mithril-client show $LATEST_DIGEST
## Download snasphot
NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT GENESIS_VERIFICATION_KEY=$GENESIS_VERIFICATION_KEY ./mithril-client download $LATEST_DIGEST
## Restore snasphot
NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT GENESIS_VERIFICATION_KEY=$GENESIS_VERIFICATION_KEY ./mithril-client restore $LATEST_DIGEST
- We have paired on the issue Fix database dead locks in Aggregator#517. After investigation, it appears that although we have implemented the store adapters behind
RwLock
, in some situation a database lock is possible:- If a transaction is opened by an adapter, the whole database is locked. Thus an attempt to make a query will result in a
Error 5: database is locked
, until the transaction is committed or rollback - A first issue is that in the case an error occurred during the transaction, it was never closed and resulted in a permanent lock of the database (until the service was restarted)
- We are working on some improvements that will make the system more resilient and efficient (although it requires some modifications on the CI to make sure the version of
sqlite
is at east3.35
) - We will continue working on a this issue tomorrow as it also creates some flakiness in the CI test lab runs
- If a transaction is opened by an adapter, the whole database is locked. Thus an attempt to make a query will result in a
- We had discussions about:
- Signer Registration (see discussion How should we link the Mithril identity with Cardano identity #508): it can be trusted because of the Genesis Certificate, so there is no specific problem with it
- Stake Distribution (new discussion to be setup to share these information with the community): understanding the portion of the stakes that is required to be secure, and how to possibly ramp up Mithril on the
mainnet
in multiple phases (with the implication of IOG stakes at first until we reach the required portion of all Cardano stakes) - Batch verification of the Certificates multi-signatures which would be provided by a batch verification function in the core library. This would involve a slightly different way of validating the Certificate Chain to take advantage of this feature
-
We had reviewed and merged the issue Add auto pruning in stores#504 🎉:
-
We have discovered a bug that is responsible of deadlocks on the Aggregator database and created an issue Fix database dead locks in Aggregator#517
-
We have created some issues with features that we need to implement or low priority bugs we need to fix:
-
We have planned the topics that will be showcased during the demo of the iteration:
- SQLite migration
- Genesis Certificate on the Certificate Chain
- New Dev Blog section the documentation website
- 10 signing SPOs on the
preview
network milestone - Release process under construction
-
We have talked about the issue Add auto pruning in stores#504 that is almost ready and should be merged shortly
-
We have reviewed and discussed in depth the issue Implement Release process#500:
- Issue is completed, need to flesh it out in the form of a document?
- Deployment of hosting environments is dependent on some work about deploying custom SPOs
- Have a single version for all crates?
- How to handle version for artifacts that do not change but get promoted?
- References:
- Informations about build number needs to be added (
version = sha1 + build number
)
-
We have decided to dedicate a future session to setting up a SPO pool as explained in this guide to better understand the way SPO work
-
We have also discussed about the issue Implement Certification of the Mithril Verification Keys in Signer/Aggregator#455:
- We will create a poll inside "Discussions" tab in order to get a better understanding of how SPOs host their
preview
andpreprod
pools vsmainnet
(Core + Relay / with firewall rules
,Core + Relay / No firewall rules
,Core only
) - A first possible design to handle properly the certification is a
Proxy
version:
- We will create a poll inside "Discussions" tab in order to get a better understanding of how SPOs host their
- A preferred design (that should be more adapted to the SPOs) is a
Async Validator
version:- Signer creates key material to sign when crossing epoch threshold (the protocol initializer with its associated verification key)
- Validator calls signer when "ready" (on cron, or manually) and ask for key material to sign
- Validator uses hot KES keys to sign the key material and send it to the signer
- Signer can then start registration process once it has signed material
- In the Mithril Explorer we will display the security level (or probability of an adversarial party to create a fake certificate) on each snapshot (and provide the formula used to compute it when hovering the protocol parameters displayed)
-
We had talks about the issue Add auto pruning in stores#504:
- It appears that there was a bug in the
MemoryAdapter
were theget_last_n_records
function retrieved the n last records sort by date of update instead of date of creation. This bug was fixed. - However, there was a bug in the implementation and in its test. We have discussed about how we could create some trait related tests that could help us spot such a problem easily (and also help qualify a new implementation of the traits is "correct")
- We have also talked about how to handle the configuration of the retention length on the stores: if none is specified (as this is currently the case) full retention is applied, if a retention length is specified then this length is used to prune the stores
- It appears that there was a bug in the
-
We had some discussions about the discussion Use CIP-22 as a way to identify SPOs when registering keys #507:
- The idea behind is the same as the one under implementation in the PR New STM registration procedure #433:
- Asking the owner of the pool to sign a message with its secret key in order to prove it owns this secret key
- In
CIP-22
:- The message signed has no meaning and is randomly generated by the verifier of the ownership
- The secret key used is the
VRF Secret Key
which is a hot non rotated key (but for which there is no Rust library available for signing/verifying)
- In our proposal:
- The message signed is the actual
Mithril Signer Verification Key
valid for1
epoch - The secret key used is the
KES Secret Key
which is a hot rotated key (for which a Rust library is available, done by IOG at https://github.com/input-output-hk/kes)
- The message signed is the actual
- The architecture of a Cardano SPO on the
mainnet
implies that:- A
Core Server
hosts aCore
(orBlock Producing
) Cardano node, which is aFull
node that has access to SPO hot secret keys and is isolated from the rest of the world (except that it is allowed to communicate with one or multiple associatedRelay
nodes) - A
Relay Server
hosts aRelay
Cardano node, which is aFull
node which is accessible from other external Cardano node peers, but does not have access to the SPO secret keys
- A
- A naive setup for running a
Certified Mithril Signers
(devnet
orpreview
) requires that the Mithril Signer node has access to:- An Aggregator that is external to the SPO infrastructure via a REST API (to send individual signatures)
- The database of a local Cardano
Full
node via file system (to compute snapshot digests and stake distribution) - The SPO hot secret keys (and operational certificates) via file system (to compute the signature that certifies the SPO is genuine)
- A more elaborated setup (
preprod
ormainnet
) would probably require that we split theMithril Signer
in 2 parts:- A first part running on the
Core
server only responsible for signing theMithril Signer Verification Keys
(when requested by the other part) - A second part running on the
Relay
server and responsible for the rest of the Mithril protocol (registering with Aggregator, sending individual signatures, ...)
- A first part running on the
- Here is a sketch of the naive setup:
- And a sketch of the real setup:
- The idea behind is the same as the one under implementation in the PR New STM registration procedure #433:
-
As expected,
2
epochs after applying the fix on the Stake Distribution computation of issueStake distribution discrepancy
#497, the Signers have been able to produce reliably individual signatures that are successfully registered on the Aggregator 💪 -
We have followed up on the merge of the issue
Deploy SQLite store adapter
#475. We have made some fixes on the migrators. We have helped the SPOs who had hard times migrating some of their stores and everything looks good now 🎉 -
We have talked about a nice to have feature of pruning automatically the stores of the Signer/Aggregator nodes. This will be implemented shortly in this issue
Add auto pruning in stores
#504 -
Also we have paired on the issue
Implement Certification of the Mithril Verification Keys in Signer/Aggregator
#455. We are working on a plan to deploy smoothly the feature to the SPOs before activating it on the Aggregator, so that a transition window will be opened for SPOs to deploy the change on their Signer nodes. We will keep on pairing on this complex topic during this iteration
-
Following the merge of the issue
Stake distribution discrepancy
#497, the stakes stores on GCP (Aggregator and Signers) are OK. We keep an eye on the list of signers in the Certificates from epoch37
that should embed new Signers and the error rate on the individual signatures registration that should drop -
We have paired and merged the issue
Deploy SQLite store adapter
#475 that activates the newSQLite
data store:- The Aggregator and the Signers nodes running on GCP have been successfully migrated to use the new store adapter
- We encountered a few difficulties when migrating the Aggregator stores. It appears that being able to qualify the migration on a testing environment would have been very helpful
- We are expecting the SPOs to migrate their stores (as explained in this dev blog post)
-
We have have continued working on the
Release Process
setup:- A dedicated issue has been created
Implement Release process
#500 and some tasks have been added to it - Here is the updated definition of the process:
- We will use a common version (
semver
) for all the crates of the repository and for the GitHub release - All the nodes should be able to display the current version they are running
- In case of a version mismatch, the Aggregator should return an error so that the Signer/Client nodes are updated regularly
- We will work with GitHub environments to support deployments of versions on multiple environments
- A new version
0.1.2
will have the following life cycle:- A commit
abc123
merged onmain
branch is deployed ontesting
environment namedtesting-preview
- A commit
def456
tagged with0.1.2-prerelease1
is deployed onpreprod
environment namedpre-release-preview
- A GitHub release
0.1.2
is created and linked with the0.1.2-rc1
tag and marked aspre-release
- A tag
0.1.2-prerelease1
is qualified and selected for release or rejected (and replaced by a0.1.2-prerelease2
tag if necessary on aghj789
) - If the tag
0.1.2-prerelease1
is selected, a new tag is created and name0.1.2
on the same commitdef456
- The GitHub release is linked to the
0.1.2
tag and marked asrelease
- The commit
def456
with tag0.1.2
is deployed to theprod
environment namedrelease-preprod
- A commit
- We will use a common version (
- Some questions remain:
- When to update
cargo.toml
crates version vs creation of the draft release on GitHub? - How to handle
merge lock
during qualification of a release candidate (with onlymain
branch) (Use of feature flag?) - How to handle
Protocol Versions
smoothly (backward compatibility of messages w/Avro
or equivalent solution?) - How to simplify the update process for the SPOs (with debian package for example)?
- How to handle real SPOs on the
testing-preview
andpre-release-preprod
environments (vs key rotations, secret keys management, ...)?
- When to update
- The deployment schema is now:
- A dedicated issue has been created
-
We have reviewed and merged the issue
Stake distribution discrepancy
#497:- The Stake Distribution should get back to normal
2
epochs after rebuilding the Signer - We will keep monitoring the GCP hosted Aggregator to check that the deployment goes well and does not prevent the Snapshot production.
- The SPOs should rebuild their Signer node (as explained in this dev blog post)
- The Stake Distribution should get back to normal
-
We have paired on the issue
Deploy SQLite store adapter
#475 and finalized the steps to follow in order to smoothly migrate the Signer/Aggregator nodes stores. TheUse Sqlite datastore in Aggregator & Signer
#477 should be merged tomorrow -
We have also worked on defining the
Release Process
for the Mithril Network:- We will use a common version (
semver
) for all the crates of the repository and for the GitHub release - All the nodes should be able to display the current version they are running
- In case of a version mismatch, the Aggregator should return an error so that the Signer/Client nodes are updated regularly
- We will work with GitHub environments to support deployments of versions on multiple environments
- A new version
0.1.2
will have the following life cycle:- We will use a common version (
semver
) for all the crates of the repository and for the GitHub release
- We will use a common version (
- All the nodes should be able to display the current version they are running
- In case of a version mismatch, the Aggregator should return an error so that the Signer/Client nodes are updated regularly
- We will work with GitHub environments to support deployments of versions on multiple environments
- A new version
0.1.2
will have the following life cycle:- A commit
abc123
merged onmain
branch is deployed ontesting
environment namedtesting-preview
- A commit
def456
tagged with0.1.2-prerelease1
is deployed onpreprod
environment namedpre-release-preview
- A GitHub release
0.1.2
is created and linked with the0.1.2-rc1
tag and marked aspre-release
- A tag
0.1.2-prerelease1
is qualified and selected for release or rejected (and replaced by a0.1.2-prerelease2
tag if necessary on aghj789
) - If the tag
0.1.2-prerelease1
is selected, a new tag is created and name0.1.2
on the same commitdef456
- The GitHub release is linked to the
0.1.2
tag and marked asrelease
- The commit
def456
with tag0.1.2
is deployed to theprod
environment namedrelease-preprod
- Diagram of the release process is below:
- A commit
- We will use a common version (
-
We have talked about the nearly ready to merge issue
Deploy SQLite store adapter
#475:- How long do we keep the migration binaries available before decommissioning them? (From
2
to4
weeks) - How to communicate with the SPOs about that breaking change and provide them with simple yet efficient documentation (This will be implemented inside a dedicated dev blog post)
- How long do we keep the migration binaries available before decommissioning them? (From
-
We have reviewed and merged the
Record 'contributing' Signers only in Certificate
#495 -
We had discussions about the issue
Stake distribution discrepancy
#497 that makes theStake Distribution
computation non deterministic and source ofA provided signature is invalid
error messages when a Signer submits individual signatures. In order to fix swiftly the problem, we have defined a plan:-
Solution 1: Add a feature that makes the
Stake Store
retrieve always the sameStake Values
until a better solution is found (worst case scenario; this will not be necessary, as we moved to Solution 2 directly) -
Solution 2: Compute the
Stake Distribution
differently by gathering the Stakes from the previous epoch pool by pool (best solution fortestnet
; solution that is under development in the PRFix Stake Distribution retrieval
#499) -
Solution 3: Modify the
cardano-cli
so that it computes the stake distribution at the previous epoch (better solution for long term andmainnet
; we will explore it in the future) -
Solution 4: Package a custom developped cli in Haskell that will query the ledger state and retrieve the
Stake Distribution
of the correct epoch (good solution, but drawback is that we need to package/deliver several binaries at once)
-
Solution 1: Add a feature that makes the
-
Other solutions have been debated such as calling Haskell functions from Rust or using a third party chain indexer
-
We have postponed the talks about the release process and we will resume them tomorrow during a dedicated session.
-
We have agreed that a relevant test case of Daedalus/Mithril would be to boostrap a
mainnet
archive with/without Mithril snapshot. This will require that we run amainnet
"test" environment. This will be part of our release/environments concerns/discussions -
Also, as we have been using the Cardano infrastructure (node/cli) quite a lot during our developments, we will organize a retrospective to give some feedback about it
-
The
Genesis Certificate
deployement worked as expected and new Snapshots are now available on the Mithril Explorer 🎉 -
We have reviewed and paired on the PR
Use Sqlite datastore in Aggregator & Signer
#477 of the issueDeploy SQLite store adapter
#475 with a main focus on the migration tool that is being built in order to migrate existingJSON
stores toSQLite
. We are at the stage of making the tool as easy to use as possible for the SPOs that will use it. Also we will create aHow to migrate stores
guide and a post on the dev blog that explains why and how use this tool. We should be able to merge next week -
We have also reviewed, paired and merged many fixes and improvements PR:
-
The PR
Implement Real Genesis Certificate
#438 has been merged and deployed successfully on the GCP Aggregator. However we had hard times to run thegenesis bootstrap
command. A fix is available in this PRUpdate Genesis GCP infra
#487. The firstGenesis Certificate
has been generated and saved successfully at epoch29
of thepreview
network and we should see newCertificates
produced as soon as the transition to epoch30
has taken effect 🎉 -
We have also worked on the preparation of the migration from
JSON
toSQLite
stores (which must take place on the Aggregator as well as the Signers), and have identified few options:- Add a specific command line in Aggregator/Signer to handle the migration
- Handle the migration with dedicated scripts, which would be cumbersome and does not look like the best option
- Add a new binary build in the cargo projects of Aggregator/Signer (that looks like the best option to take advantage of the CI and drop the code within a short time frame after release)
-
Once we have migrated to
SQLite
our stores, we will move on the relational implementation of the stores. We will have to work on an upgrade mechanism that will automatically upgrade the database schema when required -
We had discussions about the
Signers
displayed in theCertificates
of the Explorer:- We could display the stakes as
ADA
value or as%age
of total stakes enrolled in the Mithril network - We could also display which Signers have their individual signatures included in the certificate
- We could display the stakes as
-
We have added a new
Dev Blog
on the documentation website. This will help handle communications with the SPOs regarding breaking changes, deprecated features, new versions release, ... -
We need to work on the release process in order to manage correctly the evolution of the network with SPO users. We have talked about the options and questions we have, and will address them in a dedicated session:
- Rhythm of releases
- Versioning of the crates vs the Github tags
- Validation of the release candidates
- Trunk based or Gitflow?
- Packaging of the releases
- Automatic updates?
-
We have talked about the possible implementations of the optimization described in issue
Extend API to accept signature generation without Merkle path
#161 and in PRSwitch blst
#159 -
We also talked about the way we could create more compact certificates by avoiding duplication of the common parts of the Merkle paths stored
-
We have discussed the way we could provide a
Security Level
of the Chain on the Mithril Explorer, which relates to issueInclude probability of success for different parameters
#48. Researchers will provide a formula based onk
,m
, andphi_f
protocol parameters that can be used to compute a probability that an adversarial party produces a valid multi-signature -
We discussed about the evolution of the protocol parameters and Researchers will come back with proposed set of parameters that fits the number of Signers involved in the network
-
Finally, we talked about the RFP regarding the understanding of the impact of the percentage of the stakes involved in the network vs the security level, as it appears that the paper assumption of 100% stakes involved is not realistic. Also some very different scenarios can occur when we think about only a share of the stakes involved: if 10% of stakes are involved in Mithril network and 10% of the stakes of the Cardano network are considered adversarial, do we consider that 100% (all the adversaries of Cardano) or 10% (the share of adversaries of Cardano) of the Mithril stakes are adversarial?
-
We have been reviewing and finalizing the PR
Implement Real Genesis Certificate
#438. It is ready and will be merged tomorrow. Here are the operational implications:- Reset the
Certificate Chain
of the GCP hosted Aggregator - Bootstrap the
Genesis Certificate
on the GCP Aggregator - Requires that the SPOs recompile their Signer node (to handle faster registration), but previous version is compatible and will continue working
- Reset the
-
Regarding the flakiness of the CI:
- We attribute it to the way the
Stake Distribution
computed by thecardano-cli
- The expected error rate on the CI is
~4%
. If this rate gets too high, we will have to deactivate the stake delegation feature of the test lab until we find a better solution
- We attribute it to the way the
-
We have also worked on the migration of the stores of the Aggregator/Signer to
SQLite
as inDeploy SQLite store adapter
#475. We still have a few issues to fix and we will also work on an automatic upgrade mechanism (especially on the Signer side) before merging
-
We have merged the issue
Deploy mithril demo infra on 'preview' network
#457 (as well as the PRUpdate Blake dependency
#474). The Aggregator hosted on GCP is now running on thepreview
network and producing snapshots 💪 -
We have debriefed about the previous session and the Certification of the Mithril Signer Verification Keys and we all agreed on the next steps discussed previously
-
We have spent some time to dig in the Haskell code that makes the calculation of the stake distribution and we have found out that the
cardano-cli
provides the full precision on the stake distribution when the--out-file
option is activated. An issue has been created to adapt the current implementation of theChain Observer
and take advantage of this optionEnhance Stake Distribution retrieval
#480 -
⚠️ We have also tried to understand the source of flakiness on the CI and we have noticed that the computation of the stake distribution may be responsible:- We have noticed that even though we plugged all the Mithril nodes of the test lab on the same Cardano node of the
devnet
, the nodes retrieved different stake distributions during the same epoch - We have leaded another experimentation with stake delegation and we have clearly found that we could actually have different results during the same epoch
- This is a problem as we are expecting:
- The Stake Distribution to be computed for the previous epoch (and not the current epoch)
- The Stake Distribution to be deterministically computed on all the nodes
- We will probably have to work on different implementations of the
Chain Observer
:- Propose an evolution of the
cardano-cli
that allows to target a specific epoch when computing the Stake Distribution - Investigate other technologies that allow to observe the evolution of the chain
- Propose an evolution of the
- We have noticed that even though we plugged all the Mithril nodes of the test lab on the same Cardano node of the
-
We have talked about the incoming PR that include breaking changes:
-
Move GCP Aggregator to 'preview' network
#470 -
Update Blake dependency
[#474] (https://github.com/input-output-hk/mithril/pull/474) -
Use Sqlite datastore in Aggregator & Signer
#477 -
Implement Real Genesis Certificate
#438 - We will, at least, merge #470 and #474 at the same time: (Scheduled for Next Monday)
- Requires that the SPOs recompile their Signer node, update the configuration (
NETWORK=preview
andNETWORK_MAGIC=2
) - Involves a full reset of the Aggregator on GCP, and a manual intervention to produce new certificates
- Requires that the SPOs recompile their Signer node, update the configuration (
- If possible, we will also merge #477:
- Requires that the SPOs recompile their Signer node
- Involves a full reset of the Aggregator on GCP, and a manual intervention to produce new certificates
- When ready, we merge #438:
- Transparent for SPOs
- Requires a reset of the
Snapshots
andCertificate Chain
(which will be bootstrapped with aGenesis Certificate
) on the Aggregator
-
-
We have paired on the last bug that creates flakiness in the CI in the
Bootstrap Certificate Chain w/ Genesis Certificate
#364. It appears that a discrepancy occurs from time to time (~5%) on the computation of theNext Aggregate Verification Key
between the Signers and the Aggregator. We are still investigating the issue and we should fix it shortly -
We have also paired on the issue
Implement Certification of the Mithril Verification Keys in Signer/Aggregator
#455 in order to elaborate the best way to implement this feature. We have agreed on:- Implementing this feature in
mithril-common
in order to keepmithril-core
chain agnostic - In order to guarantee that no Mithril node can interact with the core library without being authenticated (now and in the future):
- The
mithril-core
library should be directly imported only by themithril-common
crate (we should probably enforce this rule in the CI) - A Cardano specific
ProtocolKeyRegistration
will be implemented as a wrapper around themithril_core::KeyReg
and added as a sub module ofcrypto_helper
module - A Cardano specific
ProtocolInitializer
will be implemented as a wrapper around themithril_core::StmInitializer
and added as a sub module ofcrypto_helper
module - We will extend the
entities::Signer
type so that it includes the Cardano specific material required for Signer certification (Operational Certificate
of the SPO,Signer Verification Key Signature
signed by theKES Secret Key
of the SPO). This will allow theSigner Verification Key Certifier
to certify that the Signer node is the genuine holder of apoolId
on the Cardano network and of aMithril Signer Verification Key
- Another required information is the
KES Period
that can be retrieved from thecardano-cli
and that will be retrieved through the currentChain Observer
(using the fieldqKesCurrentKesPeriod
of the commandcardano-cli query kes-period-info
) - We will add a new type dedicated to serialize/unserialize Cardano crypto material (that will also handle the
cborHex
conversion. This type will be able to parse a crypto file generated by the Cardano cli and convert it tobytes
, and to export ajson
format with keys encrypted incborHex
. This type will be also used for theGenesis Certificate Verification Key
.
- The
- Implementing this feature in
-
We had discussions about the fixing of the flakiness of the CI that we are trying to fix in the
Bootstrap Certificate Chain w/ Genesis Certificate
#364. We have paired and prepared some fixes in theImplement Real Genesis Certificate
#438. Also a fix on themithril-core
has been merged in order toAvoid panics in 'StmInitializer'
#472 -
We also had some talks about the migration of the Aggregator hosted on GCP to the
preview
network:- At first, we will decommission the
testnet
snapshotting - Then, it will be replaced by the
preview
network (target ETA is EOW) - In a second time, we will work on supporting multiple networks
- At first, we will decommission the
-
In order to work efficiently with SPOs, we will need to work with regular releases:
- We intend to create new releases every 1/2 weeks
- We will name our deployment environments the same way as the Cardano networks (
devnet
,preview
,preprod
,mainnet
) - When a commit is pushed on a working branch, the
devnet
is launched in therun-test-lab
job of the CI - When a commit is merged on the
main
branch, a terraform deployment will be triggered on thepreview
from the CI - When a
tag
is created (maybe following a specific format), a terraform deployment will be triggered on thepreprod
from the CI - The Signer, Client and Aggregator nodes will be released synchronously with the same tag version
- We will probably implement a feature where if a Signer or a Client requests the Aggregator with a different version, a
400
bad request will be returned
-
We also had discussions about the issue
Simplify the Multi Signer in Aggregator
#398 and we have tried to elaborate a road map to implement it:- The strategy is to make the multi signer pure and let the state machine handle the state
- We will define a clear interface for interacting with the state
- In a second time, we will also try to enhance the state machine of the Aggregator, then of the Signer
- We will use an event driven state machine that gets updated given a list of
(State, Event) -> ApplyTransition -> NewState
by depiling queued events. We still need to find a way to handle the synchronous responses of the http server routes
-
We have reviewed the new issues that have been created:
-
permission denied issue in dev-net
#459: we have hard times reproducing the issue. Therefore, we have asked the user to provide with more details about his setup. However, we have merged a PR that could fix the permission issueFix attempt 'Permission Denied' in devnet
#467. We are waiting for a feedback of the user to see if this patch fixes the problem -
Provide machine-readable output for mithril-client
#464: We will start working on it shortly
-
-
We have received and reviewed a first PR from the community
DATA_STORE_DIRECTORY
#465 that adds a missing configuration update on the Signer setup for a SPO -
We also had discussions about the PR in progress:
-
Greg/444/sql store
#460 has been merged as a first milestone of the PoC we are conductig on switching the stores toSQLite
💪. We will work on the enhancement of the iterator management (and avoid loading the full store in memory) and also on moving the actual stores in the Aggregator and Signer nodes in the nex future -
Implement Real Genesis Certificate
#438: we need to fix the panic that occurs sometimes on the Signers and we should be able to merge the PR then. Once the PR is merged, we will be able to bootstrap a brand newpreprod
GCP Aggregator as in issueDeploy mithril demo infra on 'preprod' network
#457
-
-
We have paired on the issue
Bootstrap Certificate Chain w/ Genesis Certificate
#364. All the features have been implemented in the PRImplement Real Genesis Certificate
#438. However, we have some flakiness issues that we need to fix prior to merging (that must have been in the previous code and that create somepanics
in the Signer) -
We have reviewed and discussed about the PoC for implementing a
SQLite
store adapter. A first version is close to being ready with an iterator that loads all the records in memory. Once this version is stabilized, we will work on a optimizing the iterator
-
We have paired on the issue
Bootstrap Certificate Chain w/ Genesis Certificate
#364. We are close to being ready to merge the PRImplement Real Genesis Certificate
#438 -
We also had discussions about:
-
We have sliced and created the tickets for the new iteration
-
We have cleaned up the stales branches of the repository
-
We have merged the PR
Flaky tests
#374 🥳 We now useblst
as the crypto backend (withportable
feature activated in the CI). We have also resetted the stores of the GCP Aggregator (as the previous keys were not compatible withblst
) -
As we will start working on the Mithril Keys Certification we had some discussions about this feature (and about
cbor
encodings for the keys) -
Also, we have paired on the PR
Implement Real Genesis Certificate
#438, that we will merge shortly
-
We have open sourced the repository!!! 🎉
-
We have reviewed the final version of the PR
Flaky tests
#374 and we have paired on optimizing theportable
feature implementation -
We also had discussions about the difficulty we face when trying to implement the
SQLite
store adapter. We will try a different approach by working the underlying crate used by the crate we are trying to implement -
We have prepared a path for the demo with the goal of
Open Sourcing
the GitHub repository 🥇:- Making the GitHub repository public in live 🚀
- Showcasing the final version of the documentation website (that we have already made public)
- Showcasing the restoration of a
tesnet
Cardano Node
from aMithril Snapshot
hosted on GCP (and also showcasing theMithril Explorer
)
# Mithril End ot End
# On devnet, with evolving undeterministic verification keys, with evolving stake distribution, with real Certificate chain (without genesis)
# Resources
## Github
google-chrome https://github.com/input-output-hk/mithril
## Website
google-chrome https://mithril.network/doc
## Explorer
google-chrome https://mithril.network/explorer/
---
# Setup demo
## Download source (if needed)
git clone https://github.com/input-output-hk/mithril.git
#or
git clone git@github.com:input-output-hk/mithril.git
## Checkout correct commit
cd mithril/
git checkout 2c286878d070b842cd40f63ae580456cc50c00f7
cd mithril-client && make build && cp mithril-client ../../ && cd ..
cd ..
---
# Demo: Restore a snapshot from testnet
## Prepare vars
NETWORK=testnet
AGGREGATOR_ENDPOINT=https://aggregator.api.mithril.network/aggregator
LATEST_DIGEST=$(curl -s ${AGGREGATOR_ENDPOINT}/snapshots | jq -r '.[0].digest') && echo $LATEST_DIGEST
## List snasphots
NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client list
## Show snasphot details
NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client show $LATEST_DIGEST
## Download snasphot
NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client download $LATEST_DIGEST
## Restore snasphot
NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client restore $LATEST_DIGEST
-
We have reviewed the PR
Flaky tests
#374 that corrects the CI flakiness ofmithril-core
🥳 . There is still a question regarding the implementation of theportable
feature ofblast
that we need to investigate as we are using the artifacts built by the CI to created Docker images (and in the future released binaries). Also when merging this PR we will have to reset/recreate the stores on the GCP Aggregator (as the keys currently generated withzcash
are not compatible with theblast
keys). We should merge at the end of the iteration. After some discussions, we have decided to use a featureportable
in themithril-core
library and not to re-exposemithril-core
frommithril-common
. This feature will be used in the CI (tests and artifacts released) at first. We still need to understand what is different between portable and not portableblast
(apparently related to IAS extensions that may causetheSIGILL
) and also we will work on adapting the CI and artifacts (Docker, executable) production with the idea that we must test the artifacts that we release. -
We have reviewed the latest commits of the PR
Implement Real Genesis Certificate
#438. We will continue to work on it and expect to merge it shortly -
Also, we have paired on the
use SQL store
#444
-
We have reviewed and merged the
Repository is missing a CONTRIBUTING document
#446. We also had discussions about the final steps before open sourcing a branch protection rules before merging a PR (see) -
We have paired on:
-
We have activated the
Require approvals
feature on the repository before merging new PRs (this will be needed when open sourcing the repository)
-
We have paired on numerous bug fixes and enhancements related to the flakiness of the CI:
-
We have reviewed and merged:
- The PR
Aggregator check existing certificate
#435 which closes the issueAggregator is stuck in "Signing" state when epoch changes
#431 🥳 - The PR
Move Certificate Verifier to Common
#436. It prepares the work to be done in the issueBootstrap Certificate Chain w/ Genesis Certificate
#364 for which we have been talking about the steps that needs to be completed - The PR
add code doc & factor service initialization
#440 that relates to issuePrepare open-sourcing of repository
#92
- The PR
-
We had discussions about the need to handle data structure update and to have debug tools. A way to work on these two issues is to use
SQLite
and implement a store adapter on top of it. We will run a small PoC on this implementation
-
We have merged the
Add signer integration test
#430 🥳 -
We have also reviewed the first PR of the issue
Aggregator is stuck in "Signing" state when epoch changes
#431 that will be merged shortly. We will pair on the second part of the issue which requires some modifications of the Snapshots store -
We had also discussions about the
Mithril Keys Certification
:- We have reviewed the PR
New STM registration procedure
#433 - We still need to find out how to retrieve all the information needed (
KES Key period
with Cardano Cli andCold Verification Key
maybe from the Core Cardano Node) - We were wondering if the
KES Keys
are renewed by overwriting the files. If this is he case, it means that we would need to reconfigure the Signer node after renewal of the keys - The Signer does 2 new things during key registration:
- Sign the Mithril Verification Key with the
KES Secret Key
to produce aKES Signature
- Send the
Operational Certificate
, theCold Verification Key
, theKES Period
and theKES Signature
to the Aggregator during the registration process
- Sign the Mithril Verification Key with the
- The Aggregator will verify the authenticity of the
Pool Id
and the associatedMithril Verification Key
during the registration of the Signer. It will allow the Aggregator to match thePool Id
with theStake Share
retrieved from the Cardano Node. We still need to check if theOperational Certificate
, theCold Verification Key
, theKES Period
and theKES Signature
need to be stored on the Aggregator - For now, the Core library will keep computing the Merkle trees the same way and use only the Stakes from the registered Signers (and not from the whole Cardano Network)
- Before we merge this PR, we will need to have a running SPO node on GCP (that needs to be configured) so that we don't miss epochs in the Certificate Chain
- We have reviewed the PR
-
We had also talks about the
Genesis Keys
:- We will probably store the
Genesis Keys
with the same codec as the other keys used in Mithril (by usingserde
(de)serialization andbase64
encoding) in the first place - However, the
Genesis Keys
used by the Cardano Node seem to be using acbor
format. We will try to handle this encoding instead - Another question that was raised is where can we find the
mainnet
Genesis Verification Key
?
- We will probably store the
-
We have reviewed and will merge shortly the latest modifications of the issue
Add signer integration test
#430 -
We have paired on understanding and fixing a bug on the Aggregator
Aggregator is stuck in "Signing" state when epoch changes
#431. Some PRs that fix the problem are in progress and will be merged shortly -
Following the occurrence of this bug, we have thought that it would be a good idea to implement a
Max Error
feature for a runtime cycle: if the runtime is in errorM
times in a row for the same state, the Aggregator runtime would panic. This would also help us spot early problems in state transitions -
We had also discussions about the
Mithril Keys Certification
:- In order to verify the SPO that is running a Mithril Signer, we will sign the
Mithril Verification Key
with theCardano Hot Secret Key
akaKES.skey
and we will verify it with theCardano Hot Verification Key
akaKES.vkey
that is stored inside theOperational Certificate
of the Cardano Node of the SPO - Every 6 epochs, the
KES Keys
are rotated and a newOperational Certificate
will be issued. This means that we need to retrieve the currentOperational Certificate
at each epoch (before the Signer registers its keys with the Aggregator) - We will try to stay on the
Cardano Relay Node
and avoid if possible to work with theBlock Producing Node
. It means that thePoolId
which is the hash of theCold Verification Key
should be declared by the SPO (and also verify that it matches with the one included in theOperational Certificate
) - The
Mithril Verification Key Signature
must be verified on the Signer at startup and also on the Aggregator during registration - We will include the
KES.skey
siging of theOperational Certificate
in the core library - We will maybe use the
Cardano Cli
to verify the signatures as it will require less work at first. This code should be incorporated into the core library when we go to mainnet - We also need to find a way to retrieve the
Operational Certificate
from theCardano Cli
- In order to verify the SPO that is running a Mithril Signer, we will sign the
-
We have reviewed and merged the PR
Certificate chain integration test for Aggregator
#424.It should fix some bugs related to issueProduce valid certificate chain for several epochs on Devnet
#396 -
We have also reviewed and paired on the
Greg/317/signer integration test
#426. It should be merged shortly -
We have also discussed about the
Certificate Chain
:-
Epoch Gap
: We will work in the first place on handling the Epoch Gap with using the latest "certified" stake distribution to sign the current epoch as defined in the previous Research/Engineering session. This will be done when thedevnet
is working smoothly. The mechanism needs to:- Detect a gap in the
Certificate Chain
in the Aggregator - Modify the
Beacon
of thePending Certificate
to use the previousEpoch
in the Aggregator - Make the Signers use the
Epoch
from theBeacon
of thePending Certificate
in order to select theProtocol Initializer
andStake Distribution
to use to produceSingle Signatures
- Detect a gap in the
-
Multiple Protocol Parameters
: the Aggregator can try multiple sets of parameters (with equivalent security level) on the gatheredSingle Signatures
in order to produce the most efficientMulti Signature
. It will try the harder to reach parameters first. The only constraints on the parameters are:- They must share the same parameter
phi_f
value that is used to createProtocol Initializer
- The Signers must use the worst case parameters (the one with the highest number of lottery attempts
m
)
- They must share the same parameter
-
Genesis Certificate
: We will try to put in place a process in thetestnet
that is as close as possible as what we will deploy on themainnet
. The genesis mechanism would the as follows:- The Aggregator must wait until a
Genesis Certificate
is available before appending any Certificate to the chain - In the mean time, the Signers will be able to proceed to the key registration
- At a manually selected epoch (preferably at the beginning of the epoch), the
Genesis Certificate Bootstrap
will happen - Once the
Genesis Certificate
is saved in the Aggregator store, it will be able to produce validCertificates
and to append them to the chain. This should start occurring at the next epoch. - The
Genesis Certificate Bootstrap
will be done as follows:- Export the
payload/message
to be signed in theGenesis Certificate
from the Aggregator (via cli) and store aProto Genesis Certificate
(unsigned) - Use the
Genesis Private Key
to sign thismessage
and create aGenesis Signature
(cold process, done out of Mithril Network on themainnet
, can be done via Mithril cli on thetestnet
anddevnet
) - Import the
Genesis Signature
back in the Aggregator and update theProto Genesis Certificate
and convert it to a definitiveGenesis Certificate
(metadata will be updated and hash needs to be recomputed, done via cli)
- Export the
- The Aggregator must wait until a
-
Mithril Keys Certification
: This subject is still under definition, but some issue arose about:- Do we need to run a Mitril Signer on the
Block Producing Node
just for this certification (the one that holds the cold keys required to sign and that is closed to the outside)? Or is this operation done by the Cardano Node itself? - The Mithril Signer will be running on the
Relay Node
, the one that is opened to the outside world (and does not have access to the hot keys)
- Do we need to run a Mitril Signer on the
-
-
This was the first meeting with the
Daedalus/Lace
team. The goal was to understand each other needs and to setup short term goals and working environment -
Daedalus
end of life will happen soon andLace
will replace it (with an Open Source approach).Lace
will also handle a light client wallet -
We showcased the restoration of a
Cardano Node
on thetestnet
thanks to aMithril Snapshot
-
Questions discussed:
- Is it possible to restore not the full immutable database, but instead work with the range of missing files? (Answer is yes, but not on the first version as the feature is not implemented yet)
- How secured is Mithril and the downloaded snapshot? (Answer is fully secured by design, %age of SPOs participating, and protocol parameters selection)
- Who pays for the bandwidth? (Answer is IOG for the Aggregator that it currently hosts, and each Aggregator provider when multiple are available. Also we have plans for using peer to peer networks for hosting the archives)
- What about Utxo set? (Answer is not implemented yet, but will allow Mithril to handle light wallets)
- What about the new
testnet
? (Answer is we need to work in that issue, but the new testnet is not stable enough at the moment) - Do the Mithril client binaries exist for Linux, macOS and Windows? (Answer is not yet but easy to do, will be part of the work)
- How to communicate with a Mithril Client? (Answer is stdout or text file in a first version, then IPC later. It will provide a percentage of completion and error/log messages. Will work the same whether running on Daedalus or Lace)
- How to integrate Mithril snapshot restoration in the wallet? (Answer is by being a part of the
Cardano Launcher
module of the wallet. Once the archive is extracted, Mithril is not used/needed anymore)
-
Next steps for the PoC:
- Setup another meeting to create technical tasks in Jira/Github Projects with engineers
- Create a dedicated private Slack channel with members from the 2 teams
-
We have reviewed the PRs about:
- Integration tests on the Signer (incoming)
-
Add Store Protocol Parameters in Aggregator
#385 that is ready to be merged
-
All our efforts have paid off and we now have the GCP Aggregator working smoothly, see issue
Produce valid certificate chain for several epochs on Testnet
#397. However, we will monitor it closely to be sure that there are no other snapshot producing blockers -
Also we have noticed that the refresh rate of the runtime interval of the Mithril nodes (especially the Aggregator) seem to have a high impact on flakiness of the CI/devnet. We are still activaly investigating this issue
Produce valid certificate chain for several epochs on Devnet
#396, however the flakiness is now considerably mitigated -
We have also prepared the demo path:
# Mithril Certificate Chain
# On devnet, with evolving undeterministic verification keys, with evolving stake distribution
# Resources
## Github
google-chrome https://github.com/input-output-hk/mithril
## Architecture
google-chrome https://mithril.network/doc/mithril/mithril-network/architecture
## Certificate Chain
google-chrome https://mithril.network/doc/mithril/mithril-protocol/certificates
## Explorer
google-chrome https://mithril.network/showcase/
---
# Setup demo
## Download source (if needed)
git clone https://github.com/input-output-hk/mithril.git
#or
git clone git@github.com:input-output-hk/mithril.git
## Checkout correct commit
cd mithril/
git checkout 4325260ec657b4cde0d4be5c6ff2a23241f2d886
cd mithril-client && make build && cp mithril-client ../../ && cd ..
---
# Demo: Download & Restore Latest Snapshot All In One (~20 min)
NETWORK=testnet && AGGREGATOR_ENDPOINT=https://aggregator.api.mithril.network/aggregator && LATEST_DIGEST=$(curl -s ${AGGREGATOR_ENDPOINT}/snapshots | jq -r '.[0].digest') && echo $LATEST_DIGEST && NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client list -vvv && NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client show $LATEST_DIGEST -vvv && NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client download $LATEST_DIGEST -vvv && NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client restore $LATEST_DIGEST -vvv
NETWORK=testnet && AGGREGATOR_ENDPOINT=https://aggregator.api.mithril.network/aggregator && LATEST_DIGEST=$(curl -s ${AGGREGATOR_ENDPOINT}/snapshots | jq -r '.[0].digest') && echo $LATEST_DIGEST && NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client list && NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client show $LATEST_DIGEST && NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client download $LATEST_DIGEST && NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client restore $LATEST_DIGEST
---
# Demo: Launch a Mithril Network explorer
## Change directory
cd mithril-showcase
## Build website
make dev
## Open explorer
google-chrome http://localhost:3000/showcase
---
# Demo: Bootstrap and start a Mithril/Cardano devnet
## Change directory
cd mithril-test-lab/mithril-devnet
## Run devnet with 1 BTF and 2 SPO Cardano nodes
MITHRIL_IMAGE_ID=main-4325260 NUM_BFT_NODES=1 NUM_POOL_NODES=2 EPOCH_LENGTH=45 SLOT_LENGTH=1.0 DELEGATE_PERIOD=90 ./devnet-run.sh
## Watch devnet logs
watch -n 1 LINES=5 ./devnet-log.sh
## Watch devnet queries
watch -n 1 NODES=cardano ./devnet-query.sh
## Visualize devnet topology
./devnet-visualize.sh
## Stop devnet
./devnet-stop.sh
# Client
## Get Latest Snapshot Digest
NETWORK=devnet
AGGREGATOR_ENDPOINT=http://localhost:8080/aggregator
LATEST_DIGEST=$(curl -s ${AGGREGATOR_ENDPOINT}/snapshots | jq -r '.[0].digest')
echo $LATEST_DIGEST
## List Snapshots
NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client list
## Show Latest Snapshot
NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client show $LATEST_DIGEST
## Download Latest Snapshot (Optional)
NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client download $LATEST_DIGEST
## Restore Latest Snapshot
NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client restore $LATEST_DIGEST
## All at once
NETWORK=devnet && AGGREGATOR_ENDPOINT=http://localhost:8080/aggregator && LATEST_DIGEST=$(curl -s ${AGGREGATOR_ENDPOINT}/snapshots | jq -r '.[0].digest') && echo $LATEST_DIGEST && NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client list && NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client show $LATEST_DIGEST && NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client download $LATEST_DIGEST && NETWORK=$NETWORK AGGREGATOR_ENDPOINT=$AGGREGATOR_ENDPOINT ./mithril-client restore $LATEST_DIGEST
-
We have reviewed and merged the issue
Add state machine runtime Signer
#317 🥳 it apparently solves the problem that prevented the creation of certificates because signer registration was not done properly at each epoch -
We have also reviewed and merged the issue
Add/Use Protocol Initializer Store in Signer
#362. The non deterministic verification keys have been rolled back and a bug has been fixed in theClerk
computation . With invariantStake Distribution
, the network is able to generate a validCertificate Chain
💪 -
We still have some flakiness occurring when the stake distribution changes and we are actively investigating them
-
This was the first official meeting to synchronize Research and Engineering teams. This meeting will take place every 2 weeks
-
We have mainly discussed about how to handle an
Epoch gap in the Certificate Chain
(seeMithril Client fail to validate certificate chain if the previous certificate is more than one epoch older
#377:- Having no epoch gap in the Certificate Chain is mandatory to guarantee the security of the protocol an avoid "long range" attacks
-
Re-genesis
the Certificate Chain is always possible and "nuclear" option used if nothing else works - In case of multiple Aggregators, downloading a valid chain from another Aggregator is possible
- Also an Aggregator should be able to try different protocol parameters in order to produce the multi signature:
- They would provide the same security level
- But the first tried would produce lighter signatures (whereas the quorum would be harder to be reached)
- If a multi signature is produced, no other tries
- If not, a different set of parameters is tried
- If an Aggregator is not able to produce a valid certificate at epoch
n
, and is now at epochn+1
:- It should use the previously valid stake distribution (
next AVK
) in certificate at epochn-1
- Instead of the stake distribution at epoch
n
which is not validated - And produce a certificate for epoch
n+1
- It should use the previously valid stake distribution (
-
We have paired on the
Add state machine runtime Signer
#317 andAdd/Use Protocol Initializer Store in Signer
#362 issues all day long. We hope to merge very shortly 💪 -
We have also had discussions on the
Add Store Protocol Parameters in Aggregator
#385: this implies that theNext Protocol Parameters
are broadcasted in thePending Certificate
of the Aggregator
-
We have reviewed and merged all the PRs that relate to issue
Configure SSL certificate for Mithril Aggregator GCP
#324. The showcase is now working correctly on the production documentation website and it will be activated in the navbar shortly 🥳 -
We have reviewed and paired on the issue
Add state machine runtime Signer
#317 that is a blocker for 3 other issues so that we can complete it asap and not jeopardize the demo of the iteration. There is still much work to do and some questions are still open (in particular regarding the epoch that should be used: from the Cardano node or the Pending Cetificate). This is our main focus for the following days
-
We have reviewed some work that has been done yesterday on the
Add state machine runtime Signer
#317 -
We have also created new issues (wth high priority) related to fixes/optimizations that need to be implemented to:
-
Following our conversations from the previous days, we created an issue
Simplify the Multi Signer in Aggregator
#398 that will conduct a study on what is the best strategy to enhance the Multi Signer
-
We had discussions about how we can handle missing certificates for some epochs in the
Certificate Chain
. The problem is tricky and could be solved by:- Using a higher epoch offset and embedding in the signed message multiple
Next AVKs
. This could work, but would be cumbersome (as the Signers would have to wait more epochs before being able to sign) - Use the Aggregator beacon to handle certificate creation for an epoch at a later epoch when network is back up. This means that the Aggregator is in charge of broadcasting the epoch to be used by the Signers to individually sign. This solution is likely to be the most simple to deploy, but it might not cover all of the cases that would be responsible for an epoch drop in the chain (for example if the Signers were not able to gather previous Stake Distribution on their end)
- In a multiple Aggregator network, if an Aggregator misses an epoch (due to networking or operations trouble), it should be able to recover the chain by retrieving from any other up to date aggregator)
- A last option to cover such an epoch drop would be to re genesis the chain (will always work, but hard to operate)
- Using a higher epoch offset and embedding in the signed message multiple
-
We have also talked about the
Multi Signer
of the Aggregator and the issueReunite Beacon Store/Provider Aggregator
#363. We have decided to replace theBeacon Store
dependency with aBeacon
that is fed by the runtime. Also, we have agreed that this module could be simplified and we will work on that step by step. Maybe we can split the module in sub modules and we should wait for the Certificate Chain to be fully functional before making to impacting modifications. In the mean time, we agreed on pairing whenever breaking modifications are applied we should be doing them in pair -
We have paired intensively on the issue
Add state machine runtime Signer
#317 -
A last point we have discussed is that we should define a dedicated type for handling serialized keys from the Core library
-
We have reviewed and merged a PR
Improve aggregator dependencies management
#382 regarding some optimization on the dependency management in the Aggregator -
We have discussed about the issue
Add state machine runtime Signer
#317 and we have stated that:- We will use the
Beacon Provider
from the Aggregator in the Signer, which implies that the module will be moved to themithril-common
folder - The
Immutable Digester
will be fed with aBeacon
at which it will compute the digest - The Signer will not rely any more on the
Beacon
retrieved from thePending Certificate
of the Aggregator - We will also paired on this issue after these adjustments have been done
- We will use the
-
We have reviewed the PRs that have been done last week and took some time to talk about the epoch offset used to implement the Certificate Chain
-
We have discussed about several topics:
- The flakiness of the CI that was partially fixed, but sometimes another error occurs which is apparently related to a gap in the certificate chain (one epoch is not signed). We will investigate that issue and also work on the possibility of verifying AVK signed certificates up to
N
previous epochs to avoid breaking the chain (currentlyN
is 1, it could be a parameter of the Client). Also the code to verify a certificate could maybe be optimized for clarity (too many intricatematch
) - Implementing a
Service Builder
in the Aggregator to simplify usage of dependencies - Removing the
Beacon Store
(see issueReunite Beacon Store/Provider Aggregator
#363) and using only theBeacon Provider
instead. This also means that we need to create a store for theStates
of the state machine of the Aggregator. This will allow the Aggregator to restart gracefully (and not sign the sameImmutable File Number
multiple times) - Improving the source of the
Immutable File Number
that should be only the responsibility of theChain Observer
and use this source to feed to theImmutable Digester
(who should only be responsible for computation of the digest) - Also, the computation of the digest takes too long. An optimization would be to cache the digest of each immutable files and compute the digest as a root of a Merkle tree for example. This would require to compute almost only the hash of the latest
Immutable File Number
and would drastically reduce the time and CPU resources needed for computation - We could simplify state stores parameters by using only one
Store Directory
and use it as a prefix for all the stores data path. This would greatly reduce the complexity of the setup of the nodes and would avoid impacting other resources each time a new to store is added (GCP, test lab, ...) - Also in order to simplify querying and debugging of the stores we could:
- Implement a
SQLite
adapter - Provide specific tools for retrieving/gathering the data from the stores
- Implement a
- The flakiness of the CI that was partially fixed, but sometimes another error occurs which is apparently related to a gap in the certificate chain (one epoch is not signed). We will investigate that issue and also work on the possibility of verifying AVK signed certificates up to
-
We also agreed that some efforts are still needed to stabilize the system so that
- Snapshots and certificates are producing consistently (there are many hiccups on GCP)
- The Signer seems to be mainly responsible for this and the ongoing re factorization and improvements in progress should allow it shortly
-
We have reviewed the latest developments for the issue
Implement certificate chain Aggregator/Signer/Client
#316. The PR has been merged 🥳 -
The PR
Set indices to be represented as vectors instead of unique
#351 has been merged and thus closes the issueOptimize single signature in Mithril Aggregator/Signer/Core
#296 🎉 -
We have reviewed and talked about the issue
Add integration tests in Mithril Aggregator
#284 which should be ready to be merged shortly -
We have also reviewed the developments in progress of the website
Showcase
section of issueShowcase snapshots/certificate pending on doc website
#315. The first results look very good and we are keen on seeing it live on the website! As there is not always aPending Certificate
available, we were asking ourselves if maybe we could add a/beacon
route on the Aggregator API that would display the currentBeacon
🤔
-
We have reviewed the showcase interface in its first version
Showcase snapshots/certificate pending on doc website
#315.It is working and displays the first information retrieved from the Aggregator. Some more work needs to be done in order to complete the issue -
We have reviewed and talked about the
Implement certificate chain Aggregator/Signer/Client
#316: there seem to be a problem with the stake distribution update that prevents the Aggregator to produce multi signatures. Some investigation are in progress. If the fix is not obvious, a feature flag will be activated to allow the merging of the PR -
We have discussed and contributed to the issue
Optimize single signature in Mithril Aggregator/Signer/Core
#296, specifically about the dedupliction of the won lottery indices. The PR should be merged shortly
-
We have reviewed and paired on the
Add integration tests in Mithril Aggregator
#284. It is still under progress for the implementation of the Happy Path, but will be ready to merge shortly -
We have reviewed the
Implement certificate chain Aggregator/Signer/Client
#316. Some enhancements will be done in theEnd to End Tests Runner
and the PR should be merged shortly. -
We have discussed about the short term fix for the issue
Signer can not sign after restart (UnregisteredVerificationKey)
#361. We agreed to switch temporarily to a deterministicVerification Key
generator. The fix has been merged and works as expected on GCP 🥳 The long term fix will be implemeted inAdd/Use Protocol Initializer Store in Signer
#362 -
We also had discussions about the
Showcase snapshots/certificate pending on doc website
#315 issue and listed some nice to have features:- Use for the demo with the
devnet
in local website - Have a refresh every
30s
on the first page - Implement responsive design pages
- Use for the demo with the
-
The tickets of the current iteration have been sliced and created in the board
-
We have reviewed and paired on the issue
Add integration tests in Mithril Aggregator
#284. TheAggregatorConfig
struct was wrongly holding a reference to theDependencyManager
which was preventing from using the full features of theDumbImmutableFileObserver
(that will power the newly added tests). -
We have also talked about how the
Showcase
section of the documentation website and the type of information that would be displayed. A first version could showcase:- The
Pending Certificate
if it exists, and the list of the latestSnapshots
on a first page - The
Snapshots
provides a link to the associatedCertificate
details on a new page - The
Certificate
provides a link to thePrevious Certificate
in the chain if it exists
- The
-
We have made a review of the PRs that have been merged during the previous iterations and of the technical debt that we have accumulated so far. We have decided to take some time to lower this debt during the current iteration
-
Here is a list of the issues that have been listed as such:
- Add and use a
Verification Key Store
in the Signer - The previous issue should fix a bug that makes the Signer to not recognize its
Verification Key
in theSigners
list retrieved from thePending Certificate
(and trigger aUnregisteredVerificationKey
error) after a restart (due to the randomness of theVerification Keys
) - Reunite the
BeaconStore
and theBeaconProvider
in the Aggregator (we need to check if we want to remove completely theBeaconStore
) - The previous issue should fix a bug that makes the Aggregator create a new
Pending Certificate
for aBeacon
that already has aCertificate
- A bug that makes the Aggregator disk saturate (because the temp snapshot archive file is not deleted after upload)
- Add and use a
-
We have reviewed the PR
Add certificate chain Aggregator/Signer/Client
#355 in relation withImplement certificate chain Aggregator/Signer/Client
#316 and discussed about some small adjustments that will be done shortly -
We have also reviewed and merged the
Enhance documentation website
#356 with:- The enhanced
Glossary
section of the website - The enhanced
Mithril Certificate Chain in depth
page
- The enhanced
-
We have paired on the bug of the issue
Fix test lab CI flakiness
#352:- A fix to the single signer of the Mithril Signer was applied (concerning the late instantiation of the protocol initializer)
- We fine tuned the runtime intervals of the Signer and Aggregator nodes (which were running with the same cadence and thus was a source of flakiness)
- We made some tests with
2
signers and an epoch offset of-1
and the execution time of the test lab is still very good (~2m 30s) - We will merge with
2
signers and an epoch offset of0
at first (as there are still some unexplained delays in signer registration with a non0
epoch offset) - We have also identified an optimization when producing the CI run attempts artifacts (to separate them clearly). It will be included in this PR
-
We also discussed about the ongoing issues:
-
We have paired on the issue
Optimize single signature in Mithril Aggregator/Signer/Core
#296, on the PRSet indices to be represented as vectors instead of unique
#351 in order to find the best way to deduplicate indices of the single signatures before generating a multi signature. We will continue pairing on this tomorrow.
-
We have talked about solving the flakiness of the test lab in the CI. The solution is under development and the new version of the end to end test runner along with the activation of the epoch offset should work. At the same time, the parameters of the
devnet
are fine tuned in order to keep the fast test execution time. A PRLessen test lab flakyness
#350 has been pushed and will be merged shortly -
The website documentation enhancements has been reviewed in the PR
Enhance documentation website
#349. It will be merged shortly and will deploy the following changes:- Enhanced
Getting Started
pages - Enhanced
Developer Docs > Mithril Network
pages - Reorganized
About Mithril
section with clearMithril Protocol
andMithril Network
menus
- Enhanced
-
We have reviewed the work in progress regarding the integration tests of the Aggregator runtime of this issue
Add integration tests in Mithril Aggregator
#284. We had discussions about the purpose of the tests and decided to use the runtime tests as unit tests and work on a happy path scenario with the full node for the integration test. -
We have reviewed the issue
Cannot sync a cardano-node using latest snapshot on GCP
#344. After investigations, it appears that the issue is linked to the1.35.0
version of the Cardano node and is fixed in the1.35.1
-
We also had discussions about the use of
nigthly
/pre-release
/release
tags (and packages & environments). We will start with thenightly
one -
Also, the CI is very flaky at this time (mainly because the test lab is failing due to using the same epoch for registration and signing). We have decided to activate an epoch offset of
-1
and to work on fine tuning thedevnet
to accelerate the production of immutable files and epochs. This should fix the problem and should be available shortly.
-
We have reviewed and closed the
Enhance runtime state machine Aggregator
#323 issue which will prevent the Aggregator to update the stake distribution too often -
We have also merged some bug fixes and enhancements:
-
We have paired on the
Optimize single signature in Mithril Aggregator/Signer/Core
#296 that should be merged shortly
-
We have paired on getting the project one step further toward open sourcing:
- Creating a service account so that we are autonomous in managing the cloud operations (Aggregator hosting and Terraform on the CI)
- Activating the
Discussions
feature on the repository - Finding how to correctly handling the
latest
tagging of Docker images (such as what has been done onhydra
) - Finding a way to add an automatically renewing SSL certificate to the Aggregator API (with
Let's encrypt
) - Reviewing the new documentation tutorial pages (that need a second pair of eyes and beta testers to verify that they are functional and easy to use)
-
We had discussions about:
-
Upgradable protocol parameters: the Aggregator will keep on broadcasting the
Protocol Parameters
used for the current epoch and they will be stored y the Signers (along with theVerification Keys
for easy retrieval and usage) -
Epoch offsetting strategy: the
-1
and-2
that are used to work with theStake Distribution
and theVerification Keys
are well defined constants that will probably never change (as they provide sufficient security). It is therefore better to use them as hard-coded constants that will be provided at compilation time for the Signer and the Aggregator, than as an information provided at runtime by the Aggregator -
Certificate Chain Verification Requirements: The
multi signatures
embedded in theCertificates
must be verifiable even though the cryptographic library has evolved along the way- The message signed needs to be switched to a
map
format where we are free to add new entries without breaking the chain validation (today only with aimmutable_digest
entry and later with other such asutxo_set
for example) - We could maintain a set of
verifier
functions in the core library for each earlier version (could be cumbersome) - We could add a
verifier
function compiled inWASM
that is stored in the certificate - We could add a
format migration
feature to the certificate chain - We could add
milestone genesis certificates
that would provide a complimentary signature to certificates (produced with thegenesis keys
in the certificate) from time to time (e.g. everyN
epochs or as soon as a break in backward compatibility is introduced in the code) - We could also implement such a mechanism automatically by using the Cardano chain (but that would involve posting a transaction on it)
- The message signed needs to be switched to a
-
Releases packaging: In order to facilitate the distribution of the nodes (particularly to the SPO) and to have a broad adoption of the protocol, we will need to work on deploying packages for each release (
.deb
,.rpm
, ...) with the CI
-
Upgradable protocol parameters: the Aggregator will keep on broadcasting the
-
We have reviewed and merged the following PRs:
-
We have paired on updating the state machine of the Aggregator runtime so that it computes the stake distribution only once for an epoch:
- We have also paired on creating the state machine of the Signer runtime:
- During this pairing session we had many discussions about:
- The usefulness of the
Beacon
used in the certificate pending - The fore coming work that will be done regarding the
Certificate Chain
implementation - And some long term implications of the multiple Aggregators running and what it means on how we compute the multi signatures
- The usefulness of the
-
The tickets of the current iteration have been sliced and created in the board
-
We have reviewed and merged the PR
Improve UI/UX documentation website
#309. The UI/UX review comments have been taken into account in their vast majority. The website content is under redaction and this work will continue during the iteration -
We had a session related to the
Certificate Chain
which goal was to:- Specify which information to embed in the
Genesis Certificate
- Specify which information to embed in the other certificates of the chain
- Define how to link the certificates to each others
- Define how to verify a certificate
- Some questions remain such as:
- Is the Mithril
Epoch 0
an empty epoch (which means no other certificate than the Genesis one will be produced)? - What is the exhaustive list of information that we need to embed in the
Medata(p,n)
group? (AmongCertificate Version
,Protocol Parameters
,Dates
,Signers List
which included their single signature in the multi signature)
- Is the Mithril
- Here is a diagram that summarizes the structure of the chain: (see on
miro
)
- Specify which information to embed in the
- We have paired and merged the last step of retrieving the real Stake Distribution from the Cardano node
Use SD from cardano-cli in Aggregator/Signer
#314 🥳