Releases: digital-asset/canton
canton v2.9.6
Release of Canton 2.9.6
Canton 2.9.6 has been released on January 24, 2025. You can download the Daml Open Source edition from the Daml Connect Github Release Section. The Enterprise edition is available on Artifactory.
Please also consult the full documentation of this release.
Summary
This is a maintenance release that fixes one high and four medium severity issues. Please update during the next maintenance window.
What’s New
Memory check during node startup
A memory check has been introduced when starting the node. This check compares the memory allocated to the container
with the -Xmx JVM option.
The goal is to ensure that the container has sufficient memory to run the application.
To configure the memory check behavior, add one of the following to your configuration:
canton.parameters.startup-memory-check-config = warn // Default behavior: Logs a warning.
canton.parameters.startup-memory-check-config = crash // Terminates the node if the check fails.
canton.parameters.startup-memory-check-config = ignore // Skips the memory check entirely.
Minor Improvements
- Two new metrics have been added that count the number of created and archived contracts observed by a participant.
Contracts created as part of the standard Canton ping workflow are excluded from the tally.
participant_daml_parallel_indexer_creates
participant_daml_parallel_indexer_archivals
- A participant will now crash in exceptional cases during transaction validation instead of remaining in a failed state.
- Disabled the onboarding timeout for participants to support onboarding to domains with very large topology states
without annoying warnings and timeouts. - Removed warnings about failing periodic acknowledgements during initial domain onboarding of participants.
- Removed warnings about unhealthy sequencers during startup.
Bugfixes
(24-028, Medium): ACS export and party replication is broken after hard domain migration
Issue Description
The macros for the various steps for migrating a party look up domain parameters in the topology store, but don't filter
out irrelevant domains. This results in the macro throwing an error because it finds multiple domain parameters after a
hard domain migration, even though one of them comes from an inactive domain.
Affected Deployments
Participant
Affected Versions
2.8.0-2.8.11, 2.9.0-2.9.5
Impact
You cannot migrate a party to or from a participant that went through a hard domain migration.
Symptom
Calling repair.party_migration.step1_store_acs fails with the error "Found more than one (2) domain parameters set for
the given domain and time!".
Workaround
The non-self-service work-around is to not call the party migration macros but replicate what these macros do.
Likeliness
The issue consistently occurs when calling the party migration macros after a hard domain migration.
Recommendation
Upgrade the involved participant nodes to the next patch release: 2.8.12 or 2.9.6.
(24-029, Medium): Domain topology manager gets stuck on too large batches
Issue Description
An off by one check fails in the topology dispatcher of the domain manager as
batches are not limited to N but to N+1, while we check for N.
Affected Deployments
Domain and Domain topology manager nodes.
Affected Versions
All versions before 2.8, 2.8.0-2.8.11, 2.9.0-2.9.5
Impact
Topology transactions stop being propagated through the system.
Symptom
Participants cannot onboard to domains, parties do not appear on the domain, uploaded dars cannot be used.
Workaround
Restart domain topology manager.
Likeliness
Can happen under high topology management load which is rather unusual (adding thousands of parties at full speed).
Recommendation
Update during the next maintenance window.
(25-001, Medium): Newly onboarded participants may compute a wrong topology state during bootstrapping
Issue Description
When a participant is onboarded to a domain, the domain manager will send the topology state to the participant. The
topology state is split into batches of 100. If the state contains an add and a subsequent remove of a topology transaction,
and these two topology transactions are in the same batch (so less than 100 transactions apart), but the namespace certificate
or identifier delegation is in a previous batch, then the participant will miss the removal of the topology transaction.
In the common cases, the namespace delegation is always followed by a subsequent add, but it can happen.
Affected Deployments
Participant
Affected Versions
All versions before 2.8, 2.8.0-2.8.11, 2.9.0-2.9.5
Impact
Depends on the type of topology transaction, but the result is a fork in the topology state, which in a rare but theoretically
possible case (observer nodes and participant using previously removed parties) might create a ledger fork, leading to
participants disconnecting from the domain.
Symptom
If the missed transaction was a mediator domain state, then the participant will fail to submit transactions whenever it
randomly selects the non-existent mediator.
Workaround
No workaround available. Manually repairing the topology state is likely possible, but not recommended.
Likeliness
Happens deterministically if the conditions are met, but the conditions are rare and require a specific sequence of
events with removal of topology state.
Recommendation
Upgrade before removing topology state (disabling parties, rolling keys) or onboarding a new participant to a domain
with a larger number of topology transactions that includes removals.
(25-002, Medium): Intermediate certificate renewal will delete topology state
Issue Description
A Canton node uses topology keys to sign topology transactions. The ultimate trust is tied to the root node key,
which by default is held by the node, but can be moved offline. In such a case, the node may use an intermediate
certificate to manage the topology state. In order to renew such intermediate certificates, the topology state needs
to be re-issued in 2.x, which can be done using the convenience function node.topology.all.renew(oldKey, newKey)
.
The convenience function contains an error that will instead of renewing the topology state, delete topology transactions
of the type party to participant
, mediator domain state
and participant domain state
(the ones that contain the
replaceExisting
flag).
Affected Deployments
Domain, Domain manager, Participant nodes.
Affected Versions
All versions before 2.8, 2.8.0-2.8.11, 2.9.0-2.9.5
Impact
Some of the topology state will be removed after running this operation.
Symptom
Parties, participants and mediators will be missing after running the operation.
Workaround
Manually re-add the missing parties, participants and mediators.
Likeliness
Deterministic if the convenience function is used.
Recommendation
Upgrade before renewing intermediate certificates.
(25-003, High): Identifier delegation cannot be renewed
Issue Description
A Canton node uses topology keys to sign topology transactions. The ultimate trust is tied to the root node key,
which by default is held by the node, but can be moved offline. In such a case, the node may use an intermediate
certificate to manage the topology state. If such an intermediate certificate is used to sign an identifier delegation
(used as an intermediate certificate for a specific uid), then the identifier delegation cannot be renewed,
as the renewal operation will remove the old and the new certificate from the in-memory state. Unfortunately,
after a restart, the certificate could be loaded again which can cause a ledger fork.
Affected Deployments
Domain, Domain manager, Participant nodes.
Affected Versions
All versions before 2.8, 2.8.0-2.8.11, 2.9.0-2.9.5
Impact
The topology state signed with a particular key authorized by an identifier delegation will be removed from the state,
and the key cannot be used to sign new transactions. After a restart of a node, the key would be loaded again, leading
to a possible ledger fork.
Symptom
Topology state missing after an intermediate certificate renewal, with a possible subsequent ledger fork after a restart.
Workaround
Theoretically issue a new identifier delegation for a new key and re-create the topology state. In practice, upgrade
all nodes before renewing intermediate certificates.
Likeliness
Deterministic if several intermediate certificates are used and one of them is rolled in the chain.
Recommendation
Update all nodes to a version with a fix before renewing intermediate certificates.
Compatibility
The following Canton protocol versions are supported:
Dependency | Version |
---|---|
Canton protocol versions | 5 |
Canton has been tested against the following versions of its dependencies:
Dependency | Version |
---|---|
Java Runtime | OpenJDK 64-Bit Server VM Zulu11.72+19-CA (build 11.0.23+9-LTS, mixed mode) |
Postgres | Recommended: PostgreSQL 12.22 (Debian 12.22-1.pgdg120+1) – Also tested: PostgreSQL 11.16 (Debian 11.16-1.pgdg90+1), PostgreSQL 13.18 (Debian 13.18-1.pgdg120+1), PostgreSQL 14.15 (Debian 14.15-1.pgdg120+1), PostgreSQL 15.10 (Debian 15.10-1.pgdg120+1) |
Oracle | 19.20.0 |
canton v2.8.12
Release of Canton 2.8.12
Canton 2.8.12 has been released on January 24, 2025. You can download the Daml Open Source edition from the Daml Connect Github Release Section. The Enterprise edition is available on Artifactory.
Please also consult the full documentation of this release.
Summary
This is a maintenance release that fixes one high and five medium severity issues. Please update during the next maintenance window.
What’s New
Memory check during node startup
A memory check has been introduced when starting the node. This check compares the memory allocated to the container
with the -Xmx JVM option.
The goal is to ensure that the container has sufficient memory to run the application.
To configure the memory check behavior, add one of the following to your configuration:
canton.parameters.startup-memory-check-config = warn // Default behavior: Logs a warning.
canton.parameters.startup-memory-check-config = crash // Terminates the node if the check fails.
canton.parameters.startup-memory-check-config = ignore // Skips the memory check entirely.
Minor Improvements
- Fixed one issue preventing a participant to connect to an old domain even if they support a common protocol version.
- Fixed a minor issue where the
validUntil
time of the topology transaction results was incorrectly set tovalidFrom
on the console client side. - Disabled the onboarding timeout for participants to support onboarding to domains with very large topology states
without annoying warnings and timeouts. - Removed warnings about failing periodic acknowledgements during initial domain onboarding of participants.
- Removed warnings about unhealthy sequencers during startup.
Bugfixes
(24-022, Medium): Participant replica does not clear package service cache
Issue Description
When a participant replica becomes active, it does not refresh the package dependency cache. If a vetting attempt is
made on the participant that fails because the package is not uploaded, the "missing package" response is cached.
If the package is then uploaded to another replica, and we switch to the original participant, this package service
cache will still record the package as nonexistent. When the package is used in a transaction, we will get a local
model conformance error as the transaction validator cannot find the package, whereas other parts of the participant
that don't use the package service can successfully locate it.
Affected Deployments
Participant
Affected Versions
2.8.0-2.8.11, 2.9.0-2.9.4
Impact
Replica crashes during transaction validation.
Symptom
Validating participant emits warning:
LOCAL_VERDICT_FAILED_MODEL_CONFORMANCE_CHECK(5,a2b60642): Rejected transaction due to a failed model conformance check: UnvettedPackages
And then emits an error:
An internal error has occurred.
java.lang.IllegalStateException: Mediator approved a request that we have locally rejected
Workaround
Restart recently active replica.
Likeliness
Likely to happen in any replicated participant setup with frequent vetting attempts and switches between active and
passive replicated participants between those vetting attempts.
Recommendation
Users are advised to upgrade to the next patch release during their maintenance window.
(24-028, Medium): ACS export and party replication is broken after hard domain migration
Issue Description
The macros for the various steps for migrating a party look up domain parameters in the topology store, but don't filter
out irrelevant domains. This results in the macro throwing an error because it finds multiple domain parameters after a
hard domain migration, even though one of them comes from an inactive domain.
Affected Deployments
Participant
Affected Versions
2.8.0-2.8.11, 2.9.0-2.9.5
Impact
You cannot migrate a party to or from a participant that went through a hard domain migration.
Symptom
Calling repair.party_migration.step1_store_acs fails with the error "Found more than one (2) domain parameters set for
the given domain and time!".
Workaround
The non-self-service work-around is to not call the party migration macros but replicate what these macros do.
Likeliness
The issue consistently occurs when calling the party migration macros after a hard domain migration.
Recommendation
Upgrade the involved participant nodes to the next patch release: 2.8.12 or 2.9.6.
(24-029, Medium): Domain topology manager gets stuck on too large batches
Issue Description
An off by one check fails in the topology dispatcher of the domain manager as
batches are not limited to N but to N+1, while we check for N.
Affected Deployments
Domain and Domain topology manager nodes.
Affected Versions
All versions before 2.8, 2.8.0-2.8.11, 2.9.0-2.9.5
Impact
Topology transactions stop being propagated through the system.
Symptom
Participants cannot onboard to domains, parties do not appear on the domain, uploaded dars cannot be used.
Workaround
Restart domain topology manager.
Likeliness
Can happen under high topology management load which is rather unusual (adding thousands of parties at full speed).
Recommendation
Update during the next maintenance window.
(25-001, Medium): Newly onboarded participants may compute a wrong topology state during bootstrapping
Issue Description
When a participant is onboarded to a domain, the domain manager will send the topology state to the participant. The
topology state is split into batches of 100. If the state contains an add and a subsequent remove of a topology transaction,
and these two topology transactions are in the same batch (so less than 100 transactions apart), but the namespace certificate
or identifier delegation is in a previous batch, then the participant will miss the removal of the topology transaction.
In the common cases, the namespace delegation is always followed by a subsequent add, but it can happen.
Affected Deployments
Participant
Affected Versions
All versions before 2.8, 2.8.0-2.8.11, 2.9.0-2.9.5
Impact
Depends on the type of topology transaction, but the result is a fork in the topology state, which in a rare but theoretically
possible case (observer nodes and participant using previously removed parties) might create a ledger fork, leading to
participants disconnecting from the domain.
Symptom
If the missed transaction was a mediator domain state, then the participant will fail to submit transactions whenever it
randomly selects the non-existent mediator.
Workaround
No workaround available. Manually repairing the topology state is likely possible, but not recommended.
Likeliness
Happens deterministically if the conditions are met, but the conditions are rare and require a specific sequence of
events with removal of topology state.
Recommendation
Upgrade before removing topology state (disabling parties, rolling keys) or onboarding a new participant to a domain
with a larger number of topology transactions that includes removals.
(25-002, Medium): Intermediate certificate renewal will delete topology state
Issue Description
A Canton node uses topology keys to sign topology transactions. The ultimate trust is tied to the root node key,
which by default is held by the node, but can be moved offline. In such a case, the node may use an intermediate
certificate to manage the topology state. In order to renew such intermediate certificates, the topology state needs
to be re-issued in 2.x, which can be done using the convenience function node.topology.all.renew(oldKey, newKey)
.
The convenience function contains an error that will instead of renewing the topology state, delete topology transactions
of the type party to participant
, mediator domain state
and participant domain state
(the ones that contain the
replaceExisting
flag).
Affected Deployments
Domain, Domain manager, Participant nodes.
Affected Versions
All versions before 2.8, 2.8.0-2.8.11, 2.9.0-2.9.5
Impact
Some of the topology state will be removed after running this operation.
Symptom
Parties, participants and mediators will be missing after running the operation.
Workaround
Manually re-add the missing parties, participants and mediators.
Likeliness
Deterministic if the convenience function is used.
Recommendation
Upgrade before renewing intermediate certificates.
(25-003, High): Identifier delegation cannot be renewed
Issue Description
A Canton node uses topology keys to sign topology transactions. The ultimate trust is tied to the root node key,
which by default is held by the node, but can be moved offline. In such a case, the node may use an intermediate
certificate to manage the topology state. If such an intermediate certificate is used to sign an identifier delegation
(used as an intermediate certificate for a specific uid), then the identifier delegation cannot be renewed,
as the renewal operation will remove the old and the new certificate from the in-memory state. Unfortunately,
after a restart, the certificate could be loaded again which can cause a ledger fork.
Affected Deployments
Domain, Domain manager, Participant nodes.
Affected Versions
All versions before 2.8, 2.8.0-2.8.11, 2.9.0-2.9.5
Impact
The topology state signed with a particular key authorized by an identifier delegation will be removed from the state,
and the key cannot be used to sign new transactions. After a restart of a node, the key would be loaded again, leading
to a possible ledger fork.
Symptom
Topology state missing after an intermediate certificate renewal, with a possible subsequent ledger f...
canton v2.10.0-rc2
Release candidates such as 2.10.0-rc2 don't come with release notes
canton v2.10.0-rc1
Release candidates such as 2.10.0-rc1 don't come with release notes
canton v2.8.11
Release of Canton 2.8.11
Canton 2.8.11 has been released on November 26, 2024. You can download the Daml Open Source edition from the Daml Connect Github Release Section. The Enterprise edition is available on Artifactory.
Please also consult the full documentation of this release.
Summary
This is a maintenance release that provides performance improvements and fixes minor bugs.
What’s New
Minor Improvements
- Two new metrics have been added that count the number of created and archived contracts observed by a participant.
Contracts created as part of the standard Canton ping workflow are excluded from the tally.
participant_daml_parallel_indexer_creates
participant_daml_parallel_indexer_archivals
- Two more metrics have been added to the db storage metrics:
exectime
andload
to capture the execution time and load
of the database storage pool. - We added batch insertion to the single dimension event log to reduce the database load and improve performance.
- We reduced latency on the sequencer for processing and sequencing events from other nodes.
Node's Exit on Fatal Failures
Since v2.8.4 when a node encounters a fatal failure that Canton cannot handle gracefully yet, the node will exit/stop the process and relies on an external process or service monitor to restart the node's process.
Now a node also exits on failed transition from a passive replica to an active replica, which may result in an invalid state of the node.
The crashing on fatal failures can be reverted by setting: canton.parameters.exit-on-fatal-failures = false
in the configuration.
Bugfixes
(24-027, Low): Bootstrap of the domain fails if the mediator or sequencer share the same key as the domain manager
Issue Description
Domain bootstrapping fails with a KeyAlreadyExists
error when the signing key is shared between the mediator/sequencer
and the domain manager.
Impact
You cannot bootstrap a domain when the signing key is shared between the domain manager and mediator or sequencer nodes.
Symptom
After calling bootstrap_domain
we get a KeyAlreadyExists
error.
Workaround
Use different signing keys for the mediator, sequencer and the domain manager.
Likeliness
This issue consistently occurs whenever we attempt to bootstrap a domain where the domain manager's signing key is shared with the mediator or the sequencer.
Recommendation
Upgrade to 2.8.11 when affected by this limitation.
(24-025, Low): Commands for single key rotation for sequencer and mediator node fail
Issue Description
The current commands for single key rotation with sequencer and mediator nodes (rotate_node_key
and rotate_kms_node_key
) fail because they do not have the necessary domain manager reference needed to find
the old key and export the new key.
Affected Deployments
Sequencer and mediator nodes
Affected Versions
All 2.3-2.7, 2.8.0-2.8.10, 2.9.0-2.9.4
Impact
Key rotation for individual keys with sequencer or mediator nodes cannot be performed using the provided commands.
Symptom
Current single key rotation for sequencer and mediator, with commands rotate_node_key
and
rotate_kms_node_key
, fails with an IllegalStateException: key xyz does not exist
.
Workaround
Use the domain manager to rotate a mediator or sequencer key, or use the rotate_node_keys
command
with a domain manager reference to rotate all keys.
Likeliness
This issue consistently occurs when trying to rotate keys individually with sequencer or mediator nodes in
a distributed environment.
Recommendation
Upgrade to 2.8.11 when affected, and run the rotate_node_key
and rotate_kms_node_key
commands with a reference to the
domain topology manager to successfully perform the rotation.
(24-021, Medium): Participant replica fails to become active
Issue Description
A participant replica fails to become active under certain database network conditions. The previously active replica fails to fully transition to passive due to blocked database connection health checks, which leaves the other replica to fail to transition to active. Eventually the database health checks get unblocked and the replica transitions to passive, but the other replica does not recover from the previous active transition failure, which leaves both replicas passive.
Affected Deployments
Participant
Affected Versions
All 2.3-2.7
2.8.0-2.8.10
2.9.0-2.9.4
Impact
Both participant replicas remain passive and do not serve transactions.
Symptom
The transition to active failed on a participant due to maximum retries exhausted:
2024-09-02T07:08:56,178Z participant2 [c.d.c.r.DbStorageMulti:participant=participant1] [canton-env-ec-36] ERROR dd:[ ] c.d.c.r.DbStorageMulti:participant=participant1 tid:effa59a8f7ddec2e132079f2a4bd9885 - Failed to transition replica state
com.daml.timer.RetryStrategy$TooManyAttemptsException: Gave up trying after Some(3000) attempts and 300.701142545 seconds.
Workaround
Restart both replicas of the participant
Likeliness
Possible under specific database connection issues
Recommendation
Upgrade to the next patch release during regular maintenance window.
Compatibility
The following Canton protocol versions are supported:
Dependency | Version |
---|---|
Canton protocol versions | 3, 4, 5 |
Canton has been tested against the following versions of its dependencies:
Dependency | Version |
---|---|
Java Runtime | OpenJDK 64-Bit Server VM Zulu11.70+15-CA (build 11.0.22+7-LTS, mixed mode) |
Postgres | Recommended: PostgreSQL 12.22 (Debian 12.22-1.pgdg120+1) – Also tested: PostgreSQL 11.16 (Debian 11.16-1.pgdg90+1), PostgreSQL 13.18 (Debian 13.18-1.pgdg120+1), PostgreSQL 14.15 (Debian 14.15-1.pgdg120+1), PostgreSQL 15.10 (Debian 15.10-1.pgdg120+1) |
Oracle | 19.20.0 |
canton v2.9.5
Release of Canton 2.9.5
Canton 2.9.5 has been released on October 22, 2024. You can download the Daml Open Source edition from the Daml Connect Github Release Section. The Enterprise edition is available on Artifactory.
Please also consult the full documentation of this release.
Summary
This is a maintenance release of Canton that fixes bugs, including two critical bugs that can corrupt the state of a participant node when retroactive interfaces or migrated contracts from protocol version 3 are used.
Bugfixes
(24-020, Critical): Participant crashes due to retroactive interface validation
Description
The view reinterpreation of an exercise of a retroactive interface may fail because the engine does not explicitliy request the interface package. This can lead to a ledger fork as participants come to different conclusions.
Affected Deployments
Participant
Affected Versions
2.5, 2.6, 2.7, 2.8.0-2.8.9, 2.9.0-2.9.4
Impact
A participant crashes during transaction validation when using retroactive interfaces.
Symptom
"Validating participant emits warning:
Workaround
None
Likeliness
Very likely for all multi participant setups that uses retroactive interface instances.
Recommendation
Upgrade to 2.9.5
LOCAL_VERDICT_FAILED_MODEL_CONFORMANCE_CHECK(5,571d2e8a): Rejected transaction due to a failed model conformance check: DAMLeError(
Preprocessing(
Lookup(
NotFound(
Package(
And then emits an error:
An internal error has occurred.
java.lang.IllegalStateException: Mediator approved a request that we have locally rejected
Workaround
None
Likeliness
Very likely for all multi participant setups that uses retroactive interface instances.
Recommendation
Upgrade to 2.9.5
(24-024, Critical): Participant incorrectly handles unauthenticated contract IDs in PV5
Issue Description
Contracts created on participants running PV3 have an unauthenticated contract ID. When these participants are upgraded to PV5 without setting the allow-for-unauthenticated-contract-ids
flag to true, any submitted transaction that uses such unauthenticated contract IDs will produce warnings during validation, but also put the participants in an incorrect state. From then on, the participant will not output any ledger events any more and fail to reconnect to the domain.
Affected Deployments
Participant
Affected Versions
2.9.0-2.9.4
Impact
The participant is left in a failed state.
Symptom
Connecting to the domain fails with an internal error IllegalStateException: Cannot find event for sequenced in-flight submission
.
The participant does not emit any ledger events any more.
Workaround
No workaround by clients possible. Support and engineering can try to fix the participants by modifying the participant's database tables.
Likeliness
Needs a submission request using a contract with unauthenticated contract ID. This can only happen for participants who have been migrated from using PV3 to PV5, and have not set the flag to allow unauthenticated contracts on all involved participants.
Recommendation
Upgrade during the next maintenance window to a version with the fix.
If an upgrade is not possible and old contracts from PV3 are used, enable the allow-for-unauthenticated-contract-ids
flag on all the participants.
(24-026, High): Hard Synchronization Domain Migration fails to check for in-flight transactions
Issue Description
Since 2.9.0, the Hard Synchronization Domain Migration command repair.migrate_domain
aborts when it detects in-flight submissions on the participant. However, it should also check for in-flight transactions.
Affected Deployments
Participant
Affected Versions
2.9.0-2.9.4
Impact
Performing a Hard Synchronization Domain Migration while there are still in-flight submissions and transactions may result in a ledger-fork.
Symptom
Ledger-fork after running the Hard Synchronization Domain Migration command repair.migrate_domain
that may result in ACS commitment mismatches.
Workaround
Follow the documented steps, in particular ensure that there is no activity on all participants before proceeding with a Hard Synchronization Domain Migration.
Likeliness
The bug only manifests when the operator skips the documented step for the Hard Synchronization Domain Migration to ensure that there is no activity on all participants anymore in combination with still having in-flight transactions when the migration executes.
Recommendation
Upgrade to 2.9.5 to properly safe-guard against running the Hard Synchronization Domain Migration command repair.migrate_domain
while there are still in-flight submissions or transactions.
(24-021, Medium): Participant replica fails to become active
Issue Description
A participant replica fails to become active under certain database network conditions. The previously active replica fails to fully transition to passive due to blocked database connection health checks, which leaves the other replica to fail to transition to active. Eventually the database health checks get unblocked and the replica transitions to passive, but the other replica does not recover from the previous active transition failure, which leaves both replicas passive.
Affected Deployments
Participant
Affected Versions
All 2.3-2.7
2.8.0-2.8.10
2.9.0-2.9.4
Impact
Both participant replicas remain passive and do not serve transactions.
Symptom
The transition to active failed on a participant due to maximum retries exhausted:
2024-09-02T07:08:56,178Z participant2 [c.d.c.r.DbStorageMulti:participant=participant1] [canton-env-ec-36] ERROR dd:[ ] c.d.c.r.DbStorageMulti:participant=participant1 tid:effa59a8f7ddec2e132079f2a4bd9885 - Failed to transition replica state
com.daml.timer.RetryStrategy$TooManyAttemptsException: Gave up trying after Some(3000) attempts and 300.701142545 seconds.
Workaround
Restart both replicas of the participant
Likeliness
Possible under specific database connection issues
Recommendation
Upgrade to the next patch release during regular maintenance window.
(24-022, Medium): Participant replica does not clear package service cache
Issue Description
When a participant replica becomes active, it does not refresh its package service cache. If a vetting attempt is made on the participant that fails because the package is not uploaded, the "missing package" response is cached. If the package is then uploaded to another replica, and we switch to the original participant, this package service cache will still record the package as nonexistent. When the package is used in a transaction, we will get a local model conformance error as the transaction validator cannot find the package, whereas other parts of the participant that don't use the package service can successfully locate it.
Affected Deployments
Participant
Affected Versions
2.8.0-2.8.10, 2.9.0-2.9.4
Impact
Replica crashes during transaction validation.
Symptom
Validating participant emits warning:
LOCAL_VERDICT_FAILED_MODEL_CONFORMANCE_CHECK(5,a2b60642): Rejected transaction due to a failed model conformance check: UnvettedPackages
And then emits an error:
An internal error has occurred.
java.lang.IllegalStateException: Mediator approved a request that we have locally rejected
Workaround
Restart recently active replica
Likeliness
Likely to happen in any replicated participant setup with frequent vetting attempts and switches between active and passive replicated participants between those vetting attempts.
Recommendation
Users are advised to upgrade to the next patch release (2.9.5) during their maintenance window.
(24-023, Low): Participant fails to start if quickly acquiring and then losing DB connection during bootstrap
Issue Description
When a participant starts up and acquires the active lock, the participant replica initializes its storage and begins its bootstrap logic. If during the bootstrap logic and before the replica attempts to initializate its identity, the replica loses the DB connection, bootstrapping will be halted until its identity is initialized by another replica or re-acquires the lock. When the lock is lost, the replica manager will attempt to transition the participant state to passive, which assumes the participant has been initialized fully, which in this case it hasn't. Therefore the passive transition waits indefinitely.
Affected Deployments
Participant
Affected Versions
2.8.0-2.8.10, 2.9.0-2.9.4
Impact
Replica gets stuck transitioning to passive state during bootstrap.
Symptom
Participant keeps emitting info logs as follows indefinitely
Replica state update to Passive has not completed after
Workaround
Restart the node
Likeliness
Exceptional, requires acquiring then losing the DB connection with a precise timing during bootstrap of the node.
Recommendation
Users are advised to upgrade to the next patch release (2.9.5) during their maintenance window.
(24-025, Low): Commands for single key rotation for sequencer and mediator node fail
Description
The current commands for single key rotation with sequencer and mediator nodes (rotate_node_key
and rotate_kms_node_key
) fail because they do not have the necessary domain manager reference needed to find
the old key and export the new key.
Affected Deployments
Sequencer and mediator nodes
Affected Versions
All 2.3-2.7, 2.8.0-2.8.10, 2.9.0-2.9.4
Impact
Key rotation for individual keys with sequencer or mediator nodes c...
canton v2.8.10
Release of Canton 2.8.10
Canton 2.8.10 has been released on September 16, 2024. You can download the Daml Open Source edition from the Daml Connect Github Release Section. The Enterprise edition is available on Artifactory.
Please also consult the full documentation of this release.
Summary
This is a maintenance release that fixes a critical bug for retroactive interfaces.
Bugfixes
(24-020, Critical): Participant crashes due to retroactive interface validation
Description
The view reinterpretation of an exercise of a retroactive interface may fail because the engine does not explicitly request the interface package. This can lead to a ledger fork as participants come to different conclusions.
Affected Deployments
Participant
Affected Versions
2.5, 2.6, 2.7, 2.8.0-2.8.9
Impact
A participant crashes during transaction validation when using retroactive interfaces.
Symptom
Validating participant emits warning:
LOCAL_VERDICT_FAILED_MODEL_CONFORMANCE_CHECK(5,571d2e8a): Rejected transaction due to a failed model conformance check: DAMLeError(
Preprocessing(
Lookup(
NotFound(
Package(
And then emits an error:
An internal error has occurred.
java.lang.IllegalStateException: Mediator approved a request that we have locally rejected
Workaround
None
Likeliness
Very likely for all multi participant setups that uses retroactive interface instances.
Recommendation
Upgrade to 2.8.10
Compatibility
The following Canton protocol versions are supported:
Dependency | Version |
---|---|
Canton protocol versions | 3, 4, 5 |
Canton has been tested against the following versions of its dependencies:
Dependency | Version |
---|---|
Java Runtime | OpenJDK 64-Bit Server VM Zulu11.70+15-CA (build 11.0.22+7-LTS, mixed mode) |
Postgres | Recommended: PostgreSQL 12.20 (Debian 12.20-1.pgdg120+1) – Also tested: PostgreSQL 11.16 (Debian 11.16-1.pgdg90+1), PostgreSQL 13.16 (Debian 13.16-1.pgdg120+1), PostgreSQL 14.13 (Debian 14.13-1.pgdg120+1), PostgreSQL 15.8 (Debian 15.8-1.pgdg120+1) |
Oracle | 19.20.0 |
canton v2.9.4
Release of Canton 2.9.4
Canton 2.9.4 has been released on August 23, 2024. You can download the Daml Open Source edition from the Daml Connect Github Release Section. The Enterprise edition is available on Artifactory.
Please also consult the full documentation of this release.
Summary
- Protocol version 6 has had its status changed from "Beta" to "Unstable" due to a number of rare, but grave bugs in the new beta smart contract upgrading feature
- Minor improvements around logging and DAR upload validation
What’s New
Protocol Version 6 Marked as Unstable
Background
In Daml 2.9 we released a smart contract upgrading feature in Beta. Underlying the feature are a new protocol version (6), and a new Daml-LF version (1.16) that were also released in Beta status.
Beta status is intended to designate features that do not yet have full backwards compatibility guarantees, or may still have some limitations, but are ready to be supported for select customers under an "initial availability" arrangement.
A number of rare, but grave bugs in the new beta smart contract upgrading feature have been discovered during internal testing and will require breaking changes at the protocol level to fix. As a consequence data continuity will be broken in the sense that smart contracts created on protocol version 6 in 2.9.1-2.9.4 will not be readable in future versions.
The 2.9 release as a whole is robust and functional. Only Beta features are affected.
Specific Changes
To prevent any accidental corruption of prod, or even pre-prod systems, protocol version 6 has had its status changed from "Beta" to "Unstable" to clearly designate that it do not have appropriate guarantees.
Impact and Migration
Customers who are not using beta features or protocol version 6 can continue to use the 2.9 release. Customers using beta features are advised to move their testing of these features to the 2.10 release line.
To continue to use the beta features in 2.9.4 it will be necessary to enable support for unstable features.
See the user manual section on how to enable unsupported features to find out how this is done.
Minor Improvements
- Fix one issue preventing a participant to connect to an old domain even if they support a common protocol version.
- Startup errors due to TLS issues / misconfigurations are now correctly logged via the regular canton logging tooling instead of appearing only on stdout.
- Added extra validation to prevent malformed DARs from being uploaded
Compatibility
The following Canton protocol and Ethereum sequencer contract versions are supported:
Dependency | Version |
---|---|
Canton protocol versions | 5 |
Canton has been tested against the following versions of its dependencies:
Dependency | Version |
---|---|
Java Runtime | OpenJDK 64-Bit Server VM Zulu11.72+19-CA (build 11.0.23+9-LTS, mixed mode) |
Postgres | Recommended: PostgreSQL 12.20 (Debian 12.20-1.pgdg120+1) – Also tested: PostgreSQL 11.16 (Debian 11.16-1.pgdg90+1), PostgreSQL 13.16 (Debian 13.16-1.pgdg120+1), PostgreSQL 14.13 (Debian 14.13-1.pgdg120+1), PostgreSQL 15.8 (Debian 15.8-1.pgdg120+1) |
Oracle | 19.20.0 |
canton v2.9.3
Release of Canton 2.9.3
Canton 2.9.3 has been released on July 22, 2024. You can download the Daml Open Source edition from the Daml Connect Github Release Section. The Enterprise edition is available on Artifactory.
Please also consult the full documentation of this release.
Summary
This is a maintenance release of Canton that fixes one high risk bug, which can
crash a participant node due to out of memory, and two low risk bugs.
Bugfixes
(24-017, High): Participants crash with an OutOfMemoryError
Description
The TaskScheduler keeps a huge number of tasks into a queue. The queue has been newly introduced. Therefore the memory comsumption (HEAP) is much higher than in previous versions. The queue size is proportional to the number of requests processed during the decision time.
Affected Deployments
Participant
Impact
Memory consumption is much higher than in previous Canton versions.
Symptom
The participant crashes with an OutOfMemoryError.
Workaround
Test the participant under load, increase the heap size accordingly. If possible, decrease confirmation response timeout and mediator reaction timeout.
Likeliness
High likelihood under high load and with large confirmation response and mediator reaction timeouts.
Recommendation
Upgrade to 2.9.3.
(24-018, Low): Participants log "ERROR: The check for missing ticks has failed unexpectedly"
Description
The TaskScheduler monitoring crashes and logs an Error.
Affected Deployments
Participant
Impact
The monitoring of the task scheduler crashes.
Symptom
You see an error in the logs: ERROR: The check for missing ticks has failed unexpectedly.
.
Workaround
If you need the monitoring to trouble-shoot missing ticks, restart the participant to restart the monitoring.
Likeliness
This will eventually occur on every system.
Recommendation
Ignore the message until upgrading to 2.9.3.
(24-015, Low): Pointwise flat transaction Ledger API queries can unexpectedly return TRANSACTION_NOT_FOUND
Description
When a party submits a command that has no events for contracts whose stakeholders are amongst the submitters, the resulted transaction cannot be queried by pointwise flat transaction Ledger API queries. This impacts GetTransactionById, GetTransactionByEventId and SubmitAndWaitForTransaction gRPC endpoints.
Affected Deployments
Participant
Impact
User might perceive that a command was not successful even if it was.
Symptom
TRANSACTION_NOT_FOUND
is returned on a query that is expected to succeed.
Workaround
Query instead the transaction tree by transaction-id to get the transaction details.
Likeliness
Lower likelihood as commands usually have events whose contracts' stakeholders are amongst the submitting parties.
Recommendation
Users are advised to upgrade to the next patch release during their maintenance window.
Compatibility
The following Canton protocol and Ethereum sequencer contract versions are supported:
Dependency | Version |
---|---|
Canton protocol versions | 5, 6* |
Canton has been tested against the following versions of its dependencies:
Dependency | Version |
---|---|
Java Runtime | OpenJDK 64-Bit Server VM Zulu11.72+19-CA (build 11.0.23+9-LTS, mixed mode) |
Postgres | Recommended: PostgreSQL 12.19 (Debian 12.19-1.pgdg120+1) – Also tested: PostgreSQL 11.16 (Debian 11.16-1.pgdg90+1), PostgreSQL 13.15 (Debian 13.15-1.pgdg120+1), PostgreSQL 14.12 (Debian 14.12-1.pgdg120+1), PostgreSQL 15.7 (Debian 15.7-1.pgdg120+1) |
Oracle | 19.20.0 |
canton v2.9.1
Release of Canton 2.9.1
Canton 2.9.1 has been released on July 15, 2024. You can download the Daml Open Source edition from the Daml Connect Github Release Section. The Enterprise edition is available on Artifactory.
Please also consult the full documentation of this release.
Summary
We are excited to announce Canton 2.9.1, which offers additional features and
improvements:
- KMS drivers (Beta)
- support for smart contract upgrades (Beta)
- operational improvements around monitoring, liveness, and logging
See below for details.
What’s New
Breaking: Protocol version should be set explicitly
Until now, the configuration of a domain was picking the latest protocol version by default.
Since the protocol version is an important parameter of the domain, having this value set behind
the scenes caused unwanted behavior.
You now must specify the protocol version for your domain:
myDomain {
init.domain-parameters.protocol-version = 5
}
For a domain manager:
domainManager {
init.domain-parameters.protocol-version = 5
}
You can read more about protocol version in the docs.
If you are unsure which protocol version to pick:
- Use the last one supported by your binary (see docs).
- Ensure all your environments use the same protocol version: you should not use one protocol version in
your test environment and another one in production.
Breaking: Protocol version 3 and 4 discontinuation
This Canton version requires protocol version at least 5.
If your domain is running protocol version 5, you can replace the binaries and apply the database migrations.
If you have a domain running protocol version 3 or 4, you first need to bootstrap a new domain running protocol version
at least 5 and then perform a hard domain migration.
Upgrading instructions can be found in the documentation:
upgrading
manual
KMS Drivers
The Canton protocol relies on a number of cryptographic operations such as
asymmetric encryption and digital signatures. To maximize the operational
security of a Canton node the corresponding private keys should not be stored or
processed in cleartext. A Key Management System (KMS) or Hardware Security
Module (HSM) allows us to perform such cryptographic operations where the
private key resides securely inside the KMS/HSM. All nodes in Canton can make
use of a KMS.
AWS KMS and Google Cloud KMS are supported as of Canton v2.7. To broaden the
support of other KMSs and HSMs, Canton v2.9 introduces a plugin approach, called
KMS Drivers, which allows the implementation of custom integrations. You can
find more information on how to develop a KMS driver in the KMS Driver Guide.
Smart Contract Updates
The feature allows Daml models (packages in DAR files) to be updated on Canton
transparently, provided some guidelines in making the changes are followed. For
example, you can fix an application bug by uploading the DAR of the fixed
package. This is a Beta feature that requires LF 1.16 & Canton Protocol version
6. Please refer to the Daml enterprise release
notes
for more information on this feature.
Mediator liveness health service and built-in watchdog
Previously a mediator node that had irrecoverably lost its connection to a Canton domain
would not exit and would continue to report SERVING
on liveness
health endpoint.
This lead to mediator nodes not being able to automatically recover from unexpected failures.
Now sequencer connection status of a mediator node is connected to the liveness
health endpoint,
allowing for external monitoring and automated intervention (i.e. by setting up k8s
liveness probes).
Additionally, for systems not using k8s
, it is possible to enable a built-in node watchdog that will monitor
liveness
health endpoint and will forcefully make the node exit if it's no longer alive.
By default, the watchdog is disabled and can be enabled by setting the following configuration:
canton.mediators.<mediator_node>.parameters.watchdog = {
enabled = true
checkInterval = 15s // default value
killDelay = 30s // default value
}
Configuration parameters are:
checkInterval
- interval at which the watchdog will check theliveness
health endpointkillDelay
- delay after the watchdog has detected that the node is no longer alive
before it forcefully exits the node
Paging in Party Management
Background
Being able to retrieve all parties known to a participant in a paging fashion has been a frequently requested feature. When the number of parties on a participant exceeds tens of thousands, trying to deliver them all in a single message can present many challenges: corresponding db operation can take a long time, internal memory buffers within the participant or the client application can be exhausted, finally, the maximum size of the gRPC message can be exceeded. In extreme cases this could lead to an OOM crash.
Specific Changes
The ListKnownParties
method on the PartyManagementService
now takes two additional parameters. The new page_size
field determines the maximum number of results to be returned by the server. The new page_token
field on the other hand is a continuation token that signals to the server to fetch the next page containing the results. Each ListKnownPartiesResponse
response contains a page of parties and a next_page_token
field that can be used to populate the page_token
field for a subsequent request. When the last page is reached, the next_page_token
is empty. The parties on each page are sorted in ascending order according to their ids. The pages themselves are sorted as well.
The GetLedgerApiVersion
method of the VersionService
contains new features.party_management
field within the returned GetLedgerApiVersionResponse
message. It describes the capabilities of the party management through a sub-message called PartyManagementFeature
. At the moment it contains just one field the max_parties_page_size
which specifies the maximum number of parties that will be sent per page by default.
Configuration
The default maximum size of the page returned by the participant in response to the ListKnownParties
call has been set to 10'000. It can be modified through the max-parties-page-size
entry
canton.participants.participant.ledger-api.party-management-service.max-parties-page-size=777
Impact and Migration
The change may have an impact on your workflow if your participant contains more than 10'000 parties and you rely on the results of ListKnownParties
containing all parties known to the participant. You will need to do one of two things:
- Change your workflow to utilize a series of
ListKnownParties
calls chained by page tokens instead of one, This is the recommended approach. - Change your configuration to increase the maximum page returned by the participant.
Node's Exit on Fatal Failures
When a node encounters a fatal failure that Canton cannot handle gracefully yet, the new default behavior is that the node will exit/stop the process and relies on an external process or service monitor to restart the node's process.
The following failures are considered fatal and now leads to an exit of the process:
- Unhandled exceptions when processing events from a domain, which previously lead to a disconnect from that domain.
- Failed transition from an active replica to a passive replica, which may result in an invalid state of the node.
- Failed transition from a passive replica to an active replica, which may result in an invalid state of the node.
The new behavior can be reverted by setting: canton.parameters.exit-on-fatal-failures = false
in the configuration.
Minor Improvements
Logging of Conflict Reason
When a command is rejected due to conflict (e.g. usage of an inactive contract),
every participant detecting the conflict will now log the resource causing the conflict at INFO level.
This change affects the following error codes:
LOCAL_VERDICT_LOCKED_CONTRACTS, LOCAL_VERDICT_LOCKED_KEYS, LOCAL_VERDICT_INACTIVE_CONTRACTS,
LOCAL_VERDICT_DUPLICATE_KEY, LOCAL_VERDICT_INCONSISTENT_KEY, LOCAL_VERDICT_CREATES_EXISTING_CONTRACTS
Repair service improvements
- Prevent concurrent execution of a domain (re-)connection while repair operations are in flight.
- Commands
ignore_events
andunignore_events
are now also available on remote nodes.
Error code changes
PACKAGE_NAMES_NOT_FOUND
is introduced for reporting package-name identifiers that could not be found.- When an access token expires and ledger api stream is terminated an
ABORTED(ACCESS_TOKEN_EXPIRED)
error is returned. DAR_NOT_VALID_UPGRADE
is introduced for reporting that the uploaded DAR is not upgrade-compatible with other existing DARs on the participant.KNOWN_DAR_VERSION
is introduced for reporting that the uploaded DAR name and version is already known to the participant.NO_INTERNAL_PARTICIPANT_DATA_BEFORE
is introduced and returned whenparticipant.pruning.find_safe_offset
is invoked with a timestamp before the earliest
known internal participant data.
Remote Log Level Changes
The log levels and last errors can now be accessed remotely from the logging
command group on remote consoles.
New Block Sequencer Metrics
As an early access feature, the block sequencer now exposes various labeled ...