-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
build: backport changes to release/v1.30.0
branch
#12635
Conversation
* Investigate intermittent F3 itest failures on CI Repeat F3 itests on CI to investigate intermittent failures. * Fix participation lease removal for wrong network When manifest changes, depending on the timing it is possible for newly generated valid leases to get removed if the sign message loop attempts to sign messages that are as a result of progressing previous network. Here is an example scenario in a specific order that was causing itests to fail: * participants get a lease for network A up to instance 5 * network A progresses to instance 6 * manifest changes the network name to B * participants get a new lease for network B up to instance 5 * sign loop receives a message from network A, instance 6 * `getParticipantsByInstance` lazily removes leases since it only checks the instance. * the node ends up with no participants, and stuck. To fix this: 1) check if participants asked for are within the current network, and if not refuse to participate. 2) check network name, as well as instance, to lazily remove expired leases. * Add debug capability to F3 itests to print current progress To aid debugging failing tests add option to print progress of all nodes at every eventual assertion, disabled by default. * Shorten GPBFT settings for a more responsive timing Defaults are based on epoch of 30s and real RTT. Shorten Delta and rebroadcast times. * Remove F3 itest repetitions on CI now that saul goodman See proof of the pudding: * https://github.com/filecoin-project/lotus/actions/runs/11369403828/job/31626763159?pr=12597 * Update the changelog * Address review comments * Remove the sanity check that all nodes use the same initial manifest
* Implement F3 list participants API Implement Lotus API to list the current miner IDs that are participating. * Include lease information to list participants API Also return start and validity terms of the list participants API.
* Support Forrest parity testing for F3 participation APIs For parity testing, Forest nodes need to decode an F3 participation ticket that is issued by Lotus. Lotus so happens to use CBOR to encode tickets, which it then decodes to issue a lease. The lease issuer is of type `peer.ID` which is an alias of type `string` that may contain non UTF-8 characters. If this type is used directly in CBOR encoding then it gets encoded as string, which works fine in Golang but not Rust, which in turn results in decoding issues in Forest. To avoid this use `[]byte` as the type to encode the issuer public key. In Lotus, the value will be the binary marshalling of `peer.ID` that issued the ticket. While at it, also fix two issues that were brought up during discussion: * A miner must not be able to ask for a ticket to participate in zero instances. Validate and return error if instances is set to zero. * Use CBOR tuple encoding for a slightly better efficient wire encoding of ticket. `cborgen.Write*` APIs explicitly create a given file; hence the need to separate the file to which tuple kinds are generated since the existing encoding for types in `api/cbor_gen.go` use maps.
Accept any of "", "0", "false", "no" as negative value for `LOTUS_DISABLE_F3` in which case F3 should remain enabled.
Upgrade to `go-f3` `v0.4.0` removed the option to disable checkpointing by F3. As a result this build parameter no longer has any effect. Remove it. Relates to: #12547
Continuing with the function here will lead to a panic. This can happen, e.g., if the lotus node stops and/or returns an error while the participant is waiting for its lease to expire.
Can we also get this in when merged please? |
And this please |
We also need the fix for this in lotus |
Yeah, I will wait for those to land and adding those to this backport. |
Implement `lotus f3` CLI sub commands to: * Get a specific finality certificate, either latest or by instance ID. * List a range of finality certificates Part of #12607
Fix the issue by instantiating pointers to sentinel F3 error values and assert that errors indeed pass through via an integration test. Fixes #12630
…ng progress (#12640) Previously, the flow was: 1. Get ticket. 2. Participate. 3. Repeatedly poll progress. The new flow is: 1. Get ticket. 2. Repeatedly participate, using the returned lease as an indicator of progress. That way, if the lotus node reboots we'll eventually re-tell them about the lease. fixes filecoin-project/go-f3#719
This reverts commit c674554.
@masih all PRs with a
And I can´t see that I´m touching anything with regards to that here. Could you maybe help me out with that? |
@rjan90 I am not sure about the Try: |
Hmmmm.. I will have to do some fixing on my end it seems, |
Signed-off-by: Jakub Sztandera <oss@kubuxu.com>
From archioz: ``` > lotus f3 manifest Manifest: Protocol Version: 4 Paused: false Initial Instance: 0 Initial Power Table: bafy2bzaceab236vmmb3n4q4tkvua2n4dphcbzzxerxuey3mot4g3cov5j3r2c Bootstrap Epoch: 2081674 Network Name: calibrationnet Ignore EC Power: false Committee Lookback: 10 Catch Up Alignment: 15s ``` Signed-off-by: Jakub Sztandera <oss@kubuxu.com>
Signed-off-by: Jakub Sztandera <oss@kubuxu.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGWM we just need #12650 for rc3
* feat(chain): use the index when determining if A is an ancestor of B That way we efficiently accept old and checkpoint finality certificates. * feat(chain): allow checkpoints beyond finality but prevent forks 1. Allow setting checkpoints more than 900 epochs ago as long as said checkpoint doesn't cause us to _revert_ more than 900 epochs. 2. Verify that we don't end up reverting more than 900 epochs when switching to a chain at or beyond the current head. 3. Optimize all of this such that, e.g., setting checkpoints in the distant past can use the chain index instead of having to walk back the chain manually. We may still need to walk up to 900 epochs to make sure we're not forking beyond finality, but that's it. * chore(chain): remove ChainStore.NearestCommonAncestor It's unused, inefficient, and a potential DoS vector if someone decides to use it. Signed-off-by: Jakub Sztandera <oss@kubuxu.com>
Signed-off-by: Jakub Sztandera <oss@kubuxu.com>
Related Issues
#12480
Proposed Changes
Backports PRs that has landed with a backport label on them to the
release/v1.30.0
: https://github.com/filecoin-project/lotus/pulls?q=is%3Apr+label%3Arelease%2Fbackport+is%3Aclosed, to the releases branch.PRs in question:
LOTUS_DISABLE_F3
env var gracefully #12620F3Consensus
build parameter #12621Checklist
Before you mark the PR ready for review, please make sure that: