Skip to content
This repository has been archived by the owner on Jan 11, 2024. It is now read-only.

Work around serialization error while fetching votes for a checkpoint #170

Merged
merged 1 commit into from
Apr 20, 2023

Conversation

adlrocha
Copy link
Contributor

Background

While testing the use of a single IPC agent to orchestrate several validators, I ended up in a weird state where my validators kept failing when trying to determine if they have voted for a specific checkpoint. In the current implementation, errors in the checkpointing process leads to the future being returned instead of moving on to the next iteration. This serialization error with voting and the fact that the error returns the process leads to the checkpointing getting stuck. This is the serialization error we get from Lotus.

2023-04-20T16:30:49.944+0200    WARN    rpc     go-jsonrpc@v0.2.3/handler.go:406        error in RPC call to 'Filecoin.IPCHasVotedTopDownCheckpoint': error checking if validator has voted top-down checkpoint:
    github.com/filecoin-project/lotus/node/impl/ipc.(*IPCAPI).IPCHasVotedTopDownCheckpoint
        /home/workspace/pl/lotus/node/impl/ipc/ipc.go:236
  - failed to get key bafy2bzacea2dlrhxmbp5thuszzeirndwhu6snfvppzcp632iyizt3rylsanok in node
                                                                                             :
    github.com/filecoin-project/specs-actors/v7/actors/util/adt.(*Map).Get
        /home/adlrocha/go/pkg/mod/github.com/filecoin-project/specs-actors/v7@v7.0.1/actors/util/adt/map.go:98
  - expected byte array

Additionally, when submitting a checkpoint with several validators and restarting the process, we don't want to kill the future if, for instance, a validator has already voted and is turn to the next one to vote.

Implementation.

This PR disables momentarily the voting check for validators until we fix the serialization problem in the actors, and introduces a check to move on to the next iteration instead of killing the checkpoint process if there's an error while submitting a checkpoint (like for instance because the validator had already voted or is out of funds).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant