Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

governance: replace the halt counter with a halt_bit #4373

Closed
Tracked by #4402
erwanor opened this issue May 10, 2024 · 2 comments
Closed
Tracked by #4402

governance: replace the halt counter with a halt_bit #4373

erwanor opened this issue May 10, 2024 · 2 comments
Assignees
Labels
A-governance Area: Governance _P-high High priority _P-V1 Priority: slated for V1 release
Milestone

Comments

@erwanor
Copy link
Member

erwanor commented May 10, 2024

This is a ticket to replace the current halt counter with a bit flag that lives in nonverifiable storage, set by signal_halt and unset by running a migration.

Currently, every call to signal_halt must be matched with a manual counter increment in penumbra_app::TOTAL_HALT_COUNTER. The purpose of this mechanism is preventing unwanted node restarts, and progress following a halt signal. Operationally, this turned out to not be a tenable setup and @hdevalence suggested that we replace the counter with a single bit flag, eliminating the failure mode of running a binary with a TOTAL_HALT_COUNTER well ahead of the internal signal halt counter.

To do this, we need to:

  1. Replace the halt_count state key with halt_bit with path: "governance/persistent_flags/halt_bit"
  2. Replace the signal_halt implementation to set the halt_bit on (in nonverifiable storage)
  3. Add a ready_to_start implementation to set the halt_bit off
  4. Make is_chain_halted return the halt_bit
  5. Make pd migrate check is_chain_halted and return early if it is, this should happen before any specific migration variant is called, later we can add an APP_VERSION check for each special case.
  6. Add a --force flag to pd migrate so that we can optionally ignore the halt bit flag

The method names are suggestions, but they should all be documented so that their effect on application behavior can be understood.

@erwanor erwanor added A-governance Area: Governance _P-V1 Priority: slated for V1 release _P-high High priority labels May 10, 2024
@erwanor erwanor added this to Penumbra May 10, 2024
@github-project-automation github-project-automation bot moved this to Backlog in Penumbra May 10, 2024
@github-actions github-actions bot added the needs-refinement unclear, incomplete, or stub issue that needs work label May 10, 2024
@hdevalence
Copy link
Member

This sounds like a great plan.

@aubrika aubrika added this to the Sprint 7 milestone May 14, 2024
@aubrika aubrika removed the needs-refinement unclear, incomplete, or stub issue that needs work label May 14, 2024
@aubrika aubrika moved this from Backlog to In progress in Penumbra May 20, 2024
conorsch pushed a commit that referenced this issue May 20, 2024
## Describe your changes

This PR implements #4373 

## Checklist before requesting a review

- [x] If this code contains consensus-breaking changes, I have added the
"consensus-breaking" label. Otherwise, I declare my belief that there
are not consensus-breaking changes, for the following reason:

> This is consensus breaking because it deprecates the application halt
counter stored within the verifiable chain state. However, it does not
require a state migration because the switch starts in the "off" (i.e.
`false`) position. As a result, everything else being equal, using a new
upgraded binary should be sufficient.

---------

Signed-off-by: Erwan Or <erwan.ounn.84@gmail.com>
@conorsch
Copy link
Contributor

Resolved via #4413.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-governance Area: Governance _P-high High priority _P-V1 Priority: slated for V1 release
Projects
Archived in project
Development

No branches or pull requests

4 participants