Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Soft opt out implementation may lead to incorrect jailing #1517

Closed
3 of 5 tasks
mpoke opened this issue Dec 15, 2023 · 4 comments · Fixed by #1549
Closed
3 of 5 tasks

Soft opt out implementation may lead to incorrect jailing #1517

mpoke opened this issue Dec 15, 2023 · 4 comments · Fixed by #1549
Assignees
Labels
type: bug Issues that need priority attention -- something isn't working

Comments

@mpoke
Copy link
Contributor

mpoke commented Dec 15, 2023

Summary of Bug

Due to the soft opt out feature, validators in the bottom SoftOptOutThreshold (default: 5%) can opt out of validating consumer chains without getting jailed. However, if one of these validators gets into the top e.g., 95%, then it get's jailed immediately. This is because in the downtime logic of the Slashing module on the consumer chains the missed blocks are counted even for validators in the bottom 5%. The soft opt-out logic just stops the consumer CCV module from sending SlashPackets for these validators (see here).

Thanks to @freak12techno for notifying us of this unexpected behavior.

Version

f1a6129

Steps to Reproduce

  • have a validator in the bottom 5% that doesn’t run a consumer node
  • after a while (more than the signing window on the consumer chain), delegate enough tokens to the validator to get it in the top 95%
  • the validator will get immediately jailed for downtime

For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
  • Is a spike necessary to map out how the issue should be approached?
@mpoke mpoke added the type: bug Issues that need priority attention -- something isn't working label Dec 15, 2023
@mpoke mpoke added this to Cosmos Hub Dec 15, 2023
@github-project-automation github-project-automation bot moved this to 🩹 F1: Triage in Cosmos Hub Dec 15, 2023
@mpoke mpoke moved this from 🩹 F1: Triage to 📥 F2: Todo in Cosmos Hub Dec 15, 2023
@mpoke
Copy link
Contributor Author

mpoke commented Dec 15, 2023

A solution is to have a flag for every CCV validator in the consumer module. This flag indicates whether a validator is part of the 95% or not. We can update these flags here when we update SmallestNonOptOutPower. When a validator becomes part of the 95%, then we set its signInfo.StartHeight to the current height using the SetValidatorSigningInfo method from the slashingKeeper. Clearly this is state breaking for the consumers.

@freak12techno
Copy link
Contributor

Is it possible to not update the signing blocks bit array if a validator is not a part of the set that is required to sign blocks on this consumer chain (as in, it's in the bottom 5%)?

Why am I asking: currently the behaviour is apparently the following: for bottom 5%, if a validator is not signing blocks, its missed_blocks_counter (as reported by binaryd q slashing signing-info <consensus pubkey>) is constantly increasing, and once it reaches the threshold, it's reset to 0, which makes uptime bots relying on signing-info results go crazy. Here's one example on Stride, this is the bot that's running on validators-status channel.

Before reaching threshold:
изображение
After reaching threshold:
изображение

(There may still a lot of caveats with the approach I suggest, I may not be aware of that, please clarify if there are some.)

@freak12techno
Copy link
Contributor

freak12techno commented Dec 15, 2023

For sure with RS introduction there's an issue that instead of binary state of validator's presence in active set (it's either active or not) there's a ternary state (not active; active but its voting power is above SmallestNonOptOutPower; active but its voting power is below SmallestNonOptOutPower) and I think it should be reflected somewhere in responses, while it isn't now.

@mpoke
Copy link
Contributor Author

mpoke commented Dec 18, 2023

@freak12techno Your suggested solution would require changes to the slashing module in the SDK. Since this is a very ICS specific thing, it may not be worth while to bringing into mainline SDK, which means that consumers will end up using forks of SDK, which is something we want to avoid as much as possible. That's the reason the soft opt out feature was implemented in the ICS consumer module. The downtime logic is unchanged, we only changed what happens once a validator is identified as down.

@mpoke mpoke assigned mpoke and insumity and unassigned mpoke Dec 18, 2023
@mpoke mpoke assigned mpoke and unassigned insumity Jan 3, 2024
@mpoke mpoke moved this from 📥 F2: Todo to 🏗 F3: InProgress in Cosmos Hub Jan 3, 2024
@mpoke mpoke moved this from 🏗 F3: InProgress to 👀 F3: InReview in Cosmos Hub Jan 5, 2024
@github-project-automation github-project-automation bot moved this from 👀 F3: InReview to 👍 F4: Assessment in Cosmos Hub Jan 9, 2024
@mpoke mpoke moved this from 👍 F4: Assessment to ✅ Done in Cosmos Hub Jan 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Issues that need priority attention -- something isn't working
Projects
Status: ✅ Done
Development

Successfully merging a pull request may close this issue.

3 participants