Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate processes executed by leader #15944

Open
serathius opened this issue May 24, 2023 · 3 comments
Open

Investigate processes executed by leader #15944

serathius opened this issue May 24, 2023 · 3 comments

Comments

@serathius
Copy link
Member

What would you like to be added?

Etcd has multiple processes that executed by leader, however neither etcd nor raft guarantees that there is only 1 cluster member that identifies itself as a leader at one time.

Raft only guarantees that commited entries will not be lost by ensuring that they are persisted on quorum of members and electing a leader requires a quorum. However, there are periods of time where two members can consider themselves a leader, even though in reality they no longer have quorum and cannot commit any entries. An example is when leader is disconnected from other members and they elect new leader, there will be a period of time when old and new leader will be present.

TODO:

  • Identify all the etcd processes that depend on being executed by only the leader aka isLeader(). Example cluster version picking.
  • Define list of invariants that those process should fulfil. For example: idempotent, transactional etc.
  • Validate those invariants on all of those processes
  • Codify those invariants so no future changes impact etcd correctness.

Why is this needed?

To surface the issue with incorrect pattern used in etcd and improve the codebase.

@mitake
Copy link
Contributor

mitake commented May 30, 2023

Below requests can be issued by a stale leader and related to this issue:

It’s not related to the stale leader issue but Alarm might have similar behavior. In theory a stale leader can have a long duration between checking space and issuing a request of alarm in Put (or Txn, or LeaseGrant).

I think a simple solution for this issue might be not using Raft messages of MsgProp type for these requests. If an etcd server can just issue MsgApp message, the message can have a term information and can be rejected if its cluster has a newer leader.

@stale
Copy link

stale bot commented Sep 17, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 21 days if no further activity occurs. Thank you for your contributions.

@ahrtr
Copy link
Member

ahrtr commented Oct 25, 2023

Another example, s.raftStatus() might return stale info if current leader is in network partition status.

if err := s.waitAppliedIndex(); err != nil {
return err
}
rs := s.raftStatus()
// leader's raftStatus.Progress is not nil
if rs.Progress == nil {
return errors.ErrNotLeader
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants