-
Notifications
You must be signed in to change notification settings - Fork 986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read latest configuration independently from main loop #379
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -138,6 +138,11 @@ type Raft struct { | |
// the log/snapshot. | ||
configurations configurations | ||
|
||
// Holds a copy of the latest configuration which can be read | ||
// independently from main loop. | ||
latestConfiguration Configuration | ||
latestConfigurationLock sync.RWMutex | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this would probably work well for most use cases but there is a risk that very heavy reads of For example in Consul if there were some script calling I think in this case What do you think? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree that it would be better to use atomic and I will change it. |
||
|
||
// RPC chan comes from the transport layer | ||
rpcCh <-chan RPC | ||
|
||
|
@@ -603,18 +608,17 @@ func (r *Raft) restoreSnapshot() error { | |
r.setLastSnapshot(snapshot.Index, snapshot.Term) | ||
|
||
// Update the configuration | ||
var conf Configuration | ||
var index uint64 | ||
if snapshot.Version > 0 { | ||
r.configurations.committed = snapshot.Configuration | ||
r.configurations.committedIndex = snapshot.ConfigurationIndex | ||
r.configurations.latest = snapshot.Configuration | ||
r.configurations.latestIndex = snapshot.ConfigurationIndex | ||
conf = snapshot.Configuration | ||
index = snapshot.ConfigurationIndex | ||
} else { | ||
configuration := decodePeers(snapshot.Peers, r.trans) | ||
r.configurations.committed = configuration | ||
r.configurations.committedIndex = snapshot.Index | ||
r.configurations.latest = configuration | ||
r.configurations.latestIndex = snapshot.Index | ||
conf = decodePeers(snapshot.Peers, r.trans) | ||
index = snapshot.Index | ||
} | ||
r.setCommittedConfiguration(conf, index) | ||
r.setLatestConfiguration(conf, index) | ||
|
||
// Success! | ||
return nil | ||
|
@@ -746,19 +750,14 @@ func (r *Raft) VerifyLeader() Future { | |
} | ||
} | ||
|
||
// GetConfiguration returns the latest configuration and its associated index | ||
// currently in use. This may not yet be committed. This must not be called on | ||
// the main thread (which can access the information directly). | ||
// GetConfiguration returns the latest configuration. This may not yet be | ||
// committed. The main loop can access this directly. | ||
func (r *Raft) GetConfiguration() ConfigurationFuture { | ||
configReq := &configurationsFuture{} | ||
configReq.init() | ||
select { | ||
case <-r.shutdownCh: | ||
configReq.respond(ErrRaftShutdown) | ||
return configReq | ||
case r.configurationsCh <- configReq: | ||
return configReq | ||
} | ||
configReq.configurations = configurations{latest: r.getLatestConfiguration()} | ||
configReq.respond(nil) | ||
return configReq | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this make There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We are still using it in our tests and adding an alternative would be more work. I decided to leave it as is so that this PR doesn't get out of hand. But it is essentially dead code now. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In theory There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🤔 hmm. Making part of the internal workings atomic and accessing it directly in tests could introduce other issues though. Atomics prevent data races but not logical ones. I think the usage here is fine as we are limiting to only a single writer and wrapping concurrent readers so they don't have direct access but making internal state atomic and then tweaking it from tests could get brittle fast if we ever update or make assumptions about whether then atomic value. Does leaving tests using a different code path to real callers cause issues? If not I'm inclined to leave it to. Oh we also still use it in So I say it's OK to leave it for now as it is as those call sites need to lower level and more accurate answer, this just optimises for external callers. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 |
||
} | ||
|
||
// AddPeer (deprecated) is used to add a new peer into the cluster. This must be | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be returning the latest configuration or the latest comitted configuration? I may well be misremembering but I feel like we shouldn't be using/returning the config until it's committed right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nm.
GetConfiguration
docs state:SO this preserves that behaviour although frankly that seems strange to me that Consul or an application would "see" a config that tis not the one actually in use and may never get committed by a quorum 🤔.
I guess that is an edge case issue we can solve separately though this preserves same behaviour while fixing the performance issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering about the same thing, but wanted to preserve current behaviour.