-
Notifications
You must be signed in to change notification settings - Fork 994
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tuning performance and prevent leak due to time.After #484
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this improvement!
Many of these calls to time.After
are safe in practice, because the only way for the function to exit without the timer firing is when raft shuts down. I think in general we expect the entire process to exit soon after raft is shutdown, so in most cases that leak is not a concern, and not a risk. But that assumption may not always be true, so it's probably good to fix all of them. I think there is also a good argument to be made for fixing these cases as a way of demonstrating good practices.
There are definitely a couple cases of time.After
that are concerning, specifically the ones snapshot.go
, replication.go
, and raft.go
.
My main concern with the changes in this PR is the new wrapper around time.Timer
. I think we can fix these calls without introducing the new type, which will be easier to maintain in the long term. The first comment suggests how that might be possible.
I'd also like to request that you split up all the cleanup changes into at least a separate commit, and possibly a separate PR. Those clean ups are appreciated, but it makes the core changes harder to review when they are mixed into the same commit.
api.go
Outdated
timer := newTimer(timeout) | ||
defer timer.Stop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can accomplish the same thing without wrapping time.Timer
timer := newTimer(timeout) | |
defer timer.Stop() | |
var timer <-chan time.Time | |
if timeout > 0 { | |
t := time.Newtimer(timeout) | |
defer t.Stop() | |
timer = t.C | |
} |
This is a few extra lines, but may actually result in a smaller diff. I think in the long term not having a wrapper around time.Timer
will be easier to maintain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @dnephin Thank you for your careful review.
Using std timer in the first place is also my consider. However this would result in lot of code duplication. Without new wrapper type, we always have to care about:
if timeout > 0
and
timer = t.C // or something similar
My approach is that: using a wrapper but keep using of channel, by exposing t.C
. In the future, new logic code could reuse this pattern instead of implement its own timeout handling and instead of cloning the base logic.
How do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When reviewing this PR, I had to pause and locally test the behaviour of timerWrapper
to gain confidence that it would behave how I expect (and found a potential bug with (timerWrapper).Reset
).
From a maintainer's perspective, I think code duplication is an acceptable cost here. The wrapper certainly saves many lines of code but may increase developer time spent reading and understanding the wrapper type.
I'd like to request that we go with @dnephin's suggestion for using the std lib directly
inmem_transport.go
Outdated
err = fmt.Errorf("command timed out") | ||
} | ||
|
||
timer.Stop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's currently safe to call Stop here, but in the future we could add other returns. It might be good to keep this as a defer
after creating the timer, both for consistency, and to prevent future regressions. Is there a reason that defer
won't work in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you.
Not using defer
here is just a trade-off between safety (future changes) and performance (tiny improvement). Obviously, not calling timer.Stop
would result in the same manner as leaking time.After
.
I would fix that as your suggestion. Thank you so much.
log_cache_test.go
Outdated
if e := c.GetLog(uint64(i), &out); e != nil { | ||
t.Fatalf("err: %v", e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason for this change? Generally err
is the correct variable name to use for errors. If it's because it's masking err
from the parent scope, I think that's not really a concern in this test.
net_transport_test.go
Outdated
atomic.AddInt32(&errCounter, 1) | ||
t.Errorf("err: %v", err) | ||
} else if !reflect.DeepEqual(resp, out) { // Verify the response | ||
atomic.AddInt32(&errCounter, 1) | ||
t.Errorf("command mismatch: %#v %#v", resp, out) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for cleaning up the calls to t.Fatal
in a goroutine!
While this is good cleanup, I think it would be better to do this cleanup in a separate PR, and at the very least in a separate commit. Keeping commits isolated to one change makes the git history much easier to read, and can make it easier to cherry-pick changes.
I'd also like to suggest a different approach to fixing this problem. Instead of an atomic counter, how about returning an error, and using https://pkg.go.dev/golang.org/x/sync/errgroup to run the goroutines. That is a pattern I've used before, and it works really well. That might look something like this:
appendFunc := func() error {
// return errors instead of t.Fatalf
}
g := new(errgroup.Group)
for i = 0; i < 5; i++ {
g.Go(appendFunc())
}
require.NoError(t, g.Wait())
This will remove the need for a sync.WaitGroup
as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's great idea to use ErrGroup
. I would make the fix in another PR as your suggestion.
util.go
Outdated
func randomTimeout(minVal time.Duration) timerWrapper { | ||
if minVal == 0 { | ||
return nil | ||
return newTimer(0) | ||
} | ||
extra := (time.Duration(rand.Int63()) % minVal) | ||
return time.After(minVal + extra) | ||
return newTimer(minVal + extra) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can make this function work without the timerWrapper
by returning a noopTimer
implementation instead of nil
. The noopTimer
can have a method Stop
that does nothing, and a field C
that is a nil channel. Then we can return the noopTimer
or a *time.Timer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love the idea of noopTimer
. I would fix that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this more, I'm not sure how to return this from randomTimeout
. We could use an interface, but interfaces don't allow fields. I'm not sure if there is any way to make this work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible to use interface and follow SOLID principle here. But the trade-off might look like:
t.C -> t.C() // return channel through func
If it looks okay to you, then I think interface is a good choice rather than concrete type
@dnephin Follow up your comments, I have reverted changes and keep the PR small as it should be. I believe the main concern left is Thank you very much. |
inmem_transport.go
Outdated
@@ -287,6 +288,9 @@ func (i *inmemPipeline) decodeResponses() { | |||
case <-i.shutdownCh: | |||
return | |||
} | |||
|
|||
timer.Stop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be deferred on L267 in case of an early return in the select
block above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kisunji if we put defer in L267, it will be defer in loop and absolutely bad practice. How do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that's a good catch. We may have to call timer.Stop
in all of the branches that return
early then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends on each logic. As far as I look at the source code. We just need to invoke stop at the end of select or just inside a branch.
util.go
Outdated
func (t *timerWrapper) Reset(d time.Duration) { | ||
if t.t != nil { | ||
t.t.Reset(d) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Reset
will need to handle the case where t == nil
and call newTimer
t := newTimer(0)
t.Reset(1000) // no-op
t2 := time.Timer(0)
t2.Reset(1000) // resets timer to 1s
Currently, calling Reset
will not behave like (*time.Timer).Reset
in this case
api.go
Outdated
timer := newTimer(timeout) | ||
defer timer.Stop() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When reviewing this PR, I had to pause and locally test the behaviour of timerWrapper
to gain confidence that it would behave how I expect (and found a potential bug with (timerWrapper).Reset
).
From a maintainer's perspective, I think code duplication is an acceptable cost here. The wrapper certainly saves many lines of code but may increase developer time spent reading and understanding the wrapper type.
I'd like to request that we go with @dnephin's suggestion for using the std lib directly
@kisunji imho, code duplication cost might be a lot. We have to change not only one place in my PR but many (and in many files). I dont think the wrapper breaks any std lib way. And its quite small. Easy to read and understand, no third party lib. Dup code might introduce error prone or miss required specs. |
Hi @linxGnu, while this fix is valuable, more than one maintainer has expressed concerns around Internally we prefer that this PR uses stdlib directly to make the code obvious at a glance. |
if timeout > 0 { | ||
timeoutCh = time.After(timeout) | ||
timer = time.NewTimer(timeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inside a loop -> won't use defer
here
@@ -402,9 +401,9 @@ func (r *Raft) heartbeat(s *followerReplication, stopCh chan struct{}) { | |||
r.logger.Error("failed to heartbeat to", "peer", peer.Address, "error", err) | |||
r.observe(FailedHeartbeatObservation{PeerID: peer.ID, LastContact: s.LastContact()}) | |||
failures++ | |||
select { | |||
case <-time.After(backoff(failureWait, failures, maxFailureScale)): | |||
case <-stopCh: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a bug when received from stopCh
but does not return
case <-time.After(backoff(failureWait, s.failures, maxFailureScale)): | ||
case <-r.shutdownCh: | ||
if isTerminated(r.shutdownCh, backoff(failureWait, s.failures, maxFailureScale)) { | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a bug when received from shutdownCh
but does not return
Hey there, |
Hi 👋 Just cleaning up some stale issues and PRs. We want to thank you again @linxGnu for the time you spent on this PR and responding to feedback. At this point it's not very clear to us that these changes fix any real issues in this library - not ones that our usage has ever seen as a problem anyway. So on balance the cost and risk of checking, testing and merging the PR and releasing it into our products seems to outweigh any benefit we might get from it. It always feels bad saying no to a contribution but we'd rather not leave this open forever if we're not likely to merge it! Thanks again! |
First of all, thank you so much for incredible library.
This PR is an attempt to improve the performance and prevent high memory usage due to timer leak without calling
Stop
.Motivation
Many hot paths in Raft library introduce
time.After
usage. While getting out of scope (but not timeout), those timers are not stopped/recycled by GC yet (until event is fired and pushed to channel); thus pressure to runtime with unnecessary ops.Moreover,
time.After
actually allocates newtime.Timer
. There is no benefit of usingtime.After
overtime.Timer
.The patch
Objectives:
The patch introduces wrapper for timer which acts similar as
time.After
. Just a note that we need to stop it when getting out of scope (manually or usingdefer
)I would like to avoid
defer timer.Stop()
as much as possible. It's fine for some case with recent optimization introduced in go1.16, but not all.Benchmark