-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fn: Remove ctx from GoroutineManager constructor #9341
Conversation
Important Review skippedAuto reviews are limited to specific labels. 🏷️ Labels to auto review (1)
Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
cc @starius |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the change in API 🎉
IIUC this change would double the number of goroutines. I proposed one approach how to solve this.
fn/goroutine_manager.go
Outdated
// Calling wg.Add(1) and wg.Wait() when wg's counter is 0 is a race | ||
// condition, since it is not clear should Wait() block or not. This | ||
// condition, since it is not clear if should Wait() block or not. This |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/if should Wait() block/if Wait() should block/
// Go call starts running here after acquiring the mutex, it | ||
// would see that the context has expired and return false | ||
// instead of calling wg.Add(1). | ||
g.wg.Wait() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This update changes the behavior of Stop(). Previously, if one goroutine called Stop() and another goroutine invoked Stop() concurrently, the second call would block, waiting for the first call to complete. Now, the second Stop() call returns immediately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is that a problem? is there a usecase for needing to support the blocking behaviour? afaiu, this will mostly be used within other sub-systems and Stop will be called by their Stop methods which typically will also only be called once
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the previous version Stop()
actually did StopAndWait()
. If we support multiple Stop() calls, I think they should behave the same way. If someone calls Stop() from multiple places, they are likely to expect that both calls block until all goroutines finish, not only the first call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just dont think more than 1 system will ever own the GoroutineManager right?
ie, what makes this Stop different from other Stop methods of other subsystems in LND which use a similar pattern to this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just dont think more than 1 system will ever own the GoroutineManager right?
I think, the intuition of Stop() method is that after Stop() call (successfully) returning, all the workers are down. If the second Stop() just returns immediately, this intuition is broken.
Can we panic or return error if a second Stop is detected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see there is a Done() method - we could always update this to return a dedicated shutdown
channel that is only closed after Wait() in Stop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bhandras - perhaps a 3rd opinion here just to break the tie would help? 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Option 3 sgtm! (Done with shutdown chan)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool - pushed a diff with option 3 included 🫡
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah - just realised we cannot do this since callers may wait on Done() inside a goroutine that was started within Go().
Undoing this change
0f1388a
to
8c19ec0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @starius - updated as per your suggestion.
I've also removed the ContextWithQuit commit as i've realised that I can just
adapt the existing fn.ContextGuard instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 🎉
I tested it with #9140 - works great, after API changes applied.
I left few notes in the PR. It is not clear what do to if Stop is called multiple times.
for _, cancel := range g.cancelFns { | ||
cancel() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I propose to clear the map here, to free memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can do, although i dont think it is necessary since this is not a restartable system. This will only be called when the system is shutting down right? if we set the map to nil here, then you could argue that in every Stop method we have in LND, we should go and set each object to nil.
Interested to hear what a second reviewer thinks too. But yeah, can defs add a g.cancelFns = nil
line but cant do delete(g.cancelFns, id)
since deleting from a map while iterating over it is not safe i think
// Go call starts running here after acquiring the mutex, it | ||
// would see that the context has expired and return false | ||
// instead of calling wg.Add(1). | ||
g.wg.Wait() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the previous version Stop()
actually did StopAndWait()
. If we support multiple Stop() calls, I think they should behave the same way. If someone calls Stop() from multiple places, they are likely to expect that both calls block until all goroutines finish, not only the first call.
// task exiting. It attempts to catch a race condition between wg.Done() and | ||
// wg.Wait() calls. According to documentation of wg.Wait() this is acceptable, | ||
// therefore this test passes even with -race. | ||
func TestGoroutineManagerStopsStress(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should decide what to do with multiple Stop() calls. If we allow them, we should keep this test to make sure gm doesn't crash in this situation. If we don't allow them, we should panic or return error from Stop is called more than once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see the other comment re supporting multiple stop calls
8c19ec0
to
6825bde
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review @starius - addressed and replied! Will help to get a second opinion regarding your points for the Stop method
for _, cancel := range g.cancelFns { | ||
cancel() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can do, although i dont think it is necessary since this is not a restartable system. This will only be called when the system is shutting down right? if we set the map to nil here, then you could argue that in every Stop method we have in LND, we should go and set each object to nil.
Interested to hear what a second reviewer thinks too. But yeah, can defs add a g.cancelFns = nil
line but cant do delete(g.cancelFns, id)
since deleting from a map while iterating over it is not safe i think
// Go call starts running here after acquiring the mutex, it | ||
// would see that the context has expired and return false | ||
// instead of calling wg.Add(1). | ||
g.wg.Wait() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just dont think more than 1 system will ever own the GoroutineManager right?
ie, what makes this Stop different from other Stop methods of other subsystems in LND which use a similar pattern to this?
// task exiting. It attempts to catch a race condition between wg.Done() and | ||
// wg.Wait() calls. According to documentation of wg.Wait() this is acceptable, | ||
// therefore this test passes even with -race. | ||
func TestGoroutineManagerStopsStress(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see the other comment re supporting multiple stop calls
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🌮
6825bde
to
bb3442f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dankie @bhandras !
// Go call starts running here after acquiring the mutex, it | ||
// would see that the context has expired and return false | ||
// instead of calling wg.Add(1). | ||
g.wg.Wait() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but why will there ever be a second Stop call is what im getting at. Look at all the instances of ) Stop() {
in the codebase
// Go call starts running here after acquiring the mutex, it | ||
// would see that the context has expired and return false | ||
// instead of calling wg.Add(1). | ||
g.wg.Wait() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that if we really want to have a method that explicitly blocks and does so for multiple routines, then we should have an explicit Done() <-chan struct
method that does this
// Go call starts running here after acquiring the mutex, it | ||
// would see that the context has expired and return false | ||
// instead of calling wg.Add(1). | ||
g.wg.Wait() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see there is a Done() method - we could always update this to return a dedicated shutdown
channel that is only closed after Wait() in Stop
// Go call starts running here after acquiring the mutex, it | ||
// would see that the context has expired and return false | ||
// instead of calling wg.Add(1). | ||
g.wg.Wait() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bhandras - perhaps a 3rd opinion here just to break the tie would help? 🙏
bb3442f
to
186db3e
Compare
186db3e
to
37bb082
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🍫
Pointing to this change from this PR. So I think we can merge here once the CI is green on that side |
Pushed a new tag |
Updates the GoroutineManager to avoid having its constructor take
a context since this is an anti-pattern that I think we should avoid. The GoroutineManagers will
generally be created within LND subserver constructors and these will not really ever take contexts
and we want to avoid passing a fresh context to these
this PR was opened in response to reviewing this PR