-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Serve] Run __del__
even if constructor is still in-progress
#45882
[Serve] Run __del__
even if constructor is still in-progress
#45882
Conversation
Signed-off-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
Signed-off-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
Signed-off-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some non-blocker nitpicks, but thanks for addressing the issue Shreyas!
Co-authored-by: Gene Der Su <gdsu@ucdavis.edu> Signed-off-by: shrekris-anyscale <92341594+shrekris-anyscale@users.noreply.github.com>
Co-authored-by: Gene Der Su <gdsu@ucdavis.edu> Signed-off-by: shrekris-anyscale <92341594+shrekris-anyscale@users.noreply.github.com>
…yscale/ray into shrekris/run_del_during_init
Signed-off-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
python/ray/serve/_private/replica.py
Outdated
|
||
await self._metrics_manager.shutdown() | ||
|
||
# We call the destructor last because the replica may not have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the ordering between this and metrics manager shutdown not important?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll restore the original ordering once I change the destructor to log and return. In any case though, the metrics manager pushes autoscaling info and multiplex info to the controller. I don't think the ordering between that and the destructor is very relevant.
python/ray/serve/_private/replica.py
Outdated
|
||
await self._metrics_manager.shutdown() | ||
|
||
# We call the destructor last because the replica may not have | ||
# initialized yet. The destructor may depend on instance variables | ||
# that haven't been set yet, so it may raise an error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, do we want to raise an error if the user's destructor raises an error? or just log and return? i don't think the exception needs to be propagated to the controller for any purpose
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should just log and return. I left a GitHub comment about that, but it didn't get published :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do currently pass the exception back to the controller (code).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's still useful to pass the exception back to the controller for logging purposes. If users aren't running an external logging system, then logging the exception replica-side means the log only exists while the worker node is still alive. If the node gets downscaled after the replica dies, the exception gets lost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to just log and return. I don't like having an implicit requirement that the __del__
step be the last part of the replica shutdown.
@@ -1162,7 +1166,6 @@ async def call_destructor(self): | |||
Calling this multiple times has no effect; only the first call will actually | |||
call the destructor. | |||
""" | |||
self._raise_if_not_initialized("call_destructor") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hasattr(self._callable, "__del__")
check below is now implicitly also handling the case where self._callable
is None
(because __new__
hasn't been run yet). At a minimum, leave a comment about this. IMO it'd be better to explicitly handle the None
case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed offline– I made the code skip the destructor if self._callable
is None
.
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com> Signed-off-by: shrekris-anyscale <92341594+shrekris-anyscale@users.noreply.github.com>
Signed-off-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
…yscale/ray into shrekris/run_del_during_init
Signed-off-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
Signed-off-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
Signed-off-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
Signed-off-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
Why are these changes needed?
This change makes replicas run the
__del__
method even if the constructor hasn't finished running yet.Related issue number
Closes #41606.
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.test_api.py
.