Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Serve] Run __del__ even if constructor is still in-progress #45882

Conversation

shrekris-anyscale
Copy link
Contributor

Why are these changes needed?

This change makes replicas run the __del__ method even if the constructor hasn't finished running yet.

Related issue number

Closes #41606.

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
      • This change adds a unit test to test_api.py.

Signed-off-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
Signed-off-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
Signed-off-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
Copy link
Contributor

@GeneDer GeneDer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some non-blocker nitpicks, but thanks for addressing the issue Shreyas!

python/ray/serve/tests/test_api.py Outdated Show resolved Hide resolved
python/ray/serve/tests/test_api.py Outdated Show resolved Hide resolved
python/ray/serve/tests/test_api.py Outdated Show resolved Hide resolved
shrekris-anyscale and others added 4 commits June 11, 2024 15:39
Co-authored-by: Gene Der Su <gdsu@ucdavis.edu>
Signed-off-by: shrekris-anyscale <92341594+shrekris-anyscale@users.noreply.github.com>
Co-authored-by: Gene Der Su <gdsu@ucdavis.edu>
Signed-off-by: shrekris-anyscale <92341594+shrekris-anyscale@users.noreply.github.com>
Signed-off-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
@shrekris-anyscale shrekris-anyscale added the go add ONLY when ready to merge, run all tests label Jun 17, 2024
python/ray/serve/_private/replica.py Outdated Show resolved Hide resolved

await self._metrics_manager.shutdown()

# We call the destructor last because the replica may not have
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the ordering between this and metrics manager shutdown not important?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll restore the original ordering once I change the destructor to log and return. In any case though, the metrics manager pushes autoscaling info and multiplex info to the controller. I don't think the ordering between that and the destructor is very relevant.


await self._metrics_manager.shutdown()

# We call the destructor last because the replica may not have
# initialized yet. The destructor may depend on instance variables
# that haven't been set yet, so it may raise an error.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, do we want to raise an error if the user's destructor raises an error? or just log and return? i don't think the exception needs to be propagated to the controller for any purpose

Copy link
Contributor Author

@shrekris-anyscale shrekris-anyscale Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should just log and return. I left a GitHub comment about that, but it didn't get published :/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do currently pass the exception back to the controller (code).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's still useful to pass the exception back to the controller for logging purposes. If users aren't running an external logging system, then logging the exception replica-side means the log only exists while the worker node is still alive. If the node gets downscaled after the replica dies, the exception gets lost.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided to just log and return. I don't like having an implicit requirement that the __del__ step be the last part of the replica shutdown.

@@ -1162,7 +1166,6 @@ async def call_destructor(self):
Calling this multiple times has no effect; only the first call will actually
call the destructor.
"""
self._raise_if_not_initialized("call_destructor")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hasattr(self._callable, "__del__") check below is now implicitly also handling the case where self._callable is None (because __new__ hasn't been run yet). At a minimum, leave a comment about this. IMO it'd be better to explicitly handle the None case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed offline– I made the code skip the destructor if self._callable is None.

shrekris-anyscale and others added 5 commits June 17, 2024 14:04
Co-authored-by: Edward Oakes <ed.nmi.oakes@gmail.com>
Signed-off-by: shrekris-anyscale <92341594+shrekris-anyscale@users.noreply.github.com>
Signed-off-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
Signed-off-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
Signed-off-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
Signed-off-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
Signed-off-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
Signed-off-by: Shreyas Krishnaswamy <shrekris@anyscale.com>
@shrekris-anyscale shrekris-anyscale merged commit b585ef0 into ray-project:master Jun 18, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Serve] __del__ method is not called when the replica dies during __init__
3 participants