Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle warnings in tests #751

Closed
wants to merge 36 commits into from
Closed

Conversation

blink1073
Copy link
Contributor

@blink1073 blink1073 commented Mar 6, 2022

Done so far:

  • Close fds in LocalProvisioner.wait to avoid unclosed fids
  • Destroy kernel client's context when calling stop_channels to avoid unclosed zmq context
  • Use separate future for shutdown to avoid conflict: kernel.shutdown_ready
  • Add guard around interrupting in KernelManager.shutdown
  • Clean up handling of event loops
  • Increase the number of available fds in tests
  • Miscellaneous teardown cleanups

So far the only warnings we can't easily work around are due to pytest-dev/pytest-asyncio#77. We'd have to rewrite all of the tests not to use TestCase.

@blink1073 blink1073 requested review from Zsailer and kevin-bates March 7, 2022 01:31
Copy link
Member

@kevin-bates kevin-bates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for spending the necessary time on this @blink1073 - it's non-trivial and much appreciated! Just had a couple of comments.

Should we open an issue to move the remaining tests to pytest to rid ourselves of TestCase? Perhaps someone with some bandwidth could take that on as a good intro?

.github/workflows/main.yml Outdated Show resolved Hide resolved
Comment on lines 321 to 322
if self._created_context:
self.context.destroy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should self._create_context be unset here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Comment on lines +494 to +495
if self.has_kernel:
await ensure_async(self.interrupt_kernel())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice - make this call to interrupt() conditional on start status. Good catch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏼

Comment on lines 20 to 32
if resource is not None:
soft, hard = resource.getrlimit(resource.RLIMIT_NOFILE)

DEFAULT_SOFT = 4096
if hard >= DEFAULT_SOFT:
soft = DEFAULT_SOFT

old_soft, old_hard = resource.getrlimit(resource.RLIMIT_NOFILE)
hard = old_hard
if old_soft < soft:
if hard < soft:
hard = soft
resource.setrlimit(resource.RLIMIT_NOFILE, (soft, hard))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this a little difficult to understand. I think the double calls to getrlimit are part of the confusion. Perhaps a comment as to the goal would be helpful. It appears to ensure a minimal soft limit of DEFAULT_SOFT if the current hard limit is at least that much.

It seems like the hard < soft (L30) condition will never be true because old_soft < soft (L29) can only be true if soft was increased to DEFAULT_SOFT, which implies hard >= soft, which leads me to wonder some kind of update to hard is missing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaned up

@blink1073
Copy link
Contributor Author

It looks like this breaks the server tests, so I'll have to do both PRs at once.

@blink1073
Copy link
Contributor Author

Okay, this part is done, stopping for today.

fut = Future()
except RuntimeError:
# No event loop running, use concurrent future
fut = CFuture()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is inside an async def, can this except branch ever occur? There should always be a running loop from inside a coroutine.

The previous strategy of setting _ready outside a coroutine might not have a running loop, but now that you've moved it into the coroutine (good, I think), there should always be a running loop.

Returning two types of Future depending on context would be tricky because they aren't interchangeable (CFuture is not awaitable without calling asyncio.wrap_future(cfuture)).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An important difference of this strategy, though: the attribute will not be set until an arbitrarily-later point in time due to asyncness. The previous strategy ensured ._ready was defined immediately before any waiting, whereas putting it in the coroutine means the attribute will not be defined immediately.

Two new things to consider (may well both be covered):

  1. previous waits for self._ready that may now be called when _ready is not yet
  2. can _ready be set and then re-set? If so, waits for _ready may get a 'stale' state before the new _ready future is attached. This can be avoided with a non-async delattr prior to the async bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally _ready was only defined in the constructor, which is why we had the fallback option. It seems like these futures are becoming problematic in general. Maybe what we really want is a state and a signal when the state changes, like we have on the JavaScript side.

Copy link
Contributor Author

@blink1073 blink1073 Mar 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, if we want to define ready and shutdown_ready as "the most recent" starting and shutting down futures, we could also add an is_ready and is_shutdown_ready for when there are no pending of either. We would use this trick that tornado used to work around the new deprecation warnings:

loop = asyncio.get_event_loop_policy().get_event_loop()   # this always returns the same loop
future = asyncio.Future(loop=loop)

We could then make .ready and .shutdown ready futures only available when using the AsyncKernelManager.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha oh my, nevermind, I'm digesting https://bugs.python.org/issue39529

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think part of the problem is that jupyter_client is a library, not an application, in the same way that tornado is. It looks like we also can't rely on asyncio_run in the future, since it relies on get_event_loop and set_event_loop. I think here's a sketch of what we need to do:

  • The base classes will be asynchronous and call the the private async versions of functions directly as needed. We will only use futures and tasks from within async methods.
  • The synchronous classes will work by wrapping the asynchronous public methods using a decorator that runs a new private asyncio loop as return asyncio.new_event_loop().run_until_complete(async_method).
  • We remove the .ready future and add new methods called async def wait_for_pending_start() and async def wait_for_pending_shutdown() to handle pending kernels.

With this new structure, the synchronous managers could also support pending kernels, breaking it into two synchronous steps.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can handle the ready future in this PR and defer the other parts to a separate refactor PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, concurrent.futures are generally more compatible and flexible than asyncio futures - they are threadsafe, reusable across loops, etc. If you always use a cf.Future, you can await it in a coroutine with asyncio.wrap_future(cf_future). That may be the simplest solution here. It's the inconsistency that I saw as a problem (self._ready may be awaitable). Something like:

# earlier, _not_ in a coroutine
self._ready = cf.Future()

async def wait_for_ready(self):
    if self._ready is None:
        raise RuntimeError("haven't started yet!")
    # wrap cf.Future self._ready to make it awaitable
    await asyncio.wrap_future(self._ready)

ought to be more reliable.

To always use new event loops (or asyncio.run) for each sync method call, one important consideration is that all your methods completely resolve by the end of the coroutine (i.e. not hold references to the current loop for later calls, which means requiring pyzmq 22.2, among other things). A possibly simpler alternative is to maintain your own event loop reference. asyncio.get_event_loop() is deprecated because it did too many things, but that doesn't mean nobody should hold onto persistent loop references. It just means that they don't want to track a global not-running loop. If each instance had a self._loop = asyncio.new_event_loop() and always ran the sync wrappers in that loop (self._loop.run_until_complete(coro())), it should work fine. That avoids the multiple-loop problem for sync uses because it's still a single persistent loop. Each instance may happen to use a different loop, but that should not be a problem in the sync case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet, thanks for the insights Min. That sounds like a good approach.

Copy link
Contributor Author

@blink1073 blink1073 Mar 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More thoughts on ready future handling:

  • Remove the public ready attribute(s), but add public is_start_pending and is_shutdown_pending flags.
  • Internally, store current ready and pending flags for start and shutdown.
  • In start(), check for _start_pending and error if true. Set _start_pending and set a new _start_ready future. Clear _start_pending when ready.
  • In shutdown(), check for _shutdown_pending and bail if true. Set _shutdown_pending and set a new _shutdown_ready future. Clear _shutdown_pending when ready.
  • In wait_for_start_ready() and wait_for_shutdown_ready() store current ready, and wait for it. If new current ready is different from the stored one, recurse.

@blink1073
Copy link
Contributor Author

Rather than introduce more flags and more confusion, I propose we actually use a simple state machine as @echarles suggested.

Currently server adds "execution_state" to the manager object based on kernel status messages. I think eventually we should move that logic here, but it doesn't need to be in this PR.

The states used here could mirror the ones used in JupyterLab :

  • Unknown
  • Starting
  • Running (Maps to Idle and Busy execution states)
  • Terminating
  • Restarting
  • Dead (with optional traceback)

We are Unknown to start. State transitions are as follows:

  • Unknown -> Starting
  • Starting -> Running or Dead
  • Running -> Terminating, Restarting, or Dead
  • Terminating -> Dead
  • Restarting -> Running or Dead
  • Dead -> Starting

Then we have a state: Unicode() trait and a convenience wait_for_state() method that takes
an optional desired state or returns for any state change. It would also raise an exception if it
gets to Dead state and it wasn't the expected state, along with the traceback.

The state trait can also be observed directly.

We also have an exception: Unicode() trait that is set when an exception leads to a Dead status.

@echarles
Copy link
Member

Rather than introduce more flags and more confusion, I propose we actually use a simple state machine as @echarles suggested.

Maybe the first step would be to draw and share that state machine that would be highly beneficial to jupyter client, but also to e.g. the jupyterlab services package evolution.

Having a SVG format for that drawing would allow everyone to use their favorite editor (e.g. I use inkscape).

@davidbrochart
Copy link
Member

I have not looked at this PR in details, but it seems to me that supporting a sync API is becoming problematic. At the time we switched jupyter_client to async, I had the feeling that keeping a sync API was kind of mandatory, although it involved using nest-asyncio which is not without problems. But today, could jupyter_client be only async? That would simplify the library a lot.

@blink1073
Copy link
Contributor Author

Good call @echarles. I think we should adopt the transitions library. It is MIT licensed, already on conda-forge, and lets you produce diagrams from your state machine.

We could add a CI job that create the diagram and compares it to the one in the documentation, and fails and asks you to rebuild if it has changed.

@blink1073
Copy link
Contributor Author

blink1073 commented Mar 13, 2022

@davidbrochart, I think there are still valid use cases for having a synchronous manager. What we could say is that only the async class is meant to be overridable, and offer a function that wraps the async class to become synchronous. We would have a section like:

# These methods are meant to be overridden by subclasses
async def _start_kernel(...)

And the public functions would be async in the base class, but made synchronous in the sync class.
We would create our own sync manager(s) using this wrapper.
I think this would have to be an 8.0 release.

@echarles
Copy link
Member

echarles commented Mar 13, 2022

Good call @echarles. I think we should adopt the transitions library. It is MIT licensed, already on conda-forge, and lets you produce diagrams from your state machine.

Code-to-doc, that sounds exciting. Looks like transitions uses has not svg native support and uses graphviz to generate the diagram. As graphviz can generate SVG, that is all good.

We could add a CI job that create the diagram and compares it to the one in the documentation, and fails and asks you to rebuild if it has changed.

That sounds exiciting.

BTW I have been looking recently at xstate on the frontend area. Looks like state machine architecture is trendy atm.

@echarles
Copy link
Member

@davidbrochart, I think there are still valid use cases for having a synchronous manager

+1 Although async + sync is harder to maintain in jupyter-client, it benefits basic consumers of this library who don't have to worry about asynchronicity.

@minrk
Copy link
Member

minrk commented Mar 14, 2022

I think keeping sync is important.

It's also possible to avoid nest_asyncio if you run 'asyncio-for-sync' in a dedicated IO thread. call_soon_threadsafe and concurrent.futures.Future make this pretty doable, and in my experience simpler and more robust than using nest_asyncio. ipyparallel's sync client has worked that way for a long time.

@blink1073
Copy link
Contributor Author

I opened #755 to track the nest_asyncio problem.

@blink1073
Copy link
Contributor Author

Closing this one in favor of #760 and a follow up in #763

@blink1073 blink1073 closed this Mar 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants