Idea: service nurseries #1521

njsmith · 2020-05-14T08:40:03Z

Extracting this idea from #147, because that's a monster thread and this seems worth discussing separately.

Idea: a "service nursery" is one where the background tasks are "services" that don't do anything on their own, but provide some service to the main task. Therefore:

background tasks are automatically cancelled when the nursery block exits
before the nursery block exits, background tasks are automatically shielded from cancellation (and maybe KeyboardInterrupt?)
mayyybe there's some special handling of exception propagation? if in the future we switch to always wrapping nursery exceptions in a multi-error-equivalent (MultiError v2 #611), AND multi-errors remain awkward and folks want to avoid them, then we could potentially do that with a service nursery by deeming that background tasks should never fail, so if they do then you get a special BackgroundServiceFailed error or something, and that way the main task body exception doesn't need to be wrapped in a multi-error? Dunno, this seems dubious to me.

Examples that fit this pattern:

Trio's "system" nursery
The background nursery that trio-websocket creates for each connection, or any other protocol that requires some background task to drive the socket
The nursery fixture in pytest-trio
Independently suggested by @matham at https://gitter.im/python-trio/general?at=5ebceeaf2cf0da0ad90004d4

Straw man implementation: #147 (comment)

Questions:

Are those semantics right?
Should the special "service-ness" flag be attached to the nursery, or to individual tasks?
open_nursery(service=True) or open_service_nursery()?
control-C handling? (see also Rethink KeyboardInterrupt protection APIs #733)

The text was updated successfully, but these errors were encountered:

oremanj · 2020-05-15T07:30:09Z

This currently exists (without the "auto cancel when nested child is done" semantics) as tricycle.open_service_nursery() (implementation is effectively unchanged from the linked strawman one). I use it all the time. I'd be fine adding the "auto cancel when nested child is done" and "wrap background service exceptions so that service nurseries don't MultiError-ize and thus can be wrapped in abstractions without impacting exception handling" features; I think they naturally complement the typical use cases.

Shielding the "service" tasks from KI makes sense to me, and mirrors what we currently do for system tasks (which would become tasks in just another service nursery). It feels consistent to me with the fact that they're shielded from cancellation. It does mean that an infinite loop in a service task is harder to debug, but we can compensate for that with better deadlock debugging tools. And this is a mildly advanced-level feature anyway.

Should the special "service-ness" flag be attached to the nursery, or to individual tasks?

I hadn't even considered making it a property of individual tasks, but I like that a lot now that you mention it. Especially if we spell this as additional nursery methods start_service and start_service_soon. (I think passing service=True is kinda clunky by comparison.)

We would need to think about what happens when the nested child is done but some background non-service tasks aren't. I think the most consistent behavior would be to treat the nested child equal to a non-service task; only when the nested child and all non-service tasks are done do we unshield and cancel the service tasks. (Edge case: can one of the service tasks start a new non-service task while it's unwinding from that cancellation, or do we consider the nursery closed at that point? If the new non-service task then starts another service task, is it pre-cancelled?)

njsmith · 2020-05-17T19:55:31Z

Another example where a service nursery might make sense: run_process runs background tasks to pump input/output to the child process. When run_process is cancelled, it transmits that to the child process, and then waits for the child to exit. Ideally, this happens immediately, but if it doesn't, then it would make sense for the child process to keep having access to the stdin while it's cleaning up, and for run_process to capture any stdout/stderr that the child produces while exiting. But right now, the pump tasks get cancelled immediately.

njsmith · 2020-05-18T07:02:43Z

I think the most consistent behavior would be to treat the nested child equal to a non-service task; only when the nested child and all non-service tasks are done do we unshield and cancel the service tasks.

Yeah, if we make it a per-task thing then this seems like the only reasonable semantics.

(Edge case: can one of the service tasks start a new non-service task while it's unwinding from that cancellation, or do we consider the nursery closed at that point? If the new non-service task then starts another service task, is it pre-cancelled?)

I think once services are cancelled, we would consider the nursery closed to new tasks (regardless of whether they're service tasks or regular tasks). The idea of "services" is that they're critical for the non-service tasks and useless otherwise, so it doesn't make sense to add new non-service tasks after the services have started shutting down, or add new services after the non-service tasks have exited.

Here's a surprising API design question: if we do want to the service flag to be per-task, then what does that mean for the start_process(nursery, command, ...) API proposed over here?

The awkward thing is that because start_process hides the actual call to nursery.start, there's no easy way to customize it. So how would you start a service process? Would we need a service=True kwarg to start_process? (That would be especially weird if we decide we prefer nursery.start_service over nursery.start(..., service=True).)

Maybe we should reconsider making nursery.start(run_process, ...) work...

njsmith · 2020-05-18T08:05:04Z

Let's think a little about terminology.

"Service" is pretty good, but I do worry that it might be confusing, since a lot of Trio's users come to it thinking "OK, I want to implement a (HTTP, ...) service, how do I do that", and if they see "start_service" they may think "ah-ha, that's what I was looking for". We can address that with docs ("this is for tasks that provide a service to other tasks, if you you're writing a service, then you want the serve_* functions"), but it's worth taking a minute to think if there's some way we can make it clear without docs.

"Daemon" is the most-relevant jargon I can think of, but I don't like it. On the one hand, it's totally opaque unless you know about random Unix/threading arcana. And on the other hand, if you do know about that arcana, it's misleading, because daemons are not generally things that get protected from cancellation and then automatically cancelled.

Any other ideas worth considering? "helper", maybe?

smurfix · 2020-05-18T08:20:09Z

Support? "Substrate" or "Foundation" (because the rest of the code requires its tasks to be there) might work too if we're talking about the nursery itself, but if we assign the service-ness to arbitrary tasks that sounds strange.

Actually I'm of two minds about this. I kindof like having to be explicit about starting a service nursery because you can then reason about which subtasks actually see it, and thus are able to start their own additional service tasks in that nursery – which could then no longer depend on the original services to be running.

oremanj · 2020-05-18T10:47:23Z

I think once services are cancelled, we would consider the nursery closed to new tasks (regardless of whether they're service tasks or regular tasks).

Great, that sounds good to me and will reduce the number of cases we need to think about.

Maybe we should reconsider making nursery.start(run_process, ...) work...

Works for me! I think there's something appealing API-orthogonality-wise about letting "block waiting for something to start, then obtain a handle for interacting with it" always be spelled nursery.start(), or if not always, then at least "as often as we can reasonably manage it".

Let's think a little about terminology.

I still like "service" best, though I agree it's a little generic. "Helper" to me feels very generic, to the point that it doesn't communicate anything at all -- almost every task in a Trio program could be considered a helper of something.

"Scaffold" could work too, but I don't like the connotation of flimsiness.

I kindof like having to be explicit about starting a service nursery because you can then reason about which subtasks actually see it, and thus are able to start their own additional service tasks in that nursery – which could then no longer depend on the original services to be running.

I'm not totally sure I'm understanding this correctly, but I think the thing you're pointing at here is only a problem if the provider of the service shares its nursery object with the code that consumes the service. My intuition is that that's kind of an antipattern anyway: nurseries are cheap to create, so the user code can just as easily create their own, and lumping multiple abstraction layers together in the same nursery makes it hard to sort through the debris when an exception gets thrown. (This is the same reason I don't like the pattern of supporting both async with open_foo(...): and Foo(my_nursery, ...) for an object that needs to run background tasks.) The Trio programs I write tend to have one-ish layer of the task tree per layer of the protocol they're speaking, and I've found this to work well -- it gives each layer an obvious place to do cleanup and closing tasks after the layers inside it are done or unwound. With the use of service nurseries/tasks, you get the desirable property that the layers unwind in the correct order.

njsmith · 2020-05-27T09:18:34Z

Question: if you have multiple helper tasks in the same nursery, should they be cancelled concurrently, or sequentially?

async with trio.open_nursery() as nursery:
    await nursery.start_helper(helper1)
    await nursery.start_helper(helper2)  # helper2 depends on helper1

Of course you could also do

async with trio.open_nursery() as nursery1:
    await nursery1.start_helper(helper1)
    async with trio.open_nursery() as nursery2:
        await nursery2.start_helper(helper2)  # helper2 depends on helper1

...but that gets cumbersome quickly.

oremanj · 2020-05-27T09:59:56Z

Hmm. I was imagining "concurrently" but your point about system tasks in #1554 is well taken.

We could cancel in strict reverse order of spawning. That's kinda weird because the order in which the tasks started is potentially nondeterministic (if multiple start_service_soon calls were made in the same scheduling tick).
We could maintain an ordering that respects causality -- ie, a newly started service task is cancelled before whichever existing service tasks have actually taken a step already at the time when the new task is spawned. (This is easy enough to determine: task.coro.cr_frame.f_lasti >= 0.)
We could let the user specify a "priority" or "depth" or something for each service task, and cancel them in reverse order of that value.
We could say all service tasks in the same nursery are cancelled simultaneously, and if you want more control over the ordering then you need to use nested nurseries.

The downside of inferring too many cancellation ordering dependencies is that cancellations potentially take longer to complete and you have more opportunities to accidentally create deadlocks. Both of these are only relevant if you're using shielding of some sort (but that would include the "soft shields" discussed in #147). Also, it's hard to introduce additional parallelism in your cancellations if your service tasks aren't interdependent.

The downside of inferring too few cancellation ordering dependencies is that your program doesn't work, or works nondeterministically. But it's easy(ish) to enforce more ordering using nested nurseries.

I think "cancellation causality follows spawning causality" is likely to work best in practice, but it's hard to explain precisely. "Strict reverse order of spawning" is easy to explain, but hard to defeat when you want more parallelism. Manual priorities are kinda ugly, but all the service tasks in the same nursery should be cooperating with each other, so they're not inherently a non-starter. Other thoughts welcome!

oremanj · 2020-05-28T17:20:43Z

I just realized there is actually an escape hatch if the default is "cancel in reverse spawning order" and you really want multiple service tasks to be cancelled at the same time: start one service task that contains a nursery in which you run all of those as regular tasks. I don't think the difference will matter often in practice, so the fact that this is a little cumbersome seems fine to me.

One thought on service-ness of tasks vs the whole nursery: how does this relate to the eventual goal of not wrapping the nested child exception in an ExceptionGroup if all background tasks are service tasks? We could say "only use an ExceptionGroup if at least one non-service background task has ever been spawned", but then

async with trio.open_nursery():
    raise KeyError

won't produce an ExceptionGroup, which might be surprising. Or maybe it's fine? Another option would be to make ExceptionGroup-wrapping orthogonal to service-ness, with a flag on open_nursery().

smurfix · 2020-07-16T10:44:27Z

I wonder whether we shouldn't think larger than this.

A nontrivial "large" program consists of a bunch of building blocks which depend on each other. This relationship is not linear but a DAG; also, dependencies could come and go (one example is online reconfiguration, a message arrives that requires a database connection, the database connection wants to log errors to an error handler which may or may not exist already, the error handler needs to disconnect cleanly when it is no longer needed or else a loud alarm will be raised …).

My heap of ad-hoc solutions to the problem got out of hand, so I wrote a library that intends to encapsulate this.
Right now it's anyio-centric but Trio-izing the thing is simple enough.

https://github.com/M-o-a-T/asyncscope

njsmith · 2021-06-04T02:12:36Z

Speaking of naming, maybe "assistant" is a good word -- it emphasizes that they're secondary to the "main" tasks.

njsmith mentioned this issue May 18, 2020

Should we have a conventional way to "spawn a process into a nursery"? #1104

Closed

oremanj added the todo soon label May 19, 2020

njsmith mentioned this issue May 23, 2020

Raise deferred KeyboardInterrupt out of run() after cancelling all tasks #1537

Draft

njsmith mentioned this issue May 27, 2020

Maybe use a more efficient wakeup mechanism #1554

Open

oremanj mentioned this issue May 29, 2020

Add support for async generator finalization #1564

Merged

This was referenced Jun 7, 2020

Allow start_clock() to start the very first system task; make MockClock do so #1579

Closed

start() shouldn't support multiple tasks in the old_nursery #1599

Open

njsmith mentioned this issue Jun 15, 2020

Deprecate support for multiple tasks in the private old_nursery used by start() #1600

Draft

njsmith mentioned this issue May 7, 2021

nursery fixture is too error-prone python-trio/pytest-trio#110

Open

lordmauve mentioned this issue Sep 27, 2021

Allow creating tasks that automatically cancel when a nursery exits lordmauve/wasabi2d#70

Open

oremanj mentioned this issue Apr 21, 2022

Decide what to do about a single exception raised out of a nursery #2301

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Idea: service nurseries #1521

Idea: service nurseries #1521

njsmith commented May 14, 2020

oremanj commented May 15, 2020

njsmith commented May 17, 2020

njsmith commented May 18, 2020

njsmith commented May 18, 2020

smurfix commented May 18, 2020

oremanj commented May 18, 2020 •

edited

Loading

njsmith commented May 27, 2020

oremanj commented May 27, 2020

oremanj commented May 28, 2020

smurfix commented Jul 16, 2020

njsmith commented Jun 4, 2021

Idea: service nurseries #1521

Idea: service nurseries #1521

Comments

njsmith commented May 14, 2020

oremanj commented May 15, 2020

njsmith commented May 17, 2020

njsmith commented May 18, 2020

njsmith commented May 18, 2020

smurfix commented May 18, 2020

oremanj commented May 18, 2020 • edited Loading

njsmith commented May 27, 2020

oremanj commented May 27, 2020

oremanj commented May 28, 2020

smurfix commented Jul 16, 2020

njsmith commented Jun 4, 2021

oremanj commented May 18, 2020 •

edited

Loading