Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idea: service nurseries #1521

Open
njsmith opened this issue May 14, 2020 · 11 comments
Open

Idea: service nurseries #1521

njsmith opened this issue May 14, 2020 · 11 comments

Comments

@njsmith
Copy link
Member

njsmith commented May 14, 2020

Extracting this idea from #147, because that's a monster thread and this seems worth discussing separately.

Idea: a "service nursery" is one where the background tasks are "services" that don't do anything on their own, but provide some service to the main task. Therefore:

  • background tasks are automatically cancelled when the nursery block exits
  • before the nursery block exits, background tasks are automatically shielded from cancellation (and maybe KeyboardInterrupt?)
  • mayyybe there's some special handling of exception propagation? if in the future we switch to always wrapping nursery exceptions in a multi-error-equivalent (MultiError v2 #611), AND multi-errors remain awkward and folks want to avoid them, then we could potentially do that with a service nursery by deeming that background tasks should never fail, so if they do then you get a special BackgroundServiceFailed error or something, and that way the main task body exception doesn't need to be wrapped in a multi-error? Dunno, this seems dubious to me.

Examples that fit this pattern:

Straw man implementation: #147 (comment)

Questions:

  • Are those semantics right?
  • Should the special "service-ness" flag be attached to the nursery, or to individual tasks?
  • open_nursery(service=True) or open_service_nursery()?
  • control-C handling? (see also Rethink KeyboardInterrupt protection APIs #733)
@oremanj
Copy link
Member

oremanj commented May 15, 2020

This currently exists (without the "auto cancel when nested child is done" semantics) as tricycle.open_service_nursery() (implementation is effectively unchanged from the linked strawman one). I use it all the time. I'd be fine adding the "auto cancel when nested child is done" and "wrap background service exceptions so that service nurseries don't MultiError-ize and thus can be wrapped in abstractions without impacting exception handling" features; I think they naturally complement the typical use cases.

Shielding the "service" tasks from KI makes sense to me, and mirrors what we currently do for system tasks (which would become tasks in just another service nursery). It feels consistent to me with the fact that they're shielded from cancellation. It does mean that an infinite loop in a service task is harder to debug, but we can compensate for that with better deadlock debugging tools. And this is a mildly advanced-level feature anyway.

Should the special "service-ness" flag be attached to the nursery, or to individual tasks?

I hadn't even considered making it a property of individual tasks, but I like that a lot now that you mention it. Especially if we spell this as additional nursery methods start_service and start_service_soon. (I think passing service=True is kinda clunky by comparison.)

We would need to think about what happens when the nested child is done but some background non-service tasks aren't. I think the most consistent behavior would be to treat the nested child equal to a non-service task; only when the nested child and all non-service tasks are done do we unshield and cancel the service tasks. (Edge case: can one of the service tasks start a new non-service task while it's unwinding from that cancellation, or do we consider the nursery closed at that point? If the new non-service task then starts another service task, is it pre-cancelled?)

@njsmith
Copy link
Member Author

njsmith commented May 17, 2020

Another example where a service nursery might make sense: run_process runs background tasks to pump input/output to the child process. When run_process is cancelled, it transmits that to the child process, and then waits for the child to exit. Ideally, this happens immediately, but if it doesn't, then it would make sense for the child process to keep having access to the stdin while it's cleaning up, and for run_process to capture any stdout/stderr that the child produces while exiting. But right now, the pump tasks get cancelled immediately.

@njsmith
Copy link
Member Author

njsmith commented May 18, 2020

I think the most consistent behavior would be to treat the nested child equal to a non-service task; only when the nested child and all non-service tasks are done do we unshield and cancel the service tasks.

Yeah, if we make it a per-task thing then this seems like the only reasonable semantics.

(Edge case: can one of the service tasks start a new non-service task while it's unwinding from that cancellation, or do we consider the nursery closed at that point? If the new non-service task then starts another service task, is it pre-cancelled?)

I think once services are cancelled, we would consider the nursery closed to new tasks (regardless of whether they're service tasks or regular tasks). The idea of "services" is that they're critical for the non-service tasks and useless otherwise, so it doesn't make sense to add new non-service tasks after the services have started shutting down, or add new services after the non-service tasks have exited.


Here's a surprising API design question: if we do want to the service flag to be per-task, then what does that mean for the start_process(nursery, command, ...) API proposed over here?

The awkward thing is that because start_process hides the actual call to nursery.start, there's no easy way to customize it. So how would you start a service process? Would we need a service=True kwarg to start_process? (That would be especially weird if we decide we prefer nursery.start_service over nursery.start(..., service=True).)

Maybe we should reconsider making nursery.start(run_process, ...) work...

@njsmith
Copy link
Member Author

njsmith commented May 18, 2020

Let's think a little about terminology.

"Service" is pretty good, but I do worry that it might be confusing, since a lot of Trio's users come to it thinking "OK, I want to implement a (HTTP, ...) service, how do I do that", and if they see "start_service" they may think "ah-ha, that's what I was looking for". We can address that with docs ("this is for tasks that provide a service to other tasks, if you you're writing a service, then you want the serve_* functions"), but it's worth taking a minute to think if there's some way we can make it clear without docs.

"Daemon" is the most-relevant jargon I can think of, but I don't like it. On the one hand, it's totally opaque unless you know about random Unix/threading arcana. And on the other hand, if you do know about that arcana, it's misleading, because daemons are not generally things that get protected from cancellation and then automatically cancelled.

Any other ideas worth considering? "helper", maybe?

@smurfix
Copy link
Contributor

smurfix commented May 18, 2020

Support? "Substrate" or "Foundation" (because the rest of the code requires its tasks to be there) might work too if we're talking about the nursery itself, but if we assign the service-ness to arbitrary tasks that sounds strange.

Actually I'm of two minds about this. I kindof like having to be explicit about starting a service nursery because you can then reason about which subtasks actually see it, and thus are able to start their own additional service tasks in that nursery – which could then no longer depend on the original services to be running.

@oremanj
Copy link
Member

oremanj commented May 18, 2020

I think once services are cancelled, we would consider the nursery closed to new tasks (regardless of whether they're service tasks or regular tasks).

Great, that sounds good to me and will reduce the number of cases we need to think about.

Maybe we should reconsider making nursery.start(run_process, ...) work...

Works for me! I think there's something appealing API-orthogonality-wise about letting "block waiting for something to start, then obtain a handle for interacting with it" always be spelled nursery.start(), or if not always, then at least "as often as we can reasonably manage it".

Let's think a little about terminology.

I still like "service" best, though I agree it's a little generic. "Helper" to me feels very generic, to the point that it doesn't communicate anything at all -- almost every task in a Trio program could be considered a helper of something.

"Scaffold" could work too, but I don't like the connotation of flimsiness.

I kindof like having to be explicit about starting a service nursery because you can then reason about which subtasks actually see it, and thus are able to start their own additional service tasks in that nursery – which could then no longer depend on the original services to be running.

I'm not totally sure I'm understanding this correctly, but I think the thing you're pointing at here is only a problem if the provider of the service shares its nursery object with the code that consumes the service. My intuition is that that's kind of an antipattern anyway: nurseries are cheap to create, so the user code can just as easily create their own, and lumping multiple abstraction layers together in the same nursery makes it hard to sort through the debris when an exception gets thrown. (This is the same reason I don't like the pattern of supporting both async with open_foo(...): and Foo(my_nursery, ...) for an object that needs to run background tasks.) The Trio programs I write tend to have one-ish layer of the task tree per layer of the protocol they're speaking, and I've found this to work well -- it gives each layer an obvious place to do cleanup and closing tasks after the layers inside it are done or unwound. With the use of service nurseries/tasks, you get the desirable property that the layers unwind in the correct order.

@njsmith
Copy link
Member Author

njsmith commented May 27, 2020

Question: if you have multiple helper tasks in the same nursery, should they be cancelled concurrently, or sequentially?

async with trio.open_nursery() as nursery:
    await nursery.start_helper(helper1)
    await nursery.start_helper(helper2)  # helper2 depends on helper1

Of course you could also do

async with trio.open_nursery() as nursery1:
    await nursery1.start_helper(helper1)
    async with trio.open_nursery() as nursery2:
        await nursery2.start_helper(helper2)  # helper2 depends on helper1

...but that gets cumbersome quickly.

@oremanj
Copy link
Member

oremanj commented May 27, 2020

Hmm. I was imagining "concurrently" but your point about system tasks in #1554 is well taken.

  • We could cancel in strict reverse order of spawning. That's kinda weird because the order in which the tasks started is potentially nondeterministic (if multiple start_service_soon calls were made in the same scheduling tick).
  • We could maintain an ordering that respects causality -- ie, a newly started service task is cancelled before whichever existing service tasks have actually taken a step already at the time when the new task is spawned. (This is easy enough to determine: task.coro.cr_frame.f_lasti >= 0.)
  • We could let the user specify a "priority" or "depth" or something for each service task, and cancel them in reverse order of that value.
  • We could say all service tasks in the same nursery are cancelled simultaneously, and if you want more control over the ordering then you need to use nested nurseries.

The downside of inferring too many cancellation ordering dependencies is that cancellations potentially take longer to complete and you have more opportunities to accidentally create deadlocks. Both of these are only relevant if you're using shielding of some sort (but that would include the "soft shields" discussed in #147). Also, it's hard to introduce additional parallelism in your cancellations if your service tasks aren't interdependent.

The downside of inferring too few cancellation ordering dependencies is that your program doesn't work, or works nondeterministically. But it's easy(ish) to enforce more ordering using nested nurseries.

I think "cancellation causality follows spawning causality" is likely to work best in practice, but it's hard to explain precisely. "Strict reverse order of spawning" is easy to explain, but hard to defeat when you want more parallelism. Manual priorities are kinda ugly, but all the service tasks in the same nursery should be cooperating with each other, so they're not inherently a non-starter. Other thoughts welcome!

@oremanj
Copy link
Member

oremanj commented May 28, 2020

I just realized there is actually an escape hatch if the default is "cancel in reverse spawning order" and you really want multiple service tasks to be cancelled at the same time: start one service task that contains a nursery in which you run all of those as regular tasks. I don't think the difference will matter often in practice, so the fact that this is a little cumbersome seems fine to me.

One thought on service-ness of tasks vs the whole nursery: how does this relate to the eventual goal of not wrapping the nested child exception in an ExceptionGroup if all background tasks are service tasks? We could say "only use an ExceptionGroup if at least one non-service background task has ever been spawned", but then

async with trio.open_nursery():
    raise KeyError

won't produce an ExceptionGroup, which might be surprising. Or maybe it's fine? Another option would be to make ExceptionGroup-wrapping orthogonal to service-ness, with a flag on open_nursery().

@smurfix
Copy link
Contributor

smurfix commented Jul 16, 2020

I wonder whether we shouldn't think larger than this.

A nontrivial "large" program consists of a bunch of building blocks which depend on each other. This relationship is not linear but a DAG; also, dependencies could come and go (one example is online reconfiguration, a message arrives that requires a database connection, the database connection wants to log errors to an error handler which may or may not exist already, the error handler needs to disconnect cleanly when it is no longer needed or else a loud alarm will be raised …).

My heap of ad-hoc solutions to the problem got out of hand, so I wrote a library that intends to encapsulate this.
Right now it's anyio-centric but Trio-izing the thing is simple enough.

https://github.com/M-o-a-T/asyncscope

@njsmith
Copy link
Member Author

njsmith commented Jun 4, 2021

Speaking of naming, maybe "assistant" is a good word -- it emphasizes that they're secondary to the "main" tasks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants