Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement request: Task class #20

Closed
arthur-tacca opened this issue Aug 31, 2022 · 3 comments
Closed

Enhancement request: Task class #20

arthur-tacca opened this issue Aug 31, 2022 · 3 comments

Comments

@arthur-tacca
Copy link

arthur-tacca commented Aug 31, 2022

Update

I have now put together my idea of a task-like class into its own package: aioresult. It includes a ResultCapture class that runs a function and stores the result (like the Task class below) and also a Future class for when you want to manually set the value. It has functions for waiting (though normally just using a nursery would do); the wait_any() and wait_all() are slightly different from those in trio-util, but trio-util ones would also work if you passed ResultCapture.run() to them.

Original post

It would be great to have a Trio Task class, a bit like asyncio.Task or any number of other framework's task classes.

This could be a core part of Trio itself and I think I've seen it requested before, but I've realised it could actually be a fairly simple external class so could be ideal for trio-util.

The key idea is that, in this class, the task gets run in a user-supplied nursery. There are a couple of ways to do this but I think the most natural is forcing the user to run the task separately from creating it, with an async run() method that actually runs the task and you can just pass straight to Nursery.start_soon().

Here's a simple example that shows how you'd use this hypothetical Task object to run a few coroutines in a nursery, pretty similarly to usual, but then inspect their results later:

tasks = {url: Task(my_async_fetch_fn, url) for url in url_list}
async with trio.open_nursery() as nursery:
    for t in tasks.values():
        nursery.start_soon(t.run)
for url, t in tasks.items()
    print(f"At {url} got: {t.result}")

This would satisfy a few common requests made for Trio, most notably for an equivalent of asyncio's gather() function or for Nursery.start_soon() to give a way to get the return value of the async function. Actually, this is better than gather() because you're not forced to use a list, as you can see from the example above. It also nicely complements the wait_any() and wait_all() functions in trio-util (although it still would mainly be used with nurseries as in the above example).

Here's a really minimal implementation (it wouldn't handle enough cases for real use but does illustrate the idea):

class Task:
    def __init__(self, routine, *args):
        self.routine = routine
        self.args = args
        self.result = None
    async def run():
        self.result = await routine(*args)

Extra features

As I said, the above implementation is absolutely minimal. There are quite a few extra features I think could be useful:

  • Attributes exposed as properties t.result should be a read-only property rather than just directly exposing an attribute (and so should anything else that's part of the public API e.g. t.args).
  • Run twice check There should be a check (assert?) that run() isn't called twice for a given instance.
  • Record whether completed The task should note internally note whether it has completed (using try/finally in run())
    • Then there could be a t.is_completed property (or method?)
    • Accessing the t.result property should throw an exception if the task hasn't completed yet (TaskNotCompletedException?)
    • You could wait for the task to complete with await t.wait_complete(). This could just work off of a trio.Event internally.
    • One option is to combine the interfaces for getting the result and waiting for completion: result = await t.wait_completed(). This is most like the asyncio Future class. Then you'd probably still want a sync API t.result_nowait() or similar. But it seems cleaner to me to leave getting the result and waiting for completion separate, and leaving it up to the caller to compose them together if they wish (explicit is better than implicit).
  • Record finishing in exception If the task finishes in exception, that should be recorded too.
    • If t.result is accessed after the task finishes with an exception, the obvious thing to do is raise an exception from the property access. But this should definitely be wrapped in an outer exception (the original exception was already raised somewhere, so it doesn't make sense to raise it again in its original form; also, if the task was cancelled, you don't want that trio.Cancelled raised somewhere else because it won't be in the correct nursery). Perhaps it could be called ExceptionWasRaisedByTaskException? Hmm, maybe not...
    • Perhaps there should be a t.is_completed_with_exception property? Or perhaps best just to make the API for that to be that you access t.result and catch the exception.
  • kwargs You could allow passing **kwargs as well as just *args. This solves another common request for Nursery.start_soon(). It does restrict what you could do with the interface to Task in future though (then again, a static factory method could be added if needed to workaround that).
  • Monitor whether started Allow using the Nusery.start() protocol, perhaps as a subclass of Task. Then users can wait for the task to start using await t.wait_started() allow access to start return value t.start_result.

Here a couple more for completeness, even though personally I don't like them:

  • Individual cancellation The task could have its own cancel scope and cancel() method (or just property access to cancel scope). I'm actually against this one though, as it adds overhead to every task and you could just cancel the nursery that you've put them in.
  • Start in constructor A slight variation on the interface would be to pass a nursery to the constructor, which would then call nursery.start_soon(self.run) itself. That would save a bit of code in the caller but personally I like the separation of the two steps (again, explicit is better than implicit), and it makes it obvious to that nursery.start_soon() is still the right way to start things.
  • Task runner (Suggested in one of the issues below) A separate helper class that wraps a nursery, but its start_soon() returns a task (instead of None). But I think just targeting the Task class at regular nurseries is much more useful (technically the task runner class doesn't prevent that, but it would confuse things IMO).

Relevant past issues

In trio-util:

In trio:

In other libraries:

Elsewhere:

@arthur-tacca
Copy link
Author

arthur-tacca commented Sep 21, 2022

I've put together an implementation:

https://gist.github.com/arthur-tacca/32c9b5fa81294850cabc890f4a898a4e

I've renamed it ResultCapture based on feedback in Trio issues, which I think nicely stresses that it's about getting the coroutine result rather complex machinery for interdependent tasks.

Is there any interest here? Would it be worth me putting together a pull request?

Edit: Now in its own library: https://github.com/arthur-tacca/aioresult

@belm0
Copy link
Contributor

belm0 commented Sep 24, 2022

Hi-- sorry, I had mistakenly dropped taking a look at this from my TODO's.

Our application has pushed Trio fairly hard for 3 years (now 100k lines of code), and I haven't come across this kind of case enough to encapsulate it.

I was focusing on your original use case a little:

The reason I wanted something like this in the first place was so I could wait for completion of a task being spawned in a different nursery (I don't even want the result!) – see this gitter thread.

task = handler_nursery.start_soon(myhandler)
await task

I think it could be covered by just a utility function and the task_status prototcol:

async def done_wrapper(f, *args, *, task_status):
    event = trio.Event()
    task_status.started(event)
    try:
        await f(*args)
    finally:
        event.set()

done = await handler_nursery.start(done_wrapper, myhandler)
await done.wait()

@arthur-tacca
Copy link
Author

Thanks for looking at this @belm0

I was focusing on your original use case a little:
...

You're totally right that in my original use case I don't need 90% of what I'm suggesting. A wrapper around the handler that sets an Event is all that's needed. As you can see from that gitter thread, currently I'm just doing that as part of the wider function that uses the handler, but it felt a bit messy to mix that up with its core logic. (In truth, there's so little code that doing any refactoring at all has debatable value.)

Thanks for your done_wrapper() idea, I like it a lot. It avoids dumping all my code into one function, while being laser focused on actually solving my problem, rather than coming up with some super general API. Using the task_status protocol is a really clever way of achieving it.

There's certainly some interest in a general task / result capture class (as all the links in my post show, and actually it came up again today on another Trio gitter thread). But it's clear you're not interested in it in your library, which totally fair enough, especially since there's a lot of debate about what the design would be (also clear from that thread). The gist I posted earlier exists if anyone wants to use it, and I might try to publish my code as a standalone package on PyPI (if I magically find some free time). So I'll close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants