Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultiError v2 #611

Closed
njsmith opened this issue Aug 18, 2018 · 38 comments
Closed

MultiError v2 #611

njsmith opened this issue Aug 18, 2018 · 38 comments

Comments

@njsmith
Copy link
Member

njsmith commented Aug 18, 2018

MultiError is the one part of trio's core design that I'm really not satisfied with. Trying to support multiple exceptions on top of the language's firm one-exception-at-a-time stance raises (heh) all kinds of issues. Some of them can probably be fixed by changing the design. But the hardest problem is that there are lots of third-party packages that want to do custom handling of exception tracebacks (e.g. ipython, pytest, raven/sentry). And right now we have to monkeypatch all of them to work with MultiError, with more or less pain and success.

Now @1st1 wants to add nurseries to asyncio, and as part of that he'll need to add something like MultiError to the stdlib. We talked a bunch about this at PyCon, and the plan is to figure out how to do this in a way that works for both asyncio and trio. Probably we'll start by splitting MultiError off into a standalone library, that trio and uvloop can both consume, and then add that library to the stdlib in 3.8 (and then the library will remain as a backport library for those using trio on 3.7 and earlier). This way asyncio can build on our experience, and trio can get out of the monkeypatching business (because if MultiErrors are in the stdlib, then it becomes ipython/pytest/sentry's job to figure out how to cope with them).

But before we can do that we need to fix the design, and do it in writing so we (and Yury) can all see that the new design is right :-).

Current design

[If this were a PEP I'd talk more about the basic assumptions underlying the design: multiple errors cna happen concurrently, you need to preserve that fact, you need to be able to catch some-but-not-all of those errors, you need to make sure that don't accidentally throw away any errors that you didn't explicitly try to catch, etc. But this is a working doc so I'm just going to dive into the details...]

Currently, trio thinks of MultiError objects as being ephemeral things. It tries as much as possible to simulate a system where multiple exceptions just happen to propagate next to each other. So it's important to keep track of the individual errors and their tracebacks, but the MultiError objects themselves are just a detail needed to accomplish this.

So, we only create MultiErrors when there are actually multiple errors – if a MultiError has only a single exception, we "collapse" it, so MultiError([single_exc]) is single_exc. The basic primitive for working with a MultiError is the filter function, which is really a kind of weird flat-map-ish kind of thing: it runs a function over each of the "real" exceptions inside a MultiError, and can replace or remove any of them. If this results in any MultiError object that has zero or one child, then filter collapses it. And the catch helper, MultiError.catch, is a thin wrapper for filter: it catches an exception, then runs a filter over it, and then reraises whatever is left (if anything).

One more important detail: traceback handling. When you have a nested collection of MultiErrors, e.g. MultiError([RuntimeError(), MultiError([KeyError(), ValueError()])]), then the leaf exceptions' __traceback__ attr holds the traceback for the frames where they traveled independently before meeting up to become a MultiError, and then each MultiError object's __traceback__ attr holds the frames that that particular MultiError traversed. This is just how Python's __traceback__ handling works; there's no way to avoid it. But that's OK, it's actually quite convenient – when we display a traceback, we don't want to say "exception 1 went through frames A, B, C, D, and independently, exception 2 went through frames A', B, C, D" – it's more meaningful, and less cluttered, to say "exception 1 went through frame A, and exception 2 went through frame A', and then they met up and together they went through frames B, C, D". The way __traceback__ data ends up distributed over the MultiError structure makes this structure really easy to extract.

Proposal for new design

Never collapse MultiErrors. Example:

async def some_func():
    async with trio.open_nursery() as nursery:
        async with trio.open_nursery() as nursery:
            raise RuntimeError

If you do await some_func() then currently you get a RuntimeError; in this proposal, you'll instead get a MultiError([MultiError([RuntimeError()])]).

Get rid of filter, and replace it with a new primitive split. Given an exception and a predicate, split splits the exception into one half representing all the parts of the exception that match the predicate, and another half representing all the parts that don't match. Example:

match, rest = MultiError.split(
  # simple predicate: isinstance(exc, RuntimeError)
  RuntimeError,
  # The exception being split:
  MultiError([
    RuntimeError("a"),
    MultiError([
      RuntimeError("b"),
      ValueError(),
    ]),
  ])
)

# produces:

match = MultiError([RuntimeError("a"), MultiError([RuntimeError("b")])])
rest = MultiError([MultiError([ValueError()])])

The split operation always takes an exception type (or tuple of types) to match, just like an except clause. It should also take an optional arbitrary function predicate, like match=lambda exc: ....

If either match or rest is empty, it gets set to None. It's a classmethod rather than a regular method so that you can still use it in cases where you have an exception but don't know whether it's a MultiError or a regular exception, without having to check.

Catching MultiErrors is still done with a context manager, like with MultiError.catch(RuntimeError, handler). But now, catch takes a predicate + a handler (as opposed to filter, which combines these into one thing), uses the predicate to split any caught error, and then if there is a match it calls the handler exactly once, passing it the matched object.

Also, support async with MultiError.acatch(...) so you can write async handlers.

Limitations of the current design, and how this fixes them

Collapsing is not as helpful to users as you might think

A "nice" thing about collapsing out MultiErrors is that most of the time, when only one thing goes wrong, you get a nice regular exception and don't need to think about this MultiError stuff. I say "nice", but really this is... bad. When you write error handling code, you want to be prepared for everything that could happen, and this design makes it very easy to forget that MultiError is a possibility, and hard to figure out where MultiError handling is actually required. If the language made handling MultiErrors more natural/ergonomic, this might not be as big an issue, but that's just not how Python works. So Yury is strongly against the collapsing design, and he has a point.

Basically, seeing MultiError([RuntimeError()]) tells you "ah, this time it was a single exception, but it could have been multiple exceptions, so I'd better be prepared to handle that".

This also has the nice effect that it becomes much easier to teach people about MultiError, because it shows up front-and-center the first time you have an error inside a nursery.

One of my original motivations for collapsing was that trio.run has a hidden nursery (the "system nursery") that the main task runs inside, and if you do trio.run(main) and main raises RuntimeError, I wanted trio.run to raise RuntimeError as well, not MultiError([RuntimeError()]). But actually this is OK, because the way things have worked out, we never raise errors through the system nursery anyway: either we re-raise whatever main raised, or we raise TrioInternalError. So my concern was unfounded.

Collapsing makes traceback handling more complicated and fragile

Collapsing also makes the traceback handling code substantially more complicated. When filter simplifies a MultiError tree by removing intermediate nodes, it has to preserve the traceback data those nodes held, which it does by patching it into the remaining exceptions. (In our example above, if exception 2 gets caught, then we patch exception 1's __traceback__ so that it shows frames A, B, C, D after all.) This all works, but it makes the implementation much more complex. If we don't collapse, then we can throw away all the traceback patching code: the tracebacks can just continue to live on whichever object they started out on.

Collapsing also means that filter is a destructive operation: it has to mutate the underlying exception objects' __traceback__ attributes in place, so you can't like, speculatively run a filter and then change your mind and go back to using the original MultiError. That object still exists but after the filter operation it's now in an inconsistent state. Fine if you're careful, but it'd be nicer if users didn't have to be careful. If we don't collapse, then this isn't an issue: split doesn't have to mutate its input (and neither would filter, if we were still using filter).

Collapsing loses __context__ for intermediate MultiError nodes

Currently, Trio basically ignores the __context__ and __cause__ attributes on MultiError objects. They don't get assigned any semantics, they get freely discarded when collapsing, and they often end up containing garbage data. (In particular, if you catch a MultiError, handle part of it, and re-raise the remainder... the interpreter doesn't know that this is semantically a "re-raise", and insists on sticking the old MultiError object onto the new one's __context__. We have countermeasures, but it's all super annoying and messy.)

It turns out though that we do actually have a good use for __context__ on MultiErrors. It's super not obvious, but consider this scenario: you have two tasks, A and B, executing concurrently in the same nursery. They both crash. But! Task A's exception propagates faster, and reaches the nursery first. So the nursery sees this, and cancels task B. Meanwhile, the task B has blocked somewhere – maybe it's trying to send a graceful shutdown message from a finally: block or something. The cancellation interrupts this, so now task B has a Cancelled exception propagating, and that exception's __context__ is set to the original exception in task B. Eventually, the Cancelled exception reaches the nursery, which catches it. What happens to task B's original exception?

Currently, in Trio, it gets discarded. But it turns out that this is not so great – in particular, it's very possible that task A and B were working together on something, task B hit an error, and then task A crashed because task B suddenly stopped talking to it. So here task B's exception is the actual root cause, and task A's exception is detritus. At least two people have hit this in Trio (see #416 and python-trio/pytest-trio#30).

In the new design, we should declare that a MultiError object's __context__ held any exceptions that were preempted by the creation of that MultiError, i.e., by the nursery getting cancelled. We'd basically just look at the Cancelled objects, and move their __context__ attributes onto the MultiError that the nursery was raising. But this only works if we avoid collapsing.

It would be nice if tracebacks could show where exceptions jumped across task boundaries

This has been on our todo list forever. It'd be nice if we could like... annotate tracebacks somehow?

If we stopped collapsing MultiErrors, then there's a natural place to put this information: each MultiError corresponds to a jump across task boundaries, so we can put it in the exception string or something. (Oh yeah, maybe we should switch MultiErrors to having associated message strings? Currently they don't have that.)

Filtering is just an annoying abstraction to use

If I wanted to express exception catching using a weird flat-map-ish thing, I'd be writing haskell. In Python it's awkward and unidiomatic. But with filter, it's necessary, because you could have any number of exceptions you need to call it on.

With split, there's always exactly 2 outputs, so you can perform basic MultiError manipulations in straight-line code without callbacks.

Tracebacks make filtering even more annoying to use than it would otherwise be

When filter maps a function over a MultiError tree, the exceptions passed in are not really complete, standalone exceptions: they only have partial tracebacks attached. So you have to handle them carefully. You can't raise or catch them – if you did, the interpreter would start inserting new tracebacks and make a mess of things.

You might think it was natural to write a filter function using a generator, like how @contextmanager works:

def catch_and_log_handler():
    try:
        yield
    except Exception as exc:
        log.exception(exc)

def normalize_errors_handler():
    try:
        yield
    except OtherLibraryError as exc:
        raise MyLibraryError(...) from exc

But this can't work, because the tracebacks would get all corrupted. Instead, handlers take exceptions are arguments, and return either that exception object, or a new exception object (like MyLibraryError).

If a handler function does raise an exception (e.g., b/c of a typo), then there's no good way to cope with that. Currently Trio's MultiError code doesn't even try to handle this.

In the proposed design, all of these issues go away. The exceptions returned by split are always complete and self-contained. Probably for MultiError.catch we still will pass in the exception as an argument instead of using a generator and .throwing it – the only advantage of the .throw is that it lets you use an except block to say which exceptions you want to catch, and with the new MultiError.catch we've already done that before we call the handler. But we can totally allow raise as a way to replace the exception, or handle accidental exceptions. (See the code below for details.)

Async catching

Currently we don't have an async version of filter or catch (i.e., one where the user-specified handler can be async). Partly this is because when I was first implementing this I hit an interpreter bug that made it not work, but it's also because filter's is extremely complicated and maintaining two copies makes it that much worse.

With the new design, there's no need for async split, I think the new catch logic makes supporting both sync and async easy (see below).

Details

# A classic "catch-all" handler, very common in servers
with MultiError.catch(Exception, logger.exception):
    ...

# is equivalent to:

except BaseException as exc:
    caught, rest = MultiError.split(Exception, exc)
    if caught is None:
        raise
    try:
        logger.exception(caught)
    except BaseException as exc:
        # The way we set exc.__context__ here isn't quite right...
        # Ideally we should stash it the interpreter's implicit exception
        # handling context state before calling the handler.
        if rest is None:
            try:
                raise
            finally:
                exc.__context__ = caught
        else:
            exc.__context__ = caught
            new_exc = MultiError([exc, rest])
            try:
                raise new_exc
            finally:
                new_exc.__context__ = None
                # IIRC this traceback fixup doesn't actually work b/c
                # of interpreter details, but maybe we can fix that in 3.8
                new_exc.__traceback__ = new_exc.__traceback__.tb_next
    else:
        if rest is not None:
            orig_context = rest.__context__
            try:
                raise rest
            finally:
                rest.__context__ = orig_context
                # See above re: traceback fixup
                rest.__traceback__ = rest.__traceback__.tb_next

Notes:

As noted in comments, __context__ and __traceback__ handling is super finicky and has subtle bugs. Interpreter help would be very... helpful.

Notice that all the logic around the logger.exception call is always synchronous and can be factored into a context manager, so we can do something like:

def catch(type, handler, match=None):
    with _catch_impl(type, match=None) as caught:
        handler(caught)
        
async def acatch(type, handler, match=None):
    with _catch_impl(type, match=None) as caught:
        await handler(caught)

Other notes

Subclassing

Do we want to support subclassing of MultiError, like class NurseryError(MultiError)? Should we encourage it?

If so, we need to think about handling subclasses when cloning MultiErrors in .split.

I think we should not support this though. Because, we don't have, and don't want, a way to distinguish between a MultiError([MultiError([...])]) and a MultiError([NurseryError([...])]) – we preserve the structure, it contains metadata, but still it's structural. split and catch still only allow you address the leaf nodes. And that's important, because if we made it easy to match on structure, then people would do things like try to catch a MultiError([MultiError([RuntimeError()])]), when what they should be doing is trying to catch one-or-more-RuntimeErrors. The point of keeping the MultiErrors around instead of collapsing is to push you to handle this case, not continue to hard-code assumptions about there being only a single error.

Naming

MultiError isn't bad, but might as well consider other options while we're redoing everything. AggregateError? CombinedError? NurseryError?

Relevant open issues

#408, #285, #204, #56, python-trio/pytest-trio#30

@1st1
Copy link

1st1 commented Aug 18, 2018

Thanks for writing this up! I'm indeed thinking about how we can implement MultiErrors in asyncio, and the more I think about this the more I'm leaning towards not adding a full-blown MultiError semi-standard API at all. What I think would be enough instead is to have a new standard attribute on Exception objects to represent a list of exception this exception instance wraps. Let's assume we name that new attribute __errors__. If Python error displayhook sees an exception with __errors__ it knows that it will have to render all of them. This is it. It's now up to libraries (Trio, pytest, asyncio, etc) how exactly they want to design their MultiErrors; it's up to tools (like Sentry) how exactly they want to render multi-error-like exception instances. I think this would be a relatively straightforward change for Python 3.8 that would be easy to push through python-ideas/python-dev. Pushing a full MultiError API, on the other hand, will be a long and excruciating experience.

@njsmith
Copy link
Member Author

njsmith commented Aug 19, 2018

@1st1 Hmm, yeah, our strategy there is an important question. I'm not sure I agree that pushing __errors__ through will be the easiest approach – that's changing one of the language's core protocols, and in a sort of vague way, while by contrast we can spin MultiError as adding a single concrete class that's needed by a concrete stdlib user. And even better if we can spin it as "here's an existing library, it already works, no need to bikeshed the details, we're just moving it into the stdlib so asyncio can use it". (There's plenty of precedent for this, e.g. ipaddress.)

Anyway, there'll be plenty of time later to decide on our python-dev strategy. For right now... you're going to need a "full-blown MultiError API" in any case, right, in some namespace or another? And IIRC you were hoping to prototype its usage in uvloop, right, so its initial implementation will be outside the stdlib? I think it'd be great to collaborate on this so it works for both asyncio and trio, and for right now it seems like the way to do that is to make a multierror library on pypi and use it for prototyping, even if that might not be the form we eventually use to present it to python-dev. What do you think?

@njsmith
Copy link
Member Author

njsmith commented Aug 19, 2018

(I love the new profile picture by the way, that's a great photo)

@njsmith
Copy link
Member Author

njsmith commented Aug 20, 2018

Played around with this a bit tonight and re:

It'd be nice if we could like... annotate tracebacks somehow?
If we stopped collapsing MultiErrors, then there's a natural place to put this information: each MultiError corresponds to a jump across task boundaries, so we can put it in the exception string or something. (Oh yeah, maybe we should switch MultiErrors to having associated message strings? Currently they don't have that.)

...it immediately became clear that what you actually want is not the ability to annotate a MultiError, but rather to annotate each of the exceptions inside the MultiError. Like, "this came from child task X", "this came from the main task".

njsmith added a commit to njsmith/trio that referenced this issue Sep 3, 2018
Old way:

- exceptions and regular returns from main were captured, then
  re-raised/returned from init, then captured again, then
  re-raised/returned from run()
- exceptions in system tasks were converted into
  TrioInternalErrors (one-at-a-time, using MultiError.filter()),
  except for a whitelist of types (Cancelled, KeyboardInterrupt,
  GeneratorExit, TrioInternalError), and then re-raised in the init
  task.
- exceptions in the run loop machinery were caught in run(), and
  converted into TrioInternalErrors (if they weren't already).

New way:

- exceptions and regular returns from main are captured, and then
  re-raised/returned from run() directly
- exceptions in system tasks are allowed to propagate naturally into
  the init task
- exceptions in the init task are re-raised out of the run loop
  machinery
- exceptions in the run loop machinery are caught in run(), and
  converted into TrioInternalErrors (if they aren't already).

This needs one new special case to detect when spawning the main task
itself errors, and treating that as a regular non-TrioInternalError,
but otherwise it simplifies things a lot. And, it removes 2
unnecessary traceback frames from every trio traceback.

Removing the special case handling for some exception types in system
tasks did break a few tests. It's not as bad as it seems though:

- Cancelled was allowed through so it could reach the system nursery's
  __aexit__; that still happens. But now if it's not caught there, it
  gets converted into TrioInternalError instead of being allowed to
  escape from trio.run().
- KeyboardInterrupt should never happen in system tasks anyway; not
  sure why we had a special case to allow this.
- GeneratorExit should never happen; if it does, it's probably because
  things blew up real good, and then the system task coroutine got
  GC'ed, and called coro.close(). In this case letting it escape is the
  right thing to do; coro.close() will catch it. In other cases,
  letting it escape and get converted into a TrioInternalError is
  fine.
- Letting TrioInternalError through works the same as before.

Also, if multiple system tasks crash, we now get a single
TrioInternalError with the original MultiError as a __cause__, rather
than a MultiError containing multiple TrioInternalErrors. This is
probably less confusing, and it's more compatible with the python-triogh-611
approach to things.
@njsmith
Copy link
Member Author

njsmith commented Sep 15, 2018

Looking at @belm0's example code here, I realized that this proposal is also going to complicate code that "hides" a nursery inside a custom context manager: python-trio/trio-websocket#20 (comment)

This code looks innocuous:

async with open_websocket("https://...") as ws:
    raise ValueError

but if open_websocket has a nursery hidden inside it, then you'll get a MultiError([ValueError]) instead of a ValueError, and that's going to surprise people, because it's mostly an implementation detail that open_websocket creates a nursery.

We can't hide this entirely, because of #264 – if open_websocket has a nursery inside it, then you can't yield inside an open_websocket (see). And to some extent, this is unavoidable, because it is an inescapable fact of life that if open_websocket has a background task, that task might crash, and then we need to cope with propagating that error out.

But, for the open_websocket case, there is another pattern possible: we could say that well, we've carefully designed our websocket library so that the background task should never crash – if it encounters an error, then it catches it and arranges for the error to be delivered the next time someone tries to send or receive. Of course, it's still possible for the background task to crash due to a bug in the websocket library... but that's a bug, so it can be handled a bit differently than a regular error. In this case, it might make sense to say:

  • An unhandled exception inside the body of the async with block propagates through without being wrapped in a MultiError
  • If there's an unhandled exception in a background task (which should never happen, and indicates a bug in our library), we signal that by converting it into a MyWebsocketLibraryInternalError (which might be-a or have-a MultiError)

In fact, this is exactly the pattern used by Trio's internal "system nursery" (with the main task playing the role of the async with block body, and TrioInternalError playing the part of MyWebsocketLibraryInternalError).

Maybe it should be something we support explicitly, e.g. with a custom nursery type that implements the above logic, or some sort of setting passed to open_nursery.

@njsmith
Copy link
Member Author

njsmith commented Sep 30, 2018

ExceptionGroup([SystemExit()]) needs special handling in the interpreter (the same special handling that SystemExit gets, which is something like: don't print a traceback, and if the payload is an integer then use that as the process exit code, otherwise print it?).

This is a nice example too because it forces us to think about how we can access attributes of embedded exceptions.

@thejohnfreeman
Copy link

Subprocesses might be too big of a first bite for me, but I'd like to take a stab at this (much easier since it should be doable in pure Python). Here is my work in progress. I would like to add more tests of the contract before trying to integrate it into Trio. Do you have any ideas for Trio-agnostic tests? Any other comments? How best can I proceed to fix this issue?

@njsmith
Copy link
Member Author

njsmith commented Oct 8, 2018

@thejohnfreeman Sorry I didn't get back to you before! This is a high priority but I started writing up a bunch of notes from the Python core sprints a few weeks ago, stalled out, and then have been stuck on that..... let me finish those up real quick and push my totally incomplete prototype to https://github.com/python-trio/exceptiongroup , and then can compare notes and figure out how to move this forward :-)

@njsmith
Copy link
Member Author

njsmith commented Oct 8, 2018

Notes from the python core summit

@1st1 and I spent a bunch of time talking about this at the python core summit a few weeks ago; these are my notes from that.

The initial goal is to get enough consensus that we can start writing code and get some experience. So all of the conclusions below are tentative and subject to change, but it's looking like we have enough to get started: https://github.com/python-trio/exceptiongroup

Topics discussed

Should MultiError inherit from BaseException, or from Exception? (Or something else?)

Quick review: BaseException is the base class for all raise-able objects in Python. Exception is the base class for exceptions that should be caught by catch-all handlers, which is most but not quite all of them:

In [1]: BaseException.__subclasses__()                                          
Out[1]: [Exception, GeneratorExit, SystemExit, KeyboardInterrupt]

Also, trio.Cancelled is a BaseException, and asyncio is planning to make asyncio.CancelledError into a BaseException in the next release.

In Trio, MultiError inherits from BaseException. Yury is worried about this, because of backcompat: current asyncio code uses try: ... except Exception: ... as a generic catch-all handler. If asyncio.TaskGroup starts wrapping regular Exceptions into MultiError, and MultiError is a BaseException, then that means it will start "promoting" errors from regular Exceptions into BaseExceptions, and that breaks existing catch-all handlers. (E.g., aiohttp has 32 instances of except Exception – many of them don't have async code inside them, but they would all need auditing.) Trio doesn't really have this issue the same way, because there's less existing code and it's been using BaseException since the beginning.

However, if MultiError is an Exception, then it can only contain Exception subclasses, because otherwise a MultiError(KeyboardInterrupt) could get caught by an except Exception handler, which would be bad. So if you make MultiError into an exception, then you have to figure out what to do with KeyboardInterrupt and the other BaseException types.

At least with regards to catching exceptions, the general principle is: a MultiError object A can be an instance of exception type B (isinstance(A, B)) if and only if for every exception inside the MultiError, that exception is also an instance of B (all(isinstance(exc, B) for exc in A)). So for example, it would be OK for a MultiError([OSError, OSError]) to itself be an instance of OSError, and be caught by except OSError blocks. But MultiError([OSError, ValueError]) can't be an OSError or a ValueError, but only a Exception. And MultiError([KeyboardInterrupt, OSError]) can't be an OSError or even an Exception, but only a BaseException.

In principle we could dynamically set each MultiError object's base classes when it's created, based on this rule. But this doesn't seem very useful in practice... a MultiError([OSError, OSError]) could be caught by except OSError:, but it wouldn't have an .errno attribute. Also, the whole argument for why we don't want to "collapse" single-entry MultiErrors (MultiError([OSError]) → bare OSError) is that we want people to write code to handle what could happen, rather than what did happen. Letting them use except OSError to catch a MultiError([OSError, OSError]) would violate this: because next time it might be a MultiError([OSError, ValueError]), and now their except block doesn't work.

So trying to get super clever here doesn't seem to be a good idea. If we do anything here, it seems like it should be a special case target specifically at those except Exceptions. So then we would need a plan for how to handle BaseExceptions inside nurseries.

It seems like the two options are:

  • Make MultiError a BaseException, like Trio has done
  • Add a bunch of special-case code inside the nursery exception propagation logic to propagate BaseExceptions in a special way. E.g.: if there's a BaseException and a regular Exception, then the Exception gets discarded and the BaseException gets propagated, unwrapped. And if there are multiple BaseExceptions, then apply some hard-coded ad hoc rules, like KeyboardInterrupt beats GeneratorExit, which beats asyncio.CancelledError.

Yury is tentatively planning to take the second approach in asyncio. I don't want to do that in Trio; too much ad-hackery. So is there a way we can still share our toys?

Plan: make a BaseException version of MultiError that's built into Python. Then asyncio can do:

class TaskGroupError(MultiError, Exception):
    ...

to make a version of MultiError that's caught by except Exception and that can only contain Exception objects.

This may mean that trio-asyncio needs to do some extra translation of these errors when they pass from trio → asyncio, but since it already has to do that for cancellation exceptions then this shouldn't be a showstopper.

If we have multiple MultiError subclasses, there was some question of what kind of semantics to assign that. The concern is: any code for catching exceptions inside MultiError objects needs to know how to break a MultiError into its constituent pieces. Therefore, you really don't want to assign special meaning to the MultiError subclass itself – Trio's exception catching logic needs to be able to tear apart an asyncio MultiError without knowing what it is, and vice-versa. The consensus was that MultiError (and subclasses thereof) should be thought of as pure transparent containers – they carry information about how an exception or exceptions propagated, like the traceback; but, they shouldn't change how you interpret what happened, or catch/handle the exception. (This is also like tracebacks.)

In this context there was also some discussion of whether we wanted a generic protocol ("any exception object can contain sub-exceptions by defining an __exceptions__ attribute") versus a concrete type. But given these objects can't contain extra semantics, it doesn't makes sense to have e.g. an OSError that also has sub-exceptions inside it. And it turns out that even though trio and asyncio might use different types, they can both be handled by a single concrete type + subclasses. So having a single concrete class seems like the way to go.

What goes into Python?

Yury thinks we might want to be really minimal with what actually goes into Python: like maybe just the exception type and the traceback printers, but leave split and catch out for now. (Also, he really really dislikes the idea of using a with statement to catch these things, and really really wants to find some alternative. But it's very difficult to come up with something better; we spent some time trying and didn't really succeed.) So for now I'm going to go ahead and put split and with catch(...) in our prototype library so we can at least try them, and we can adjust later if we need to. But in deference to Yury, I'll make them top-level functions, rather than classmethods :-).

Name

I've been saying MultiError all this time, but that's actually not what we're going to use.

Guido disliked the Multi part, because they can (in the new design) hold a single exception.

After scrutinizing all the BaseException stuff, I'm uncomfortable with Error, since the Python convention is that that's used specifically for Exception subclasses (and not even all of them, just the ones that indicate real errors – see StopIteration).

We did some brainstorming during lunch; I suggested ExceptionGroup; Guido said "oo, now you're talking". So I guess that's what we're going with for now :-).

@njsmith
Copy link
Member Author

njsmith commented Oct 8, 2018

@thejohnfreeman OK, so I pushed my very-rough-draft code to the exceptiongroup repo so you can at least see it! I think we'll want to move over there and use that name. But, I don't know what code we'll want to use – I haven't really had a chance to look at yours yet. So I guess the next step is for someone to read through both sets of code and figure out which parts are most worth salvaging, and then moving forward in whatever direction makes sense? If you're still interested in working on this, then I guess that's what I'd suggest doing next? (I also want to work on it, but I'm juggling a lot of things, so if you want to take the lead on that it's fine with me :-).)

@njsmith
Copy link
Member Author

njsmith commented Dec 11, 2018

As further evidence that we need to stop "collapsing" single-item MultiError's, here's a real-world account of how this bit @belm0: #800

@belm0
Copy link
Member

belm0 commented Dec 16, 2018

Cases of MultiError in my app tend to be Cancelled combined with something else, and so far I always want to defer to Cancelled. I'm using this pattern:

try:
    ...
except Foo:
    ...
except MultiError as e:
    raise MultiError.filter(lambda exc: None if isinstance(exc, Foo) else exc, e)

@oremanj
Copy link
Member

oremanj commented Jan 10, 2019

I wonder if we can improve the ergonomics (i.e., put the exception handler below the code it handles exceptions in, like Python normally does) with something like:

try:
    # code
except:
    @ExceptionGroup.handle_each(FooException)
    def catch(exc):
        # log, raise another exception (which would chain), etc

Which would be implemented along the lines of

class ExceptionGroup:
    ...
    @staticmethod
    def handle_each(type, match=None):
        def decorate(catcher):
            with ExceptionGroup.catch(type, match=match):
                raise
        return decorate

@njsmith
Copy link
Member Author

njsmith commented Jan 10, 2019

Or possibly even

try:
    ...
except:
    @ExceptionGroup.handle(FooException)
    def handler(exc):
        ...

I'm not sure how to extend this to chain multiple blocks together though. Perhaps:

try:
    ...
except:
    handler_chain = exceptiongroup.HandlerChain()
    @handler_chain(FooException)
    def handler(exc):
        ...
    @handler_chain(BarException)
    def handler(exc):
        ...
    handler_chain.run()

@oremanj
Copy link
Member

oremanj commented Jan 10, 2019

We could let handle() take multiple types, and let users use whatever normal Python method of case-by-types they like:

try:
    ...
except:
    @ExceptionGroup.handle(FooException, BarException)
    def handler(exc):
        try:
            raise exc
        except FooException:
            ...
        except BarException:
            ...

I guess this adds a traceback frame that could be avoided with @handler_chain?

@njsmith
Copy link
Member Author

njsmith commented Jan 10, 2019

@oremanj That doesn't work with the v2 design, because exc here will still contain an ExceptionGroup, just one that's been split to only contain exceptions of the specified type.

@oremanj
Copy link
Member

oremanj commented Jan 10, 2019

Oh! I had assumed that ExceptionGroup.catch() would invoke the handler once for each leaf exception that matches the filter. I would find the other behavior pretty surprising, and I don't see how I could use it easily in a situation where I cared about the attributes of the exceptions.

I suspect we might want interfaces for both "handle everything in the ExceptionGroup that's a subtype of X" (for logging errors) and "handle each atomic exception that's a subtype of X separately" (for more specific handling). Maybe there's something I'm overlooking though...

@Zac-HD
Copy link
Member

Zac-HD commented Nov 18, 2019

Hi all - what's the current state of the ExceptionGroup effort?

Hypothesis is currently considering how to generalise minimal examples, and one of the major challenges that our status quo for reporting multiple errors is not as helpful as we'd like it to be (for example, it doesn't support pdb).

So if there is or soon will be a widely shared way of doing this which we could support, that would probably improve the lives of our users as well as developers! I'd be happy to help out too, if there's some way to do so 😄

@mehaase
Copy link

mehaase commented Feb 28, 2020

I thought I understood MultiError, but recently I was surprised by this mistake I made in a WebSocket server. Each connection has a heartbeat task (sends a WebSocket ping frame every n seconds and waits for a pong) and a request handler task. If the client side closes the connection, most of the time the heartbeat will be waiting in a await trio.sleep(n). In that scenario, the request handler raises ConnectionClosed and the heartbeat task raises Cancelled. My exception handler used a filter to remove Cancelled so that I end up with just a bare exception that I can catch:

try:
    with trio.MultiError.catch(remove_cancelled):
        await connection.run()
except ConnectionClosed:
    logger.info('Connection closed %s@%s', user, client_ip)
except Exception:
    logger.exception('Connection exception %s@%s', user,
        client_ip)

There are two errors here:

  1. MultiError inherits from BaseException, so my fallback handler won't catch it, and it crashes the server instead. This is just a stupid error on my part.
  2. The more interesting error is the assumption that a closed connection always returns a single ConnectionClosed. If a client disconnects while a ping frame is in flight, then both of the connection's task will raise ConnectionClosed, which means my filter still returns a MultiError. This rare race condition was occasionally crashing my server.

I ended up writing a new filter that removes cancellations and all but one ConnectionClosed, and then added MultiError to the last except. I don't know if Trio can do anything to make this easier to reason about. I recognize that part of the problem is the inherent complexity of writing concurrent code and my own lack of experience here. But I thought it was an interesting real-world scenario that might be relevant to the design of MultiError. The v2 design will definitely help because it won't collapse MultiError, but I don't know if there's an ergonomic way to express, "if all of the exceptions are Cancelled or ConnectionClosed then do this, else do that". This example deals with trio_websocket but in principle I think you'd have the same phenomenon when using bare Trio streams.

@belm0
Copy link
Member

belm0 commented Feb 28, 2020

@mehaase I think what you want is a decorator like @defer_to_exception(ConnectionClosed), where a MultiError with any combination of ConnectionClosed and Cancelled will yield ConnectionClosed. It could be generalized to take a list of exceptions in priority order, where any combination of those exceptions and Cancelled will yield the highest priority exception.

It's along the lines of @defer_to_cancelled in trio-util, which does the opposite (defers low-priority exceptions to Cancelled).

https://trio-util.readthedocs.io/en/latest/#trio_util.defer_to_cancelled

In fact I think it could be generalized across the two cases by including Cancelled in the exception list: @defer_to_exception(ConnectionClosed, Cancelled, MyLowPriorityException), where the arg list is from higher to lower priority.

@belm0
Copy link
Member

belm0 commented Mar 15, 2020

I tried to make an API and implementation for a general multi_error_defer_to() context manager mentioned previously, which accepts an ordered list of privileged exception types and coalesces a MultiError into the highest priority exception as long as every exception in the hierarchy is an instance of the listed exceptions. There were several subtle considerations:

  • should it be strict about not choosing an arbitrary exception object among several different candidates (per repr())? - Merging Cancelled() and Cancelled() is fine, but how about ValueError('foo') and ValueError('bar'), or exceptions objects of two derived types matching the base type in the privileged list? I wouldn't want the non-determinism, but perhaps offering it as an option would suit someone else. (Unfortunately it's a bit costly to implement the strictness check.)
  • should it ensure that MultiError is never raised? - I can imagine use cases where you want to ensure MultiError is not leaked from your API, so it seems like there should be an option. If the MultiError cannot be coalesced, a RuntimeError is raised.
def multi_error_defer_to(*privileged_types: Type[BaseException],
                         propagate_multi_error=True,
                         strict=True):
    """
    Defer a trio.MultiError exception to a single, privileged exception

    In the scope of this context manager, a raised MultiError will be coalesced
    into a single exception with the highest privilege if the following
    criteria is met:
        1. every exception in the MultiError is an instance of one of the given
           privileged types
    additionally, by default with strict=True:
        2. there is a single candidate at the highest privilege after grouping
           the exceptions by repr().  For example, this test fails if both
           ValueError('foo') and ValueError('bar') are the most privileged.

    If the criteria are not met, by default the original MultiError is
    propagated.  Use propagate_multi_error=False to instead raise a
    RuntimeError in these cases.

    Synopsis:
        multi_error_defer_to(trio.Cancelled, MyException)
            MultiError([Cancelled(), MyException()]) -> Cancelled()
            MultiError([Cancelled(), MyException(), MultiError([Cancelled(), Cancelled())]]) -> Cancelled()
            MultiError([Cancelled(), MyException(), ValueError()]) -> *no change*
            MultiError([MyException('foo'), MyException('foo')]) -> MyException('foo')
            MultiError([MyException('foo'), MyException('bar')]) -> *no change*

        multi_error_defer_to(MyImportantException, trio.Cancelled, MyBaseException)
            # where isinstance(MyDerivedException, MyBaseException)
            #   and isinstance(MyImportantException, MyBaseException)
            MultiError([Cancelled(), MyDerivedException()]) -> Cancelled()
            MultiError([MyImportantException(), Cancelled()]) -> MyImportantException()
    """

@belm0
Copy link
Member

belm0 commented Jun 29, 2020

@graingert
Copy link
Member

ExceptionGroup([SystemExit()]) needs special handling in the interpreter

ExceptionGroup([KeyboardInterrupt()]) needs special handling also https://github.com/python/cpython/pull/21956/files#diff-75445bdc3b6b3dd20b005698fa165444R290

@graingert
Copy link
Member

graingert commented Oct 21, 2020 via email

@arthur-tacca
Copy link
Contributor

Hi, I have a suggestion to make here. Even though I'm the one making it, even I'm not really sure it's a good idea. But it doesn't seem to have been mentioned before so I wanted to at least bring it up.

I'm quite a new (/naive) user of both Trio and asyncio, and I was trying to get a classic catch-all exception handler to work: Exception is caught, logged, and swallowed, while BaseException is caught and logged but then re-raised. Of course MultiError (v1) plays havok with that; I'd need to catch it specifically then recursively check interior exceptions to see whether I should re-raise it.

My suggestion is something a bit blunt but a lot more straightforward than runtime manipulation of base classes. It's to have actually have two classes instead of one:

  • BaseMultiError inherits from BaseException and contains all the multi-error functionality.
  • MultiError now multiply-inherits from (Exception, BaseMultiError).

There could be a helper function to build a multi-error object from a list of child exceptions, and you'd get a BaseMultiError if any of them do not derive from Exception, and a MultiError otherwise. In my code I can just have except ... clauses for Exception and BaseException without any special handling for multi-errors.

The main weakness with my suggestion is that it's very focused on my one specific use case (albeit quite a common one). If you want to find some specific type of exception (other than Exception) then it doesn't help at all. It sounds like the type(E).__subclasscheck__() trick mentioned above would solve my use case, although I'm not quite sure what my code would need to look like (maybe I need to add a clause except MultiError[Exception]?), but unlike my idea would also handle other use cases.

@h-vetinari
Copy link

For subscribers of this issue: there's now PEP654 for ExceptionGroup (which explicitly lists this issue as a reference), see also python-dev announcement.

Guess now would be a good time for feedback, if you have opinions™

@Zac-HD
Copy link
Member

Zac-HD commented Apr 16, 2022

Played around with this a bit tonight and re:

It'd be nice if we could like... annotate tracebacks somehow?
If we stopped collapsing MultiErrors, then there's a natural place to put this information: each MultiError corresponds to a jump across task boundaries, so we can put it in the exception string or something. (Oh yeah, maybe we should switch MultiErrors to having associated message strings? Currently they don't have that.)

...it immediately became clear that what you actually want is not the ability to annotate a MultiError, but rather to annotate each of the exceptions inside the MultiError. Like, "this came from child task X", "this came from the main task".

With PEP-678, you can loop over the inner exceptions and add a note to each:

exc = MultiError.filter(self._exc_filter, exc)

    exc = MultiError.filter(self._exc_filter, exc)
    for e in exc.exceptions:
        e.add_note(f"This came from {scope_task!r}")

(OK, in practice we'll also need the exceptiongroup backport for display for a while, and if not hasattr(e, "add_note"): <add __notes__ by hand>, but that's still not too bad)

@Zac-HD
Copy link
Member

Zac-HD commented Nov 3, 2022

Closing this issue because Python 3.11 has been released with a built-in ExceptionGroup type, and Trio has already switched over (using the excellent backport on earlier versions).

Thanks to everyone involved in this remarkably long effort!

@Zac-HD Zac-HD closed this as completed Nov 3, 2022
@gvanrossum
Copy link

What's the backport's equivalent to except*?

@graingert
Copy link
Member

from exceptiongroup import ExceptionGroup, catch

def value_key_err_handler(excgroup: ExceptionGroup) -> None:
    for exc in excgroup.exceptions:
        print('Caught exception:', type(exc))

def runtime_err_handler(exc: ExceptionGroup) -> None:
    print('Caught runtime error')

with catch({
    (ValueError, KeyError): value_key_err_handler,
    RuntimeError: runtime_err_handler
}):
    ...

@arthur-tacca
Copy link
Contributor

arthur-tacca commented Nov 10, 2022

@graingert The unfortunate tree-like design of ExceptionGroup.exceptions means that would not give you the exceptions you were probably hoping for. Instead you need to recurse into the child subtrees, starting by copying and pasting the helper function from the PEP:

def leaf_generator(exc, tbs=None):
    if tbs is None:
        tbs = []

    tbs.append(exc.__traceback__)
    if isinstance(exc, BaseExceptionGroup):
        for e in exc.exceptions:
            yield from leaf_generator(e, tbs)
    else:
        # exc is a leaf exception and its traceback
        # is the concatenation of the traceback
        # segments in tbs.

        # Note: the list returned (tbs) is reused in each iteration
        # through the generator. Make a copy if your use case holds
        # on to it beyond the current iteration or mutates its contents.

        yield exc, tbs
    tbs.pop()

def value_key_err_handler(excgroup: ExceptionGroup) -> None:
    for exc in leaf_generator(excgroup):
        print('Caught exception:', type(exc))

You'd have the same problem with except*.

Edit: If you don't care about tracebacks then you could use this rather simpler leaf iterator (untested!):

def leaf_generator(exc: BaseExceptionGroup):
    for e in exc.exceptions:
        if isinstance(e, BaseExceptionGroup):
            yield from leaf_generator(e)
        else:
            yield e

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests