Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix long exception of asyncio.gather #2748

Merged
merged 4 commits into from
Feb 24, 2022

Conversation

fyrestone
Copy link
Contributor

@fyrestone fyrestone commented Feb 24, 2022

What do these changes do?

Long exception will be raised when all the following conditions are met,

  • asyncio.gather
  • all the gathering coroutines / tasks are set to the same exception instance.

Minimal reproduce code

import asyncio

futures = [asyncio.Future() for x in range(10)]


async def foo(i):
    await futures[i]


async def bar(i):
    await foo(i)


async def set_exception_loop():
    try:
        raise Exception("Got Exception!!!")
    except Exception as e:
        for fut in futures:
            # All the futures are sharing the same exception instance.
            fut.set_exception(e)


async def wait_in_loop():
    tasks = []
    for _ in range(10):
        task = asyncio.create_task(bar(_))
        tasks.append(task)

    asyncio.create_task(set_exception_loop())
    await asyncio.gather(*tasks)


loop = asyncio.get_event_loop()
loop.run_until_complete(wait_in_loop())

output of long exception

Traceback (most recent call last):
  File "/home/admin/mars/reproduce5.py", line 34, in <module>
    loop.run_until_complete(wait_in_loop())
  File "/home/admin/.pyenv/versions/3.8.12/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "/home/admin/mars/reproduce5.py", line 30, in wait_in_loop
    await asyncio.gather(*tasks)
  File "/home/admin/mars/reproduce5.py", line 11, in bar
    await foo(i)
  File "/home/admin/mars/reproduce5.py", line 7, in foo
    await futures[i]
  File "/home/admin/mars/reproduce5.py", line 11, in bar
    await foo(i)
  File "/home/admin/mars/reproduce5.py", line 7, in foo
    await futures[i]
  File "/home/admin/mars/reproduce5.py", line 11, in bar
    await foo(i)
  File "/home/admin/mars/reproduce5.py", line 7, in foo
    await futures[i]
  File "/home/admin/mars/reproduce5.py", line 11, in bar
    await foo(i)
  File "/home/admin/mars/reproduce5.py", line 7, in foo
    await futures[i]
  File "/home/admin/mars/reproduce5.py", line 11, in bar
    await foo(i)
  File "/home/admin/mars/reproduce5.py", line 7, in foo
    await futures[i]
  File "/home/admin/mars/reproduce5.py", line 11, in bar
    await foo(i)
  File "/home/admin/mars/reproduce5.py", line 7, in foo
    await futures[i]
  File "/home/admin/mars/reproduce5.py", line 11, in bar
    await foo(i)
  File "/home/admin/mars/reproduce5.py", line 7, in foo
    await futures[i]
  File "/home/admin/mars/reproduce5.py", line 11, in bar
    await foo(i)
  File "/home/admin/mars/reproduce5.py", line 7, in foo
    await futures[i]
  File "/home/admin/mars/reproduce5.py", line 11, in bar
    await foo(i)
  File "/home/admin/mars/reproduce5.py", line 7, in foo
    await futures[i]
  File "/home/admin/mars/reproduce5.py", line 11, in bar
    await foo(i)
  File "/home/admin/mars/reproduce5.py", line 7, in foo
    await futures[i]
  File "/home/admin/mars/reproduce5.py", line 16, in set_exception_loop
    raise Exception("Got Exception!!!")
Exception: Got Exception!!

Fixed code (give each gathering future a copy of exception)

import asyncio
import copy

futures = [asyncio.Future() for x in range(10)]


async def foo(i):
    await futures[i]


async def bar(i):
    await foo(i)


async def set_exception_loop():
    try:
        raise Exception("Got Exception!!!")
    except Exception as e:
        for fut in futures:
            # We give each future a different exception instance.
            fut.set_exception(copy.copy(e))


async def wait_in_loop():
    tasks = []
    for _ in range(10):
        task = asyncio.create_task(bar(_))
        tasks.append(task)

    asyncio.create_task(set_exception_loop())
    await asyncio.gather(*tasks)


loop = asyncio.get_event_loop()
loop.run_until_complete(wait_in_loop())

output exception is short as expected

Traceback (most recent call last):
  File "/home/admin/mars/reproduce4.py", line 35, in <module>
    loop.run_until_complete(wait_in_loop())
  File "/home/admin/.pyenv/versions/3.8.12/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "/home/admin/mars/reproduce4.py", line 31, in wait_in_loop
    await asyncio.gather(*tasks)
  File "/home/admin/mars/reproduce4.py", line 12, in bar
    await foo(i)
  File "/home/admin/mars/reproduce4.py", line 8, in foo
    await futures[i]
Exception: Got Exception!!!

Related issue number

Fixes #2744

Check code requirements

  • tests added / passed (if needed)
  • Ensure all linting tests pass, see here for how to run them

@fyrestone fyrestone self-assigned this Feb 24, 2022
Copy link
Member

@wjsi wjsi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@qinxuye qinxuye added mod: actor to be backported Indicate that the PR need to be backported to stable branch type: bug Something isn't working labels Feb 24, 2022
@qinxuye qinxuye added this to the v0.9.0b2 milestone Feb 24, 2022
Copy link
Collaborator

@qinxuye qinxuye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@qinxuye qinxuye merged commit 9dbfe0c into mars-project:master Feb 24, 2022
wjsi pushed a commit to wjsi/mars that referenced this pull request Feb 25, 2022
@qinxuye qinxuye added backported already PR has been backported and removed to be backported Indicate that the PR need to be backported to stable branch labels Mar 1, 2022
qinxuye pushed a commit to hekaisheng/mars that referenced this pull request Mar 1, 2022
chaokunyang pushed a commit to chaokunyang/mars that referenced this pull request May 31, 2022
mars-project#2748)

Merge branch fix_long_exception of git@gitlab.alipay-inc.com:ray-project/mars.git into master
https://code.alipay.com/ray-project/mars/pull_requests/248

Signed-off-by: 不涸 <zhongchun.yzc@antgroup.com>


* Fix long exception of asyncio.gather (mars-project#2748)

* Fix cherry pick

* Fix CI
chaokunyang pushed a commit to chaokunyang/mars that referenced this pull request May 31, 2022
Merge branch cp_2748_2755_2801 of git@gitlab.alipay-inc.com:ray-project/mars.git into master
https://code.alipay.com/ray-project/mars/pull_requests/267

Signed-off-by: 不涸 <zhongchun.yzc@antgroup.com>


* Fix long exception of asyncio.gather (mars-project#2748)

* Fix profiling band_subtasks and most_calls are empty if the slow duration is large (mars-project#2755)

* [Ray] Default enable fault tolerance for ray (mars-project#2801)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backported already PR has been backported mod: actor type: bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] mars exception stack exceed 20W+ lines
3 participants