Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zombie tasks after missing->released transition #5316

Merged
merged 2 commits into from
Oct 1, 2021

Conversation

fjetter
Copy link
Member

@fjetter fjetter commented Sep 13, 2021

This is a downstream issue introduced by #5046

After a task is transitioned to missing and is released afterwards, the transition missing->released is leaving a zombie task, i.e. an unreferenced task. There is no data connected and no execution or fetch scheduled afterwards so this is not a critical issue.

The fix is already functional but I'll keep the WIP until the primary PR is merged

@fjetter fjetter force-pushed the zombie_tasks_after_missing branch from 2a7c2c0 to 92ed7c7 Compare September 27, 2021 12:08
@fjetter fjetter requested a review from crusaderky September 27, 2021 12:08
@fjetter fjetter changed the title WIP Zombie tasks after missing Zombie tasks after missing->released transition Sep 27, 2021
Ensure that no fetch/flight tasks are left in the task dict of a
worker after everything was released
"""
a.total_in_connections = 0
Copy link
Collaborator

@crusaderky crusaderky Sep 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
a.total_in_connections = 0
# Cause f1 to remain in "fetch" state on b forever
a.total_in_connections = 0

f2 = c.submit(inc, f1, key="f2", workers=[b.address])

while f1.key not in b.tasks:
await asyncio.sleep(0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
await asyncio.sleep(0)
await asyncio.sleep(0.01)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The zero sleep is actually deliberate to avoid any timing related problems. fetch->flight->memory can be very fast and I don't want to miss the transition. My sleeping 0 we're checking this upon every loop cycle. This is a bit excessive but guarantees us not to miss this

Copy link
Collaborator

@crusaderky crusaderky Sep 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't a.total_in_connections = 0 cause the state on b to be stuck in "fetch" forever?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the tasks will transition fo flight and trigger a get_data to worker A. Based on the total_in_connections it will then limit the number of concurrent outgoing connections (outgoing_current_count) and will reply with a {"status": "busy"} message.

@fjetter fjetter merged commit d8c7c6b into dask:main Oct 1, 2021
@jrbourbeau jrbourbeau mentioned this pull request Oct 7, 2021
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants