Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove server close background task grace period #6633

Merged
merged 8 commits into from
Jul 1, 2022

Conversation

graingert
Copy link
Member

Closes #6632

a draft CI run to see what happens

  • Tests added / passed
  • Passes pre-commit run --all-files

@github-actions
Copy link
Contributor

github-actions bot commented Jun 27, 2022

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

       15 files  ±    0         15 suites  ±0   6h 22m 32s ⏱️ - 3h 16m 31s
  2 908 tests +    1    2 822 ✔️  -     1    82 💤 ±  0  4 +2 
20 713 runs   - 812  19 791 ✔️  - 778  917 💤  - 35  5 +1 

For more details on these failures, see this check.

Results for commit 4a05368. ± Comparison against base commit 1bfd99c.

♻️ This comment has been updated with latest results.

async def background_stops():
await asyncio.gather(*_stops)

self._ongoing_background_tasks.call_soon(background_stops)
Copy link
Member

@hendrikmakait hendrikmakait Jun 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a comment/warning here to highlight that these are likely to get cancelled due to the lack of a grace period? Alternatively, we could add a small grace period back in that would allow fast tasks to finish but have less of a performance impact? For example, 100 ms should result in ~5 % performance degradation, 20 ms in 1 %.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a PendingDeprecationWarning to https://github.com/dask/distributed/pull/6633/files#r907345989

I have no idea who would implement this asynchronously. I think the deprecation warning there should be sufficient.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a regular DeprecationWarning would be better, otherwise you have to go via Pend -> Deprecate -> Remove

Comment on lines +524 to +523
if inspect.isawaitable(future):
_stops.add(future)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC you introduce this merely to ensure that there are no users outside that provide a listener with an async stop?
This smells like we should introduce a deprecation warning and get rid of this

Copy link
Member Author

@graingert graingert Jun 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, it's tricky to work out where to put the deprecation, as right at server stop is a bit late.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, it's tricky to work out where to put the deprecation, as right at server stop is a bit late.

Well, better than nothing. If anybody is using this, they'll see it in their CI then. We're also not in a rush to remove this so we can let the warning sit for a while

@fjetter
Copy link
Member

fjetter commented Jun 27, 2022

There appears to be a worker state related test failure with test_flight_to_executing_via_cancelled_resumed

@graingert graingert force-pushed the remove-server-close-grace-period branch from d93c7b2 to b9767ca Compare June 28, 2022 09:04
await s.remove_worker(w1.address, stimulus_id="stim-id")

await wait_for_state(f3.key, "resumed", w2)
await asyncio.gather(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f"Expected all ongoing tasks to be cancelled and removed, found {self._ongoing_tasks}."
)
if err is not None:
raise err
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reasoning behind calling task.cancel() multiple times?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the current task is cancelled and any child task suppresses the cancelation then the child tasks will leak from the task group

@fjetter fjetter self-assigned this Jun 30, 2022
@fjetter fjetter marked this pull request as ready for review June 30, 2022 12:06
Copy link
Collaborator

@crusaderky crusaderky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides one test that leaves me skeptical, LGTM

Copy link
Member

@hendrikmakait hendrikmakait left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM apart from one list without a clear purpose.

distributed/tests/test_cancelled_state.py Outdated Show resolved Hide resolved
@fjetter fjetter force-pushed the remove-server-close-grace-period branch from cd8ac6e to 4a05368 Compare July 1, 2022 10:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

@gen_cluster has become 1s slower; CI takes 50% longer
4 participants