Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix distributed.worker.memory.target=False #5788

Merged
merged 1 commit into from
Feb 10, 2022

Conversation

crusaderky
Copy link
Collaborator

@crusaderky crusaderky commented Feb 10, 2022

Fix bug with config

distributed:
  worker:
    memory:
      target: false
      spill: 0.7

Where the Worker would ignore the target and use 0.7 instead.

Review a few tests around spilling.

xrefs

memory_limit=1200 / 0.6,
memory_pause_fraction=None,
memory_spill_fraction=None,
)
Copy link
Collaborator Author

@crusaderky crusaderky Feb 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would hit the spill threshold after 1200/0.6*0.7=1400 bytes worth of total process memory, and the pause threshold after 1200/0.6*0.8=1600 bytes. Unmanaged memory alone is always > 100 MB.

In other words, any futures at all managed to complete just thanks to a race condition with memory_monitor().
It was also a race condition to decide what would spill any of the keys, the zict LRU machinery used by target or the explicit evict() used by by spill.

Test reimplemented below.

@crusaderky crusaderky self-assigned this Feb 10, 2022
@crusaderky crusaderky marked this pull request as ready for review February 10, 2022 11:26
@crusaderky crusaderky changed the title Fix for distributed.worker.memory.target=False and spill=0.7 Fix for distributed.worker.memory.target=False Feb 10, 2022
@crusaderky crusaderky changed the title Fix for distributed.worker.memory.target=False Fix distributed.worker.memory.target=False Feb 10, 2022
@fjetter
Copy link
Member

fjetter commented Feb 10, 2022

I see the test test_gather_then_submit_after_failed_workers fail quite deterministically. It looks like another instance of #5751

@github-actions
Copy link
Contributor

Unit Test Results

       17 files  ±  0         17 suites  ±0   9h 7m 29s ⏱️ + 19m 40s
  2 595 tests ±  0    2 513 ✔️ ±  0       79 💤  -   1  3 +1 
21 828 runs   - 27  20 447 ✔️  - 86  1 374 💤 +56  7 +3 

For more details on these failures, see this check.

Results for commit b36fcf0. ± Comparison against base commit 8a5e014.

@fjetter fjetter merged commit 13fce19 into dask:main Feb 10, 2022
@crusaderky crusaderky mentioned this pull request Feb 10, 2022
@crusaderky crusaderky deleted the spill_no_target branch February 10, 2022 16:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants