Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor find_missing and refresh_who_has #6348

Merged
merged 1 commit into from
Jun 1, 2022

Conversation

crusaderky
Copy link
Collaborator

@crusaderky crusaderky commented May 16, 2022

Additional changes to #6342 are visible at crusaderky#3

Refactor Worker->Scheduler queries to fetch who_has, from RPC to bulk comms. This removes one of the two async commands in gather_dep (the other being the actual data request) and is propaedeutic to its refactoring.

Fix minor bug where:

  1. all workers that hold a replica of a key are busy
  2. gather_dep triggers a query to scheduler.who_has to get more workers
  3. the scheduler replies with new workers
  4. nothing happens until the next call to _ensure_communicating, which will happen at the latest 150ms later (when a worker gets out of busy state).

@crusaderky crusaderky force-pushed the WSMR/refresh_who_has branch from 8eb8eea to 56c796c Compare May 16, 2022 11:41
@crusaderky crusaderky self-assigned this May 16, 2022
@github-actions
Copy link
Contributor

github-actions bot commented May 16, 2022

Unit Test Results

       15 files  ±  0         15 suites  ±0   6h 27m 49s ⏱️ - 6m 5s
  2 826 tests +  7    2 745 ✔️ +  9    81 💤 +1  0  - 3 
20 945 runs  +49  20 002 ✔️ +44  943 💤 +8  0  - 3 

Results for commit 1bc0283. ± Comparison against base commit 5feb171.

♻️ This comment has been updated with latest results.

@crusaderky crusaderky marked this pull request as ready for review May 16, 2022 15:30
@crusaderky crusaderky force-pushed the WSMR/refresh_who_has branch from 56c796c to cbccc86 Compare May 16, 2022 19:19
crusaderky added a commit to crusaderky/distributed that referenced this pull request May 18, 2022
@crusaderky
Copy link
Collaborator Author

As the git tree has become messy, until #6342 is merged the diff of this PR is visible at crusaderky#3

crusaderky added a commit to crusaderky/distributed that referenced this pull request May 20, 2022
@crusaderky crusaderky mentioned this pull request May 20, 2022
@crusaderky crusaderky linked an issue May 20, 2022 that may be closed by this pull request
8 tasks
crusaderky added a commit to crusaderky/distributed that referenced this pull request May 21, 2022
crusaderky added a commit to crusaderky/distributed that referenced this pull request May 21, 2022
@crusaderky crusaderky force-pushed the WSMR/refresh_who_has branch 2 times, most recently from 45dc795 to 2f96f32 Compare May 25, 2022 10:44
@crusaderky
Copy link
Collaborator Author

@fjetter this has been updated to reflect the new update_who_has.
Diff here: crusaderky#3

@crusaderky
Copy link
Collaborator Author

Ironically, with this PR some changes that we just introduced in the update_who_has PR can now be reverted: df488f0

@crusaderky crusaderky force-pushed the WSMR/refresh_who_has branch from c0b2bd4 to 50f83d2 Compare June 1, 2022 11:56
Copy link
Member

@fjetter fjetter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 LGTM

Just for reference, there is a discussion going on that the entire find-missing mechanism can likely be removed entirely. I'm fine with this refactoring that moves us closer to a more robust architecture see #6445

Comment on lines +2861 to +2862
if not isinstance(stim, FindMissingEvent):
self.stimulus_log.append(stim.to_loggable(handled=time()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you're logging this because it is flooding the log otherwise?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. Plus, it's really not interesting to see in the log even when relevant. Note that the response from the scheduler is logged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate ensure_communicating transitions to new WorkerState event mechanism
2 participants