Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where multiple Worker on same Host, scheduler should consider actual free space on disk before accepting sector #11013

Open
3 of 9 tasks
michieljmitchell opened this issue Jun 28, 2023 · 2 comments
Labels
area/sealing kind/enhancement Kind: Enhancement need/team-input Hint: Needs Team Input

Comments

@michieljmitchell
Copy link

Checklist

  • This is not brainstorming ideas. If you have an idea you'd like to discuss, please open a new discussion on the lotus forum and select the category as Ideas.
  • I have a specific, actionable, and well motivated feature request to propose.

Lotus component

  • lotus daemon - chain sync
  • lotus fvm/fevm - Lotus FVM and FEVM interactions
  • lotus miner/worker - sealing
  • lotus miner - proving(WindowPoSt/WinningPoSt)
  • lotus JSON-RPC API
  • lotus message management (mpool)
  • Other

What is the motivation behind this feature request? Is your feature request related to a problem? Please describe.

Impact of a worker running out of space has a domino effect onto other workers where the same will happen and eventually a massive queue will build and could cause up to 3 days of downtime to fix and resolve based on the amount of workers allocated to the a miner.

Describe the solution you'd like

Lotus/scheduler should be clever and aware of free/available space on workers, specifically PC1 but would be required on PC2 if fixed on PC1’s.

Boost will keep sending new sectors to PC1 whether it has enough available space for the sector or not.

Describe alternatives you've considered

No response

Additional context

No response

@rjan90
Copy link
Contributor

rjan90 commented Jul 6, 2023

Hey @michieljmitchell!

This is very fair point and is something that we are aware about. We have been looking at potentially replacing the current scheduler with something where the workers are able to take tasks based on their knowledge of their own state, instead of the lotus-miner process trying to model it. Some of the base-layer for this potential refactor will be built in #10991.

That said, I would suggest to try out the experiment-spread-tasks-qcount assigner in the meantime (link). That assigner logic takes into account task counts which are in running/preparing/queued states, as well as counting running tasks on a per-task-type basic, which should help alleviate some of the overflowing from Boost.

@TippyFlitsUK TippyFlitsUK added kind/enhancement Kind: Enhancement area/sealing need/author-input Hint: Needs Author Input need/team-input Hint: Needs Team Input and removed kind/feature Kind: Feature need/triage need/author-input Hint: Needs Author Input labels Jul 20, 2023
@TippyFlitsUK
Copy link
Contributor

Hey @michieljmitchell

Have you managed to try @rjan90's suggestion with the experiment-spread-tasks-qcount assigner?

How are you getting on?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/sealing kind/enhancement Kind: Enhancement need/team-input Hint: Needs Team Input
Projects
None yet
Development

No branches or pull requests

3 participants