Where multiple Worker on same Host, scheduler should consider actual free space on disk before accepting sector #11013

michieljmitchell · 2023-06-28T16:12:56Z

Checklist

This is not brainstorming ideas. If you have an idea you'd like to discuss, please open a new discussion on the lotus forum and select the category as Ideas.
I have a specific, actionable, and well motivated feature request to propose.

Lotus component

lotus daemon - chain sync
lotus fvm/fevm - Lotus FVM and FEVM interactions
lotus miner/worker - sealing
lotus miner - proving(WindowPoSt/WinningPoSt)
lotus JSON-RPC API
lotus message management (mpool)
Other

What is the motivation behind this feature request? Is your feature request related to a problem? Please describe.

Impact of a worker running out of space has a domino effect onto other workers where the same will happen and eventually a massive queue will build and could cause up to 3 days of downtime to fix and resolve based on the amount of workers allocated to the a miner.

Describe the solution you'd like

Lotus/scheduler should be clever and aware of free/available space on workers, specifically PC1 but would be required on PC2 if fixed on PC1’s.

Boost will keep sending new sectors to PC1 whether it has enough available space for the sector or not.

Describe alternatives you've considered

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

rjan90 · 2023-07-06T07:01:33Z

Hey @michieljmitchell!

This is very fair point and is something that we are aware about. We have been looking at potentially replacing the current scheduler with something where the workers are able to take tasks based on their knowledge of their own state, instead of the lotus-miner process trying to model it. Some of the base-layer for this potential refactor will be built in #10991.

That said, I would suggest to try out the experiment-spread-tasks-qcount assigner in the meantime (link). That assigner logic takes into account task counts which are in running/preparing/queued states, as well as counting running tasks on a per-task-type basic, which should help alleviate some of the overflowing from Boost.

TippyFlitsUK · 2023-07-20T15:57:24Z

Hey @michieljmitchell

Have you managed to try @rjan90's suggestion with the experiment-spread-tasks-qcount assigner?

How are you getting on?

michieljmitchell added kind/feature Kind: Feature need/triage labels Jun 28, 2023

TippyFlitsUK added kind/enhancement Kind: Enhancement area/sealing need/author-input Hint: Needs Author Input need/team-input Hint: Needs Team Input and removed kind/feature Kind: Feature need/triage need/author-input Hint: Needs Author Input labels Jul 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Where multiple Worker on same Host, scheduler should consider actual free space on disk before accepting sector #11013

Where multiple Worker on same Host, scheduler should consider actual free space on disk before accepting sector #11013

michieljmitchell commented Jun 28, 2023

rjan90 commented Jul 6, 2023

TippyFlitsUK commented Jul 20, 2023

Where multiple Worker on same Host, scheduler should consider actual free space on disk before accepting sector #11013

Where multiple Worker on same Host, scheduler should consider actual free space on disk before accepting sector #11013

Comments

michieljmitchell commented Jun 28, 2023

Checklist

Lotus component

What is the motivation behind this feature request? Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

rjan90 commented Jul 6, 2023

TippyFlitsUK commented Jul 20, 2023