-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: PoSt workers #7971
feat: PoSt workers #7971
Conversation
…/lotus into feat/decoupling-post-worker
Codecov Report
@@ Coverage Diff @@
## master #7971 +/- ##
==========================================
+ Coverage 40.23% 40.55% +0.31%
==========================================
Files 684 687 +3
Lines 74655 75230 +575
==========================================
+ Hits 30040 30509 +469
- Misses 39372 39420 +48
- Partials 5243 5301 +58
|
ae4f219
to
33dfcf2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some notes
7051e20
to
cca69a6
Compare
Just tested in lotus-pond, and it works really well:
|
@@ -161,25 +99,4 @@ func (m *Manager) CheckProvable(ctx context.Context, pp abi.RegisteredPoStProof, | |||
return bad, nil | |||
} | |||
|
|||
func addCachePathsForSectorSize(chk map[string]int64, cacheDir string, ssize abi.SectorSize) { | |||
switch ssize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see these stat checks anywhere anymore and no behavior change within GenerateSingleVanillaProof
. Did they all become redundant with the check in GenerateSingleVanillaProof
at some point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, if we can read challanges, even random ones, that probably means that the sector is fine (when challenges are read, they are also validated, at least that's what I've been told)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will it seriously increase the time spent on check?
}, | ||
sealtasks.TTGenerateWindowPoSt: { | ||
abi.RegisteredSealProof_StackedDrg64GiBV1: Resources{ | ||
MaxMemory: 120 << 30, // TODO: Confirm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still need confirmation here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep
// list contains substitutes for skipped sectors - but we don't care about | ||
// those for the purpose of the proof, so for things to work, we need to | ||
// dedupe here. | ||
sectorInfo = dedupeSectorInfo(sectorInfo) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think anything like this currently exists. Is this new logic fixing a bug / bugs in current lotus?
sectorInfo = dedupeSectorInfo(sectorInfo) | ||
|
||
// The partitions number of this batch | ||
// ceil(sectorInfos / maxPartitionSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is surprising to me. If we had two partitions of size maxPartitionSize and half of each was skipped this code is telling me that we only need to take one partition proof rather than two partition proofs. There's no problem shifting sectors across partitions like this?
@magik6k since we're using a different pathway through ffi to do post in the case a worker is hooked up it seems like lotus-bench should adapt to be able to exercise the new path. I'm not sure whether the typical lotus-bench use case is 1) exercise the operator's specific setup or 2) exercise general purpose proving pathways on the miner. If its 1) we could hook up lotus-bench to the manager so it can talk to workers. If its 2) I think we should add a lotus-bench check that does the challenge -> vanilla proof -> snark ( -> merge multi partition?) This may be overkill because ffi is doing similar things inside for each. What do you think? |
@@ -31,15 +31,15 @@ import ( | |||
|
|||
const metaFile = "sectorstore.json" | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: add test of new method GenerateSingleVanillaProof
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The question about lotus bench remains and I didn't review the new worker itest code too carefully but I'm happy with this going in as is.
if storiface.WorkerID(curSes) != wid { | ||
if curSes != ClosedWorkerID { | ||
// worker restarted | ||
log.Warnw("worker session changed (worker restarted?)", "initial", wid, "current", curSes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
disable here? or is the delete on return enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that delete is enough
b6a3c7f
to
82343a0
Compare
On lotus-bench + setting good resource numbers - I think this can be done after this lands in master; I'd love to stop making this PR bigger, especially given that it seems to work pretty well already (opened #8373 with some in-progress work on top of this PR) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 on the last five commits
What would happen when miner process exit? |
@magik6k |
Related Issues
Based on #7600
Proposed Changes
PoSt workers:
Additional Info
This PR introduces a number of new features:
lotus-worker
- one lotus-worker instance can only be one of:Usage
lotus-worker run [--winningpost/--windowpost]
(use the same envvars like for any other worker)lotus-miner sealing workers
Workers: Seal(1) WdPoSt(0) WinPoSt(1)
Followups
Checklist
Before you mark the PR ready for review, please make sure that:
<PR type>: <area>: <change being made>
fix: mempool: Introduce a cache for valid signatures
PR type
: fix, feat, INTERFACE BREAKING CHANGE, CONSENSUS BREAKING, build, chore, ci, docs, misc,perf, refactor, revert, style, testarea
: api, chain, state, vm, data transfer, market, mempool, message, block production, multisig, networking, paychan, proving, sealing, wallet