Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sector can not be removed #5081

Closed
qiusugang opened this issue Dec 1, 2020 · 7 comments
Closed

Sector can not be removed #5081

qiusugang opened this issue Dec 1, 2020 · 7 comments

Comments

@qiusugang
Copy link

When I want to remove a sector, at first I abort this sector runing job. and run command:
lotus-miner sectors remove --really-do-it

It not effectly, late soon, the miner still assign it to worker as a new job.

My lotus-miner version is 1.2.1+git.3e143cac4.dirty

@jennijuju
Copy link
Member

jennijuju commented Dec 1, 2020

What do you mean by aborting the sector running job, aborting the task by sealing abort maybe?

Please provide the sectors status --log

@jennijuju jennijuju added support need/author-input Hint: Needs Author Input labels Dec 1, 2020
@qiusugang
Copy link
Author

qiusugang commented Dec 2, 2020

This is lotus-miner sectors status --log 46
Event Log:
0. 2020-11-30 07:13:57 +0800 CST: [event;sealing.SectorStartCC] {"User":{"ID":46,"SectorType":8,"Pieces":[{"Piece":{"Size":34359738368,"PieceCID":{"/":"baga6ea4seaqao7s73y24kcutaosvacpdjgfe5pw76ooefnyqw4ynr3d2y6x2mpq"}},"DealInfo":null}]}}

  1. 2020-11-30 07:13:58 +0800 CST: [event;sealing.SectorPacked] {"User":{"FillerPieces":null}}
  2. 2020-11-30 07:13:58 +0800 CST: [event;sealing.SectorTicket] {"User":{"TicketValue":"bc/UOK7fLGm+O6mWOJ6FEnCIRJGIbjX0/eo7g/kfOvQ=","TicketEpoch":278607}}
  3. 2020-11-30 12:18:45 +0800 CST: [event;sealing.SectorPreCommit1] {"User":{"PreCommit1Out":"eyJyZWdpc3RlcmVkX3Byb29mIjoiU3RhY2tlZERyZzMyR2lCVjFfMSIsImxhYmVscyI6eyJTdGFja2VkRHJnMzJHaUJWMSI6eyJsYWJlbHMiOlt7InBhdGgiOiIvaG9tZS9pcGZzLy5sb3R1c21pbmVyL2NhY2hlL3MtdDA4NTg5OS00NiIsImlkIjoibGF5ZXItMSIsInNpemUiOjEwNzM3NDE4MjQsInJvd3NfdG9fZGlzY2FyZCI6N30seyJwYXRoIjoiL2hvbWUvaXBmcy8ubG90dXNtaW5lci9jYWNoZS9zLXQwODU4OTktNDYiLCJpZCI6ImxheWVyLTIiLCJzaXplIjoxMDczNzQxODI0LCJyb3dzX3RvX2Rpc2NhcmQiOjd9LHsicGF0aCI6Ii9ob21lL2lwZnMvLmxvdHVzbWluZXIvY2FjaGUvcy10MDg1ODk5LTQ2IiwiaWQiOiJsYXllci0zIiwic2l6ZSI6MTA3Mzc0MTgyNCwicm93c190b19kaXNjYXJkIjo3fSx7InBhdGgiOiIvaG9tZS9pcGZzLy5sb3R1c21pbmVyL2NhY2hlL3MtdDA4NTg5OS00NiIsImlkIjoibGF5ZXItNCIsInNpemUiOjEwNzM3NDE4MjQsInJvd3NfdG9fZGlzY2FyZCI6N30seyJwYXRoIjoiL2hvbWUvaXBmcy8ubG90dXNtaW5lci9jYWNoZS9zLXQwODU4OTktNDYiLCJpZCI6ImxheWVyLTUiLCJzaXplIjoxMDczNzQxODI0LCJyb3dzX3RvX2Rpc2NhcmQiOjd9LHsicGF0aCI6Ii9ob21lL2lwZnMvLmxvdHVzbWluZXIvY2FjaGUvcy10MDg1ODk5LTQ2IiwiaWQiOiJsYXllci02Iiwic2l6ZSI6MTA3Mzc0MTgyNCwicm93c190b19kaXNjYXJkIjo3fSx7InBhdGgiOiIvaG9tZS9pcGZzLy5sb3R1c21pbmVyL2NhY2hlL3MtdDA4NTg5OS00NiIsImlkIjoibGF5ZXItNyIsInNpemUiOjEwNzM3NDE4MjQsInJvd3NfdG9fZGlzY2FyZCI6N30seyJwYXRoIjoiL2hvbWUvaXBmcy8ubG90dXNtaW5lci9jYWNoZS9zLXQwODU4OTktNDYiLCJpZCI6ImxheWVyLTgiLCJzaXplIjoxMDczNzQxODI0LCJyb3dzX3RvX2Rpc2NhcmQiOjd9LHsicGF0aCI6Ii9ob21lL2lwZnMvLmxvdHVzbWluZXIvY2FjaGUvcy10MDg1ODk5LTQ2IiwiaWQiOiJsYXllci05Iiwic2l6ZSI6MTA3Mzc0MTgyNCwicm93c190b19kaXNjYXJkIjo3fSx7InBhdGgiOiIvaG9tZS9pcGZzLy5sb3R1c21pbmVyL2NhY2hlL3MtdDA4NTg5OS00NiIsImlkIjoibGF5ZXItMTAiLCJzaXplIjoxMDczNzQxODI0LCJyb3dzX3RvX2Rpc2NhcmQiOjd9LHsicGF0aCI6Ii9ob21lL2lwZnMvLmxvdHVzbWluZXIvY2FjaGUvcy10MDg1ODk5LTQ2IiwiaWQiOiJsYXllci0xMSIsInNpemUiOjEwNzM3NDE4MjQsInJvd3NfdG9fZGlzY2FyZCI6N31dLCJfaCI6bnVsbH19LCJjb25maWciOnsicGF0aCI6Ii9ob21lL2lwZnMvLmxvdHVzbWluZXIvY2FjaGUvcy10MDg1ODk5LTQ2IiwiaWQiOiJ0cmVlLWQiLCJzaXplIjoyMTQ3NDgzNjQ3LCJyb3dzX3RvX2Rpc2NhcmQiOjd9LCJjb21tX2QiOls3LDEyNiw5NSwyMjIsNTMsMTk3LDEwLDE0NywzLDE2NSw4MCw5LDIyNyw3MywxMzgsNzgsMTkwLDIyMywyNDMsMTU2LDY2LDE4MywxNiwxODMsNDgsMjE2LDIzNiwxMjIsMTk5LDE3NSwxNjYsNjJdfQ=="}}
  4. 2020-11-30 16:28:16 +0800 CST: [event;sealing.SectorPreCommit2] {"User":{"Sealed":{"/":"bagboea4b5abcaslphn7dpth5lahmlrraohz7bdudewbvlpotrcry7wcjvsxwpqrz"},"Unsealed":{"/":"baga6ea4seaqao7s73y24kcutaosvacpdjgfe5pw76ooefnyqw4ynr3d2y6x2mpq"}}}
  5. 2020-11-30 16:28:16 +0800 CST: [event;sealing.SectorPreCommitted] {"User":{"Message":{"/":"bafy2bzacednf4hnkovhndreugk5mkqivg3gv2vj53peyny7ah5zsadj723ufk"},"PreCommitDeposit":"110154877658149174","PreCommitInfo":{"SealProof":8,"SectorNumber":46,"SealedCID":{"/":"bagboea4b5abcaslphn7dpth5lahmlrraohz7bdudewbvlpotrcry7wcjvsxwpqrz"},"SealRandEpoch":278607,"DealIDs":[],"Expiration":1833650,"ReplaceCapacity":false,"ReplaceSectorDeadline":0,"ReplaceSectorPartition":0,"ReplaceSectorNumber":0}}}
  6. 2020-11-30 16:31:30 +0800 CST: [event;sealing.SectorPreCommitLanded] {"User":{"TipSet":"AXGg5AIgM5wBx19tcXpKYr1OBggmI4LtoOZdoAaWdMfnHKp7shg="}}
  7. 2020-11-30 17:46:30 +0800 CST: [event;sealing.SectorSeedReady] {"User":{"SeedValue":"YDcIqTDOSlk0ynWPcW38Db41G0OrMQyhhA9sSG5MN40=","SeedEpoch":280767}}
  8. 2020-12-01 09:19:32 +0800 CST: [event;sealing.SectorComputeProofFailed] {"User":{}}
    computing seal proof failed(2): storage call error 0: task aborted
  9. 2020-12-01 09:20:32 +0800 CST: [event;sealing.SectorRetryComputeProof] {"User":{}}
  10. 2020-12-01 20:02:35 +0800 CST: [event;sealing.SectorRestart] {"User":{}}
  11. 2020-12-01 21:15:12 +0800 CST: [event;sealing.SectorRestart] {"User":{}}
  12. 2020-12-01 21:27:15 +0800 CST: [event;sealing.SectorRestart] {"User":{}}
  13. 2020-12-01 21:44:03 +0800 CST: [event;sealing.SectorComputeProofFailed] {"User":{}}
    computing seal proof failed(2): storage call error 0: task aborted
  14. 2020-12-01 21:45:03 +0800 CST: [event;sealing.SectorRetryComputeProof] {"User":{}}
  15. 2020-12-01 21:49:32 +0800 CST: [event;sealing.SectorComputeProofFailed] {"User":{}}
    computing seal proof failed(2): storage call error 0: task aborted
  16. 2020-12-01 21:50:32 +0800 CST: [event;sealing.SectorSealPreCommit1Failed] {"User":{}}
    consecutive compute fails
  17. 2020-12-01 21:51:32 +0800 CST: [event;sealing.SectorRetrySealPreCommit1] {"User":{}}
  18. 2020-12-01 23:18:18 +0800 CST: [event;sealing.SectorSealPreCommit1Failed] {"User":{}}
    seal pre commit(1) failed: storage call error 0: task aborted
  19. 2020-12-01 23:19:18 +0800 CST: [event;sealing.SectorRetrySealPreCommit1] {"User":{}}
  20. 2020-12-01 23:23:05 +0800 CST: [event;sealing.SectorSealPreCommit1Failed] {"User":{}}
    seal pre commit(1) failed: storage call error 0: task aborted
  21. 2020-12-01 23:24:05 +0800 CST: [event;sealing.SectorRetrySealPreCommit1] {"User":{}}

@shaodan
Copy link
Contributor

shaodan commented Dec 2, 2020

@jennijuju @magik6k I found three situations that sectors cannot be removed:

  1. Sector is in sched queue, even if its data is all cleaned
[root@miner-2 ~]# miner storage find 130

[root@miner-2 ~]# miner sealing sched-diag | grep -B 4 -A 3 130
      {
        "Priority": 0,
        "Sector": {
          "Miner": 68528,
          "Number": 130
        },
        "TaskType": "seal/v0/precommit/2"
      },

[root@miner-2 ~]# miner sectors status --log 130
SectorID:       130
Status:         PreCommit2
CIDcommD:       <nil>
CIDcommR:       <nil>
Ticket:         960497b5e845e15ba7abb80288e3f3bb2e54ee9c17fa819a426b5278091aaf68
TicketH:        222865
Seed:
SeedH:          0
Precommit:      <nil>
Commit:         <nil>
Proof:
Deals:          [0]
Retries:        0
--------
Event Log:
0.      2020-11-10 08:29:21 +0000 UTC:  [event;sealing.SectorStartCC]   {"User":{"ID":130,"SectorType":3,"Pieces":[{"Piece":{"Size":34359738368,"PieceCID":{"/":"baga6ea4seaqao7s73y24kcutaosvacpdjgfe5pw76ooefnyqw4ynr3d2y6x2mpq"}},"DealInfo":null}]}}
1.      2020-11-10 08:29:21 +0000 UTC:  [event;sealing.SectorPacked]    {"User":{"FillerPieces":null}}
2.      2020-11-10 14:43:02 +0000 UTC:  [event;sealing.SectorRestart]   {"User":{}}
3.      2020-11-11 10:16:56 +0000 UTC:  [event;sealing.SectorPreCommit1]        {"User":{"PreCommit1Out":"eyJyZWdpc3RlcmVkX3Byb29mIjoiU3RhY2tlZERyZzMyR2lCVjEiLCJsYWJlbHMiOnsiU3RhY2tlZERyZzMyR2lCVjEiOnsibGFiZWxzIjpbeyJwYXRoIjoiL3N0b3JhZ2UvbG90dXN3b3JrL2xvdHVzd29ya2VyL2NhY2hlL3MtdDA2ODUyOC0xMzAiLCJpZCI6ImxheWVyLTEiLCJzaXplIjoxMDczNzQxODI0LCJyb3dzX3RvX2Rpc2NhcmQiOjd9LHsicGF0aCI6Ii9zdG9yYWdlL2xvdHVzd29yay9sb3R1c3dvcmtlci9jYWNoZS9zLXQwNjg1MjgtMTMwIiwiaWQiOiJsYXllci0yIiwic2l6ZSI6MTA3Mzc0MTgyNCwicm93c190b19kaXNjYXJkIjo3fSx7InBhdGgiOiIvc3RvcmFnZS9sb3R1c3dvcmsvbG90dXN3b3JrZXIvY2FjaGUvcy10MDY4NTI4LTEzMCIsImlkIjoibGF5ZXItMyIsInNpemUiOjEwNzM3NDE4MjQsInJvd3NfdG9fZGlzY2FyZCI6N30seyJwYXRoIjoiL3N0b3JhZ2UvbG90dXN3b3JrL2xvdHVzd29ya2VyL2NhY2hlL3MtdDA2ODUyOC0xMzAiLCJpZCI6ImxheWVyLTQiLCJzaXplIjoxMDczNzQxODI0LCJyb3dzX3RvX2Rpc2NhcmQiOjd9LHsicGF0aCI6Ii9zdG9yYWdlL2xvdHVzd29yay9sb3R1c3dvcmtlci9jYWNoZS9zLXQwNjg1MjgtMTMwIiwiaWQiOiJsYXllci01Iiwic2l6ZSI6MTA3Mzc0MTgyNCwicm93c190b19kaXNjYXJkIjo3fSx7InBhdGgiOiIvc3RvcmFnZS9sb3R1c3dvcmsvbG90dXN3b3JrZXIvY2FjaGUvcy10MDY4NTI4LTEzMCIsImlkIjoibGF5ZXItNiIsInNpemUiOjEwNzM3NDE4MjQsInJvd3NfdG9fZGlzY2FyZCI6N30seyJwYXRoIjoiL3N0b3JhZ2UvbG90dXN3b3JrL2xvdHVzd29ya2VyL2NhY2hlL3MtdDA2ODUyOC0xMzAiLCJpZCI6ImxheWVyLTciLCJzaXplIjoxMDczNzQxODI0LCJyb3dzX3RvX2Rpc2NhcmQiOjd9LHsicGF0aCI6Ii9zdG9yYWdlL2xvdHVzd29yay9sb3R1c3dvcmtlci9jYWNoZS9zLXQwNjg1MjgtMTMwIiwiaWQiOiJsYXllci04Iiwic2l6ZSI6MTA3Mzc0MTgyNCwicm93c190b19kaXNjYXJkIjo3fSx7InBhdGgiOiIvc3RvcmFnZS9sb3R1c3dvcmsvbG90dXN3b3JrZXIvY2FjaGUvcy10MDY4NTI4LTEzMCIsImlkIjoibGF5ZXItOSIsInNpemUiOjEwNzM3NDE4MjQsInJvd3NfdG9fZGlzY2FyZCI6N30seyJwYXRoIjoiL3N0b3JhZ2UvbG90dXN3b3JrL2xvdHVzd29ya2VyL2NhY2hlL3MtdDA2ODUyOC0xMzAiLCJpZCI6ImxheWVyLTEwIiwic2l6ZSI6MTA3Mzc0MTgyNCwicm93c190b19kaXNjYXJkIjo3fSx7InBhdGgiOiIvc3RvcmFnZS9sb3R1c3dvcmsvbG90dXN3b3JrZXIvY2FjaGUvcy10MDY4NTI4LTEzMCIsImlkIjoibGF5ZXItMTEiLCJzaXplIjoxMDczNzQxODI0LCJyb3dzX3RvX2Rpc2NhcmQiOjd9XSwiX2giOm51bGx9fSwiY29uZmlnIjp7InBhdGgiOiIvc3RvcmFnZS9sb3R1c3dvcmsvbG90dXN3b3JrZXIvY2FjaGUvcy10MDY4NTI4LTEzMCIsImlkIjoidHJlZS1kIiwic2l6ZSI6MjE0NzQ4MzY0Nywicm93c190b19kaXNjYXJkIjo3fSwiY29tbV9kIjpbNywxMjYsOTUsMjIyLDUzLDE5NywxMCwxNDcsMywxNjUsODAsOSwyMjcsNzMsMTM4LDc4LDE5MCwyMjMsMjQzLDE1Niw2NiwxODMsMTYsMTgzLDQ4LDIxNiwyMzYsMTIyLDE5OSwxNzUsMTY2LDYyXX0=","TicketValue":"lgSXtehF4Vunq7gCiOPzuy5U7pwX+oGaQmtSeAkar2g=","TicketEpoch":222865}}
4.      2020-11-12 14:54:06 +0000 UTC:  [event;sealing.SectorRestart]   {"User":{}}
5.      2020-11-14 01:34:08 +0000 UTC:  [event;sealing.SectorRestart]   {"User":{}}
6.      2020-11-15 00:58:10 +0000 UTC:  [event;sealing.SectorRestart]   {"User":{}}
7.      2020-11-17 16:49:29 +0000 UTC:  [event;sealing.SectorRestart]   {"User":{}}
8.      2020-11-21 06:25:27 +0000 UTC:  [event;sealing.SectorRestart]   {"User":{}}
9.      2020-11-21 08:05:39 +0000 UTC:  [event;sealing.SectorRestart]   {"User":{}}
10.     2020-11-21 13:33:43 +0000 UTC:  [event;sealing.SectorRestart]   {"User":{}}
11.     2020-11-22 05:57:49 +0000 UTC:  [event;sealing.SectorRestart]   {"User":{}}
12.     2020-11-24 06:56:14 +0000 UTC:  [event;sealing.SectorRestart]   {"User":{}}
13.     2020-11-25 15:20:14 +0000 UTC:  [event;sealing.SectorRestart]   {"User":{}}
14.     2020-11-28 06:19:37 +0000 UTC:  [event;sealing.SectorRestart]   {"User":{}}
15.     2020-11-29 04:10:23 +0000 UTC:  [event;sealing.SectorRestart]   {"User":{}}
16.     2020-11-30 03:00:45 +0000 UTC:  [event;sealing.SectorRestart]   {"User":{}}
  1. Sector is running, cannot be removed right after type lotus-miner sectors remove --really-do-it <ID>, however we can abort that job, and then sector will be removed immediately (@qiusugang try to remove before abort)

  2. Sector is not running, but assigned to worker, cannot be removed either, need to wait task is started then follow 2

@qiusugang
Copy link
Author

@shaodan Thanks, This sector I really do abort it's job at first, then run: lotus-miner sectors remove --really-do-it , but it Restart again.
Today, I have success removed sector, follow that sequence.

@shaodan
Copy link
Contributor

shaodan commented Dec 2, 2020

@qiusugang Great

@jennijuju @magik6k So according to these evidents, I guess fsm cannot handle remove event when state of sector is not xxxFailed. And I found thousands of goroutines blocked at fsm select here

github.com/filecoin-project/go-statemachine.(*StateMachine).run(0xc00185ff20)
        /go/pkg/mod/github.com/filecoin-project/go-statemachine@v0.0.0-20200925024713-05bd7c71fbfe/machine.go:53 +0x14f
created by github.com/filecoin-project/go-statemachine.(*StateGroup).loadOrCreate
        /go/pkg/mod/github.com/filecoin-project/go-statemachine@v0.0.0-20200925024713-05bd7c71fbfe/group.go:133 +0x3bc

@github-actions
Copy link

Oops, seems like we needed more information for this issue, please comment with more details or this issue will be closed in 24 hours.

@github-actions
Copy link

This issue was closed because it is missing author input.

@TippyFlitsUK TippyFlitsUK removed the need/author-input Hint: Needs Author Input label Mar 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants