Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rebuild miner database(metadata) after a corruption #4075

Closed
jennijuju opened this issue Sep 28, 2020 · 11 comments
Closed

Rebuild miner database(metadata) after a corruption #4075

jennijuju opened this issue Sep 28, 2020 · 11 comments
Assignees
Labels
area/proving Area: Proving area/ux Area: UX effort/days Effort: Multiple Days

Comments

@jennijuju
Copy link
Member

jennijuju commented Sep 28, 2020

Related #3840

Another miner reported that after using level db repair tool to recover from error ERROR: starting node: could not build arguments for function "github.com/filecoin-project/lotus/node/modules/lp2p".StartListening.func1 (/home/mcloud/ceph/current/lotus-0.8.0/node/modules/lp2p/addrs.go:98): failed to build host.Host: could not build arguments for function "reflect".makeFuncStub (/usr/local/go/src/reflect/asm_amd64.s:12): failed to build lp2p.BaseIpfsRouting: could not build arguments for function "reflect".makeFuncStub (/usr/local/go/src/reflect/asm_amd64.s:12): failed to build dtypes.MetadataDS: received non-nil error from function "reflect".makeFuncStub (/usr/local/go/src/reflect/asm_amd64.s:12): opening datastore /metadata: leveldb/journal: block/chunk corrupted: missing chunk part (0 bytes) [file=15008041.log]), he got opening datastore /staging: file does not exist for table 18.

In this case, we think that he may be able to re-init a new miner with the actor id and owner key and be able to prove the existing sectors, however, this miner won't be able to add new sectors and get new deals because the indices can not be catch up.

Ideally, we should provide a way to rebuild the database with something like lotus-mienr recover

@jennijuju jennijuju added the area/ux Area: UX label Sep 28, 2020
@jennijuju
Copy link
Member Author

jennijuju commented Sep 30, 2020

since 0.7.0: #3837 (comment)
related #4045
discussion: #3840

@jennijuju
Copy link
Member Author

slack convo dump 1

ERROR: starting node: could not build arguments for function "github.com/filecoin-project/lotus/node/modules/lp2p".StartListening.func1 (/home/mcloud/ceph/current/lotus-0.8.0/node/modules/lp2p/addrs.go:98): failed to build host.Host: could not build arguments for function "reflect".makeFuncStub (/usr/local/go/src/reflect/asm_amd64.s:12): failed to build lp2p.BaseIpfsRouting: could not build arguments for function "reflect".makeFuncStub (/usr/local/go/src/reflect/asm_amd64.s:12): failed to build dtypes.MetadataDS: received non-nil error from function "reflect".makeFuncStub (/usr/local/go/src/reflect/asm_amd64.s:12): opening datastore /metadata: leveldb/journal: block/chunk corrupted: missing chunk part (0 bytes) [file=15008041.log]

used level db tool to repair however,

opening datastore /staging: file does not exist for table 18
``

and after syncing up Riba, we think
 ```Unfortunately, once your miners’ metadata db is corrupted, you can reinit a new miner using the actor id and owner key, you will be able  to prove the existing sectors you had, but you will not be able to add new ones```

@s0nik42
Copy link

s0nik42 commented Sep 30, 2020

@jennijuju sorry I don't understand, does it mean I've got to start a new miner ?

@s0nik42
Copy link

s0nik42 commented Sep 30, 2020

my db is corrupted

@s0nik42
Copy link

s0nik42 commented Oct 1, 2020

Daemon: 0.8.0+git.2c1d96bc.dirty+api0.16.0
Local: lotus version 0.8.0+git.2c1d96bc.dirty

Here are the log :
ERROR: starting node: could not build arguments for function “github.com/filecoin-project/lotus/node/modules/lp2p”.StartListening.func1 (/home/s0nik42/lotus/node/modules/lp2p/addrs.go:98): failed to build host.Host: could not build arguments for function “reflect”.makeFuncStub (/usr/local/go/src/reflect/asm_amd64.s:12): failed to build lp2p.BaseIpfsRouting: could not build arguments for function “reflect”.makeFuncStub (/usr/local/go/src/reflect/asm_amd64.s:12): failed to build dtypes.MetadataDS: received non-nil error from function “reflect”.makeFuncStub (/usr/local/go/src/reflect/asm_amd64.s:12): opening datastore /metadata: leveldb/journal: block/chunk corrupted: chunk length overflows block (366 bytes) [file=000253.log]

The only way to restart the miner was to remove "000253.log" as proposed by @Meatball
I cannot confirm the impacts now.
1/ On-going sectors are finalizing properly
2/ PoST failed for one partition. Need to confirm if it's related. apparently not. it succeeded the day after.
3/ I didn't pledge yet as I'm running out of space ==> Pledging is OK / Sector sealing also.
4/ But I lost hundreds of transaction deals and the sectors associated to them were boucing in the sealing process, only solution was to remove them.

@jennijuju
Copy link
Member Author

jennijuju commented Oct 2, 2020

#4133 may cover part of this, @vyzo said he can take this later.

@jsign
Copy link
Contributor

jsign commented Oct 15, 2020

We got into a similar situation with fatal error:
opening datastore /metadata: leveldb/journal: block/chunk corrupted: chunk length overflows block (3119 bytes) [file=000283.log]

@jsign
Copy link
Contributor

jsign commented Oct 16, 2020

Obviously, you can't delete a file and call it a day, but as an exercise, I deleted the corrupted file and the node got working again. But some hours later, other file got corrupted and the node became unusable again.

Just leaving this comment if anyone else is thinking that deleting database files is a good idea. It might to temporarily make your Lotus node be up again, but just use that time to do things you need to nuke it and start from zero.

@ribeirojose
Copy link

I got myself a corrupted database because a 256GB RAM instance run OOM earlier this morning.

I was eventually able to recover using a tweak suggested (renaming 000283.log to something else), although, as @jsign mentioned, this is likely to be only a duct tape fix.

Lotus seems to be eating all RAM and swap even if I’m not pushing anything after we push a good amount of deals out there.

@jennijuju jennijuju changed the title [Feature Request] Be able to rebuild miner database(metadata) after a corruption Rebuild miner database(metadata) after a corruption Jan 19, 2021
@jennijuju jennijuju added the area/proving Area: Proving label Feb 24, 2021
@jennijuju jennijuju added the effort/days Effort: Multiple Days label Feb 24, 2021
@arajasek
Copy link
Contributor

arajasek commented Mar 1, 2021

It's complicated to do this from on-chain info. The simplest thing to do is probably:

  • keep and refresh backups (should be fairly small)
  • perhaps also maintain a log, so that the latest state can be computed from a backup

@arajasek
Copy link
Contributor

Closed by #5755

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/proving Area: Proving area/ux Area: UX effort/days Effort: Multiple Days
Projects
None yet
Development

No branches or pull requests

7 participants