Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core/rawdb: fix cornercase shutdown behaviour in freezer #26485

Merged
merged 8 commits into from
Jan 16, 2023

Conversation

holiman
Copy link
Contributor

@holiman holiman commented Jan 12, 2023

This PR does a few things. First of all, it makes the printout more detailed, e.g.

WARN [01-11|10:56:59.140] Truncating dangling indexes              database=/media/seassd/ancient              table=bodies indexed=596.10MiB stored=596.05MiB

by showing the exact number instead of approximate MB.

Secondly, it adds a testcase showing a cornercase that can occur during shutdown, if freezer is Close()d, and only afterwards the chain freezer calls Sync. Currently, this would lead to a exit with CRIT.

The chain_freezer does writes here: https://github.com/ethereum/go-ethereum/blob/master/core/rawdb/chain_freezer.go#L162

followed by Sync here: https://github.com/ethereum/go-ethereum/blob/master/core/rawdb/chain_freezer.go#L170

Between these operation, the underlying f may be closed by another routine.

I saw also that the chain freezer does exactly this 'dangerous' sort of shutdown sequence: first it shuts down the underlying database, and only after does it signal to the active goroutine to exit. This PR fixes this. BUT: I suspect that the same sequence may happen regardless, depending on the shutdown sequence. I haven't investigated that in full.

However, this PR adds a failing test. I am not sure what the best fix is. Should we remove the test? Should we expect Sync after Close not to yield an error?

select {
case <-f.quit:
default:
close(f.quit)
}
f.wg.Wait()
return err
return f.Freezer.Close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

@@ -869,7 +869,9 @@ func (t *freezerTable) advanceHead() error {
func (t *freezerTable) Sync() error {
t.lock.Lock()
defer t.lock.Unlock()

if t.index == nil || t.head == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also check if t.meta is nil.

Copy link
Member

@rjl493456442 rjl493456442 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the fix is correct, we should include it although not sure if it's root cause

Comment on lines 454 to 456
if err := f.Sync(); err != nil {
t.Fatal(err)
}
Copy link
Contributor Author

@holiman holiman Jan 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I'll just change this, and expect an error, and that the error (or error-string) is "closed", then?

log.Info("Failed to retrieve ancient root", "err", err)
return err
}
ancient := stack.ResolveAncient("chaindata", ctx.String(utils.AncientFlag.Name))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason for this change? Directly resolve ancient datadir without involving the chain DB?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for the investigation here: #26483 (comment) , I needed to do a geth db inspect, but I didn't actually have a leveldb -- I only had a couple of index files.

This whole utils.MakeChainDatabase assumes things to be pretty well ordered, but all we use it for eventually is to help resolve the ancientdir, via db.AncientDir.

So this change has the same effect as the original code, but is more robust in case the data is not fully consistent / present.

@holiman holiman merged commit 0b53b29 into ethereum:master Jan 16, 2023
shekhirin pushed a commit to shekhirin/go-ethereum that referenced this pull request Jun 6, 2023
)

This PR does a few things. 
It fixes a shutdown-order flaw in the chainfreezer. Previously, the chain-freezer would shutdown the freezer backend first, and then signal for the loop to exit. This can lead to a scenario where the freezer tries to fsync closed files, which is an error-conditon that could lead to exit via log.Crit. 

It also makes the printout more detailed when truncating 'dangling' items, by showing the exact number instead of approximate MB.

This PR also adds calls to fsync files before closing them, and also makes the `db inspect` command slightly more robust.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants