Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

channeldb: specify freelist bbolt options by default #3278

Merged
merged 2 commits into from
Jul 30, 2019

Conversation

Crypt-iQ
Copy link
Collaborator

@Crypt-iQ Crypt-iQ commented Jul 8, 2019

Fixes #3241.

These options could really be specified anywhere that bbolt is opened, so if that's of interest I can also do that. But the only other large database I had on my machine was wallet.db which is the btcwallet repo. Perhaps watchtowers might use a lot of space, or there could be some design parameters that change in the future where the other databases start using more space (macaroons.db etc). Should I add these options to all the tests so they reflect the new behavior?

Updated bbolt dependency to v1.3.3 per etcd-io/bbolt#153
The only downside is that if the db txn has to be rolled back in case of failure, the freelist must be built again temporarily which can be heavy with big databases. I think this is an acceptable tradeoff since this shouldn't be happening too often.

Copy link
Contributor

@cfromknecht cfromknecht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

. But the only other large database I had on my machine was wallet.db which is the btcwallet repo. Perhaps watchtowers might use a lot of space, or there could be some design parameters that change in the future where the other databases start using more space (macaroons.db etc).

Indeed btcwallet and towers would also be good candidates for these settings! For towers we can do this on the client and tower side dbs.

Updated bbolt dependency to v1.3.3 per

Perfect, yeah i hit this panic in testing with the current version

@@ -10,7 +10,7 @@ require (
github.com/btcsuite/btcutil v0.0.0-20190425235716-9e5f4b9a998d
github.com/btcsuite/btcwallet v0.0.0-20190628225330-4a9774585e57
github.com/btcsuite/fastsha256 v0.0.0-20160815193821-637e65642941
github.com/coreos/bbolt v1.3.2
github.com/coreos/bbolt v1.3.3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small nit: swap the order of the commits so bbolt is updated first

@wpaulino wpaulino added database Related to the database/storage of LND optimization P2 should be fixed if one has time v0.7.1 labels Jul 8, 2019
@wpaulino wpaulino added this to the 0.7.1 milestone Jul 8, 2019
This commit updates the bbolt dependency to v1.3.3 so that we
can use the NoFreelistSync option without triggering a panic
in case a rollback occurs.
This commit specifies two bbolt options when opening the underlying
channel and watchtower databases so that there is reduced heap
pressure in case the bbolt database has a lot of free pages in the
B+ tree.
@Crypt-iQ
Copy link
Collaborator Author

Crypt-iQ commented Jul 8, 2019

@cfromknecht Addressed comment and added the options to watchtower client + server. Can make a PR for btcwallet next. For testing, I'm wondering if the options should also be specified in unit/itests? If so, in order to reduce code duplication, I suppose I could add the options to channeldb/options.go similar to the way the cache sizes are setup?

Copy link
Contributor

@wpaulino wpaulino left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For testing, I'm wondering if the options should also be specified in unit/itests? If so, in order to reduce code duplication, I suppose I could add the options to channeldb/options.go similar to the way the cache sizes are setup?

I'd say yes in order to exercise what would be used in production. Adding the options wouldn't remove the duplication in btcwallet, but seems fine to do for the other databases within lnd.

@cfromknecht
Copy link
Contributor

@Crypt-iQ since the options are set inside the Open method, wouldn't these already be used in testing as well? Which cases do you think are still missing?

@Crypt-iQ
Copy link
Collaborator Author

Crypt-iQ commented Jul 9, 2019

@Crypt-iQ since the options are set inside the Open method, wouldn't these already be used in testing as well? Which cases do you think are still missing?

Hey you're right, I meant the other databases (sphinxreplay, macaroons) and some of the temporary test databases. I didn't look at the results of my search very closely, mb

@cfromknecht
Copy link
Contributor

Hey you're right, I meant the other databases (sphinxreplay, macaroons) and some of the temporary test databases. I didn't look at the results of my search very closely, mb

okay cool then i think we're good. sphinxreplay and macaroon dbs are less of a priority as they're much smaller, but not opposed to adding them as well

@Crypt-iQ
Copy link
Collaborator Author

Crypt-iQ commented Jul 9, 2019

Hey you're right, I meant the other databases (sphinxreplay, macaroons) and some of the temporary test databases. I didn't look at the results of my search very closely, mb

okay cool then i think we're good. sphinxreplay and macaroon dbs are less of a priority as they're much smaller, but not opposed to adding them as well

The freelist on my 8MB sphinxreplay db has 5 free pages on it... so I don't think I'll include it here. And my macaroons.db is tiny.

Copy link
Contributor

@cfromknecht cfromknecht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀will start running this version to ensure there aren't any other regressions

@Roasbeef Roasbeef modified the milestones: 0.7.1, 0.8 Jul 9, 2019
@Roasbeef Roasbeef removed the v0.7.1 label Jul 9, 2019
@cfromknecht
Copy link
Contributor

@Crypt-iQ decided we should push this off to 0.8, since it's a pretty big change for a point release and want to leave ample time for testing

@Crypt-iQ
Copy link
Collaborator Author

@cfromknecht no problem, I'm looking into the InitialMmapSize option as well, so might be good to hold off until I can determine if it's worth adding

@cfromknecht
Copy link
Contributor

@Crypt-iQ sounds good, yeah we haven't done much exploration wrt bbolt options, would be interested to hear what you find!

@Crypt-iQ
Copy link
Collaborator Author

Crypt-iQ commented Jul 16, 2019

Looked at InitialMmapSize and some of the related bbolt code. Bbolt uses mmap to map the db file and sometimes it needs to be remapped during write tx's via munmap and mmap. In these cases, the write transaction that is referencing the old mmap needs to copy these references to the heap. So IMO it seems like we should just set a high InitialMmapSize and never have to remap. Certain write transactions could cause high memory allocations on the heap depending on how many nodes they access during the txn.

It is also the case that long-running read transactions can block certain writes from happening. The read txns hold a lock on the mmap and writes that need to remap also need the lock, thus they are blocked. See: boltdb/bolt#472

The creator of boltdb suggested an mmap size of 1TB (boltdb/bolt#489 (comment)) but I think we could make do with much less given the size of the channel graph (my channel.db is only ~2G). It is virtual memory after all so we could set the InitialMmapSize to 50GB or so. I will test it out and report back.

Edit: Just remembered 32-bit systems exist and can't have memory-mapped files > 2G. Could just disable for 32-bit systems? On another note, I see no performance difference with a mmap size of 1TB and my regular mmap size of ~500MB.

Copy link
Contributor

@cfromknecht cfromknecht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🌋

We should get this in early on in the road to 0.8 so that we can get adequate testing in master. Been running on my node for the past week or so without issue.

@Crypt-iQ
Copy link
Collaborator Author

Crypt-iQ commented Jul 30, 2019

I'm kind of indifferent on the InitialMmapSize parameter. I tested and couldn't, in any reasonable (non-test) circumstance, produce any huge memory allocations when resizing the mmap. It is also not resized very often so it's not likely to occur that long-running reads (which there aren't that many of) block special write txn's.

Just a note. Looking at the boltdb code and issues, it seems that the mmap size cannot be greater than 2GB for 32-bit machines which makes sense. If channel.db files or other boltdb-backed db files start regularly growing larger than this, support for those machines may need to be dropped.

@Roasbeef
Copy link
Member

I'm kind of indifferent on the InitialMmapSize parameter.

I don't think we need to set this value in this PR, and can defer tuning it to a later PR.

But the only other large database I had on my machine was wallet.db which is the btcwallet repo.

Once this goes in, we can follow up with the change on the btcwallet side as well.

Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🐝

@Roasbeef Roasbeef merged commit 69c9e2b into lightningnetwork:master Jul 30, 2019
@Crypt-iQ Crypt-iQ deleted the bbolt_options_0707 branch July 31, 2019 01:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
database Related to the database/storage of LND optimization P2 should be fixed if one has time
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Specify options to better handle boltdb fragmentation
4 participants