Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ethdb/pebble: add database backend using pebble #24615

Closed
wants to merge 4 commits into from

Conversation

jwasinger
Copy link
Contributor

Uses Pebble patched to expose an API for getting the amount of heap memory allocated through CGo: https://github.com/jwasinger/pebble/tree/mem-stats .

Modifies the system/memory/used and system/memory/held gauges to include Pebble's allocation through CGo.

@jwasinger
Copy link
Contributor Author

Charts from a snap sync (LevelDB is orange, Pebble is green):

image

image

image

image

Pebble disk usage:

311.7G	/datadir/geth/geth/chaindata/ancient
503.6G	/datadir/geth/geth/chaindata
230.3M	/datadir/geth/geth/ethash
670.1M	/datadir/geth/geth/triecache
4.4M	/datadir/geth/geth/nodes
504.5G	/datadir/geth/geth
4.0K	/datadir/geth/keystore
504.5G	/datadir/geth

Leveldb disk usage:

693.2M	/datadir/geth/geth/triecache
230.3M	/datadir/geth/geth/ethash
311.7G	/datadir/geth/geth/chaindata/ancient
559.7G	/datadir/geth/geth/chaindata
3.6M	/datadir/geth/geth/nodes
560.6G	/datadir/geth/geth
4.0K	/datadir/geth/keystore
560.6G	/datadir/geth

@jwasinger jwasinger marked this pull request as ready for review May 6, 2022 03:18
@jwasinger jwasinger force-pushed the exp-pebble branch 2 times, most recently from 8df05d4 to f9d8001 Compare May 11, 2022 06:52
@jwasinger jwasinger changed the title ethdb: add pebble for experiment (updated) ethdb: add pebble May 11, 2022
Comment on lines 118 to 120
if matches, err := filepath.Glob(file + "/OPTIONS*"); len(matches) > 0 || err != nil {
if err != nil {
panic(err) // only possible if the pattern is malformed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pebble fingerprinting should be in pebble/pebble.go

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the db detection code from both leveldb.go/pebble.go and moved it into core/raw/database.go.

@jwasinger jwasinger force-pushed the exp-pebble branch 2 times, most recently from 2393fd3 to ef8979b Compare June 16, 2022 08:39
@fjl fjl changed the title ethdb: add pebble ethdb/pebble: add database backend using pebble Jul 5, 2022
@fjl
Copy link
Contributor

fjl commented Jul 5, 2022

Update: still waiting for upstream on cockroachdb/pebble#1628

@fjl fjl removed the status:triage label Jul 5, 2022
@jwasinger
Copy link
Contributor Author

Actually they have responded cockroachdb/pebble#1628 (review) .

@holiman
Copy link
Contributor

holiman commented Aug 16, 2022

@jwasinger would you mind rebasing this?

@jwasinger
Copy link
Contributor Author

@holiman done.

core/rawdb/database.go Outdated Show resolved Hide resolved
core/rawdb/database.go Outdated Show resolved Hide resolved
ethdb/pebble/pebble.go Outdated Show resolved Hide resolved
ethdb/pebble/pebble.go Outdated Show resolved Hide resolved
// Per-level options. Options for at least one level must be specified. The
// options for the last level are used for all subsequent levels.
Levels: []pebble.LevelOptions{
{TargetFileSize: 2 * 1024 * 1024, FilterPolicy: bloom.FilterPolicy(10)},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we every try with any larger files in pebble?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't. Maybe @rjl493456442 did at some point?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I did. As far as I remember it's the same as with LevelDB, compaction makes it go boom.

@holiman
Copy link
Contributor

holiman commented Aug 29, 2022

Testing this on two bootnodes which have a very hard time syncing

ansible-playbook playbook.yaml -t geth -l bootnode-azure-westus-001,bootnode-azure-koreasouth-001  -e "geth_image=holiman/geth-experimental:latest" -e "geth_datadir_wipe=partial" -e '{"geth_args_custom":["--backingdb=pebble"]}'

and for comparison:

ansible-playbook playbook.yaml -t geth -l bootnode-azure-brazilsouth-001,bootnode-azure-australiaeast-001  -e "geth_image=holiman/geth-experimental:latest" -e "geth_datadir_wipe=partial"

@holiman
Copy link
Contributor

holiman commented Aug 29, 2022

Metrics that do not appear to work:
geth.eth/db/chaindata/disk/read.meter
geth.eth/db/chaindata/compact/writedelay/counter.meter

Edi - here's why the disk read meter is always zerot:

		if db.diskReadMeter != nil {
			db.diskReadMeter.Mark(0) // pebble doesn't track non-compaction reads
		}

Is there any point in even having the metric around? Though I suppose we can keep it for a while, to stay in par with leveldb.

@holiman
Copy link
Contributor

holiman commented Aug 29, 2022

It's still very early, but it looks like on our 'weak' azure nodes, pebble is performing a lot better, since it's not being killed-by-compaction:

Screenshot 2022-08-29 at 16-07-01 Dual Geth - Grafana

@holiman
Copy link
Contributor

holiman commented Aug 29, 2022

The pebble azure nodes are on ~26%

bootnode-azure-westus-001 geth INFO [08-29|16:20:37.679] State sync in progress synced=26.53% state=51.08GiB accounts=49,463,623@11.06GiB slots=193,320,612@38.88GiB codes=206,295@1.14GiB eta=8h46m27.524s
bootnode-azure-koreasouth-001 geth INFO [08-29|16:20:38.621] State sync in progress synced=26.15% state=50.79GiB accounts=48,789,260@10.90GiB slots=192,700,384@38.77GiB codes=203,451@1.12GiB eta=8h56m8.768s

The non-pebble are on ~15%

bootnode-azure-brazilsouth-001 geth INFO [08-29|16:20:37.959] State sync in progress synced=13.69% state=27.49GiB accounts=26,174,418@5.71GiB slots=104,749,928@21.16GiB codes=117,622@632.17MiB eta=19h31m38.490s
bootnode-azure-australiaeast-001 geth INFO [08-29|16:20:46.154] State sync in progress synced=16.99% state=32.69GiB accounts=32,593,521@7.08GiB slots=123,280,732@24.85GiB codes=141,575@772.87MiB eta=15h11m3.934s

@holiman
Copy link
Contributor

holiman commented Aug 30, 2022

Pebble-nodes finished the first phase a couple of hours earlier

Aug 30 03:30:56 bootnode-azure-koreasouth-001 geth INFO [08-30|01:30:55.936] State sync in progress synced=100.00% state=197.38GiB accounts=186,306,958@41.69GiB slots=760,255,029@151.77GiB codes=651,153@3.92GiB eta=-2m14.173s
Aug 30 04:48:43 bootnode-azure-westus-001 geth INFO [08-30|02:48:43.208] State sync in progress synced=100.00% state=197.33GiB accounts=186,328,374@41.69GiB slots=760,016,492@151.72GiB codes=651,467@3.92GiB eta=-2m31.628s

Leveldb-nodes:

Aug 30 06:31:50 bootnode-azure-australiaeast-001 geth INFO [08-30|04:31:50.585] State sync in progress synced=100.00% state=197.47GiB accounts=186,182,442@41.70GiB slots=760,605,354@151.84GiB codes=652,393@3.94GiB eta=-2m43.671s
Aug 30 08:38:23 bootnode-azure-brazilsouth-001 geth INFO [08-30|06:38:23.436] State sync in progress synced=100.00% state=197.51GiB accounts=186,552,432@41.70GiB slots=760,745,777@151.87GiB codes=652,051@3.94GiB eta=-1m27.429s

@holiman
Copy link
Contributor

holiman commented Aug 30, 2022

RIght, there's this too:

C:\Users\appveyor\go\pkg\mod\github.com\jwasinger\pebble@v0.0.0-20220816202818-d02a56d8d2b9\internal\batchskl\skl.go:310:18: maxNodesSize (untyped int constant 4294967295) overflows int
C:\Users\appveyor\go\pkg\mod\github.com\jwasinger\pebble@v0.0.0-20220816202818-d02a56d8d2b9\internal\batchskl\skl.go:320:16: cannot use maxNodesSize (untyped int constant 4294967295) as int value in assignment (overflows)

Which afaict would be fixed by cockroachdb/pebble#1619. It has been open since april.

@SLoeuillet
Copy link

SLoeuillet commented Sep 16, 2022

Found a little issue :
Got cache configured to 256K (--cache 262144)
./ethdb/pebble/pebble.go:161: MemTableSize: cache * 1024 * 1024 / 4,

=>

geth[2101456]: Fatal: Failed to register the Ethereum service: MemTableSize (21 G) must be < 4.0 G

So, MemTableSize should be capped to 4GB max

65536 =>
Sep 16 13:51:30 geth01-ethereum-mainnet-eu geth[2105780]: Fatal: Failed to register the Ethereum service: MemTableSize (8.0 G) must be < 4.0 G

32768 =>
Sep 16 13:52:20 geth01-ethereum-mainnet-eu geth[2106436]: Fatal: Failed to register the Ethereum service: MemTableSize (4.0 G) must be < 4.0 G

from https://github.com/cockroachdb/pebble/blob/master/options.go
MemTableSize is an int

@SLoeuillet
Copy link

SLoeuillet commented Sep 19, 2022

On an archive node, it doesn't seem to make sync faster, but at least, the ugly long compaction times (with some going up to 11 days full compacting in a row) are gone

2 archive nodes with standard geth with stair-case effect :
image

One of those 2 nodes using the pebble branch :

image

A mix of those 2 with both axes visible :
image

On 2022-06-29, started 2 archive nodes with geth 1.10.1x then 1.10.2x then 1.11.0
On 2022-07-13, stopped one of those 2, replaced it with 1.11.0, ex_pebble branch after wiping storage out.

@holiman
Copy link
Contributor

holiman commented Sep 19, 2022

@SLoeuillet thanks for the feedback and charts! Unfortunately, the Y-axis got a bit cropped out, so I couldn't really figure out how the two charts compared.
Would love to see some more charts after a few more days of progress!

@holiman
Copy link
Contributor

holiman commented Sep 19, 2022

Ah, the max memtable size is not so much becaues the field is an int (int64), but
https://github.com/cockroachdb/pebble/blob/master/open.go#L38:

	// The max memtable size is limited by the uint32 offsets stored in
	// internal/arenaskl.node, DeferredBatchOp, and flushableBatchEntry.
	maxMemTableSize = 4 << 30 // 4 GB

jwasinger and others added 3 commits September 19, 2022 10:14
Co-authored-by: Gary Rong <garyrong0905@gmail.com>

foo

update
node: fix ddir lookup mistake

accounts/abi/bind: fix go.mod replacement for generated binding

deps: update pebble + with fix 32-bit build
@SLoeuillet
Copy link

With a great pleasure, I can announce that my archive node running leveldb standard storage, did just finish to sync to HEAD
2022-06-09 => 2022-09-21

Pebble 1.11.0 based one, started sync on 2022-09-16, currently at 7925855 blocks.

@MariusVanDerWijden
Copy link
Member

I ran a successful snap sync with it again. Took a pretty long time on a very underprovisioned node (5.8GB usable RAM) but it finished after ~70 hours

@holiman
Copy link
Contributor

holiman commented Jan 17, 2023

Triage discussion: I'll take this PR and try to separate the 64-bit and 32-bit, and make it so that we avoid pebble when building 32-bit.

@holiman
Copy link
Contributor

holiman commented Jan 20, 2023

Closing in favour of #26517

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants