GC doesn't work? (not cleaning up SST files properly) #1228

adwinsky · 2020-02-25T10:35:34Z

What version of Go are you using (`go version`)?

$ go version
1.13.8

What version of Badger are you using?

v1.6.0

opts := badger.DefaultOptions(fmt.Sprintf(dir + "/" + name))
opts.SyncWrites = false
opts.ValueLogLoadingMode = options.FileIO

Does this issue reproduce with the latest master?

With the latest master GC becomes much slower

What are the hardware specifications of the machine (RAM, OS, Disk)?

2TB NVME drive, 128 GB RAM

What did you do?

I have a Kafka topic with 12 partitions. For every partition I create a database. Each database grows quite quickly (about 12*30GB per hour) and the TTL for most of the events is 1h, so the size should stay at constant level. Now for every partition I create a separate transaction and I process read and write operations sequentially, there is no concurrency, when the transaction is getting to big I commit it, in separate go-routine I start RunValueLogGC(0.5). Most of GC runs end up with ErrNoRewrite. Even tried to repeat RunValueLogGC until I have 5 errors in the row, but still I was running out of disk space quite quickly. My current fix is to patch the Badger GC, make it run on every fid that is before the head. This works fine, but eventually becomes slow when I have too many log files.

What did you expect to see?

The size of each of twelve databases I created, should stay at constant level and has less then 20 GB

What did you see instead?

After running it for a day, if I look at one of twelve databases, I see 210 sst files, 68 vlog files, db size is 84 GB (and these numbers keep growing).

If I run badger histogram it shows me this stats:

Histogram of key sizes (in bytes)
Total count: 4499955
Min value: 13
Max value: 108
Mean: 22.92
Range Count
[ 8, 16) 2
[ 16, 32) 4499939
[ 64, 128) 14

Histogram of value sizes (in bytes)
Total count: 4499955
Min value: 82
Max value: 3603
Mean: 2428.16
Range Count
[ 64, 128) 1
[ 256, 512) 19301
[ 512, 1024) 459
[ 1024, 2048) 569
[ 2048, 4096) 4479625

2428*4479625=10GB

The text was updated successfully, but these errors were encountered:

adwinsky · 2020-02-25T10:39:10Z

I should also add that before I patched GC I was suffering the same issue.

jarifibrahim · 2020-02-26T13:57:36Z

Hey @adwinsky, the GC definitely works :) The GC doesn't instantly clean up space (which we should definitely improve) but the disk space should eventually decrease. Here's a related issue which shows that GC works #1006

I've replied to your original discuss post https://discuss.dgraph.io/t/how-to-clean-up-values-for-old-versions-of-items/6020/4

adwinsky · 2020-02-26T16:33:35Z

Hello @jarifibrahim, I opened second ticket because I am pretty sure I see a bug here.

I made a backup of the database for which I added histogram in my previous post. In the meantime most of the entries has expired. I changed the value of RunGC function to RunValueLogGC(0.001). I started a script running GC in a loop (without any modifications of the badger code), initially it cleaned up some segments (as some of the records has expired since the last run), but after 30 minutes of returning:

Value log GC attempt didn't result in any cleanup

My database size is 48 GB,has 34 vlog files 213 sst files and this is a result of histogram:

Histogram of key sizes (in bytes)
Total count: 16811
Min value: 20
Max value: 26
Mean: 22.92
Range Count
[ 16, 32) 16811

Histogram of value sizes (in bytes)
Total count: 16811
Min value: 2093
Max value: 3301
Mean: 2706.77
Range Count
[ 2048, 4096) 16811

Badger needs 50 GB to store 16k records? Clearly there is a problem with GC :(

adwinsky · 2020-03-03T10:17:57Z

I figured the version I am using (1.6.0) was not including the #1006 fix you liked to. I cherry picked that and now it seems vlog files are cleaned up correctly. I see an issue with sst files though. In my database I have:

15 vlog files, 124 sst files

Histogram shows me following stats:

[Summary]
Level 0 size: 0 B
Level 1 size: 17 MB
Level 2 size: 0 B
Level 3 size: 7.7 GB
Total index size: 7.7 GB
Value log size: 15 GB

Histogram of key sizes (in bytes)
Total count: 5872848
Min value: 13
Max value: 26
Mean: 22.91
Range Count
[ 8, 16) 1
[ 16, 32) 5872847

So the size of the index is 7.7 GB while it has only 5872847*22.91 bytes of data, which is about 130 MB.

Is there any other commit I need to pick up to my 1.6.0 version in order to fix that? I tried using the most recent version but it was way slower for me.

jarifibrahim · 2020-03-03T11:29:45Z

Is there any other commit I need to pick up to my 1.6.0 version in order to fix that? I tried using the most recent version but it was way slower for me.

What was slow? The GC? It would be very useful if you can file another issue so that I can look into this slowness issue.

So the size of the index is 7.7 GB while it has only 5872847*22.91 bytes of data, which is about 130 MB.

The total index size is the total amount of data stored in the SSTs. You have 7.7 GB of data in the SST which includes

Internal keys (move keys if GC was running)
Older versions of the key (badger will store version along with each key and these older versions are eventually removed by compaction)

The histogram shows only valid keys. At any given time, badger would have some valid keys, some internal keys, and some invalid keys.

adwinsky · 2020-03-04T09:48:23Z

My application reads events from Kafka streams. If event is new, it writes it into Badger DB with TTL 1h, if event exists it bumps its TTL to 24h. Only about 1% of event has a duplicate, so 99% of events in DB will expire after an hour.

What was slow? The GC? It would be very useful if you can file another issue so that I can look into this slowness issue.

Both. GC is way slower, with 1.6.0 it takes about 4s to run GC over the segments that doesn't need compaction (returning ErrNoRewrite). For 2.0.1 sometimes it even takes 1m to return same error.

In 1.6.0 my service has great performance and is able to process the stream, in 2.0.1 my service is lagging.

Haven't looked much into that, but I can collect some pprof output and create a ticket for that.

The total index size is the total amount of data stored in the SSTs. You have 7.7 GB of data in the SST which includes

In my app SST compaction seems to be very inefficient. To demonstrate it, I plotted some data on a chart:

I started an app and kept it running till 16:10. Then I stopped the service and I run a simple script, that opens the database, and it runs RunValueLogGC and Flatten in a loop, until there is nothing left that can be compacted. You can see that after, stopping an app, after an hour it reached constant size, but SST files were never compacted. Histogram shows this:

[Summary]
Level 0 size: 0 B
Level 1 size: 3.7 MB
Level 2 size: 0 B
Level 3 size: 674 MB
Total index size: 678 MB
Value log size: 6.4 GB

Histogram of key sizes (in bytes)
Total count: 2337252
Min value: 13
Max value: 26
Mean: 22.89
Range Count
[ 8, 16) 1
[ 16, 32) 2337251

Histogram of value sizes (in bytes)
Total count: 2337252
Min value: 80
Max value: 3263
Mean: 2438.90
Range Count
[ 64, 128) 1
[ 512, 1024) 1
[ 1024, 2048) 113
[ 2048, 4096) 2337137

adwinsky · 2020-03-13T14:55:42Z

#1257 fixed my performance problems and I was able to switch to v2 badger. Unfortunately the problem with increasing SST files happens also in this version. I am creating events with short TTL. After reaching that TTL time, the number of VLOG files stays constant but SST files keep growing linearly

adwinsky · 2020-03-16T12:35:57Z

I created small script that reproduces the problem:

https://gist.github.com/adwinsky/058a9972db502899b9fa097d76d04c70

After running it for 35 mins, it shows me these stats:

Adams-MBP:sandbox3 adwinsky$ go run main.go 2> /tmp/errs  | grep ==
==== sst files: 0 vlog files: 1, 250.288µs
==== sst files: 8 vlog files: 5, 5m0.000977494s
==== sst files: 12 vlog files: 3, 10m0.003889869s
==== sst files: 20 vlog files: 3, 15m0.00808114s
==== sst files: 24 vlog files: 3, 20m0.010168021s
==== sst files: 28 vlog files: 3, 25m0.011417726s
==== sst files: 35 vlog files: 6, 30m0.017283408s
==== sst files: 39 vlog files: 2, 35m0.020704423s

adwinsky · 2020-03-17T15:19:56Z

@jarifibrahim I narrowed down the issue and I hotfixed it for myself.

The problem is that during the compaction none of the records are not marked as 'to skip'.

I am using NumVersionsToKeep = 1 and because of that in this line:

badger/levels.go

Line 582 in 91c31eb

numVersions == s.kv.opt.NumVersionsToKeep

lastValidVersion is always true. So instead of skip the record here:

https://github.com/dgraph-io/badger/blob/master/levels.go#L601

It is expected to be skipped here:

https://github.com/dgraph-io/badger/blob/master/levels.go#L550

But this will never happen, since I have only 1 version of items so y.SameKey(it.Key(), skipKey) is always false.

Unfortunately I couldn't figure the way how to properly fix this, I already invested a lot of time into this, maybe I will be able to look into this later. But this issue is quite important as it affects the performance of GC. More SST files is created more time is spend in GC..

jarifibrahim · 2020-03-18T07:46:40Z

@adwinsky Thank you so much for debugging the issue and nailing down the bug. I'm will try to find some time and look at it.

This definitely looks like something that should be fixed. The SST size should drop after compaction.

sana-jawad · 2020-03-18T18:18:01Z

@adwinsky can you please share your fix?

I am running into a similar issue and contemplating the right fix. In my scenario we don't use TTL. We have a space limitation condition and whenever the db gets bigger than it we collect all the keys that need deletion in batch using
keyToDel := it.Item().KeyCopy(nil)

and then in a separate transaction call delete on all these items.
err := txn.Delete(key)

And I see similar issue that you have pointed out. I don't see SST table size dropping and VLog file count continues to rise. As the condition of size is not met the deletion continues. Our deletion logic runs after =every 30mins.

ou05020 · 2020-03-20T16:30:14Z

Should lastValidVersion be set true if the value has been deleted or expired?

After updating the code to only set lastValidVersion if not deleted, the number of SST files did not grow during testing.

The test performed the following steps:

Add 250K entries with 8 byte keys and 1K values.
Delete previous 250K entries.
Flatten
ValueLog GC w/0.001 Discard Ratio
Scan all entries
Repeat 30 times

Here is the change made to https://github.com/dgraph-io/badger/blob/master/levels.go#L577-L584:
`

deleted := isDeletedOrExpired(vs.Meta, vs.ExpiresAt)

lastValidVersion := false
if !deleted {
	// Keep the current version and discard all the next versions
	// - The `discardEarlierVersions` bit is set OR
	// - We've already processed `NumVersionsToKeep` number of versions
	// (including the current item being processed)
	lastValidVersion = vs.Meta&bitDiscardEarlierVersions > 0 ||
		numVersions == s.kv.opt.NumVersionsToKeep
}
if deleted || lastValidVersion {

`

jarifibrahim · 2020-03-30T15:39:43Z

Edit - Do not set number of versions to keep to zero. If you do so, only the deleted keys will be removed and badger will retain all the other keys. This means badger would store unnecessary data. There were multiple bugs that fixed now. See #1228 (comment)

hey @adwinsky, the keys are not being removed because you've set number of versions to keep to 1. If you set it to zero, the keys will be dropped instantly. I changed your script to set number of versions to keep to 0 https://gist.github.com/jarifibrahim/076afb3a9c054f8b72e093a65cc14828#file-main-go and here's what I see

==== sst files: 4 vlog files: 6, 259.849µs
==== sst files: 4 vlog files: 6, 1m19.617627623s
...
==== sst files: 4 vlog files: 6, 14m13.249432996s
==== sst files: 4 vlog files: 6, 15m18.989894738s
==== sst files: 4 vlog files: 6, 16m30.082259208s
...
==== sst files: 4 vlog files: 6, 55m2.905999518s
==== sst files: 4 vlog files: 6, 57m21.64958008s
==== sst files: 4 vlog files: 6, 58m33.623188144s
...
==== sst files: 4 vlog files: 6, 1h13m4.00361087s
==== sst files: 4 vlog files: 6, 1h14m12.102449542s
==== sst files: 4 vlog files: 6, 1h15m18.292868386s

jarifibrahim · 2020-03-30T15:42:22Z

Should lastValidVersion be set true if the value has been deleted or expired?

@ou05020 For expired keys, we can drop them but for deleted keys, it might be possible that there's an older version in the lower levels and then dropping the delete key would mean your lookups will see the older version stored in the tree.

For instance, Let's say you inserted foo which is stored as foo1 on level 2. Now you deleted foo so it is stored as foo2 in level 0. If we drop foo2 while compacting it to level 1, we'll lose the delete marker. So your LSM tree would have foo1 left. This means a deleted key would up again when you do a lookup.

jarifibrahim · 2020-03-30T15:43:19Z

Edit - Do not set number of versions to keep to zero. If you do so, only the deleted keys will be removed and badger will retain all the other keys. This means badger would store unnecessary data. There were multiple bugs that fixed now. See #1228 (comment)

@sana-jawad, the default value of number of versions to keep is one. So badger will keep one version of the key. If you set the number of versions to keep to zero, badger will drop the deleted key when compaction runs.

adwinsky · 2020-03-31T09:41:22Z

If you set the number of versions to keep to zero, badger will drop the deleted key when compaction runs.

Thank you very much @jarifibrahim for your reply. Seems that NumVersionsToKeep=0 works indeed, but only if it's set from the beginning. If I run a program with NumVersionsToKeep=1 keep it for a while until I have some duplicates created, and then restart the program with NumVersionsToKeep=0, it will never remove old SST files.

But I must say, to me this behaviour seems counter intuitive. NumVersionsToKeep=1 is the default value, if the entry is expired there won't be any access to it anymore. For users who use TTL, produce a lot of events and expect to have constant size of the database it will be problematic. Maybe it is worth considering to change the default to 0?

jarifibrahim · 2020-03-31T13:22:39Z

Edit - Do not set number of versions to keep to zero. If you do so, only the deleted keys will be removed and badger will retain all the other keys. This means badger would store unnecessary data. There were multiple bugs that fixed now. See #1228 (comment)

If I run a program with NumVersionsToKeep=1 keep it for a while until I have some duplicates created, and then restart the program with NumVersionsToKeep=0, it will never remove old SST files.

@adwinsky No. The keys will be dropped when compaction runs. It might take some time for some specific range to be compacted but eventually, all the duplicate keys will be dropped.

For users who use TTL, produce a lot of events and expect to have constant size of the database it will be problematic. Maybe it is worth considering to change the default to 0?

Yes, I also feel the default should be set to 0 but there has to be a reason why we have it at 1. I will try to find out why it is at 1 and then post an update here.

…o plays a big role in memory consumption. (#114) Upgraded Badger version. Added flattening at start up time. Fixed the event count spreading issue which resulted in uneven data distribution across partitions. Moved to drop prefix as it yields better space claim. Added feature flag for switching to delete prefix. Also changed the numberfversions to 0 so that delete prefix would reclaim space. dgraph-io/badger#1228 Fixed the issue of unclaimed !badger!move prefixes which are never cleaned up. Details: dgraph-io/badger#1288 Added support in debugging pages to see internal keys.

luca-moser · 2020-05-03T10:49:12Z

@jarifibrahim can you point us to the correct setting for letting BadgerDB clean deleted keys correctly? My understanding is that #1300 was closed because setting NumVersionsToKeep to zero results actually in infinite version to be kept around. So what does one have to define in order to let BadgerDB clean deleted keys?

jarifibrahim · 2020-06-02T18:54:13Z

Hey @luca-moser, we had a bunch of bugs that were preventing SSTs/vlog to be cleaned up. We've merged all the fixes and the disk-space usage should be much better.
The following are the bugs found by the community and fixed by them. My sincere thanks to all of them

Move keys were not being removed from the system
- Fixed by 3e1cdd9 . Thank you @ou05020
- Also fixed via dropPrefix 1056675 . Thank you @sana-jawad
Bug in compaction due to which deleted/expired keys were not being removed - Fixed via fd89894 . Thanks to @damz and @ou05020

@adwinsky I ran your script on master and here's what I see

==== sst files: 2 vlog files: 2, 5m0.000397361s
==== sst files: 3 vlog files: 2, 10m0.000730725s
==== sst files: 4 vlog files: 2, 15m0.001013494s
==== sst files: 2 vlog files: 2, 20m0.00130372s
==== sst files: 4 vlog files: 2, 25m0.001578543s
==== sst files: 5 vlog files: 1, 30m0.00201562s
==== sst files: 2 vlog files: 1, 35m0.002256777s
==== sst files: 4 vlog files: 2, 40m0.002542917s
==== sst files: 1 vlog files: 2, 45m0.0029382s
==== sst files: 2 vlog files: 2, 50m0.003103286s

It looks like SST cleanup and GC are working correctly. Please do try out the master version of Badger and let me know how it goes.

jarifibrahim · 2020-06-02T19:00:14Z

This issue has been fixed and I'm going to close it. Please feel free to re-open,

crphang · 2020-06-27T06:22:57Z

@jarifibrahim really great work on the fixes, are there any plans for another release soon? I am helping maintain TalariaDB and have checked that latest master resolved many of the issues our users have been facing with GC of value log. Great work on that 🎉.

jarifibrahim · 2020-06-27T06:25:29Z

@crphang Actually, the community fixed it. The main PRs that fixed the GC issues were from community users 🎉

crphang · 2020-06-27T06:28:43Z

@jarifibrahim, sorry to ask again 😅. Are there plans to create a new release with the fixes? I think it'll help a lot for other Badger users 😄

jarifibrahim · 2020-06-27T06:39:57Z

The release is blocked by #1350 . I haven't been able to reproduce the failure easily so far.

jarifibrahim · 2020-06-30T07:47:24Z

@crphang FYI, we found 3 more crashes on master and release is blocked by them.
#1387
#1388
#1389

#1350 has been identified and will be fixed once dgraph-io/ristretto#168 is merged.

jarifibrahim added the kind/question Something requiring a response label Feb 26, 2020

adwinsky mentioned this issue Mar 11, 2020

v2 much slower then v1.6.0 #1255

Closed

adwinsky changed the title ~~GC doesn't work?~~ GC doesn't work? (not cleaning up SST files properly) Mar 16, 2020

jarifibrahim added area/gc Garbage Collection issues. kind/bug Something is broken. priority/P1 Serious issue that requires eventual attention (can wait a bit) status/accepted We accept to investigate or work on it. and removed kind/question Something requiring a response labels Mar 18, 2020

jsign mentioned this issue Mar 20, 2020

GC: Data may remain after garbage collecting everything ipfs/go-ds-badger#54

Open

oncilla mentioned this issue Mar 28, 2020

Badger not cleaning data in expected manner jaegertracing/jaeger#1916

Closed

mschneider82 mentioned this issue Apr 3, 2020

Do not keep old key versions on disk in Badger jaegertracing/jaeger#2150

Closed

jarifibrahim mentioned this issue Apr 14, 2020

Set default value of NumVersionsToKeep to 0 #1300

Closed

sana-jawad mentioned this issue Apr 17, 2020

Sloop was not able to reduce the size on disk after GC runs. This also plays a big role in memory consumption. salesforce/sloop#114

Merged

luca-moser mentioned this issue May 3, 2020

Pruning does not reduce DB size iotaledger/hornet#416

Closed

sana-jawad mentioned this issue May 4, 2020

Updated number of version to keep to 1 salesforce/sloop#117

Merged

jarifibrahim closed this as completed Jun 2, 2020

jarifibrahim mentioned this issue Jun 3, 2020

Expired spans in badger still exists in disk even after long time jaegertracing/jaeger#2266

Closed

ostafen mentioned this issue Aug 9, 2022

V2: RunValueLogGC(): Value log GC attempt didn't result in any cleanup ostafen/clover#94

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GC doesn't work? (not cleaning up SST files properly) #1228

GC doesn't work? (not cleaning up SST files properly) #1228

adwinsky commented Feb 25, 2020

adwinsky commented Feb 25, 2020 •

edited

Loading

jarifibrahim commented Feb 26, 2020

adwinsky commented Feb 26, 2020 •

edited

Loading

adwinsky commented Mar 3, 2020 •

edited

Loading

jarifibrahim commented Mar 3, 2020

adwinsky commented Mar 4, 2020

adwinsky commented Mar 13, 2020

adwinsky commented Mar 16, 2020

adwinsky commented Mar 17, 2020

jarifibrahim commented Mar 18, 2020

sana-jawad commented Mar 18, 2020

ou05020 commented Mar 20, 2020

jarifibrahim commented Mar 30, 2020 •

edited

Loading

jarifibrahim commented Mar 30, 2020

jarifibrahim commented Mar 30, 2020 •

edited

Loading

adwinsky commented Mar 31, 2020

jarifibrahim commented Mar 31, 2020 •

edited

Loading

luca-moser commented May 3, 2020

jarifibrahim commented Jun 2, 2020

jarifibrahim commented Jun 2, 2020

crphang commented Jun 27, 2020

jarifibrahim commented Jun 27, 2020

crphang commented Jun 27, 2020

jarifibrahim commented Jun 27, 2020

jarifibrahim commented Jun 30, 2020

GC doesn't work? (not cleaning up SST files properly) #1228

GC doesn't work? (not cleaning up SST files properly) #1228

Comments

adwinsky commented Feb 25, 2020

What version of Go are you using (go version)?

What version of Badger are you using?

Does this issue reproduce with the latest master?

What are the hardware specifications of the machine (RAM, OS, Disk)?

What did you do?

What did you expect to see?

What did you see instead?

adwinsky commented Feb 25, 2020 • edited Loading

jarifibrahim commented Feb 26, 2020

adwinsky commented Feb 26, 2020 • edited Loading

adwinsky commented Mar 3, 2020 • edited Loading

jarifibrahim commented Mar 3, 2020

adwinsky commented Mar 4, 2020

adwinsky commented Mar 13, 2020

adwinsky commented Mar 16, 2020

adwinsky commented Mar 17, 2020

jarifibrahim commented Mar 18, 2020

sana-jawad commented Mar 18, 2020

ou05020 commented Mar 20, 2020

jarifibrahim commented Mar 30, 2020 • edited Loading

jarifibrahim commented Mar 30, 2020

jarifibrahim commented Mar 30, 2020 • edited Loading

adwinsky commented Mar 31, 2020

jarifibrahim commented Mar 31, 2020 • edited Loading

luca-moser commented May 3, 2020

jarifibrahim commented Jun 2, 2020

jarifibrahim commented Jun 2, 2020

crphang commented Jun 27, 2020

jarifibrahim commented Jun 27, 2020

crphang commented Jun 27, 2020

jarifibrahim commented Jun 27, 2020

jarifibrahim commented Jun 30, 2020

What version of Go are you using (`go version`)?

adwinsky commented Feb 25, 2020 •

edited

Loading

adwinsky commented Feb 26, 2020 •

edited

Loading

adwinsky commented Mar 3, 2020 •

edited

Loading

jarifibrahim commented Mar 30, 2020 •

edited

Loading

jarifibrahim commented Mar 30, 2020 •

edited

Loading

jarifibrahim commented Mar 31, 2020 •

edited

Loading