-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GC doesn't work? (not cleaning up SST files properly) #1228
Comments
I should also add that before I patched GC I was suffering the same issue. |
Hey @adwinsky, the GC definitely works :) The GC doesn't instantly clean up space (which we should definitely improve) but the disk space should eventually decrease. Here's a related issue which shows that GC works #1006 I've replied to your original discuss post https://discuss.dgraph.io/t/how-to-clean-up-values-for-old-versions-of-items/6020/4 |
Hello @jarifibrahim, I opened second ticket because I am pretty sure I see a bug here. I made a backup of the database for which I added histogram in my previous post. In the meantime most of the entries has expired. I changed the value of RunGC function to RunValueLogGC(0.001). I started a script running GC in a loop (without any modifications of the badger code), initially it cleaned up some segments (as some of the records has expired since the last run), but after 30 minutes of returning: Value log GC attempt didn't result in any cleanup My database size is 48 GB,has 34 vlog files 213 sst files and this is a result of histogram: Histogram of key sizes (in bytes) Histogram of value sizes (in bytes) Badger needs 50 GB to store 16k records? Clearly there is a problem with GC :( |
I figured the version I am using (1.6.0) was not including the #1006 fix you liked to. I cherry picked that and now it seems vlog files are cleaned up correctly. I see an issue with sst files though. In my database I have: 15 vlog files, 124 sst files Histogram shows me following stats: [Summary] Histogram of key sizes (in bytes) So the size of the index is 7.7 GB while it has only 5872847*22.91 bytes of data, which is about 130 MB. Is there any other commit I need to pick up to my 1.6.0 version in order to fix that? I tried using the most recent version but it was way slower for me. |
What was slow? The GC? It would be very useful if you can file another issue so that I can look into this slowness issue.
The total index size is the total amount of data stored in the SSTs. You have 7.7 GB of data in the SST which includes
The |
My application reads events from Kafka streams. If event is new, it writes it into Badger DB with TTL 1h, if event exists it bumps its TTL to 24h. Only about 1% of event has a duplicate, so 99% of events in DB will expire after an hour.
Both. GC is way slower, with 1.6.0 it takes about 4s to run GC over the segments that doesn't need compaction (returning ErrNoRewrite). For 2.0.1 sometimes it even takes 1m to return same error. In 1.6.0 my service has great performance and is able to process the stream, in 2.0.1 my service is lagging. Haven't looked much into that, but I can collect some pprof output and create a ticket for that.
In my app SST compaction seems to be very inefficient. To demonstrate it, I plotted some data on a chart: I started an app and kept it running till 16:10. Then I stopped the service and I run a simple script, that opens the database, and it runs RunValueLogGC and Flatten in a loop, until there is nothing left that can be compacted. You can see that after, stopping an app, after an hour it reached constant size, but SST files were never compacted. Histogram shows this: [Summary] Histogram of key sizes (in bytes) Histogram of value sizes (in bytes) |
#1257 fixed my performance problems and I was able to switch to v2 badger. Unfortunately the problem with increasing SST files happens also in this version. I am creating events with short TTL. After reaching that TTL time, the number of VLOG files stays constant but SST files keep growing linearly |
I created small script that reproduces the problem: https://gist.github.com/adwinsky/058a9972db502899b9fa097d76d04c70 After running it for 35 mins, it shows me these stats: Adams-MBP:sandbox3 adwinsky$ go run main.go 2> /tmp/errs | grep == ==== sst files: 0 vlog files: 1, 250.288µs ==== sst files: 8 vlog files: 5, 5m0.000977494s ==== sst files: 12 vlog files: 3, 10m0.003889869s ==== sst files: 20 vlog files: 3, 15m0.00808114s ==== sst files: 24 vlog files: 3, 20m0.010168021s ==== sst files: 28 vlog files: 3, 25m0.011417726s ==== sst files: 35 vlog files: 6, 30m0.017283408s ==== sst files: 39 vlog files: 2, 35m0.020704423s |
@jarifibrahim I narrowed down the issue and I hotfixed it for myself. The problem is that during the compaction none of the records are not marked as 'to skip'. I am using NumVersionsToKeep = 1 and because of that in this line: Line 582 in 91c31eb
lastValidVersion is always true. So instead of skip the record here: https://github.com/dgraph-io/badger/blob/master/levels.go#L601 It is expected to be skipped here: https://github.com/dgraph-io/badger/blob/master/levels.go#L550 But this will never happen, since I have only 1 version of items so Unfortunately I couldn't figure the way how to properly fix this, I already invested a lot of time into this, maybe I will be able to look into this later. But this issue is quite important as it affects the performance of GC. More SST files is created more time is spend in GC.. |
@adwinsky Thank you so much for debugging the issue and nailing down the bug. I'm will try to find some time and look at it. This definitely looks like something that should be fixed. The SST size should drop after compaction. |
@adwinsky can you please share your fix? I am running into a similar issue and contemplating the right fix. In my scenario we don't use TTL. We have a space limitation condition and whenever the db gets bigger than it we collect all the keys that need deletion in batch using and then in a separate transaction call delete on all these items. And I see similar issue that you have pointed out. I don't see SST table size dropping and VLog file count continues to rise. As the condition of size is not met the deletion continues. Our deletion logic runs after =every 30mins. |
Should After updating the code to only set The test performed the following steps:
Here is the change made to https://github.com/dgraph-io/badger/blob/master/levels.go#L577-L584:
` |
Edit - Do not set hey @adwinsky, the keys are not being removed because you've set number of versions to keep to
|
@ou05020 For expired keys, we can drop them but for deleted keys, it might be possible that there's an older version in the lower levels and then dropping the delete key would mean your lookups will see the older version stored in the tree. For instance, Let's say you inserted |
Edit - Do not set @sana-jawad, the default value of number of versions to keep is |
Thank you very much @jarifibrahim for your reply. Seems that NumVersionsToKeep=0 works indeed, but only if it's set from the beginning. If I run a program with NumVersionsToKeep=1 keep it for a while until I have some duplicates created, and then restart the program with NumVersionsToKeep=0, it will never remove old SST files. But I must say, to me this behaviour seems counter intuitive. NumVersionsToKeep=1 is the default value, if the entry is expired there won't be any access to it anymore. For users who use TTL, produce a lot of events and expect to have constant size of the database it will be problematic. Maybe it is worth considering to change the default to 0? |
Edit - Do not set
@adwinsky No. The keys will be dropped when compaction runs. It might take some time for some specific range to be compacted but eventually, all the duplicate keys will be dropped.
Yes, I also feel the default should be set to |
…o plays a big role in memory consumption. (#114) Upgraded Badger version. Added flattening at start up time. Fixed the event count spreading issue which resulted in uneven data distribution across partitions. Moved to drop prefix as it yields better space claim. Added feature flag for switching to delete prefix. Also changed the numberfversions to 0 so that delete prefix would reclaim space. dgraph-io/badger#1228 Fixed the issue of unclaimed !badger!move prefixes which are never cleaned up. Details: dgraph-io/badger#1288 Added support in debugging pages to see internal keys.
@jarifibrahim can you point us to the correct setting for letting BadgerDB clean deleted keys correctly? My understanding is that #1300 was closed because setting |
Hey @luca-moser, we had a bunch of bugs that were preventing SSTs/vlog to be cleaned up. We've merged all the fixes and the disk-space usage should be much better.
@adwinsky I ran your script on master and here's what I see
It looks like SST cleanup and GC are working correctly. Please do try out the master version of Badger and let me know how it goes. |
This issue has been fixed and I'm going to close it. Please feel free to re-open, |
@jarifibrahim really great work on the fixes, are there any plans for another release soon? I am helping maintain TalariaDB and have checked that latest master resolved many of the issues our users have been facing with GC of value log. Great work on that 🎉. |
@crphang Actually, the community fixed it. The main PRs that fixed the GC issues were from community users 🎉 |
@jarifibrahim, sorry to ask again 😅. Are there plans to create a new release with the fixes? I think it'll help a lot for other Badger users 😄 |
The release is blocked by #1350 . I haven't been able to reproduce the failure easily so far. |
What version of Go are you using (
go version
)?What version of Badger are you using?
v1.6.0
opts := badger.DefaultOptions(fmt.Sprintf(dir + "/" + name))
opts.SyncWrites = false
opts.ValueLogLoadingMode = options.FileIO
Does this issue reproduce with the latest master?
With the latest master GC becomes much slower
What are the hardware specifications of the machine (RAM, OS, Disk)?
2TB NVME drive, 128 GB RAM
What did you do?
I have a Kafka topic with 12 partitions. For every partition I create a database. Each database grows quite quickly (about 12*30GB per hour) and the TTL for most of the events is 1h, so the size should stay at constant level. Now for every partition I create a separate transaction and I process read and write operations sequentially, there is no concurrency, when the transaction is getting to big I commit it, in separate go-routine I start RunValueLogGC(0.5). Most of GC runs end up with ErrNoRewrite. Even tried to repeat RunValueLogGC until I have 5 errors in the row, but still I was running out of disk space quite quickly. My current fix is to patch the Badger GC, make it run on every fid that is before the head. This works fine, but eventually becomes slow when I have too many log files.
What did you expect to see?
The size of each of twelve databases I created, should stay at constant level and has less then 20 GB
What did you see instead?
After running it for a day, if I look at one of twelve databases, I see 210 sst files, 68 vlog files, db size is 84 GB (and these numbers keep growing).
If I run badger histogram it shows me this stats:
Histogram of key sizes (in bytes)
Total count: 4499955
Min value: 13
Max value: 108
Mean: 22.92
Range Count
[ 8, 16) 2
[ 16, 32) 4499939
[ 64, 128) 14
Histogram of value sizes (in bytes)
Total count: 4499955
Min value: 82
Max value: 3603
Mean: 2428.16
Range Count
[ 64, 128) 1
[ 256, 512) 19301
[ 512, 1024) 459
[ 1024, 2048) 569
[ 2048, 4096) 4479625
2428*4479625=10GB
The text was updated successfully, but these errors were encountered: