Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing stops after hitting ValueLogFileSize #293

Closed
JeanMertz opened this issue Oct 28, 2017 · 7 comments
Closed

Writing stops after hitting ValueLogFileSize #293

JeanMertz opened this issue Oct 28, 2017 · 7 comments
Assignees
Labels
kind/question Something requiring a response

Comments

@JeanMertz
Copy link

JeanMertz commented Oct 28, 2017

I'm storing about ~100 million records in Badger. I can see the initial log file increasing in size, and using debug output, I can see all the key/value pairs being written. But then once the max size for the first log file hits, the loop blocks, and nothing is written anymore.

I'm still debugging, so I hope to find the solution, but still wanted to report it anyway.

Version

5240a8f

Data directory

ls -la /var/folders/h6/wgjjkq6n1b363v1tkggbg0540000gn/T/tmp.6Oq86sL1/
total 3218360
-rw-r--r--    1 jean  staff  1073741947 Oct 28 15:28 000000.vlog
-rw-r--r--    1 jean  staff    69905136 Oct 28 15:28 000025.sst
-rw-r--r--    1 jean  staff    69905131 Oct 28 15:28 000026.sst
-rw-r--r--    1 jean  staff    69905161 Oct 28 15:28 000028.sst
-rw-r--r--    1 jean  staff    69905128 Oct 28 15:28 000029.sst
-rw-r--r--    1 jean  staff     3235971 Oct 28 15:28 000031.sst
-rw-r--r--    1 jean  staff    69905175 Oct 28 15:28 000032.sst
-rw-r--r--    1 jean  staff    69905168 Oct 28 15:28 000033.sst
-rw-r--r--    1 jean  staff    68058152 Oct 28 15:28 000034.sst
-rw-r--r--    1 jean  staff    69195086 Oct 28 15:28 000035.sst
-rw-r--r--    1 jean  staff           5 Oct 28 15:24 LOCK
-rw-r--r--    1 jean  staff         572 Oct 28 15:28 MANIFEST

Options

{
  "Dir": "/var/folders/h6/wgjjkq6n1b363v1tkggbg0540000gn/T/tmp.J7spOWWV",
  "DoNotCompact": false,
  "LevelOneSize": 268435456,
  "LevelSizeMultiplier": 10,
  "MaxLevels": 7,
  "MaxTableSize": 67108864,
  "NumCompactors": 3,
  "NumLevelZeroTables": 5,
  "NumLevelZeroTablesStall": 10,
  "NumMemtables": 5,
  "SyncWrites": true,
  "TableLoadingMode": 1,
  "ValueDir": "/var/folders/h6/wgjjkq6n1b363v1tkggbg0540000gn/T/tmp.J7spOWWV",
  "ValueLogFileSize": 1073741824,
  "ValueThreshold": 20
}
@JeanMertz
Copy link
Author

I tried running against the latest version at 2dc0fdc, but am seeing the same results.

I also set ValueLogFileSize to 2 << 30, but this caused the y.ErrEOF error after a brief time:

End of mapped region
  .../vendor/github.com/dgraph-io/badger/y.init
  .../vendor/github.com/dgraph-io/badger/y/y.go:32
  .../vendor/github.com/dgraph-io/badger/table.init
  .../vendor/github.com/dgraph-io/badger.init

Setting it to 2 << 29 worked, but this is unexpected, as 2 << 30 is considered to be the maximum allowed size:

badger/db.go

Lines 211 to 213 in 5240a8f

if !(opt.ValueLogFileSize <= 2<<30 && opt.ValueLogFileSize >= 1<<20) {
return nil, ErrValueLogSize
}

Using 2 << 29 also made the script work as expected, with the following stored data:

ls -lah /var/folders/h6/wgjjkq6n1b363v1tkggbg0540000gn/T/tmp.ARlnU1Uc/
total 4825384
-rw-r--r--    1 jean  staff   512M Oct 28 16:16 000000.vlog
-rw-r--r--    1 jean  staff   512M Oct 28 16:17 000001.vlog
-rw-r--r--    1 jean  staff   486M Oct 28 16:18 000002.vlog
-rw-r--r--    1 jean  staff    67M Oct 28 16:17 000032.sst
-rw-r--r--    1 jean  staff    67M Oct 28 16:17 000033.sst
-rw-r--r--    1 jean  staff    65M Oct 28 16:17 000035.sst
-rw-r--r--    1 jean  staff    67M Oct 28 16:18 000041.sst
-rw-r--r--    1 jean  staff    67M Oct 28 16:18 000042.sst
-rw-r--r--    1 jean  staff    67M Oct 28 16:18 000045.sst
-rw-r--r--    1 jean  staff    44M Oct 28 16:18 000046.sst
-rw-r--r--    1 jean  staff    67M Oct 28 16:18 000047.sst
-rw-r--r--    1 jean  staff    67M Oct 28 16:18 000048.sst
-rw-r--r--    1 jean  staff    65M Oct 28 16:18 000049.sst
-rw-r--r--    1 jean  staff    65M Oct 28 16:18 000050.sst
-rw-r--r--    1 jean  staff    46M Oct 28 16:18 000051.sst
-rw-r--r--    1 jean  staff    41M Oct 28 16:18 000052.sst
-rw-r--r--    1 jean  staff    45M Oct 28 16:18 000053.sst
-rw-r--r--    1 jean  staff   9.2M Oct 28 16:18 000054.sst
-rw-r--r--    1 jean  staff   842B Oct 28 16:18 MANIFEST

I'm guessing this only works until the data exceeds the new max file limit, so it's not really fixed, but at least I can work with this right now.

@JeanMertz
Copy link
Author

I'm not 100% sure yet, but preliminary results show that switching to manually managing the transactions instead of using the View and Update functions, has solved this issue for me.

I don't have any real metrics to report other than seeing that I no longer have the issue after rewriting some parts of the logic (and doing away with the above functions). If I find anything else, I'll let you know.

@JeanMertz
Copy link
Author

^ Part of this rewrite was also to split up the Get from the Set, which was inside a single Update transaction before, whereas now I first use the read iterator to get all the values I need, and then perform a single set operation when required. I just wanted to put this here, in case this might actually be the cause.

@manishrjain
Copy link
Contributor

NewIterator returns a new iterator. Depending upon the options, either only keys, or both key-value pairs would be fetched. The keys are returned in lexicographically sorted order. Using prefetch is highly recommended if you're doing a long running iteration. Avoid long running iterations in update transactions.

Can you paste your code here? Are you using prefetch in your iteration? Without it, the locks over the value log would not be released until txn ends. For a long-running iteration, which is also simultaneously updating, these read locks would block any value log update; which is why your updates are stopping once the value log size is reached (unable to create a new value log file).

@JeanMertz
Copy link
Author

Actually, in my original post I wasn't using any of the iterators, I was doing my own for loop, and made sure the transactions where small enough to not hit the "transaction too big" error.

Still though, I did start a lot of goroutines to handle concurrent updates, so perhaps I was hitting a limit there.

But, the fact that it stopt exactly at the 1073741947 mark (i.e. 1 << 30), and no new files where created after hitting this limit, makes me think it was probably related to that.

Either way, I'll see if I can get some code from the setup I had before, to share with you.

@manishrjain
Copy link
Contributor

Okay. If your code is working around this limitation, great. Otherwise, we could add an API to explicitly release the item after a Get -- which would allow the lock to be released and new value logs to be created.

@manishrjain manishrjain self-assigned this Oct 30, 2017
@manishrjain manishrjain added the kind/question Something requiring a response label Oct 30, 2017
@deepakjois
Copy link
Contributor

Not sure if anything more needs to be done here. If so, pls re-open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/question Something requiring a response
Development

No branches or pull requests

3 participants