Skip to content

Commit

Permalink
Merge pull request #1 from dgraph-io/master
Browse files Browse the repository at this point in the history
Pull from origin master
  • Loading branch information
ehsannm authored Sep 18, 2019
2 parents fbcd608 + db73862 commit 99b2416
Show file tree
Hide file tree
Showing 28 changed files with 875 additions and 164 deletions.
65 changes: 49 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ key-value stores like [RocksDB](https://github.com/facebook/rocksdb).
Badger is stable and is being used to serve data sets worth hundreds of
terabytes. Badger supports concurrent ACID transactions with serializable
snapshot isolation (SSI) guarantees. A Jepsen-style bank test runs nightly for
8h, with `--race` flag and ensures maintainance of transactional guarantees.
8h, with `--race` flag and ensures the maintenance of transactional guarantees.
Badger has also been tested to work with filesystem level anomalies, to ensure
persistence and consistency.

Expand Down Expand Up @@ -158,7 +158,7 @@ of your application, you have the option to retry the operation if you receive
this error.

An `ErrTxnTooBig` will be reported in case the number of pending writes/deletes in
the transaction exceed a certain limit. In that case, it is best to commit the
the transaction exceeds a certain limit. In that case, it is best to commit the
transaction and start a new transaction immediately. Here is an example (we are
not checking for errors in some places for simplicity):

Expand Down Expand Up @@ -301,7 +301,7 @@ is thread-safe and can be used concurrently via various goroutines.
Badger would lease a range of integers to hand out from memory, with the
bandwidth provided to `DB.GetSequence`. The frequency at which disk writes are
done is determined by this lease bandwidth and the frequency of `Next`
invocations. Setting a bandwith too low would do more disk writes, setting it
invocations. Setting a bandwidth too low would do more disk writes, setting it
too high would result in wasted integers if Badger is closed or crashes.
To avoid wasted integers, call `Release` before closing Badger.

Expand Down Expand Up @@ -450,7 +450,7 @@ forward or backward through the keys one at a time.

By default, Badger prefetches the values of the next 100 items. You can adjust
that with the `IteratorOptions.PrefetchSize` field. However, setting it to
a value higher than GOMAXPROCS (which we recommend to be 128 or higher)
a value higher than `GOMAXPROCS` (which we recommend to be 128 or higher)
shouldn’t give any additional benefits. You can also turn off the fetching of
values altogether. See section below on key-only iteration.

Expand Down Expand Up @@ -763,7 +763,7 @@ Below is a list of known projects that use Badger:
If you are using Badger in a project please send a pull request to add it to the list.

## Frequently Asked Questions
- **My writes are getting stuck. Why?**
### My writes are getting stuck. Why?

**Update: With the new `Value(func(v []byte))` API, this deadlock can no longer
happen.**
Expand All @@ -788,7 +788,7 @@ There are multiple workarounds during iteration:
iteration. This might be useful if you just want to delete a lot of keys.
1. Do the writes in a separate transaction after the reads.

- **My writes are really slow. Why?**
### My writes are really slow. Why?

Are you creating a new transaction for every single key update, and waiting for
it to `Commit` fully before creating a new one? This will lead to very low
Expand All @@ -813,25 +813,25 @@ handle(wb.Flush()) // Wait for all txns to finish.
Note that `WriteBatch` API does not allow any reads. For read-modify-write
workloads, you should be using the `Transaction` API.

- **I don't see any disk write. Why?**
### I don't see any disk writes. Why?

If you're using Badger with `SyncWrites=false`, then your writes might not be written to value log
and won't get synced to disk immediately. Writes to LSM tree are done inmemory first, before they
get compacted to disk. The compaction would only happen once `MaxTableSize` has been reached. So, if
you're doing a few writes and then checking, you might not see anything on disk. Once you `Close`
the database, you'll see these writes on disk.

- **Reverse iteration doesn't give me the right results.**
### Reverse iteration doesn't give me the right results.

Just like forward iteration goes to the first key which is equal or greater than the SEEK key, reverse iteration goes to the first key which is equal or lesser than the SEEK key. Therefore, SEEK key would not be part of the results. You can typically add a `0xff` byte as a suffix to the SEEK key to include it in the results. See the following issues: [#436](https://github.com/dgraph-io/badger/issues/436) and [#347](https://github.com/dgraph-io/badger/issues/347).

- **Which instances should I use for Badger?**
### Which instances should I use for Badger?

We recommend using instances which provide local SSD storage, without any limit
on the maximum IOPS. In AWS, these are storage optimized instances like i3. They
provide local SSDs which clock 100K IOPS over 4KB blocks easily.

- **I'm getting a closed channel error. Why?**
### I'm getting a closed channel error. Why?

```
panic: close of closed channel
Expand All @@ -840,17 +840,50 @@ panic: send on closed channel

If you're seeing panics like above, this would be because you're operating on a closed DB. This can happen, if you call `Close()` before sending a write, or multiple times. You should ensure that you only call `Close()` once, and all your read/write operations finish before closing.

- **Are there any Go specific settings that I should use?**
### Are there any Go specific settings that I should use?

We *highly* recommend setting a high number for GOMAXPROCS, which allows Go to
We *highly* recommend setting a high number for `GOMAXPROCS`, which allows Go to
observe the full IOPS throughput provided by modern SSDs. In Dgraph, we have set
it to 128. For more details, [see this
thread](https://groups.google.com/d/topic/golang-nuts/jPb_h3TvlKE/discussion).

- **Are there any linux specific settings that I should use?**

We recommend setting max file descriptors to a high number depending upon the expected size of you data.

### Are there any Linux specific settings that I should use?

We recommend setting `max file descriptors` to a high number depending upon the expected size of
your data. On Linux and Mac, you can check the file descriptor limit with `ulimit -n -H` for the
hard limit and `ulimit -n -S` for the soft limit. A soft limit of `65535` is a good lower bound.
You can adjust the limit as needed.

### I see "manifest has unsupported version: X (we support Y)" error.

This error means you have a badger directory which was created by an older version of badger and
you're trying to open in a newer version of badger. The underlying data format can change across
badger versions and users will have to migrate their data directory.
Badger data can be migrated from version X of badger to version Y of badger by following the steps
listed below.
Assume you were on badger v1.5.5 and you wish to migrate to v2.0.0 version
1. Install badger version v1.5.5.
- `cd $GOPATH/src/github.com/dgraph-io/badger`
- `git checkout v1.5.5`
- `cd badger && go install`

This should install the old badger binary in your $GOBIN.
2. Create Backup
- `badger backup --dir path/to/badger/directory -f badger.backup`
3. Install badger version v2.0.0
- `cd $GOPATH/src/github.com/dgraph-io/badger`
- `git checkout v2.0.0`
- `cd badger && go install`

This should install new badger binary in your $GOBIN
4. Install badger version v2.0.0
- `badger restore --dir path/to/new/badger/directory -f badger.backup`

This will create a new directory on `path/to/new/badger/directory` and add badger data in
newer format to it.

NOTE - The above steps shouldn't cause any data loss but please ensure the new data is valid before
deleting the old badger directory.
## Contact
- Please use [discuss.dgraph.io](https://discuss.dgraph.io) for questions, feature requests and discussions.
- Please use [Github issue tracker](https://github.com/dgraph-io/badger/issues) for filing bugs or feature requests.
Expand Down
2 changes: 1 addition & 1 deletion appveyor.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ clone_folder: c:\gopath\src\github.com\dgraph-io\badger

# Environment variables
environment:
GOVERSION: 1.8.3
GOVERSION: 1.12
GOPATH: c:\gopath
GO111MODULE: on

Expand Down
3 changes: 1 addition & 2 deletions backup_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -76,8 +76,7 @@ func TestBackupRestore1(t *testing.T) {
defer os.RemoveAll(dir)
bak, err := ioutil.TempFile(dir, "badgerbak")
require.NoError(t, err)
ts, err := db.Backup(bak, 0)
t.Logf("New ts: %d\n", ts)
_, err = db.Backup(bak, 0)
require.NoError(t, err)
require.NoError(t, bak.Close())
require.NoError(t, db.Close())
Expand Down
49 changes: 31 additions & 18 deletions db.go
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,6 @@ import (
"context"
"encoding/binary"
"expvar"
"io"
"math"
"os"
"path/filepath"
Expand Down Expand Up @@ -200,6 +199,8 @@ func Open(opt Options) (db *DB, err error) {
return nil, errors.Errorf("Valuethreshold greater than max batch size of %d. Either "+
"reduce opt.ValueThreshold or increase opt.MaxTableSize.", opt.maxBatchSize)
}
// Compact L0 on close if either it is set or if KeepL0InMemory is set.
opt.CompactL0OnClose = opt.CompactL0OnClose || opt.KeepL0InMemory

if opt.ReadOnly {
// Can't truncate if the DB is read only.
Expand Down Expand Up @@ -270,13 +271,18 @@ func Open(opt Options) (db *DB, err error) {
}
}()

elog := y.NoEventLog
if opt.EventLogging {
elog = trace.NewEventLog("Badger", "DB")
}

db = &DB{
imm: make([]*skl.Skiplist, 0, opt.NumMemtables),
flushChan: make(chan flushTask, opt.NumMemtables),
writeCh: make(chan *request, kvWriteChCapacity),
opt: opt,
manifest: manifestFile,
elog: trace.NewEventLog("Badger", "DB"),
elog: elog,
dirLockGuard: dirLockGuard,
valueDirGuard: valueDirLockGuard,
orc: newOracle(opt),
Expand Down Expand Up @@ -846,8 +852,8 @@ func arenaSize(opt Options) int64 {
return opt.MaxTableSize + opt.maxBatchSize + opt.maxBatchCount*int64(skl.MaxNodeSize)
}

// WriteLevel0Table flushes memtable.
func writeLevel0Table(ft flushTask, f io.Writer, bopts table.Options) error {
// buildL0Table builds a new table from the memtable.
func buildL0Table(ft flushTask, bopts table.Options) []byte {
iter := ft.mt.NewIterator()
defer iter.Close()
b := table.NewTableBuilder(bopts)
Expand All @@ -858,8 +864,7 @@ func writeLevel0Table(ft flushTask, f io.Writer, bopts table.Options) error {
}
b.Add(iter.Key(), iter.Value())
}
_, err := f.Write(b.Finish())
return err
return b.Finish()
}

type flushTask struct {
Expand All @@ -886,28 +891,36 @@ func (db *DB) handleFlushTask(ft flushTask) error {
headTs := y.KeyWithTs(head, db.orc.nextTs())
ft.mt.Put(headTs, y.ValueStruct{Value: val})

bopts := table.Options{
BlockSize: db.opt.BlockSize,
BloomFalsePositive: db.opt.BloomFalsePositive,
}
tableData := buildL0Table(ft, bopts)

fileID := db.lc.reserveFileID()
if db.opt.KeepL0InMemory {
tbl, err := table.OpenInMemoryTable(tableData, fileID)
if err != nil {
return errors.Wrapf(err, "failed to open table in memory")
}
return db.lc.addLevel0Table(tbl)
}

fd, err := y.CreateSyncedFile(table.NewFilename(fileID, db.opt.Dir), true)
if err != nil {
return y.Wrap(err)
}

// Don't block just to sync the directory entry.
dirSyncCh := make(chan error)
dirSyncCh := make(chan error, 1)
go func() { dirSyncCh <- syncDir(db.opt.Dir) }()

bopts := table.Options{
BlockSize: db.opt.BlockSize,
BloomFalsePositive: db.opt.BloomFalsePositive,
}
err = writeLevel0Table(ft, fd, bopts)
dirSyncErr := <-dirSyncCh

if err != nil {
if _, err = fd.Write(tableData); err != nil {
db.elog.Errorf("ERROR while writing to level 0: %v", err)
return err
}
if dirSyncErr != nil {

if dirSyncErr := <-dirSyncCh; dirSyncErr != nil {
// Do dir sync as best effort. No need to return due to an error there.
db.elog.Errorf("ERROR while syncing level directory: %v", dirSyncErr)
}
Expand All @@ -922,7 +935,7 @@ func (db *DB) handleFlushTask(ft flushTask) error {
return err
}
// We own a ref on tbl.
err = db.lc.addLevel0Table(tbl) // This will incrRef (if we don't error, sure)
err = db.lc.addLevel0Table(tbl) // This will incrRef
_ = tbl.DecrRef() // Releases our ref.
return err
}
Expand Down Expand Up @@ -1030,7 +1043,7 @@ func (db *DB) updateSize(lc *y.Closer) {
// RunValueLogGC triggers a value log garbage collection.
//
// It picks value log files to perform GC based on statistics that are collected
// duing compactions. If no such statistics are available, then log files are
// during compactions. If no such statistics are available, then log files are
// picked in random order. The process stops as soon as the first log file is
// encountered which does not result in garbage collection.
//
Expand Down
Loading

0 comments on commit 99b2416

Please sign in to comment.