Merge pull request #1 from dgraph-io/master

Pull from origin master
dgraph-io · Sep 18, 2019 · 99b2416 · 99b2416
2 parents fbcd608 + db73862
commit 99b2416
Show file tree

Hide file tree

Showing 28 changed files with 875 additions and 164 deletions.
diff --git a/README.md b/README.md
@@ -11,7 +11,7 @@ key-value stores like [RocksDB](https://github.com/facebook/rocksdb).
 Badger is stable and is being used to serve data sets worth hundreds of
 terabytes. Badger supports concurrent ACID transactions with serializable
 snapshot isolation (SSI) guarantees. A Jepsen-style bank test runs nightly for
-8h, with `--race` flag and ensures maintainance of transactional guarantees.
+8h, with `--race` flag and ensures the maintenance of transactional guarantees.
 Badger has also been tested to work with filesystem level anomalies, to ensure
 persistence and consistency.
 
@@ -158,7 +158,7 @@ of your application, you have the option to retry the operation if you receive
 this error.
 
 An `ErrTxnTooBig` will be reported in case the number of pending writes/deletes in
-the transaction exceed a certain limit. In that case, it is best to commit the
+the transaction exceeds a certain limit. In that case, it is best to commit the
 transaction and start a new transaction immediately. Here is an example (we are
 not checking for errors in some places for simplicity):
 
@@ -301,7 +301,7 @@ is thread-safe and can be used concurrently via various goroutines.
 Badger would lease a range of integers to hand out from memory, with the
 bandwidth provided to `DB.GetSequence`. The frequency at which disk writes are
 done is determined by this lease bandwidth and the frequency of `Next`
-invocations. Setting a bandwith too low would do more disk writes, setting it
+invocations. Setting a bandwidth too low would do more disk writes, setting it
 too high would result in wasted integers if Badger is closed or crashes.
 To avoid wasted integers, call `Release` before closing Badger.
 
@@ -450,7 +450,7 @@ forward or backward through the keys one at a time.
 
 By default, Badger prefetches the values of the next 100 items. You can adjust
 that with the `IteratorOptions.PrefetchSize` field. However, setting it to
-a value higher than GOMAXPROCS (which we recommend to be 128 or higher)
+a value higher than `GOMAXPROCS` (which we recommend to be 128 or higher)
 shouldn’t give any additional benefits. You can also turn off the fetching of
 values altogether. See section below on key-only iteration.
 
@@ -763,7 +763,7 @@ Below is a list of known projects that use Badger:
 If you are using Badger in a project please send a pull request to add it to the list.
 
 ## Frequently Asked Questions
-- **My writes are getting stuck. Why?**
+### My writes are getting stuck. Why?
 
 **Update: With the new `Value(func(v []byte))` API, this deadlock can no longer
 happen.**
@@ -788,7 +788,7 @@ There are multiple workarounds during iteration:
    iteration. This might be useful if you just want to delete a lot of keys.
 1. Do the writes in a separate transaction after the reads.
 
-- **My writes are really slow. Why?**
+### My writes are really slow. Why?
 
 Are you creating a new transaction for every single key update, and waiting for
 it to `Commit` fully before creating a new one? This will lead to very low
@@ -813,25 +813,25 @@ handle(wb.Flush()) // Wait for all txns to finish.
 Note that `WriteBatch` API does not allow any reads. For read-modify-write
 workloads, you should be using the `Transaction` API.
 
-- **I don't see any disk write. Why?**
+### I don't see any disk writes. Why?
 
 If you're using Badger with `SyncWrites=false`, then your writes might not be written to value log
 and won't get synced to disk immediately. Writes to LSM tree are done inmemory first, before they
 get compacted to disk. The compaction would only happen once `MaxTableSize` has been reached. So, if
 you're doing a few writes and then checking, you might not see anything on disk. Once you `Close`
 the database, you'll see these writes on disk.
 
-- **Reverse iteration doesn't give me the right results.**
+### Reverse iteration doesn't give me the right results.
 
 Just like forward iteration goes to the first key which is equal or greater than the SEEK key, reverse iteration goes to the first key which is equal or lesser than the SEEK key. Therefore, SEEK key would not be part of the results. You can typically add a `0xff` byte as a suffix to the SEEK key to include it in the results. See the following issues: [#436](https://github.com/dgraph-io/badger/issues/436) and [#347](https://github.com/dgraph-io/badger/issues/347).
 
-- **Which instances should I use for Badger?**
+### Which instances should I use for Badger?
 
 We recommend using instances which provide local SSD storage, without any limit
 on the maximum IOPS. In AWS, these are storage optimized instances like i3. They
 provide local SSDs which clock 100K IOPS over 4KB blocks easily.
 
-- **I'm getting a closed channel error. Why?**
+### I'm getting a closed channel error. Why?
 
 ```
 panic: close of closed channel
@@ -840,17 +840,50 @@ panic: send on closed channel
 
 If you're seeing panics like above, this would be because you're operating on a closed DB. This can happen, if you call `Close()` before sending a write, or multiple times. You should ensure that you only call `Close()` once, and all your read/write operations finish before closing.
 
-- **Are there any Go specific settings that I should use?**
+### Are there any Go specific settings that I should use?
 
-We *highly* recommend setting a high number for GOMAXPROCS, which allows Go to
+We *highly* recommend setting a high number for `GOMAXPROCS`, which allows Go to
 observe the full IOPS throughput provided by modern SSDs. In Dgraph, we have set
 it to 128. For more details, [see this
 thread](https://groups.google.com/d/topic/golang-nuts/jPb_h3TvlKE/discussion).
 
-- **Are there any linux specific settings that I should use?**
-
-We recommend setting max file descriptors to a high number depending upon the expected size of you data.
-
+### Are there any Linux specific settings that I should use?
+
+We recommend setting `max file descriptors` to a high number depending upon the expected size of
+your data. On Linux and Mac, you can check the file descriptor limit with `ulimit -n -H` for the
+hard limit and `ulimit -n -S` for the soft limit. A soft limit of `65535` is a good lower bound.
+You can adjust the limit as needed.
+
+### I see "manifest has unsupported version: X (we support Y)" error.
+
+This error means you have a badger directory which was created by an older version of badger and
+you're trying to open in a newer version of badger. The underlying data format can change across
+badger versions and users will have to migrate their data directory.
+Badger data can be migrated from version X of badger to version Y of badger by following the steps
+listed below.
+Assume you were on badger v1.5.5 and you wish to migrate to v2.0.0 version
+1. Install badger version v1.5.5.
+    - `cd $GOPATH/src/github.com/dgraph-io/badger`
+    - `git checkout v1.5.5`
+    - `cd badger && go install`
+
+      This should install the old badger binary in your $GOBIN.
+2. Create Backup
+    - `badger backup --dir path/to/badger/directory -f badger.backup`
+3. Install badger version v2.0.0
+    - `cd $GOPATH/src/github.com/dgraph-io/badger`
+    - `git checkout v2.0.0`
+    - `cd badger && go install`
+
+      This should install new badger binary in your $GOBIN
+4. Install badger version v2.0.0
+    - `badger restore --dir path/to/new/badger/directory -f badger.backup`
+
+      This will create a new directory on `path/to/new/badger/directory` and add badger data in
+      newer format to it.
+
+NOTE - The above steps shouldn't cause any data loss but please ensure the new data is valid before
+deleting the old badger directory.
 ## Contact
 - Please use [discuss.dgraph.io](https://discuss.dgraph.io) for questions, feature requests and discussions.
 - Please use [Github issue tracker](https://github.com/dgraph-io/badger/issues) for filing bugs or feature requests.

diff --git a/appveyor.yml b/appveyor.yml
@@ -11,7 +11,7 @@ clone_folder: c:\gopath\src\github.com\dgraph-io\badger
 
 # Environment variables
 environment:
-  GOVERSION: 1.8.3
+  GOVERSION: 1.12
   GOPATH: c:\gopath
   GO111MODULE: on
 

diff --git a/backup_test.go b/backup_test.go
@@ -76,8 +76,7 @@ func TestBackupRestore1(t *testing.T) {
 	defer os.RemoveAll(dir)
 	bak, err := ioutil.TempFile(dir, "badgerbak")
 	require.NoError(t, err)
-	ts, err := db.Backup(bak, 0)
-	t.Logf("New ts: %d\n", ts)
+	_, err = db.Backup(bak, 0)
 	require.NoError(t, err)
 	require.NoError(t, bak.Close())
 	require.NoError(t, db.Close())

diff --git a/db.go b/db.go
@@ -21,7 +21,6 @@ import (
 	"context"
 	"encoding/binary"
 	"expvar"
-	"io"
 	"math"
 	"os"
 	"path/filepath"
@@ -200,6 +199,8 @@ func Open(opt Options) (db *DB, err error) {
 		return nil, errors.Errorf("Valuethreshold greater than max batch size of %d. Either "+
 			"reduce opt.ValueThreshold or increase opt.MaxTableSize.", opt.maxBatchSize)
 	}
+	// Compact L0 on close if either it is set or if KeepL0InMemory is set.
+	opt.CompactL0OnClose = opt.CompactL0OnClose || opt.KeepL0InMemory
 
 	if opt.ReadOnly {
 		// Can't truncate if the DB is read only.
@@ -270,13 +271,18 @@ func Open(opt Options) (db *DB, err error) {
 		}
 	}()
 
+	elog := y.NoEventLog
+	if opt.EventLogging {
+		elog = trace.NewEventLog("Badger", "DB")
+	}
+
 	db = &DB{
 		imm:           make([]*skl.Skiplist, 0, opt.NumMemtables),
 		flushChan:     make(chan flushTask, opt.NumMemtables),
 		writeCh:       make(chan *request, kvWriteChCapacity),
 		opt:           opt,
 		manifest:      manifestFile,
-		elog:          trace.NewEventLog("Badger", "DB"),
+		elog:          elog,
 		dirLockGuard:  dirLockGuard,
 		valueDirGuard: valueDirLockGuard,
 		orc:           newOracle(opt),
@@ -846,8 +852,8 @@ func arenaSize(opt Options) int64 {
 	return opt.MaxTableSize + opt.maxBatchSize + opt.maxBatchCount*int64(skl.MaxNodeSize)
 }
 
-// WriteLevel0Table flushes memtable.
-func writeLevel0Table(ft flushTask, f io.Writer, bopts table.Options) error {
+// buildL0Table builds a new table from the memtable.
+func buildL0Table(ft flushTask, bopts table.Options) []byte {
 	iter := ft.mt.NewIterator()
 	defer iter.Close()
 	b := table.NewTableBuilder(bopts)
@@ -858,8 +864,7 @@ func writeLevel0Table(ft flushTask, f io.Writer, bopts table.Options) error {
 		}
 		b.Add(iter.Key(), iter.Value())
 	}
-	_, err := f.Write(b.Finish())
-	return err
+	return b.Finish()
 }
 
 type flushTask struct {
@@ -886,28 +891,36 @@ func (db *DB) handleFlushTask(ft flushTask) error {
 	headTs := y.KeyWithTs(head, db.orc.nextTs())
 	ft.mt.Put(headTs, y.ValueStruct{Value: val})
 
+	bopts := table.Options{
+		BlockSize:          db.opt.BlockSize,
+		BloomFalsePositive: db.opt.BloomFalsePositive,
+	}
+	tableData := buildL0Table(ft, bopts)
+
 	fileID := db.lc.reserveFileID()
+	if db.opt.KeepL0InMemory {
+		tbl, err := table.OpenInMemoryTable(tableData, fileID)
+		if err != nil {
+			return errors.Wrapf(err, "failed to open table in memory")
+		}
+		return db.lc.addLevel0Table(tbl)
+	}
+
 	fd, err := y.CreateSyncedFile(table.NewFilename(fileID, db.opt.Dir), true)
 	if err != nil {
 		return y.Wrap(err)
 	}
 
 	// Don't block just to sync the directory entry.
-	dirSyncCh := make(chan error)
+	dirSyncCh := make(chan error, 1)
 	go func() { dirSyncCh <- syncDir(db.opt.Dir) }()
 
-	bopts := table.Options{
-		BlockSize:          db.opt.BlockSize,
-		BloomFalsePositive: db.opt.BloomFalsePositive,
-	}
-	err = writeLevel0Table(ft, fd, bopts)
-	dirSyncErr := <-dirSyncCh
-
-	if err != nil {
+	if _, err = fd.Write(tableData); err != nil {
 		db.elog.Errorf("ERROR while writing to level 0: %v", err)
 		return err
 	}
-	if dirSyncErr != nil {
+
+	if dirSyncErr := <-dirSyncCh; dirSyncErr != nil {
 		// Do dir sync as best effort. No need to return due to an error there.
 		db.elog.Errorf("ERROR while syncing level directory: %v", dirSyncErr)
 	}
@@ -922,7 +935,7 @@ func (db *DB) handleFlushTask(ft flushTask) error {
 		return err
 	}
 	// We own a ref on tbl.
-	err = db.lc.addLevel0Table(tbl) // This will incrRef (if we don't error, sure)
+	err = db.lc.addLevel0Table(tbl) // This will incrRef
 	_ = tbl.DecrRef()               // Releases our ref.
 	return err
 }
@@ -1030,7 +1043,7 @@ func (db *DB) updateSize(lc *y.Closer) {
 // RunValueLogGC triggers a value log garbage collection.
 //
 // It picks value log files to perform GC based on statistics that are collected
-// duing compactions.  If no such statistics are available, then log files are
+// during compactions.  If no such statistics are available, then log files are
 // picked in random order. The process stops as soon as the first log file is
 // encountered which does not result in garbage collection.
 //