Skip to content
This repository has been archived by the owner on Mar 9, 2019. It is now read-only.

Speedup open huge file #410

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 20 additions & 10 deletions db.go
Original file line number Diff line number Diff line change
Expand Up @@ -115,10 +115,11 @@ type DB struct {
batchMu sync.Mutex
batch *batch

rwlock sync.Mutex // Allows only one writer at a time.
metalock sync.Mutex // Protects meta page access.
mmaplock sync.RWMutex // Protects mmap access during remapping.
statlock sync.RWMutex // Protects stats access.
rwlock sync.Mutex // Allows only one writer at a time.
metalock sync.Mutex // Protects meta page access.
mmaplock sync.RWMutex // Protects mmap access during remapping.
statlock sync.RWMutex // Protects stats access.
freelistonce sync.Once // Protects reading freelist from file.

ops struct {
writeAt func(b []byte, off int64) (n int, err error)
Expand Down Expand Up @@ -232,10 +233,6 @@ func Open(path string, mode os.FileMode, options *Options) (*DB, error) {
return nil, err
}

// Read in the freelist.
db.freelist = newFreelist()
db.freelist.read(db.page(db.meta().freelist))

// Mark the database as opened and return.
return db, nil
}
Expand Down Expand Up @@ -439,6 +436,15 @@ func (db *DB) close() error {
return nil
}

func (db *DB) ensureFreelist() *freelist {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain more how this lazy load helps?

It appears that you're trading

  • an eager read (that will happen eventually anyway?)
  • a regular pointer load
  • straightforward code

for

  • a lazy read
  • an atomic pointer load
  • less obvious code

I'd like to see more detail about the benefits to understand why this is worth doing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you open file only for read, you don't need to read freelist at all ever.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first commit in this series handles that already, no?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes you want to have single code for both readonly open and readwrite simply cause you don't know will you write or not.

Anyway, I was never confident last commit will be accepted.

db.freelistonce.Do(func() {
fl := newFreelist()
fl.read(db.page(db.meta().freelist))
db.freelist = fl
})
return db.freelist
}

// Begin starts a new transaction.
// Multiple read-only transactions can be used concurrently but only one
// write transaction can be used at a time. Starting multiple write transactions
Expand Down Expand Up @@ -514,11 +520,11 @@ func (db *DB) beginRWTx() (*Tx, error) {
// Once we have the writer lock then we can lock the meta pages so that
// we can set up the transaction.
db.metalock.Lock()
defer db.metalock.Unlock()

// Exit if the database is not open yet.
if !db.opened {
db.rwlock.Unlock()
db.metalock.Unlock()
return nil, ErrDatabaseNotOpen
}

Expand All @@ -534,10 +540,14 @@ func (db *DB) beginRWTx() (*Tx, error) {
minid = t.meta.txid
}
}

// Release meta here cause ensureFreelist can take a long time
db.metalock.Unlock()

db.ensureFreelist()
if minid > 0 {
db.freelist.release(minid - 1)
}

return t, nil
}

Expand Down
31 changes: 17 additions & 14 deletions freelist.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,16 +9,16 @@ import (
// freelist represents a list of all pages that are available for allocation.
// It also tracks pages that have been freed but are still in use by open transactions.
type freelist struct {
ids []pgid // all free and available free page ids.
pending map[txid][]pgid // mapping of soon-to-be free page ids by tx.
cache map[pgid]bool // fast lookup of all free and pending page ids.
ids []pgid // all free and available free page ids.
pending map[txid][]pgid // mapping of soon-to-be free page ids by tx.
cache map[pgid]struct{} // fast lookup of all free and pending page ids.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A benchmark or some other data demonstrating the memory savings would be useful. I'm not a prior convinced that the memory savings here are noticeable in real world use.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance is one thing, and idiomatic code is another:
a map[key]struct{} is just the idiomatic way to store a set. Golang doesn't have many idioms, they should not require more challenges to be accepted. As long as the code passes the tests of course.

}

// newFreelist returns an empty, initialized freelist.
func newFreelist() *freelist {
return &freelist{
pending: make(map[txid][]pgid),
cache: make(map[pgid]bool),
cache: make(map[pgid]struct{}),
}
}

Expand Down Expand Up @@ -113,13 +113,13 @@ func (f *freelist) free(txid txid, p *page) {
var ids = f.pending[txid]
for id := p.id; id <= p.id+pgid(p.overflow); id++ {
// Verify that page is not already free.
if f.cache[id] {
if _, ok := f.cache[id]; ok {
panic(fmt.Sprintf("page %d already freed", id))
}

// Add to the freelist and cache.
ids = append(ids, id)
f.cache[id] = true
f.cache[id] = struct{}{}
}
f.pending[txid] = ids
}
Expand Down Expand Up @@ -152,7 +152,8 @@ func (f *freelist) rollback(txid txid) {

// freed returns whether a given page is in the free list.
func (f *freelist) freed(pgid pgid) bool {
return f.cache[pgid]
_, ok := f.cache[pgid]
return ok
}

// read initializes the freelist from a freelist page.
Expand All @@ -174,7 +175,9 @@ func (f *freelist) read(p *page) {
copy(f.ids, ids)

// Make sure they're sorted.
sort.Sort(pgids(f.ids))
if !pgids(f.ids).isSorted() {
sort.Sort(pgids(f.ids))
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you want to break here, else you'll sort, then keep testing all following indices.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're absolutely right

}

// Rebuild the page cache.
Expand Down Expand Up @@ -212,18 +215,18 @@ func (f *freelist) reload(p *page) {
f.read(p)

// Build a cache of only pending pages.
pcache := make(map[pgid]bool)
pcache := make(map[pgid]struct{})
for _, pendingIDs := range f.pending {
for _, pendingID := range pendingIDs {
pcache[pendingID] = true
pcache[pendingID] = struct{}{}
}
}

// Check each page in the freelist and build a new available freelist
// with any pages not in the pending lists.
var a []pgid
for _, id := range f.ids {
if !pcache[id] {
if _, ok := pcache[id]; !ok {
a = append(a, id)
}
}
Expand All @@ -236,13 +239,13 @@ func (f *freelist) reload(p *page) {

// reindex rebuilds the free cache based on available and pending free lists.
func (f *freelist) reindex() {
f.cache = make(map[pgid]bool, len(f.ids))
f.cache = make(map[pgid]struct{}, len(f.ids))
for _, id := range f.ids {
f.cache[id] = true
f.cache[id] = struct{}{}
}
for _, pendingIDs := range f.pending {
for _, pendingID := range pendingIDs {
f.cache[pendingID] = true
f.cache[pendingID] = struct{}{}
}
}
}
8 changes: 8 additions & 0 deletions page.go
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,14 @@ type pgids []pgid
func (s pgids) Len() int { return len(s) }
func (s pgids) Swap(i, j int) { s[i], s[j] = s[j], s[i] }
func (s pgids) Less(i, j int) bool { return s[i] < s[j] }
func (s pgids) isSorted() bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Package sort already has an IsSorted function. No need to add this.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sort.IsSorted uses interfaces, so it is slower.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much slower? Slower enough that it is worth adding more code to work around it? Numbers would help a lot here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want numbers, you may measure. If you find I was wrong, I will appologise.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually the onus is on you to prove that your performance claims are justified. Without having provided proofs that your optimizations were guided by a scientific method, it's to be expected that others will ask you to justify your claims.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aybabtme , as I've said two times down in a page, I don't care, will this pool request be merged or not.
And I don't gonna prove obvious things. If it is not obvious for someone that function call through interface is slower, that man should study a bit more instead of asking meaningless questions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@funny-falcon thanks for your feedback, I hadn't yet read the other comments. I now see that this is an old PR, sorry for the confusion. We can disagree on the burden of proof another time maybe. Have a good day.

for i := len(s)-1; i > 0; i-- {
if s[i] < s[i-1] {
return false
}
}
return true
}

// merge returns the sorted union of a and b.
func (a pgids) merge(b pgids) pgids {
Expand Down
2 changes: 1 addition & 1 deletion tx.go
Original file line number Diff line number Diff line change
Expand Up @@ -381,7 +381,7 @@ func (tx *Tx) Check() <-chan error {
func (tx *Tx) check(ch chan error) {
// Check if any pages are double freed.
freed := make(map[pgid]bool)
for _, id := range tx.db.freelist.all() {
for _, id := range tx.db.ensureFreelist().all() {
if freed[id] {
ch <- fmt.Errorf("page %d: already freed", id)
}
Expand Down