Speedup open huge file #410

funny-falcon · 2015-08-16T05:58:05Z

use real set for freelist.cache
map([pgids]struct{}) - is a real set structure, cause struct{} doesn't consume memory
sort freelist.ids only if they aren't sorted
do not read freelist if database opened in readonly mode

aybabtme · 2015-08-16T08:37:49Z

freelist.go

+	for i, v := range f.ids[:len(f.ids)-1] {
+		if f.ids[i+1] < v {
+			sort.Sort(pgids(f.ids))
+		}


I think you want to break here, else you'll sort, then keep testing all following indices.

You're absolutely right

funny-falcon · 2015-08-17T16:21:19Z

fixed error

benbjohnson · 2015-08-18T20:35:27Z

freelist.go

@@ -6,19 +6,21 @@ import (
 	"unsafe"
 )

+type ctrue struct{}


This seems unnecessary. Can you simply use map[pgid]struct{} instead?

ok, will do

benbjohnson · 2015-08-18T20:54:52Z

Besides the ctrue type, it lgtm.

funny-falcon · 2015-08-19T11:11:56Z

I've added commit: reading freelist on write transaction.
If you fill it is ok, then I can replace do not read freelist if database opened readonly with lazily open freelist on write transaction

funny-falcon · 2015-08-19T20:38:57Z

last commit looks to be complex :( I will not not be surprised if you reject it.

rgeronimi · 2016-01-23T13:49:45Z

+1
This, along with no-timeout patch #494, makes cooperative sharing of read-access and readwrite-access to the db between concurrent processes much easier and performant. This is ideal for situations where there is low-concurrence between processes but each process needs super-high-intensity access to the db data when it finally obtains the lock (as opposed to a client-server db architecture which is tuned for high-concurrence low-intensity access to the db data).

josharian · 2016-12-20T22:10:52Z

page.go

@@ -133,6 +133,17 @@ type pgids []pgid
 func (s pgids) Len() int           { return len(s) }
 func (s pgids) Swap(i, j int)      { s[i], s[j] = s[j], s[i] }
 func (s pgids) Less(i, j int) bool { return s[i] < s[j] }
+func (s pgids) isSorted() bool {


Package sort already has an IsSorted function. No need to add this.

sort.IsSorted uses interfaces, so it is slower.

How much slower? Slower enough that it is worth adding more code to work around it? Numbers would help a lot here.

If you want numbers, you may measure. If you find I was wrong, I will appologise.

Usually the onus is on you to prove that your performance claims are justified. Without having provided proofs that your optimizations were guided by a scientific method, it's to be expected that others will ask you to justify your claims.

@aybabtme , as I've said two times down in a page, I don't care, will this pool request be merged or not.
And I don't gonna prove obvious things. If it is not obvious for someone that function call through interface is slower, that man should study a bit more instead of asking meaningless questions.

@funny-falcon thanks for your feedback, I hadn't yet read the other comments. I now see that this is an old PR, sorry for the confusion. We can disagree on the burden of proof another time maybe. Have a good day.

josharian · 2016-12-20T22:12:22Z

freelist.go

@@ -171,7 +172,9 @@ func (f *freelist) read(p *page) {
 	copy(f.ids, ids)

 	// Make sure they're sorted.
-	sort.Sort(pgids(f.ids))
+	if !pgids(f.ids).isSorted() {


Have you benchmarked to confirm that it is faster on sorted data to first ask whether it is sorted? It is not obvious to me. Many sort algorithms run very quickly on already-sorted data, and when f.ids is large, touching all the data twice may be more expensive.

We are talking not about "many sort algorithms", but about sort.Sort.

Yes. How does sort.Sort behave on already sorted data? Do you have benchmarks to help understand how much of a win this is?

Why don't you benchmark instead of asking?
Yes, I did benchmark.

Since you bothered to benchmark, why didn't you include it in the commit? And it's not a question of how much faster it is in isolation, it's a question of how much faster it is in the context of things that matter in boltdb.

Anyway, I'll stop reviewing now.

Look, I was investigating Bolt more than year ago. I need it to open many mostly-read-only files fast. That is why I concentrated on open. All patches were benched, but I have no numbers now. Neither I use Bolt now, cause my director didn't ratify my suggestion of using it.
So now I don't care will this pull request be accepted or not.
If you benchmark it and find it will help you, then it will be your deal to push on merging this pull request.
Though, if you find some error here, I will fix it.

josharian · 2016-12-20T22:14:43Z

freelist.go

-	cache   map[pgid]bool   // fast lookup of all free and pending page ids.
+	ids     []pgid            // all free and available free page ids.
+	pending map[txid][]pgid   // mapping of soon-to-be free page ids by tx.
+	cache   map[pgid]struct{} // fast lookup of all free and pending page ids.


A benchmark or some other data demonstrating the memory savings would be useful. I'm not a prior convinced that the memory savings here are noticeable in real world use.

Performance is one thing, and idiomatic code is another:
a map[key]struct{} is just the idiomatic way to store a set. Golang doesn't have many idioms, they should not require more challenges to be accepted. As long as the code passes the tests of course.

josharian · 2016-12-20T22:18:47Z

db.go

@@ -397,6 +393,19 @@ func (db *DB) close() error {
 	return nil
 }

+func (db *DB) ensureFreelist() *freelist {


Can you explain more how this lazy load helps?

It appears that you're trading

an eager read (that will happen eventually anyway?)

a regular pointer load

straightforward code

for

a lazy read

an atomic pointer load

less obvious code

I'd like to see more detail about the benefits to understand why this is worth doing.

If you open file only for read, you don't need to read freelist at all ever.

The first commit in this series handles that already, no?

Sometimes you want to have single code for both readonly open and readwrite simply cause you don't know will you write or not.

Anyway, I was never confident last commit will be accepted.

josharian · 2016-12-21T06:11:30Z

To be clear, I'm reviewing this (and the other perf-related PRs) because I am not getting the performance I want out of boltdb, and I'm checking to see whether someone else has already fixed my problems. So I want boltdb to be faster...

DavidVorick · 2016-12-21T06:21:16Z

@funny-falcon Any sort of speedup PR should always have reproducible benchmarks demonstrating the advantages of the new code.

funny-falcon · 2016-12-21T06:31:37Z

@DavidVorick , I don't care will this pull request be accepted, or not.
If you want numbers, measure by your self.
Though, if you find error, I will fix it.

sizeof struct{} == 0

Hint: they are always sorted.

funny-falcon · 2016-12-21T16:37:15Z

Netherless, I've rebased patch and simplified couple of details.

aybabtme reviewed Aug 16, 2015
View reviewed changes

funny-falcon force-pushed the speedup_open_huge_file branch 2 times, most recently from 6436adc to 278ef9a Compare August 17, 2015 16:18

funny-falcon force-pushed the speedup_open_huge_file branch from 278ef9a to 2a05b06 Compare August 17, 2015 16:21

benbjohnson reviewed Aug 18, 2015
View reviewed changes

funny-falcon force-pushed the speedup_open_huge_file branch from 2a05b06 to 87b6edd Compare August 19, 2015 09:29

funny-falcon force-pushed the speedup_open_huge_file branch 5 times, most recently from 03fb180 to 5f5b4b1 Compare August 19, 2015 20:37

funny-falcon force-pushed the speedup_open_huge_file branch from 5f5b4b1 to ccd0542 Compare August 20, 2015 03:08

josharian reviewed Dec 20, 2016

View reviewed changes

funny-falcon added 3 commits December 21, 2016 18:29

do not read freelist if database opened readonly

82d5990

use map to struct{} cause it uses less memory

afa9310

sizeof struct{} == 0

sort pgids on file open only if they aren't sorted

7656291

Hint: they are always sorted.

funny-falcon force-pushed the speedup_open_huge_file branch from ccd0542 to e40ad77 Compare December 21, 2016 15:56

lazily read freelist on write transaction and check

ce500b5

funny-falcon force-pushed the speedup_open_huge_file branch from e40ad77 to ce500b5 Compare December 21, 2016 16:36

heyitsanthony mentioned this pull request Aug 11, 2017

do not read freelist if database opened readonly etcd-io/bbolt#19

Merged

benbjohnson closed this Apr 27, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speedup open huge file #410

Speedup open huge file #410

funny-falcon commented Aug 16, 2015

aybabtme Aug 16, 2015

funny-falcon Aug 16, 2015

funny-falcon commented Aug 17, 2015

benbjohnson Aug 18, 2015

funny-falcon Aug 19, 2015

benbjohnson commented Aug 18, 2015

funny-falcon commented Aug 19, 2015

funny-falcon commented Aug 19, 2015

rgeronimi commented Jan 23, 2016

josharian Dec 20, 2016

funny-falcon Dec 21, 2016

josharian Dec 21, 2016

funny-falcon Dec 21, 2016

aybabtme Dec 21, 2016

funny-falcon Dec 21, 2016

aybabtme Dec 21, 2016

josharian Dec 20, 2016

funny-falcon Dec 21, 2016

josharian Dec 21, 2016

funny-falcon Dec 21, 2016

josharian Dec 21, 2016

funny-falcon Dec 21, 2016

josharian Dec 20, 2016

rgeronimi Dec 21, 2016

josharian Dec 20, 2016

funny-falcon Dec 21, 2016

josharian Dec 21, 2016

funny-falcon Dec 21, 2016

josharian commented Dec 21, 2016

DavidVorick commented Dec 21, 2016

funny-falcon commented Dec 21, 2016

funny-falcon commented Dec 21, 2016

Speedup open huge file #410

Speedup open huge file #410

Conversation

funny-falcon commented Aug 16, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

funny-falcon commented Aug 17, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benbjohnson commented Aug 18, 2015

funny-falcon commented Aug 19, 2015

funny-falcon commented Aug 19, 2015

rgeronimi commented Jan 23, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

josharian commented Dec 21, 2016

DavidVorick commented Dec 21, 2016

funny-falcon commented Dec 21, 2016

funny-falcon commented Dec 21, 2016