-
Notifications
You must be signed in to change notification settings - Fork 110
[WIP] rushed - shed extension #1006
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I'll still have to revisit DB.listen()
. In textbook examples I did not see WaitGroup
used this way... so I want to spend some more time on it.
) | ||
|
||
// batch wraps leveldb batch extending it with a waitgroup and a done channel | ||
type Batch struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't expose newBatch()
. Do we need to expose Batch?
(or vice-versa, if we expose Batch
maybe we should expose newBatch
, too)
// DB extends shed DB with batch execution, garbage collection and iteration support with subscriptions | ||
type DB struct { | ||
*shed.DB // underlying shed.DB | ||
update func(*Batch, Mode, *shed.IndexItem) error // mode-dependent update method |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on update
/access
I would be happy to see more comments :) At this point, as just getting familiar with the code, it's totally not clear what their purpose is
} | ||
|
||
// New constructs a new DB | ||
func New(sdb *shed.DB, update func(*Batch, Mode, *shed.IndexItem) error, access func(Mode, *shed.IndexItem) error) *DB { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe I would extract update and access to a private Type => it would keep the method signature clearer and their definition is already duplicated (see DB struct)
} | ||
// call the update with the access mode | ||
err := db.update(b, mode, item) | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I the only one who favors the "simple statement" format if
?
I think it's clearer that the variable only needed for a limited scope and does not pollute the scope.
(For me it's also easier to understand as I immediately know the scope I have to consider is just a few lines.)
=> if err:= foo(); err != nil {
<3
swarm/shed/rushed/db.go
Outdated
b := newBatch() // current batch | ||
var done chan struct{} // | ||
wasdone := make(chan struct{}) | ||
close(wasdone) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: I did not get it why the hack we close this channel immediately. 1st though was we missed a defer
.
} | ||
|
||
func benchmarkPut(b *testing.B, n int, in time.Duration, db putter) { | ||
for i := 0; i < b.N; i++ { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- before this for loop I would generate the random chunks => so we don't benchmark the generate function, too
- actually I would try to start only X number of Goroutines, where X should correlate to the number of CPUs (as we cannot run more in parallel anyway)
- each Goroutine would get it's own slice and would try to iterate that as fast with put as possible
swarm/shed/rushed/db_test.go
Outdated
} | ||
} | ||
|
||
func benchmarkPut(b *testing.B, n int, in time.Duration, db putter) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in
parameter not used
swarm/shed/rushed/db_test.go
Outdated
return t.index.Put(*(newItemFromChunk(ch))) | ||
} | ||
func BenchmarkPut(b *testing.B) { | ||
n := 128 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe: n => numberOfChunks ?
It's not clear from here why 128 or why n. Those two together makes it hard to resolve the meaning.
swarm/shed/rushed/db_test.go
Outdated
n := 128 | ||
for j := 0; j < 5; j++ { | ||
n *= 2 | ||
in := time.Nanosecond |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of in
?
}, | ||
EncodeValue: func(fields shed.IndexItem) (value []byte, err error) { | ||
b := make([]byte, 16) | ||
binary.BigEndian.PutUint64(b[:8], uint64(fields.StoreTimestamp)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why we don't just implement the BinaryMarshaler
interface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't use Go interfaces enough. One of the key features of Go are interfaces and they make the code really flexible.
That goes as far as I think EncodeKey and DecodeKey should be binary marshal and unmarshal on an IndexKey and same goes for IndexValue. The IndexIteam could be a composition of the key and value types.
I believe that could result in a more idiomatic Go code and could make the code look immediately familiar to any Gopher.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is pore related how shed
is designed, and I would like to see your thoughts on #993. There is a go example test where the usage is demonstrated.
The decision not to use interfaces for encode/decode functions is to avoid performance penalties when using them by introducing abstractions. Interfaces are a bit costly and allocate to heap. Encode/decode functions will be called all the time in swarm, as the storage is lower layer. This is the reasoning why IndexItem is implemented in most simple value semantics, instead interfaces. I tried to document this in comments in the PR #993, and would like to hear other opinions. If needed, I can do another demonstration, maybe in more detail then I did last week where I talked about this decisions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@janos
The code this complex I have to sit down and read, that's just how my brain works. So I'll go trough shed, too. Thanks for the demonstration offer though. :)
If you have some good articles in mind, our your own measurements about these performance benefits I would be happy to see/read. If not, I'll google. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The most simple demonstration how interfaces are complied, escaped or inlined is here golang/go#20116 (comment). But it just boils down to how interfaces in implemented in go and how compiler decides what to leave to runtime.
If you are interested in implementing encode/decode functions with interfaces, I would be glad to see benchmarks and how the example will look like with them as we would need additional types in example/usage instead anonymous functions. Performance was the main question after the first demonstration and for the right reason, as this part of the code is the hot path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe we can just leave the performance considerations aside as they may be neglectable compared to database operations. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this will be a good start for me: golang/go#20116 :)
GitHub just did not show your comments without reload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I guess as to compiler improves some of these penalties will disappear:
golang/go#19361
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe we can just leave the performance considerations aside as they may be neglectable compared to database operations. :)
I guess that's also possible. Perf: never guess. :)
What I was thinking about: Are we optimizing prematurely?
Would it be better 1st write the code with interfaces then optimize?
Considerations:
- Maybe the code ends up more ideal in look and we have to tune only a few places.
- What about optimizing for change? (XP core value) Change is also a cost and imminent.
- If we first go with interfaces, then what would be the blast radius (cost) of replacing them? Given that we try to expose/export only as much as required.
I would vote for interfaces, but I have limited experience here. So I have to dig deeper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, this is the discussion that I was looking for. I would be glad to hear other opinions too. If we are up to use interfaces, then struct fields also can help in simplifying usage by applying declarative approach instead of the current functional one.
obsolete as unnecessary based on #1014 (comment) |
experimental PR - untested - to be discussed
adds to the shed
adds localstore package with tentative db store using rushed