Skip to content
This repository has been archived by the owner on Aug 2, 2021. It is now read-only.

[WIP] rushed - shed extension #1006

Closed
wants to merge 3 commits into from
Closed

[WIP] rushed - shed extension #1006

wants to merge 3 commits into from

Conversation

zelig
Copy link
Member

@zelig zelig commented Nov 20, 2018

experimental PR - untested - to be discussed

adds to the shed

  • multi-mode accessors (with update and access manipulating alternative fields/indexes)
  • continuous optimised batch writes (as is, adds zero performance gain)
  • continuous iterator support and subscriptions (untested)

adds localstore package with tentative db store using rushed

@zelig zelig added this to the 2018 Christmas Edition milestone Nov 20, 2018
@zelig zelig self-assigned this Nov 20, 2018
@zelig zelig requested review from janos and frncmx November 20, 2018 09:11
Copy link
Contributor

@frncmx frncmx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I'll still have to revisit DB.listen(). In textbook examples I did not see WaitGroup used this way... so I want to spend some more time on it.

)

// batch wraps leveldb batch extending it with a waitgroup and a done channel
type Batch struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't expose newBatch(). Do we need to expose Batch?
(or vice-versa, if we expose Batch maybe we should expose newBatch, too)

// DB extends shed DB with batch execution, garbage collection and iteration support with subscriptions
type DB struct {
*shed.DB // underlying shed.DB
update func(*Batch, Mode, *shed.IndexItem) error // mode-dependent update method
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on update/access I would be happy to see more comments :) At this point, as just getting familiar with the code, it's totally not clear what their purpose is

}

// New constructs a new DB
func New(sdb *shed.DB, update func(*Batch, Mode, *shed.IndexItem) error, access func(Mode, *shed.IndexItem) error) *DB {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe I would extract update and access to a private Type => it would keep the method signature clearer and their definition is already duplicated (see DB struct)

}
// call the update with the access mode
err := db.update(b, mode, item)
if err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I the only one who favors the "simple statement" format if?
I think it's clearer that the variable only needed for a limited scope and does not pollute the scope.
(For me it's also easier to understand as I immediately know the scope I have to consider is just a few lines.)
=> if err:= foo(); err != nil { <3

b := newBatch() // current batch
var done chan struct{} //
wasdone := make(chan struct{})
close(wasdone)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: I did not get it why the hack we close this channel immediately. 1st though was we missed a defer.

}

func benchmarkPut(b *testing.B, n int, in time.Duration, db putter) {
for i := 0; i < b.N; i++ {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • before this for loop I would generate the random chunks => so we don't benchmark the generate function, too
  • actually I would try to start only X number of Goroutines, where X should correlate to the number of CPUs (as we cannot run more in parallel anyway)
  • each Goroutine would get it's own slice and would try to iterate that as fast with put as possible

}
}

func benchmarkPut(b *testing.B, n int, in time.Duration, db putter) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in parameter not used

return t.index.Put(*(newItemFromChunk(ch)))
}
func BenchmarkPut(b *testing.B) {
n := 128
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe: n => numberOfChunks ?
It's not clear from here why 128 or why n. Those two together makes it hard to resolve the meaning.

n := 128
for j := 0; j < 5; j++ {
n *= 2
in := time.Nanosecond
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of in?

},
EncodeValue: func(fields shed.IndexItem) (value []byte, err error) {
b := make([]byte, 16)
binary.BigEndian.PutUint64(b[:8], uint64(fields.StoreTimestamp))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why we don't just implement the BinaryMarshaler interface?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't use Go interfaces enough. One of the key features of Go are interfaces and they make the code really flexible.

That goes as far as I think EncodeKey and DecodeKey should be binary marshal and unmarshal on an IndexKey and same goes for IndexValue. The IndexIteam could be a composition of the key and value types.

I believe that could result in a more idiomatic Go code and could make the code look immediately familiar to any Gopher.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is pore related how shed is designed, and I would like to see your thoughts on #993. There is a go example test where the usage is demonstrated.

The decision not to use interfaces for encode/decode functions is to avoid performance penalties when using them by introducing abstractions. Interfaces are a bit costly and allocate to heap. Encode/decode functions will be called all the time in swarm, as the storage is lower layer. This is the reasoning why IndexItem is implemented in most simple value semantics, instead interfaces. I tried to document this in comments in the PR #993, and would like to hear other opinions. If needed, I can do another demonstration, maybe in more detail then I did last week where I talked about this decisions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@janos
The code this complex I have to sit down and read, that's just how my brain works. So I'll go trough shed, too. Thanks for the demonstration offer though. :)

If you have some good articles in mind, our your own measurements about these performance benefits I would be happy to see/read. If not, I'll google. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most simple demonstration how interfaces are complied, escaped or inlined is here golang/go#20116 (comment). But it just boils down to how interfaces in implemented in go and how compiler decides what to leave to runtime.

If you are interested in implementing encode/decode functions with interfaces, I would be glad to see benchmarks and how the example will look like with them as we would need additional types in example/usage instead anonymous functions. Performance was the main question after the first demonstration and for the right reason, as this part of the code is the hot path.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe we can just leave the performance considerations aside as they may be neglectable compared to database operations. :)

Copy link
Contributor

@frncmx frncmx Nov 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this will be a good start for me: golang/go#20116 :)
GitHub just did not show your comments without reload.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I guess as to compiler improves some of these penalties will disappear:
golang/go#19361

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe we can just leave the performance considerations aside as they may be neglectable compared to database operations. :)

I guess that's also possible. Perf: never guess. :)

What I was thinking about: Are we optimizing prematurely?

Would it be better 1st write the code with interfaces then optimize?
Considerations:

  • Maybe the code ends up more ideal in look and we have to tune only a few places.
  • What about optimizing for change? (XP core value) Change is also a cost and imminent.
  • If we first go with interfaces, then what would be the blast radius (cost) of replacing them? Given that we try to expose/export only as much as required.

I would vote for interfaces, but I have limited experience here. So I have to dig deeper.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, this is the discussion that I was looking for. I would be glad to hear other opinions too. If we are up to use interfaces, then struct fields also can help in simplifying usage by applying declarative approach instead of the current functional one.

swarm/shed/rushed/subscribe.go Show resolved Hide resolved
@zelig
Copy link
Member Author

zelig commented Nov 26, 2018

obsolete as unnecessary based on #1014 (comment)

@zelig zelig closed this Nov 26, 2018
@zelig zelig mentioned this pull request Nov 29, 2018
26 tasks
@frncmx frncmx deleted the rushed branch December 5, 2018 12:17
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants