Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: more. #279

Merged
merged 14 commits into from
Nov 11, 2021
Merged

storage: more. #279

merged 14 commits into from
Nov 11, 2021

Conversation

warpfork
Copy link
Collaborator

This PR contains a lot.

  • many more storage implementations.
    • some of them being additional adapters to older APIs, increasing our coverage of what can be bridged easily;
    • one of them, a new filesystem-backed storage implementation...
      • ... which is gunning for exact on-disk compatibility with the layout used by the older go-ds-flatfs, while beating the pants of it on benchmarks (in part, thanks to, yes, the new interfaces themselves, I reckon).
  • even more docs, now focusing more and more on what you should use as an end user (and how to consider migrating).
    • more of this to come (and not just for storage APIs).
  • benchmark functions, good on apply on storage interface implementation.
  • actual benchmark suites, fully wired up to a selection of implementations, runnable -- exercising both old and new storage implementations in comparable ways.

At the time of first push of this PR, not all of this is ready for review, but if you want to give it an early look, feel free.

There's still plenty of mileage to go on the benchmarks. They're primitive. However, they do produce interesting results already; and just having them is a major step in a useful direction. I might be adding more benchmarks to this PR before landing it. Or kicking that off to subsequent PRs. Not sure what's going to be the most productive way to roll forward yet.

My next step with this will be using these interfaces in a downstream project to make sure they cover all the bases, and make it possible for projects currently stuck using older generations of IPLD APIs to move entirely forward to go-ipld-prime with no additional kerfuffle beyond these new adapters introduced here. (I have a particular project in mind which needed to use both blockstore and blockservice, and I'd like to see that these adapters are enough to both let that project continue to use those, and use, say, selectors (which are only possible with ipld-prime) at the same time, with minimal setup.) If we can do that, it'll be pretty exciting.

@warpfork warpfork force-pushed the more-storage-implementations-and-benchmarks branch from c8f9b0f to 04a511b Compare October 25, 2021 19:25
storage/bsadapter/README.md Outdated Show resolved Hide resolved
storage/bsrvadapter/README.md Show resolved Hide resolved
@rvagg
Copy link
Member

rvagg commented Oct 26, 2021

which is gunning for exact on-disk compatibility with the layout used by the older go-ds-flatfs

this concerns me a little, because you're going to have people who will want to use existing ds-flatfs stores with this and vice versa, so compatibility should probably involve testing the edges of this, or importing tests from over there, or doing some funky cross-impl testing

… benchmark helper functions; and actual benchmark tables, runnable, which compare several storage implementations.
@warpfork warpfork force-pushed the more-storage-implementations-and-benchmarks branch from 04a511b to 67c65ae Compare October 30, 2021 07:26
Mostly deredundantizing, but made a few improvements as well.
…default configuration of go-ipfs.

Include links to where the relevant ipfs code can be found.
(It's not all in very obvious places, and not all in the go-ipfs repo.)

Extracted some of the b32 encoder to a package function...
but, although this does avoid some repeated initialization,
it was all CPU stuff.  Benchmarks indicate that it made no real
difference.  So it's mostly a stylistic change.
@warpfork
Copy link
Collaborator Author

warpfork commented Nov 8, 2021

Holistically agree that some cross-impl testing for ds-flatfs would be nice to have, but I'll might tag that as... "nice to have", for now, and punt to a future PR. It's at least very very close. If there's a detail to tune, it's in the escaping func or in the sharding func, both of which are user-definable callbacks -- and I don't mind changing the defaults either -- so we're not at risk of making any permanent mistakes by merging the current diff.

(Also, optimistically: I don't expect the very small amount of relevant code will be subject to much churn over time, so testing it and eyeballing the alignment once and then enshrining it with unit tests might be good enough.)

(Also also: while I want to advertise the ability to match ds-flatfs's strategy, I don't necessarily think "same as ds-flatfs's defaults" sharding is the correct default that we should necessarily keep here. I should save more comments on that until we publish more definitive benchmarks. But some preliminary data seems to indicate there might be considerable performance improvements available by as few as 2^20 entries by using a very minutely different sharding func.)

One level with fmt.Sprintf is enough for what we need,
and is also easier to follow.
... which is actually important, not just stylistic: this also makes
the benchmarks much more correct, because we were previously
doing the cleanup work on the clock!  Now the cleanup is done by
the test system, off the clock.

This fixes results to actually be, um, correct.  *facepalm*.
The storage APIs now require this and caution of potential resource
leakage otherwise.
@warpfork
Copy link
Collaborator Author

performance notes

Some observations about performance, since we do now have several storage implementations, one new one (!), and benchmarks across them.

N.b.: Performance wasn't my initial main goal in doing any of this work, but now that we're here, and I happen to have measured it a bit, I might as well post what information I've got.

(tl;dr: things definitely don't get worse if you use the new, simplified code paths; and they probably get noticeably better. Hooray!)

data

Without further ado, data:

memstore/scale=4096                                   10000      360.6 ns/op     259 B/op      3 allocs/op
memstore/scale=65536                                  10000      167.5 ns/op      24 B/op      3 allocs/op
memstore/scale=1048576                                10000      224.0 ns/op      24 B/op      3 allocs/op
dsadapter-flatfs-base32-defaultshard/scale=4096       10000    37564 ns/op      1602 B/op     37 allocs/op
dsadapter-flatfs-base32-defaultshard/scale=65536      10000    40059 ns/op      1601 B/op     37 allocs/op
dsadapter-flatfs-base32-defaultshard/scale=1048576    10000    39541 ns/op      1600 B/op     37 allocs/op
fsstore-base32-defaultshard/scale=4096                10000    27055 ns/op       752 B/op     22 allocs/op
fsstore-base32-defaultshard/scale=65536               10000    32405 ns/op       752 B/op     22 allocs/op
fsstore-base32-defaultshard/scale=1048576             10000    31921 ns/op       752 B/op     22 allocs/op

Note that if a system has a configurable "sync" mode (e.g. flatfs does), it's off. (Yes, I'm aware that that makes things slower. This benchmark is everyone running at their fastest.)

some interpretation

The wall clock time is significantly improved in the newly introduced fsstore package (which fits the new interfaces): the new time is between about 72% and 82% of the old times.

at scale=4096:    27055/37564 = .72023
at scale=65536:   32405/40059 = .80893
at scale=1048576: 32405/39541 = .81952

Though I haven't yet run the benchmarks at yet higher scale, I expect this trend to continue.

The wall clock time improvements come from two sources:

  1. The system performs significantly fewer syscalls. This is due mostly to ensuring to act first and then check errors (rather than check preconditions, then try the op, then check errors anyway).
  2. The system has considerably fewer memory alloc events, due mostly to the design of the new storage APIs being friendly to allocation avoidance. (Also, from ditching some of go-datastore.Key's behavior which had performance baggage.)

(The former is just from making a fresh implementation. The latter is a result that's entangled with and dependent on the new APIs.)

I haven't made any attempt to measure which one dominates under which scenarios.
Presumably the syscalls are the most significant, but both are sources of wins.

The reduction in memory alloc events is a nice bonus, even if it doesn't dominate end to end wallclock times,
because that's still that much less GC pressure being produced. That leaves CPU time for... anything else more useful.

environment

A few remarks about my system: it's a laptop toodling along at about 3Ghz (I doubt this matters much),
the drive I did these quick benchmarks on is an NVMe (so, okay, that's... rather fast and probably not what a bulk storage host is gonna be using),
and the filesystem the work was done on is EXT4 on linux kernel 5.11.0.

Some of these performance numbers will surely vary (probably significantly) on other hardware, and may vary with other filesystems.

Nonetheless, I'm expecting the general trend of performance improvements to be pretty portable
(decreasing the sheer syscall count, and thus the number of times we do a context switch to kernel space and back again, may vary in magnitude of impact in context... but is generally a noticeable win under any circumstances).

sharding variations

I also did some quick tests with even more sharding variations:

... but I'll bury it in a fold. These didn't seem particularly informative. I think they'd have to be tried at rather greater scale before becoming interesting.

(P.S. Outliers are also visible here. b.N=10k is clearly not enough.)

more data, with sharding variations
memstore/scale=4096                                   10000      360.6 ns/op     259 B/op      3 allocs/op
memstore/scale=65536                                  10000      167.5 ns/op      24 B/op      3 allocs/op
memstore/scale=1048576                                10000      224.0 ns/op      24 B/op      3 allocs/op
dsadapter-flatfs-base32-defaultshard/scale=4096       10000    37564 ns/op      1602 B/op     37 allocs/op
dsadapter-flatfs-base32-defaultshard/scale=65536      10000    40059 ns/op      1601 B/op     37 allocs/op
dsadapter-flatfs-base32-defaultshard/scale=1048576    10000    39541 ns/op      1600 B/op     37 allocs/op
dsadapter-flatfs-base32-shardntl3/scale=4096          10000    37183 ns/op      1610 B/op     37 allocs/op
dsadapter-flatfs-base32-shardntl3/scale=65536         10000    39545 ns/op      1609 B/op     37 allocs/op
dsadapter-flatfs-base32-shardntl3/scale=1048576       10000    52832 ns/op      1608 B/op     37 allocs/op
fsstore-base32-defaultshard/scale=4096                10000    27055 ns/op       752 B/op     22 allocs/op
fsstore-base32-defaultshard/scale=65536               10000    32405 ns/op       752 B/op     22 allocs/op
fsstore-base32-defaultshard/scale=1048576             10000    31921 ns/op       752 B/op     22 allocs/op
fsstore-base32-shardr13/scale=4096                    10000    25409 ns/op       774 B/op     22 allocs/op
fsstore-base32-shardr13/scale=65536                   10000    26496 ns/op       752 B/op     22 allocs/op
fsstore-base32-shardr13/scale=1048576                 10000    32547 ns/op       752 B/op     22 allocs/op
fsstore-base32-shardr122/scale=4096                   10000    26377 ns/op       790 B/op     22 allocs/op
fsstore-base32-shardr122/scale=65536                  10000    26882 ns/op       790 B/op     22 allocs/op
fsstore-base32-shardr122/scale=1048576                10000    30016 ns/op       752 B/op     22 allocs/op
fsstore-base32-shardr133/scale=4096                   10000    51031 ns/op       806 B/op     22 allocs/op
fsstore-base32-shardr133/scale=65536                  10000    26299 ns/op       806 B/op     22 allocs/op
fsstore-base32-shardr133/scale=1048576                10000    30659 ns/op       806 B/op     22 allocs/op

future work

In the benchmarking itself: Plenty of future work. These benchmarks should be run at much higher scale and with many more variations.

Here are some things I haven't tested yet and am not commenting on, but would love to see tested:

  • simply larger b.N, or collections from multiple runs, to iron out noise
  • larger datasets in entry count
  • datasets with larger and varied value sizes
  • more randomized keys, and performance under various collision levels
  • different kinds of hardware

These will probably end up "PRs welcome" territory. If someone wants to take it on, please, be welcome. (I might circle back to it, too -- but for today, and for the purposes of merging what we've got here: I'm happy with "not worse", and we do appear to be beating that bar.)

In the performance of code examined here: also probably future work is possible. Likely even of the "low-hanging fruit" kind! (After getting these results of "already better", I've done zero further chasing. That means we're still looking at results of my very first pass on some of this stuff. It's thus very likely improvements are possible.)

@warpfork
Copy link
Collaborator Author

I'm going to motion to merge this shortly -- it seems to have survived its burn-in period so far. I'd like to include it in the next release we tag, which I believe will be shortly (it's due time for a lot of reasons; I'm going to do the changelog backfill tonight and I think it will already be quite sizable).

(This is not the end! We may still also continue refining it, even after including it in a tagged release. Remember, we're pulling this off with no breaking changes so far. Existing code is mostly written to use linking.Storage*Opener function types, and those still work too -- so just because a tag goes out with these APIs won't mean tons of code is forced to switch to them immediately, and that in turn means these APIs won't be instantly frozen by tons of code immediately depending on them, either.)

…ng parameters that are in fact "default" when comparing these.
…an io.ReadCloser instead of just io.Reader.

This seems more likely to encourge good code that doesn't leak
resources than just admonishing people to check via a comment.

On the downside, sometimes this requires implementations to wrap stuff
in no-op closers.  The trade is probably worth it.

The code in LinkSystem which uses these things still doesn't change.
It's still using the `linking.Block*Opener` functions, which still
return just io.Reader.  And we're considerably less free to change
those, because there are already quite a few implementations of those
types in the wild, and we don't want to break them.
(I think our policy is roughly "support the last two significant go
versions", and by that, we might technically be allowed to use this,
but, eh; there's minimal cost to being conservative today.)
@warpfork
Copy link
Collaborator Author

Merging without rebase -- some of these commit hashes are already referenced by other repos, so it's preferable to preserve them.

@warpfork warpfork merged commit b161ca7 into master Nov 11, 2021
@warpfork warpfork deleted the more-storage-implementations-and-benchmarks branch November 11, 2021 19:44
@aschmahmann aschmahmann mentioned this pull request Dec 1, 2021
80 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants