-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(experiment): add a parallel HAMT traversal function #103
base: master
Are you sure you want to change the base?
feat(experiment): add a parallel HAMT traversal function #103
Conversation
|
||
// parallelShardWalk walks the HAMT concurrently processing callbacks upon encountering leaf nodes | ||
func parallelShardWalk(ctx context.Context, root *Node, processShardValues func(k string, val *cbg.Deferred) error) error { | ||
const concurrency = 16 // TODO: should be an option, also this number was basically made up with a bit of empirical testing/usage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be easy enough to make this an arg to ForEachParallel
, yeah? Doing so would match the behavior of ParallelDiff.
We'd make use of this logic in lily when loading things such as actor states (as you are already doing in filexp), miner sectors, miner precommits, and any other information stored in HAMTs that we need to load all at once. Also interested in having similar logic implemented for go-amt-ipld, which could be used for loading miner sectors from their respective partitions. |
seems fine to me, if it it gets cleaned up and some basic tests added https://github.com/ipfs/go-unixfs/blob/323bb63cafa93c5bdd10170f35b96327760d7c1a/hamt/hamt.go#L478 is kind of like https://github.com/ipfs/go-merkledag/blob/5c067b1958df79db6a28b57d48f16f94de1f7fd8/merkledag.go#L455 in terms of interface, maybe some commonalities to be extracted from there if we're going for consistency |
Great! Any thoughts on what to do about a multiple block requesting interface here? My main concern is that it seems wasteful/excessive to do one goroutine (or channel) per block and to allow some sort of grouping. It'll probably have to be a bit custom to deal with the IpldStore semantics expected here but overall the 3 mechanisms I've seen so far are:
If there are going to be custom IpldStore semantics should I just add an interface declaration here for now, or upstream to go-ipld-cbor and depend on that? If we're ok with this breaking a bit/being less stable than the rest of the package I can just choose one to experiment with, define the interface here, and we can doing breaking changes as needed. WDYT?
Yep, the go-merkledag code was copy-paste-modified into go-unixfs and from go-unixfs to here. The go-unixfs case is a little closer because it also has cached deserialized data but yeah same idea. |
This is an implementation of a parallel traversal of the HAMT. It's mostly copied from https://github.com/ipfs/go-unixfs/blob/323bb63cafa93c5bdd10170f35b96327760d7c1a/hamt/hamt.go#L402.
There are a few pieces of this that are obviously wrong or non-optimal (e.g. the GetMany interface, lack of options for concurrency, not caching pointers, etc.) however this seems like a good starting point for a discussion on if this PR is even wanted/acceptable for this repo.
While I don't expect lotus will make use of this function because blockchains tend to avoid parallel things, it could be useful in processing. I made some use of it recently and it was a big performance help.
cc @rvagg @frrist @ribasushi