refactor(gw): faster dir listing - fetch child sizes in parallel #8888

schomatis · 2022-04-15T00:13:26Z

Broken and untested. Just to gather early feedback.

@lidel This is a variant of your optimization fetching metadata in parallel to avoid the sequential stall. Need to work some more on it (document, test, encapsulate) but wanted to check if (a) this is a feature you're interested in and (b) this is an additional level of complexity you're comfortable with in this part of the code.

core/corehttp/gateway_handler_unixfs_dir.go

lidel

Sounds like a good idea, but let's wait for #8853 (review) to land first (and decide on default threshold there).

To avoid having too much on our plate, I'm marking this for go-ipfs 0.14.

lidel · 2022-04-15T18:55:06Z

core/corehttp/gateway_handler_unixfs_dir.go

+					// FIXME: Check above. The UnixFS files we're iterating
+					//  (because we use the UnixFS API) should always support
+					//  this.


iirc we need to keep this check because we've seen dag-pb dir that links to dag-cbor or dag-json CIDv1

schomatis · 2022-04-18T17:31:55Z

@lidel Your call, but note that this PR wasn't about just another performance optimization but addressing the core of the issue (at least what I understand it to be). We probably didn't groom Steven's original issue description as much as we should and the principal objective is still a bit unclear to me, my takeaway was @alanshaw's concern about the GW timing out:

[users] are getting confused when trying to view [big directories just uploaded] on the gateway - it times out and they assume their upload did not complete successfully.

The core of the timeout is fetching directory entries metadata sequentially and without a per-entry timeout as we still do now in #8853, only with the threshold to avoid the fetching if we have too many entries, but note this timeout will still happen even with a few entries if only one of them is missing:

mkdir big-dir
touch big-dir/file{1..5}
echo "different file" > big-dir/file1
BIG_DIR=$(ipfs add big-dir -r -Q)
echo "http://localhost:8080/ipfs/$BIG_DIR"
# http://localhost:8080/ipfs/QmYf9xX1TrQA3vxoQRkYWQT9ugWsdg7fihMg1w7ZLg2UiN

ipfs ls QmYf9xX1TrQA3vxoQRkYWQT9ugWsdg7fihMg1w7ZLg2UiN
# QmRij737cvMJxHMPM6foyPYfVrcWnZsiVF6yvDZknNiG1j 15 file1       <<<< Remove this entry
# QmbFMke1KXqnYyBBWxB74N4c5SBnJMVAiMNRcGu6x1AwQH 0  file2
# QmbFMke1KXqnYyBBWxB74N4c5SBnJMVAiMNRcGu6x1AwQH 0  file3
# QmbFMke1KXqnYyBBWxB74N4c5SBnJMVAiMNRcGu6x1AwQH 0  file4
# QmbFMke1KXqnYyBBWxB74N4c5SBnJMVAiMNRcGu6x1AwQH 0  file5

# Remove `file1`: QmRij737cvMJxHMPM6foyPYfVrcWnZsiVF6yvDZknNiG1j
rm ~/.ipfs/blocks/PX/CIQDEOWO6TJANTWQUC2DBXZB7K4XA5CPPYMIOGAN2HH3MGWIM3FUPXQ.data
ipfs ls QmRij737cvMJxHMPM6foyPYfVrcWnZsiVF6yvDZknNiG1j # not found

Browsing http://localhost:8080/ipfs/QmYf9xX1TrQA3vxoQRkYWQT9ugWsdg7fihMg1w7ZLg2UiN even with 4 entries will still timeout on the first one not being available.

The objective of this PR (not fully expanded before, sorry) is to attempt to fetch any FastDirIndexThreshold entries metadata in the directory even if there are some slow/missing ones without tying ourselves to only the first FastDirIndexThreshold entries in the list.

lidel · 2022-04-19T11:56:55Z

Quick triage notes:

Two separate problems

In my mind the core issue reported was the unmovable reality of big directories fetching tens of thousands of child blocks: even when all blocks are reachable, and even if we did the parallel thing from this PR, big enough dir would take ages, and could hit timeout on caching/reverse proxy or client. That is being addressed in #8853 by skipping child fetch entirely for directories above some threshold.

This PR is about additional improvement that can land separately:

This PR: parallel child size fetch

The objective of this PR (not fully expanded before, sorry) is to attempt to fetch any FastDirIndexThreshold entries metadata in the directory even if there are some slow/missing ones

@schomatis is the idea to do the listing in "best-effort" fashion: fetch things in parallel and display "?" next to child nodes that errored / timeouted? or just add parallel fetch to parallelize slow fetches? If it is the latter, perhaps we could simplify this code by simpler batching using GetMany from DAGService?

schomatis · 2022-04-19T14:37:28Z

unmovable reality of big directories fetching tens of thousands of child blocks: even when all blocks are reachable,

If this is the original issue then this PR would add more complexity than value here. Closing then.

feat(gw): fetch metadata in parallel up to FastDirIndexThreshold

e9c2ab6

schomatis requested a review from lidel April 15, 2022 00:13

schomatis self-assigned this Apr 15, 2022

Jorropo requested changes Apr 15, 2022

View reviewed changes

core/corehttp/gateway_handler_unixfs_dir.go Show resolved Hide resolved

core/corehttp/gateway_handler_unixfs_dir.go Show resolved Hide resolved

lidel reviewed Apr 15, 2022

View reviewed changes

lidel changed the title ~~feat(gw): fetch metadata in parallel up to FastDirIndexThreshold~~ refactor(gw): faster dir listing - fetch child sizes in parallel Apr 15, 2022

schomatis closed this Apr 19, 2022

schomatis deleted the schomatis/feat/gw/parallel-metadata-fetch branch April 19, 2022 14:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(gw): faster dir listing - fetch child sizes in parallel #8888

refactor(gw): faster dir listing - fetch child sizes in parallel #8888

schomatis commented Apr 15, 2022

lidel left a comment

lidel Apr 15, 2022

schomatis commented Apr 18, 2022

lidel commented Apr 19, 2022 •

edited

Loading

schomatis commented Apr 19, 2022

refactor(gw): faster dir listing - fetch child sizes in parallel #8888

refactor(gw): faster dir listing - fetch child sizes in parallel #8888

Conversation

schomatis commented Apr 15, 2022

lidel left a comment

Choose a reason for hiding this comment

lidel Apr 15, 2022

Choose a reason for hiding this comment

schomatis commented Apr 18, 2022

lidel commented Apr 19, 2022 • edited Loading

Two separate problems

This PR: parallel child size fetch

schomatis commented Apr 19, 2022

lidel commented Apr 19, 2022 •

edited

Loading