Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(gw): faster dir listing - fetch child sizes in parallel #8888

Conversation

schomatis
Copy link
Contributor

Broken and untested. Just to gather early feedback.

@lidel This is a variant of your optimization fetching metadata in parallel to avoid the sequential stall. Need to work some more on it (document, test, encapsulate) but wanted to check if (a) this is a feature you're interested in and (b) this is an additional level of complexity you're comfortable with in this part of the code.

@schomatis schomatis requested a review from lidel April 15, 2022 00:13
@schomatis schomatis self-assigned this Apr 15, 2022
Copy link
Member

@lidel lidel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a good idea, but let's wait for #8853 (review) to land first (and decide on default threshold there).

To avoid having too much on our plate, I'm marking this for go-ipfs 0.14.

Comment on lines +194 to +196
// FIXME: Check above. The UnixFS files we're iterating
// (because we use the UnixFS API) should always support
// this.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc we need to keep this check because we've seen dag-pb dir that links to dag-cbor or dag-json CIDv1

@lidel lidel changed the title feat(gw): fetch metadata in parallel up to FastDirIndexThreshold refactor(gw): faster dir listing - fetch child sizes in parallel Apr 15, 2022
@schomatis
Copy link
Contributor Author

@lidel Your call, but note that this PR wasn't about just another performance optimization but addressing the core of the issue (at least what I understand it to be). We probably didn't groom Steven's original issue description as much as we should and the principal objective is still a bit unclear to me, my takeaway was @alanshaw's concern about the GW timing out:

[users] are getting confused when trying to view [big directories just uploaded] on the gateway - it times out and they assume their upload did not complete successfully.

The core of the timeout is fetching directory entries metadata sequentially and without a per-entry timeout as we still do now in #8853, only with the threshold to avoid the fetching if we have too many entries, but note this timeout will still happen even with a few entries if only one of them is missing:

mkdir big-dir
touch big-dir/file{1..5}
echo "different file" > big-dir/file1
BIG_DIR=$(ipfs add big-dir -r -Q)
echo "http://localhost:8080/ipfs/$BIG_DIR"
# http://localhost:8080/ipfs/QmYf9xX1TrQA3vxoQRkYWQT9ugWsdg7fihMg1w7ZLg2UiN

ipfs ls QmYf9xX1TrQA3vxoQRkYWQT9ugWsdg7fihMg1w7ZLg2UiN
# QmRij737cvMJxHMPM6foyPYfVrcWnZsiVF6yvDZknNiG1j 15 file1       <<<< Remove this entry
# QmbFMke1KXqnYyBBWxB74N4c5SBnJMVAiMNRcGu6x1AwQH 0  file2
# QmbFMke1KXqnYyBBWxB74N4c5SBnJMVAiMNRcGu6x1AwQH 0  file3
# QmbFMke1KXqnYyBBWxB74N4c5SBnJMVAiMNRcGu6x1AwQH 0  file4
# QmbFMke1KXqnYyBBWxB74N4c5SBnJMVAiMNRcGu6x1AwQH 0  file5

# Remove `file1`: QmRij737cvMJxHMPM6foyPYfVrcWnZsiVF6yvDZknNiG1j
rm ~/.ipfs/blocks/PX/CIQDEOWO6TJANTWQUC2DBXZB7K4XA5CPPYMIOGAN2HH3MGWIM3FUPXQ.data
ipfs ls QmRij737cvMJxHMPM6foyPYfVrcWnZsiVF6yvDZknNiG1j # not found

Browsing http://localhost:8080/ipfs/QmYf9xX1TrQA3vxoQRkYWQT9ugWsdg7fihMg1w7ZLg2UiN even with 4 entries will still timeout on the first one not being available.

The objective of this PR (not fully expanded before, sorry) is to attempt to fetch any FastDirIndexThreshold entries metadata in the directory even if there are some slow/missing ones without tying ourselves to only the first FastDirIndexThreshold entries in the list.

@lidel
Copy link
Member

lidel commented Apr 19, 2022

Quick triage notes:

Two separate problems

In my mind the core issue reported was the unmovable reality of big directories fetching tens of thousands of child blocks: even when all blocks are reachable, and even if we did the parallel thing from this PR, big enough dir would take ages, and could hit timeout on caching/reverse proxy or client. That is being addressed in #8853 by skipping child fetch entirely for directories above some threshold.

This PR is about additional improvement that can land separately:

This PR: parallel child size fetch

The objective of this PR (not fully expanded before, sorry) is to attempt to fetch any FastDirIndexThreshold entries metadata in the directory even if there are some slow/missing ones

@schomatis is the idea to do the listing in "best-effort" fashion: fetch things in parallel and display "?" next to child nodes that errored / timeouted? or just add parallel fetch to parallelize slow fetches? If it is the latter, perhaps we could simplify this code by simpler batching using GetMany from DAGService?

@schomatis
Copy link
Contributor Author

unmovable reality of big directories fetching tens of thousands of child blocks: even when all blocks are reachable,

If this is the original issue then this PR would add more complexity than value here. Closing then.

@schomatis schomatis closed this Apr 19, 2022
@schomatis schomatis deleted the schomatis/feat/gw/parallel-metadata-fetch branch April 19, 2022 14:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants