-
Notifications
You must be signed in to change notification settings - Fork 20
Conversation
// TODO: This only deals with repeated pathing | ||
// Do we want to deal parameters as well since they might mean walking a lot of data twice? Alternatively, this could wait until we can rework the GraphGateway logic to not rely on the BlocksGateway traversal. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling out that this will not handle the case where the data is fully local, the optimization only checks path locality in order to avoid doing the IPLD traversal twice.
We could certainly do it twice though in order to save bandwidth costs. This likely depends on how much data we think should go into the block cache and what's likely to be there.
Certainly we should consider doing this once we're further down the pipeline of changes (i.e. proper incremental verification), but could do it now too if that's of interest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aschmahmann probably too late for me to reason about traversal, do you have an example how "fully local" case would that work for loading sub-article from /ipfs/bafybeiaysi4s6lnjev27ln5icwm6tueaw2vdykrtjkwiphwekaywqhcjze/wiki
?
(fysa I've increased block cache to 16384, and we have still plenty of room – #47 (comment), so if we can save bandwidth by growing this cache, i think its worth exploring)
I think we need to pass params
to fetcher.Fetch
when fetching remainingPathToGet
.
Reason 1: without ?depth=
we will be harcore over-fetching the entire wikipedia
2023-04-05T01:13:01.065+0200 DEBUG caboose caboose@v0.0.0-20230331212405-5b42e9465c12/fetcher.go:75 doing fetch {"from": "64.44.166.5", "of": "/ipfs/bafybeiaysi4s6lnjev27ln5icwm6tueaw2vdykrtjkwiphwekaywqhcjze/wiki", "mime": "application/vnd.ipld.car", "requestId": "efb63fd6-12cd-4417-99bb-fd9106d0aa96"}
Reason 2: difficult to debug, as we are missing info if the failed request was ?format=car
or raw
(we can fix this by adding mime
to error logs in caboose, but i think if we have params, we should also have explicit format
one):
2023-04-05T01:13:03.040+0200 DEBUG caboose caboose@v0.0.0-20230331212405-5b42e9465c12/fetcher.go:122 fetch result {"from": "64.44.166.5", "of": "/ipfs/bafybeiaysi4s6lnjev27ln5icwm6tueaw2vdykrtjkwiphwekaywqhcjze/wiki", "status": 200, "size": 203846, "ttfb": 0, "duration": 1.975232799, "attempt": 0, "error": "context canceled"}
2023-04-05T01:13:03.040+0200 DEBUG caboose caboose@v0.0.0-20230331212405-5b42e9465c12/pool.go:505 fetch attempt failed {"from": "64.44.166.5", "attempt": 0, "of": "/ipfs/bafybeiaysi4s6lnjev27ln5icwm6tueaw2vdykrtjkwiphwekaywqhcjze/wiki", "error": "context canceled"}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good catch that was a refactor bug., I meant it when I said I hadn't really tested yet 😅. It's a bug that the fetcher isn't using params. This comment was about whether the tryLoad
should also do params. Right now it doesn't, but it's pretty doable to add it back in as well if we need it. WDYT?
The former is a pretty trivial fix, the later is a little more time. Not sure how much I'll be able to do before IPFS Thing, but will update if I can.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct me if we aim at different things here, but in my mind the main value of tryLoad
is to avoid fetching parents over and over again, and move root CID at which we start Fetch to the right of the path.
With that framing, we don't need to pass the same params as the final Fetch
, we could always use bytes=0:0
or depth=1
here (no matter what actual params would be used for nial Fetch), no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tryLoad
is trying to reduce the amount of data we request that we already have. It's most notable/obvious for paths, but is true for things like when an entire file is stored in memory already.
Doing parents only (i.e. depth=0) happens to be easier than also dealing with the terminal element/params (e.g. byte range, depth=1, depth=all). Seems like we should deal with the others too though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the most important optimization is to skip re-fetching parents we already have (biggest pain for websites and ranges over videos).
Could we land that first, and deal with the terminal element later, or is that the same amount of work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope parents is easier (it's already done). Will push the fixes shortly for the bugs you've found
if !errors.Is(err, format.ErrNotFound{}) { | ||
graphLog.Error(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
opening http://en.wikipedia-on-ipfs.org.ipns.localhost:8081/wiki/Belize/ produces:
backend/graph lib/graph_gateway.go:343 adl requested but not supported by link system: "unixfs"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh ... lol thanks. This is a mixing of all the tooling and coming out badly. The UnixFSPathSelector inserts requests for ADLs, but the fetcher thing uses a default reifier. Should just drop the UnixFSPathSelector and build one largely copied from boxo/path/resolver
so we're only sticking to one frankenstein'd version at a time. We could also switch to using the named ADLs, but doesn't seem worth it given this is all too hacked together at the moment anyway.
f4f7736
to
a4adacb
Compare
a4adacb
to
9edb28f
Compare
dropped for similar work in boxo
Closing as this is no longer needed now that we have backpressure #160 and no block storage. |
Resolves most of part 3 in #62.
Note: I haven't had time to exercise this yet so might have issues