CAR based Gateway implementation #62

aschmahmann · 2023-03-21T05:48:46Z

Done Criteria

While there is an implementation of gateway.IPFSBackend that can leverage retrievals of CAR files with the relevant data in them.

It should implemented the proposed version of the API here, which shouldn't have major changes before the above PR lands.

Implementation stages

Why Important

Implementation Phases

(1) Fetch CAR into per-request memory blockstore and serve response
- done: feat: opt-in GRAPH_BACKEND #61
(2) Fetch CAR into shared memory blockstore and serve response along with a blockservice that does block requests for missing data
- done: feat: opt-in GRAPH_BACKEND #61
(3) Start doing the walk locally and then if a path segment is incomplete send a request for a CAR/blocks and upon every received block try to continue using the blockservice
- wip: traverse path locally when possible #75
(4) Start doing the walk locally and keep a list of "plausible" blocks, if after issuing a request we get a non-plausible block then report them and attempt to recover by redoing the last segment
(5) Don't redo the last segment fully if it's part of a UnixFS file and we can do range requests

Details and Dependencies

ECD: 2023-03-27

Refactor Gateway API so can extract out the request layer ipfs/boxo#173 (resolved by feat: refactor gateway api to operate on higher level semantics ipfs/boxo#176 ). It will now be possible to build an IPFS HTTP Gateway implementation where individual HTTP requests are more closely tied to Go API calls into a configurable backend.

Blockers for mirroring traffic for Rhea

ECD: 2023-03-29

Resolve memory issues
Add more metrics tracking to the new implementation

The work is happening in #61. See there for more details

Blockers for production traffic for Rhea

ECD: TBD - Date for a date/plan: 2023-03-30

We need to have sufficient testing of the bifrost-gateway code given we aren't able to run Kubo's battery of sharness tests against it (per #58 ).

Options being considered:

Enough of testing in ci: add gateway-confromance test suite to the CI #66 that we can be reasonably confident in the new implementation
- Note: we may want to be cautious in some of our implementation work here to increase the chance that kubo sharness tests will catch errors while the conformance tests improve (i.e. use something like the current strategy with the same BlocksGateway implementation kubo uses but with DAG prefetching of blocks happening underneath)
Can happen alongside some confidence building by comparing production ipfs.io/dweb.link traffic status codes + response sizes to Rhea ones.

Completion tasks to mark this done-done-done

Turning an inbound gateway.IPFSBackend request into a CAR request (should be relatively straightforward)
Doing incremental verification of the responses
Handle what happens if the CAR response sends back bad data (e.g. for Caboose report the problem upstream)
Handle what happens if the CAR response dies in the middle (i.e. resumption or restarting of download)
Handle OOM/out-of-disk-space errors
- because the CAR responses do not have duplicate blocks, but a block may be reused in a graph traversal, either the entire graph needs to be buffered/stored before the blocks are thrown away or it needs to be possible to re-issue block requests for data we recently received but might have thrown away

Additional Notes

There already is an implementation of gateway.IPFSBackend that uses the existing tooling for block-based storage/retrieval here (and related to #57).

Some details related to Caboose:

Since Caboose is in charge of selecting which Saturn peers to ask for which content there may be some affinity information (perhaps just what already exists) that it wants in order to optimize which nodes it sends requests to (e.g. for a given CAR request that fulfills an IPFS HTTP Gateway request understanding if it wants to split the load, send it all to a specific L1, send it to a set of L1s, etc.).
IIUC the current plan is to send all data for a given high level IPFS HTTP Gateway request to a single L1 which shouldn't be too bad. Note: it may not be exactly 1 IPFS HTTP Gateway request -> 1 CAR file request due to various optimizations however the total number of requests should certainly go down dramatically

If we need to make some compromises in the implementation here in order to start collecting some data that's doable, but if so they should be explicitly called out and issues filed. Additionally, it should continue to be possible to use a blocks gateway implementation here via config.

cc @Jorropo @aarshkshah1992

The text was updated successfully, but these errors were encountered:

lidel · 2023-03-21T15:25:10Z

Flagging filecoin-saturn/L1-node#289 so we don't make the same mistake here.
Block requests should not be converted to car requests.

aschmahmann · 2023-03-27T17:39:34Z

Update: Work is taking place in #61

Closing this issue out requires dealing with the high level tasks indicated above. However, the blockers for testing this for usage in Rhea are:

Blockers for mirroring traffic:

ECD 2023-03-29

Resolve memory issues
Add more metrics tracking to the new implementation
See feat: opt-in GRAPH_BACKEND #61 for more details

Blockers for production traffic:

Date for date/plan: 2023-03-30

Enough of testing in ci: add gateway-confromance test suite to the CI #66 that we can be reasonably confident in the new implementation
- Note: we may want to be cautious in some of our implementation work here to increase the chance that kubo sharness tests will catch errors while the conformance tests improve (i.e. use something like the current strategy with the same BlocksGateway implementation kubo uses but with DAG prefetching of blocks happening underneath)
Can happen alongside some confidence building by comparing production ipfs.io/dweb.link traffic status codes + response sizes to Rhea ones.

BigLep · 2023-03-27T18:01:43Z

Thanks @aschmahmann . I inlined this information into the issue description.

lidel · 2023-04-03T22:55:42Z

@aschmahmann fysa I've moved the description of Implementation Phases from #61 to this meta-issue and marked the first two as done.

Mind clarifying which phase covers CAR-based resumes (?format=car&depth=1) instead of block-by-block (or add one?)
Why? It will have a very positive impact on website loads because website assets share a common parent, and we will be able to avoid over-fetching AND if we use cAR instead of block-by-block, we'll avoid round-trips at the same time.

Example:

/ipfs/cid/sub/index.html
/ipfs/cid/sub/assets/a.jpg
/ipfs/cid/sub/assets/b.css
/ipfs/cid/sub/assets/c.js

When we load index.html, we learn about contents of /sub and learn what is the cid-of-assets.
When we load a.jpg, we also retrieve blocks for parent dir, and can enumerate it to learn what are the CIDs of other files in the assets dir.

Ideally, opening /ipfs/cid/sub/ would fetch parents only once, and only fetch specific sub-graphs.
I imagine it would translate to below requests:

/ipfs/cid/sub/index.html?format=car&depth=1 # learn cid-of-assets

/ipfs/cid-of-assets/a.jpg?format=car&depth=1 # learn cids in /assets
/ipfs/cid-of-b?format=car&depth=1 #direct fetch of a file
/ipfs/cid-of-c?format=car&depth=1 #direct fetch of a file

Lmk how feasible this is, and if should we add this as (3.5) or something else?

aschmahmann · 2023-04-04T00:26:21Z

Mind clarifying which phase covers CAR-based resumes (?format=car&depth=1) instead of block-by-block (or add one?)

I should actually reword it (I'll change it above) but this is phase 3.

(3) Start doing the walk locally and then if a path segment is incomplete send a request for blocks and upon every received block try to continue

"send a request for blocks" should be "send a request for a CAR/blocks" (i.e. it's the same ask for a CAR, if it fails just use blocks) as above.

Note: In the case listed above you're likely actually asking for the directory and then implicitly getting index.html and it might look like this:

/ipfs/cid/sub?format=car&depth=0 (or bytes=0:0) # learn cid-of-sub and if it's a directory or file
/ipfs/cid-of-sub/index.html?format=car&depth=1 # get-the-index.html

/ipfs/cid-of-sub/assets/a.jpg?format=car&depth=1 # learn cid of assets (this might or might not already be known based on if sub is a sharded directory)
/ipfs/cid-of-b?format=car&depth=1 #direct fetch of a file
/ipfs/cid-of-c?format=car&depth=1 #direct fetch of a file

Note: the latter two might also be /ipfs/cid-of-sub/assets/(b|c).jpg?format=car&depth=1 as it's a race condition with the client based on when bifrost-gateway has received the blocks from the first request and if they're still in cache since all three of those assets might be requested by the browser simultaneously. bifrost-gateway could notice all requests are coming from the same user for the same path and slow down some of the requests a little too save on wasted bandwidth, but that's something we can evaluate later.

aschmahmann · 2023-08-15T17:57:19Z

This is closed by #160. bifrost-gateway has largely handled the concerns in Completion tasks to mark this done-done-done.

However, we now use backpressured processing and incremental verification of CAR responses rather than buffering all the data on memory or in on disk-cache with block-request fallbacks.

ipfs-inactive deleted a comment from welcome bot Mar 21, 2023

BigLep assigned aschmahmann Mar 21, 2023

lidel mentioned this issue Apr 1, 2023

feat: opt-in GRAPH_BACKEND #61

Merged

2 tasks

aschmahmann mentioned this issue Apr 4, 2023

traverse path locally when possible #75

Closed

lidel mentioned this issue Apr 5, 2023

Proposal for a 'per-request prefetch' filecoin-saturn/caboose#62

Closed

aschmahmann mentioned this issue May 2, 2023

Verify HTTP Car Requests filecoin-project/lassie#195

Closed

This was referenced May 9, 2023

meta: GRAPH_BACKEND fixes and latency improvements #88

Closed

Boxo gateway should support IPIP-402 parameters ipfs/boxo#283

Closed

aschmahmann closed this as completed Aug 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CAR based Gateway implementation #62

CAR based Gateway implementation #62

aschmahmann commented Mar 21, 2023 •

edited

Loading

lidel commented Mar 21, 2023

aschmahmann commented Mar 27, 2023 •

edited by BigLep

Loading

BigLep commented Mar 27, 2023

lidel commented Apr 3, 2023

aschmahmann commented Apr 4, 2023

aschmahmann commented Aug 15, 2023

CAR based Gateway implementation #62

CAR based Gateway implementation #62

Comments

aschmahmann commented Mar 21, 2023 • edited Loading

Done Criteria

Implementation stages

Why Important

Implementation Phases

Details and Dependencies

Blockers for mirroring traffic for Rhea

Blockers for production traffic for Rhea

Completion tasks to mark this done-done-done

Additional Notes

lidel commented Mar 21, 2023

aschmahmann commented Mar 27, 2023 • edited by BigLep Loading

Blockers for mirroring traffic:

Blockers for production traffic:

BigLep commented Mar 27, 2023

lidel commented Apr 3, 2023

aschmahmann commented Apr 4, 2023

aschmahmann commented Aug 15, 2023

aschmahmann commented Mar 21, 2023 •

edited

Loading

aschmahmann commented Mar 27, 2023 •

edited by BigLep

Loading