Skip to content
This repository has been archived by the owner on Apr 18, 2024. It is now read-only.

Proposal for a 'per-request prefetch' #62

Closed
wants to merge 1 commit into from
Closed

Proposal for a 'per-request prefetch' #62

wants to merge 1 commit into from

Conversation

willscott
Copy link
Contributor

This allows a sub-blockstore interface to be used at a request level specifying an initial expectation of the path -> blocks needed.

I suspect that this can initially be the inbound path to the gateway with '?depth=1' added in unixfs handler and '?depth=all' for tar/car handlers.

That will trigger loading of the requested car into a local cache in front of the blockstore, and blocks reading of the stream from the backing request to be reasonably in line with reads.

@willscott willscott requested a review from lidel March 6, 2023 11:37
@willscott willscott changed the base branch from main to nonblock March 6, 2023 11:37
Base automatically changed from nonblock to main March 6, 2023 15:33

n := 0
for {
ss.writeAheadSynchronizer.ReqAdd()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, this is to limit reading of incoming blocks from the CAR stream to keep memory usage in check, right ?
Given that the cache size is bounded, I don't think we need this to limit how many blocks we put in the cache.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, what would happen if we didn't use a writeAheadSynchronizer here ?

if err != nil {
return nil, err
}
ss.cache.Add(it, blk)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we NOT need to increment the count on the writeAheadSynchronizer here ?


const SessionCacheSize = 1024

func (s *Stoker) NewSession(ctx context.Context, path string) ipfsblockstore.Blockstore {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will bifrost directly use the Stoker blockstore instead of the Caboose blockstore by giving it a path here ?

Copy link
Contributor

@aarshkshah1992 aarshkshah1992 Mar 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lidel Is it easy for Bifrost to pass the path here that it gets from it's clients ? Will the "affinity" key be blank in this case ? Becasue from the code, I can see that we only pass the path to Lassie when the affinity key is empty.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When does passing the path to Lassie make more sense than passing the "affinity key" to Lassie ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also @lidel -> Would it make sense to share the block cache across requests given IPFS GW access patterns ?

Copy link
Contributor

@lidel lidel Mar 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will bifrost directly use the Stoker blockstore instead of the Caboose blockstore by giving it a path here ?

My understanding is we won't be able to use anything other than blockstore interface until we finish go-libipfs/gateway refactor.

When done, bifrost-gateway may use Get* functions from IPFSBackend interface for talking with caboose instead getBlock for some types of requests.
This is how Caboose will get higher resolution request for specific depth or byte ranges.

Is it easy for Bifrost to pass the path here that it gets from it's clients ?

Caboose already reads and uses the requested Content Path as the Affinity key here.

This is the parent, normalized content path that the block request comes from (so you get uniform behavior across path, subdomain, and dnslink gateway requests based on Host HTTP header too).

The problem is, you don't know how much you need to prefetch to satisfy client's request.
I guess the best you can do, to avoid overfetching entire wikipedia when /ipfs/bafybeiaysi4s6lnjev27ln5icwm6tueaw2vdykrtjkwiphwekaywqhcjze/ is requested, is to do all CAR requests with ?depth=1, and effectively fetch level-by-level instead of block-by-block.

When does passing the path to Lassie make more sense than passing the "affinity key" to Lassie ?

Probably when you know the context (full file OR only dir enumeration vs ALWAYS full dag).

If you don't know the context, sending path to lassie will force preload of full dag (too much) or only root level (too little if request was for a file), and most likely impact earnings of L1s.

Would it make sense to share the block cache across requests given IPFS GW access patterns ?

We already have global block cache in bifrost-gateway – 2Q cache is based on frequency+recency to match gateway usage patterns, and has hit ratio of 40%-50%.

If you add any additional cache, add a metric which will let us see if it brings any value (prior art in blockstore_cache.go)

// from pre-fetching. that said, this will more likely read out the
// fill pre-fetch stream, which is still a reasonable choice.
ss.writeAheadSynchronizer.Dec()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does calling Dec here signify ? We're not really removing anything from the cache here. Is it like a "yeah, your prefetching is useful to me, keep doing what you are doing". Are we optimizing for memory usage here ?

// decrement 1 from the queue, potentially unblocking writers
func (w *writeAheadSynchronizer) Dec() {
curr := w.current.Add(-1)
if curr < 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can go below zero because the same cached block can get read multiple times by Bifrost, right ? Do we want to decrement the count in this case where the same cached block gets read multiple times ?


type stokerSession struct {
c *caboose.Caboose
cache *lru.ARCCache
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: avoid using ARC, it comes with uncertainty related to patent by IBM and someone will ask for removal: ipfs/kubo#6590


const SessionCacheSize = 1024

func (s *Stoker) NewSession(ctx context.Context, path string) ipfsblockstore.Blockstore {
Copy link
Contributor

@lidel lidel Mar 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will bifrost directly use the Stoker blockstore instead of the Caboose blockstore by giving it a path here ?

My understanding is we won't be able to use anything other than blockstore interface until we finish go-libipfs/gateway refactor.

When done, bifrost-gateway may use Get* functions from IPFSBackend interface for talking with caboose instead getBlock for some types of requests.
This is how Caboose will get higher resolution request for specific depth or byte ranges.

Is it easy for Bifrost to pass the path here that it gets from it's clients ?

Caboose already reads and uses the requested Content Path as the Affinity key here.

This is the parent, normalized content path that the block request comes from (so you get uniform behavior across path, subdomain, and dnslink gateway requests based on Host HTTP header too).

The problem is, you don't know how much you need to prefetch to satisfy client's request.
I guess the best you can do, to avoid overfetching entire wikipedia when /ipfs/bafybeiaysi4s6lnjev27ln5icwm6tueaw2vdykrtjkwiphwekaywqhcjze/ is requested, is to do all CAR requests with ?depth=1, and effectively fetch level-by-level instead of block-by-block.

When does passing the path to Lassie make more sense than passing the "affinity key" to Lassie ?

Probably when you know the context (full file OR only dir enumeration vs ALWAYS full dag).

If you don't know the context, sending path to lassie will force preload of full dag (too much) or only root level (too little if request was for a file), and most likely impact earnings of L1s.

Would it make sense to share the block cache across requests given IPFS GW access patterns ?

We already have global block cache in bifrost-gateway – 2Q cache is based on frequency+recency to match gateway usage patterns, and has hit ratio of 40%-50%.

If you add any additional cache, add a metric which will let us see if it brings any value (prior art in blockstore_cache.go)

@lidel
Copy link
Contributor

lidel commented Apr 5, 2023

I believe we can close this since we now have GRAPH_BACKEND from ipfs-inactive/bifrost-gateway#61.
Follow up work is tracked in ipfs-inactive/bifrost-gateway#62 and ipfs-inactive/bifrost-gateway#47

@lidel lidel closed this Apr 5, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants