Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: handle retrieval queries for unindexed identity payload CIDs #747

Merged
merged 5 commits into from
Sep 13, 2022

Conversation

rvagg
Copy link
Member

@rvagg rvagg commented Aug 24, 2022

There are valid cases where a CAR may have an identity CID as its root that is
not represented as a 'block' within the CAR body and therefore isn't indexed
by the dagstore. In this case, we inspect the identity CID content and treat
the query as a query for the intersection of all of the links within the block.

Ref: filecoin-project/boost#715

@rvagg
Copy link
Member Author

rvagg commented Aug 24, 2022

TestHandleQueryStream/When_payload_CID_not_found failing:

        	            	 (retrievalmarket.QueryResponse) {
        	            	- Status: (retrievalmarket.QueryResponseStatus) 1,
        	            	+ Status: (retrievalmarket.QueryResponseStatus) 2,
        	            	  PieceCIDFound: (retrievalmarket.QueryItemStatus) 1,
        	            	@@ -15,3 +15,3 @@
        	            	  MaxPaymentIntervalIncrease: (uint64) 0,
        	            	- Message: (string) (len=69) "piece info for cid not found (deal has not been added to a piece yet)",
        	            	+ Message: (string) (len=121) "failed to fetch piece to retrieve from: getting pieces for cid QmfM8NQMc9bGY37wa9FwrVJtw5YT66ut7jjLFVha6aeFST: %!w(<nil>)",
        	            	  UnsealPrice: (big.Int) {

I'll need to restore that if we proceed with this.

@rvagg rvagg force-pushed the rvagg/unindexed-identity-cid-retrieval branch from 6154030 to ee63003 Compare August 31, 2022 07:36
@rvagg
Copy link
Member Author

rvagg commented Aug 31, 2022

New commits added to make this work properly, reasonably and efficiently:

  • I had to refactor out two calls to dagStore.GetPiecesContainingBlock and related logic to check pieces. Currently there's one call for the initial find-piece logic, then there's a second one to collect the list of deals for the potential pieces so we can invoke the dynamic-ask logic for those deals. I want to wrap the identity CID check in front of both of those calls. So now, instead of hiding all of pieces-with-block logic inside of the two utility functions, we now just do a single call, get the list of pieces, narrow it down to one and use that for the answer, but also use the full list of pieces for the dynamic-ask.
  • Limits imposed on identity CIDs to reduce the potential for abuse: number of bytes and number of links we're willing to investigate. Having an easy way to make an SP do a lot of work by looking up pieces containing CIDs is not really ideal. The limits I've added are a bit arbitrary but I'm trying to make this API useful and not fully limit it to the ~126 byte case that's commonly encountered for UnixFS.
  • Tests for most of the new things.

Some trade-offs that I judged worth it:

  • By reducing the number of calls to dagStore.GetPiecesContainingBlock, we're increasing the number of calls to pieceStore.GetPieceInfo in some cases (there's one test that had to change where you can see this).
  • API weirdness with the use of getAllPieceInfoForPayload - it can return an error and results. This is to match the current implementation where it's allowed to error when inspecting one or more of the calls to dagStore.GetPiecesContainingBlock as long as it gets at least successful call. Now that's in its own function, we have a piecesErr that we hold on and use it when we run out of options. Hopefully inline docs make this not-too-terrible.

Aside: Why, if we have selected a piece to make a deal on, does the dynamic-ask operate over all deals that contain the pieces that have the payload? Why aren't we just doing dynamic-ask for the one piece?

Copy link
Collaborator

@hannahhoward hannahhoward left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given it's our main target use case, we should add a test for a unixfs/dag-PB identity CD.

retrievalmarket/impl/provider.go Show resolved Hide resolved
retrievalmarket/impl/provider_test.go Outdated Show resolved Hide resolved
@hannahhoward
Copy link
Collaborator

alternativelty, run the integration test change in Boost with this PR to verify.

@rvagg rvagg force-pushed the rvagg/unindexed-identity-cid-retrieval branch from 73dba21 to f1cbd37 Compare September 1, 2022 06:36
rvagg added a commit to filecoin-project/boost that referenced this pull request Sep 1, 2022
rvagg added a commit to filecoin-project/boost that referenced this pull request Sep 1, 2022
@rvagg
Copy link
Member Author

rvagg commented Sep 1, 2022

Updated to:

  • Handle recursive identity CIDs, but still account properly for the maximum number of links when they bubble up to the top-level one, so even if you do the recursive thing you still don't get to go beyond the byte limit or number of links limit
  • Tested recursive identity CIDs
  • Introduced dag-pb into the tests - there's now a combination of dag-pb and dag-cbor being used for those various retrieval handling cases and the identity CID maker utility is moved up into shared_testutil package in its own file.

Also verified this works with filecoin-project/boost#715 and it does, so 🥳

@rvagg
Copy link
Member Author

rvagg commented Sep 1, 2022

btw this branch is based off v1.23.2 which is the tag boost is currently using; there's breaking libp2p dep changes between there and master to be able to use anything later and have it play ball with lotus

rvagg added a commit to filecoin-project/boost that referenced this pull request Sep 1, 2022
@rvagg rvagg force-pushed the rvagg/unindexed-identity-cid-retrieval branch 2 times, most recently from 1e9837d to f8755fa Compare September 2, 2022 00:11
There are valid cases where a CAR may have an identity CID as its root that is
not represented as a 'block' within the CAR body and therefore isn't indexed
by the dagstore. In this case, we inspect the identity CID content and treat
the query as a query for the intersection of all of the links within the block.

Ref: filecoin-project/boost#715
1. to support identity PayloadCID without having to duplicate decode & lookup
   logic
2. because it's not cheap, especially for identity PayloadCIDs with lots of
   links

The tradeoff is that in some cases we end up calling the PieceStore more than
we otherwise would.
* Byte limit (2048)
* Link limit (32)
@rvagg rvagg force-pushed the rvagg/unindexed-identity-cid-retrieval branch from f8755fa to e37d6f4 Compare September 2, 2022 11:24
@rvagg
Copy link
Member Author

rvagg commented Sep 2, 2022

rebased this to master since lotus has caught up already, we just need to get boost done too

jacobheun pushed a commit to filecoin-project/boost that referenced this pull request Jan 26, 2023
* test(itests): demonstrate failure to retrieve

* fix: cleanup test & handle identity roots

* chore: update go-fil-markets with identity CID fix

Ref: filecoin-project/go-fil-markets#747

* doc: make it clear InlineBuilder isn't recommended
i.e. don't copypasta this for your filecoin deal prep

Co-authored-by: Rod Vagg <rod@vagg.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants