Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: ipld: restore e2e online deal and retrievals w/ identity CIDs #9259

Closed
wants to merge 3 commits into from
Closed

fix: ipld: restore e2e online deal and retrievals w/ identity CIDs #9259

wants to merge 3 commits into from

Conversation

rvagg
Copy link
Member

@rvagg rvagg commented Sep 2, 2022

Related Issues

Depends on:

Related:

Proposed Changes

This used to work prior to the CARv2 blockstore interface was introduced to back
Graphsync retrievals and client retrieval CAR export. When that was done, the
StoreIdentityCIDs option was omitted in some key places, and the CARv2
blockstore interface always returned true from Has() for identity CIDs which
prevents Graphsync from storing them.

Since launch, car.SelectiveCar has been the canonical way to produce CARs for
online deals to calculate CommP. This is still being done on the client side
in Lotus for online deals, but it's no longer done on the server side, and
neither is it done for retrieval export on the client side.

Additional Info

As per discussion in ipld/go-car#332, we do have the
option of changing online deals to not include identity CIDs, therefore
ditching use of car.SelectiveCar (replacing with a CARv2 alternative); but that
would be a breaking change—like the blockstore changes were a breaking change,
but that was accidental. There's also other places where car.SelectiveCar
is simulated, e.g. node/impl/client/client.go#outputCAR.

In addition, I think we should probably be documenting how identity CIDs are handled throughout Lotus - for the online deal flow as well as for retrievals (filecoin-project/go-fil-markets#747 is another example of problems with identity CIDs that is only indirectly related to this PR).

Checklist

Before you mark the PR ready for review, please make sure that:

  • All commits have a clear commit message.
  • The PR title is in the form of of <PR type>: <area>: <change being made>
  • This PR has tests for new functionality or change in behaviour
  • If new user-facing features are introduced, clear usage guidelines and / or documentation updates should be included in https://lotus.filecoin.io or Discussion Tutorials.
  • CI is green

@rvagg rvagg requested a review from a team as a code owner September 2, 2022 11:46
@rvagg
Copy link
Member Author

rvagg commented Sep 2, 2022

Sorry, I forgot to mention the second commit in here is related but not directly—undoing the identity CID option for unixfs creation on lotus client import since it's one of the major sources of our problems—i.e. we wouldn't have to deal with any of this if there weren't identity CIDs being generated and stored in Filecoin.

go.mod Outdated Show resolved Hide resolved
Comment on lines -37 to +35
b := cidutil.InlineBuilder{
Builder: prefix,
Limit: 126,
}
return b, nil
return prefix, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we should include this change in this PR; Yes, there were bugs in ID cid handling, but that doesn't negate the fact that it's an useful feature.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm proposing that we don't do this in Lotus and don't really encourage it for Filecoin, especially for the online deal flow. For two reasons:

  1. In Filecoin, there's a relationship between a block and the piece it's stored in. This is complicated for identity CIDs and is a problem when we have retrieval modes where you're able to ask for an arbitrary (not necessarily deal root) "payload CID" and have to figure out what piece it's in so in order to make decisions about availability and price. Illustrated by the difficulty in dealing with identity CIDs as roots in feat: handle retrieval queries for unindexed identity payload CIDs go-fil-markets#747 (this is not really a "fix" for that problem, just a patch to make it workable, it'd not be a problem if we didn't have identity CIDs that can fit CIDs) but the same applies with non-root identity CIDs unless we make sure to store them as blocks and index them as blocks, thereby reducing their utility and increasing the complexity on all of the layers we manage. This gets even more complicated when we index the content in a piece and send that to an indexer. Should the indexer ignore them? Does the CID:Piece relationship matter at that level?
  2. The break-even cost of storing inline CIDs is pretty low with the way we're having to store them. For an online deal, we end up at only 32-bytes break-even and that's without counting the extra CARv2 index that's generated (I think the break-even for SPs with the CARv2 index is just ~20-bytes, far from the 126 max default here).

Screenshot 2022-09-05 at 8 54 58 pm

(I wrote about this in more detail over here)

So, maybe we just avoid it entirely? At least in lotus—we can't stop them entirely but we can set best practice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Yeah avoiding those is probably reasonable.

I wonder if we could allow them, but only for leaf nodes, where we wouldn't need to worry about traversal edge-cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, but in the current form CidBuilder only gets the bytes so it'd need to decode them to look for links to determine that: https://github.com/ipfs/go-cidutil/blob/ac549eaa2ca06b13a1e5b37ee2696cd72b6481a7/inline.go#L21-L26, that's probably not very efficient.

Alternatively we could introduce an optional LeafCidBuilder into DagBuilderHelper and special-case use it when it's set for leaves: https://github.com/ipfs/go-unixfs/blob/2c23c3ea6fae3ef1b487cfc0c606a4ffc7893676/importer/helpers/dagbuilder.go#L154

Used to work prior to the CARv2 blockstore interface was introduced to back
Graphsync retrievals and client retrieval CAR export. When that was done, the
StoreIdentityCIDs option was omitted in some key places, and the CARv2
blockstore interface always returned true from Has() for identity CIDs which
prevents Graphsync from storing them.

Since launch, car.SelectiveCar has been the canonical way to produce CARs for
online deals to calculate CommP. This is still being done on the client side
in Lotus for online deals, but it's no longer done on the server side, and
neither is it done for retrieval export on the client side.

Ref: filecoin-project/go-fil-markets#749
Ref: ipld/go-car#332
This leads to all sorts of problems with the full e2e flow, so let's just not
do it and not encourage it
@rvagg rvagg requested a review from magik6k September 20, 2023 02:37
@rvagg
Copy link
Member Author

rvagg commented Sep 20, 2023

@magik6k @jennijuju I know we're dealing with some deprecated components here, but boost is still relying on some parts of lotus this touches, particularly the itest kit, and I'm running into this problem over there! Which is what reminded me I still had this open.

This is updated now to latest master and should be safe to merge. The change set is much more minimal now that master has evolved past some of the original dependencies I had to update for this. It would be great to have this merged so we can finally close this PR.

@rvagg rvagg closed this Jun 5, 2024
@rvagg rvagg deleted the rvagg/identity-cid-handling branch June 5, 2024 03:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants