-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add --cid-version=1 --raw-leaves=false
isn't hash coherent with add --cid-version=0
#8974
Comments
add --cid-version=1 --raw-leaves=false
isn't hash coherent with --cid-version=0
add --cid-version=1 --raw-leaves=false
isn't hash coherent with add --cid-version=0
Personally that a WONTFIX (except maybe some docs), having to use |
@Jorropo this isn't a bug - the version of a cid is part of the payload. A block with links in cidv0 is different byte-content ( thus different CID ) as opposed to a block with the very same links in cidv1. |
@ribasushi I know, that why the:
It's just that go-ipfs attempt to have some hash coherency with future versions. (adding the same data multiple times with the same options in future versions should generally give you the same CID back) |
You are probably thinking of |
I'm essentially proposing the same thing but doing it explicitly with |
Always happy to get more docs but is this coherency expected by the user somewhere? (And what exactly is the meaning of it?) It seems reliant on very specific and internal knowledge on how we construct these (raw leaves, chunker, etc.). |
I have talked with a few people that assume it.
Adding the same file multiple times yields the same CID (or at least a compatible one, so really it's only the same multihash).
Yeah |
Pushing some more on your definition to try to better understand it and see how we can fit it in the docs:
You're changing the arguments, is the user that changes these still expect the same CID?
Then what does compatible mean? Same multihash? Is the user aware of it, and if it is, it expects it to go unchanged when it changes the CID or any other argument? If anything this seems more of an issue with the UnixFS layer and explaining to the user that their file/directory/etc representation in IPFS is opaque and they shouldn't rely on the CID/multihash. We attempt to change this as little as possible but there are (AFAIK) no guarantees. For example, we now automatically change the directory implementation (from basic to HAMT) based (roughly) on its number of entries, when they reach a (variably-adjusted) threshold. There is no single "file" entity in IPFS as a continuous string of bits as (say) in the user's hard drive. I'd push for more documentation in that direction. |
(Changing to a docs issues unless there is in fact a guarantee of CID/multihash stability in UnixFS.) |
IMO there is in no way whatsoever a guarantee that From If that wording is insufficient to get the point across we can modify that as well.
Correct, we don't like changing behavior because no matter what you tell people they'll start relying on things anyway unless it evolves too quickly for people to incorrectly assume it'll never change. This is part of why the defaults for When the change to CIDv1 by default happens there will likely need to be a set of comms reminding people that defaults can change and how to generate CIDv0s if they really need to. @Jorropo unless the idea is you want to add more text to |
I agree with all of this. I think the issue is getting side tracked. The issue here is that it is surprising that the CID version change the hash. It seems like a non trivial amount of people (me including before I implemented some raw CID code) think that CID version is just an encoding trick, but the binary data wouldn't change, just how it gets made into text.
I want to add a
similar thing to |
I think the implicit assumption that is causing some confusion here is we're talking about a single block (strings of bits) of data. In general changing CID version doesn't change the hash of the data (because we keep the same hashing algorithm, at least in v1). But a file is (usually) not a single block of data but a DAG of them, and a DAG is not just the concatenation of the chunks of the original file but something more opaque that includes other metadata to structure it. (That metadata is in fact leaked in the CID when the user might not be actually interested in it.) The bold text is not obvious to a new user (I had the same confusion for a lot of time) and some more docs could be added on that. |
I had discussions since then, we ideally want to sunset CIDv0, this will never happen but we are still gonna try. We want to default to encoding CIDv1 inside the binary blocks even when a CIDv0 could be used because it makes it anoying to write various implementations and have weird edge cases in some of them. |
Checklist
Installation method
built from source
Version
Config
N/A
Description
When using
--cid-version=1
ipfs will use CIDv1 CIDs in the root blocks of the file, creating different hashes if the file gets chunked into multiple leaves.Reproduction
The text was updated successfully, but these errors were encountered: