-
Notifications
You must be signed in to change notification settings - Fork 62
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking the time to add a lot more info to this doc! I’ve left a bunch of feedback inline here.
Also, could you please rewrite this commit to include a license and signoff line as described in our contributor guide?
content/guides/concepts/cid.md
Outdated
@@ -5,18 +5,39 @@ menu: | |||
parent: concepts | |||
--- | |||
|
|||
A *content identifier* is a value that addresses a single piece of content in IPFS. It is mainly a cryptographic hash of the content, but is encoded as a [multihash](https://github.com/multiformats/multihash) and [multicodec](https://github.com/multiformats/multicodec). (Note: older CIDs have a different design — see [version 0](#version-0) below.) | |||
## Summary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you please remove this heading? See the comment on your pinning PR for explanation: #105 (comment)
content/guides/concepts/cid.md
Outdated
|
||
<!-- TODO: explain more of the details of how CID v1 is composed here. --> | ||
A *content identifier*, or CID, is a label used for addressing content in IPFS. CID's are used as a standard way of pointing to pieces of information. CID's identify specific pieces of content stored in IPFS. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, I feel kind of like each of these sentences is just telling me the same thing again. This would be fine if it was just the first sentence and you dropped the rest. (Side note: “label” is a great term here! Wish I’d thought of that 😄)
CID's
Please don’t use an apostrophe here, it’s not technically correct grammar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this more redundant than necessary, but the point is to ensure understanding comes across. I'll take another stab.
content/guides/concepts/cid.md
Outdated
|
||
You can read up on the details in the [CID spec](https://github.com/ipld/cid). You might also want to check out the [CID inspector](http://cid-utils.ipfs.team/#zb2rhiVd5G2DSpnbYtty8NhYHeDvNkPxjSqA7YbDPuhdihj9L) for an interactive breakdown of CIDs. | ||
CID's are based on that content's cryptographic hash - a different piece of content will have a different hash and will produce different CID's. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can make “cryptographic hash” here a link to the concept doc on hashes?
I think it also might be helpful to explain a little more about why this matters, e.g. “Because a CID identifies content by what it is rather than by where it is stored, it gives us a way to retrieve the same content from many different peers on the network, rather than just one place — without CIDs, IPFS wouldn’t work at all.”
content/guides/concepts/cid.md
Outdated
## Version 1 | ||
## Format of CID's | ||
|
||
CID's can take a few different forms, each easy for humans and/or software to decode. Any specific CID can be transformed to other equivalent CID representations (for example, using different base, CID version or codec). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don’t say “easy.” People in this community have come from a variety of backgrounds and expertise, and what’s easy for some is not so for others. When you use this word, you risk discouraging someone who’s perfectly smart and capable, but had a hard time trying to do this task because they’d never done it before.
Any specific CID can be transformed to other equivalent CID representations
I don’t think we should say this. While you can up-convert a v0 CID to a v1 CID, there’s no explicit guarantee going forward that something similar will necessarily be possible if we ever invent a v2. Also, if you consider the hash as part of the CID, you cannot transform that (e.g. you couldn’t transform a SHA2-based CID to a SHA3-based CID if you all you have is the CID.)
content/guides/concepts/cid.md
Outdated
|
||
CID's can take a few different forms, each easy for humans and/or software to decode. Any specific CID can be transformed to other equivalent CID representations (for example, using different base, CID version or codec). | ||
|
||
CID v1 and later are comprised of some leading identifiers making it easy to identify which representation is used, along with the content-hash itself. In v1 and later, these include a multibase identifier, [multicodec](https://github.com/multiformats/multicodec) identifier, and CID version-identifier: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not say “and later” (there is no v2 and we have no idea what it might look like if and when it’s invented).
This paragraph and most of the stuff down to the next heading are really particular to the v1 format and doesn’t apply to v0 at all. I think you should probably just move it all under the “version 1” heading. I’m not sure if we actually need a “format” section alongside a “versions” section, since each version is actually a different format.
I think the only thing we really need to say at this point (before the version sections) is that there are two versions, and IPFS is slowly migrating to v1 by default. (Maybe we could add which commands do v0 by default and which do v1 by default).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the multiformat strategy, v2 (if/when) will at least have a compatible leading identifier to indicate that it is v2. But okay, I'll hold back on that aspect. Thanks.
content/guides/concepts/cid.md
Outdated
|
||
These leading identifiers provide support for different formats to be used in future versions. Older CIDs have a different design that omits these identifiers — see [version 0](#version-0) below. | ||
|
||
Using the first few bytes of the CID, the CID can easily be interpreted; the content can be fetched from IPFS, then decoded with the correct codec. For more details, check out the [CID specification](https://github.com/ipld/cid). It includes a [decoding algorithm](https://github.com/ipld/cid/blob/ef1b2002394b15b1e6c26c30545fd485f2c4c138/README.md#decoding-algorithm) and links to existing software implementations for decoding CID's. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same note as above about saying things are “easy” here.
content/guides/concepts/cid.md
Outdated
|
||
When IPFS was first designed, we used base 58-encoded multihashes as the content identifiers. (This is simpler, but much less flexible than newer CIDs.) It is still used by default when adding files and blocks to IPFS, so you should generally try to support them. | ||
When IPFS was first designed, we specified the consistent use of base 58-encoded multihashes as the content identifiers. While this is s simpler, it is also much less flexible than newer CIDs. CIDv0 is still used by default when adding files and blocks to IPFS, so you should generally try to support them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this wording (“we specified the consistent use of”) is harder to read (that’s important since there are lots of non-native English speakers in the community — and even more in the community we hope to grow) and doesn’t add any clarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, reverting that bit.
content/guides/concepts/cid.md
Outdated
|
||
The CID specification includes a [decoding algorithm](https://github.com/ipld/cid/blob/ef1b2002394b15b1e6c26c30545fd485f2c4c138/README.md#decoding-algorithm) you can use to distinguish CID v0 from newer versions. | ||
There is a [decoding algorithm](https://github.com/ipld/cid/blob/ef1b2002394b15b1e6c26c30545fd485f2c4c138/README.md#decoding-algorithm) that shows how you can distinguish CID v0 from newer versions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please stick to active voice wherever possible. Avoid things like “there is” and “it is suggested,” etc.
License: MIT Signed-off-by: Randall Harmon <rjharmon0316@gmail.com>
OK, I took another stab and I think I resolved all the concerns you expressed. |
content/guides/concepts/cid.md
Outdated
|
||
<!-- TODO: explain more of the details of how CID v1 is composed here. --> | ||
CIDs are based on the content's [cryptographic hash](concepts/hashes). As a result, any difference in content will produce a different CID. Any IPFS node having the content will be able to match the hash and be able to retrieve the original content. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fact-check: will that node be able to match the hash if it has the content?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly. The caveat here is that the node has to have the content stored with the same hash algorithm (NOT the same CID — work is currently happening to make sure different base-encodings of the same hash will point to the same content on disk in ipfs/kubo#5231 — and the codec part of a CID is not involved in lookup, it’s only a hint for parsing the content once it’s found and received).
For example, say you have the content "Hello"
. If you hash it with SHA-256, you can make differently encoded CIDs of that:
- Base 58 (default):
zb2rhdYtXM8X3Jfsm6VrmXnmcSqtfgHZbhYRJ32ENkmARL78K
- Base 32:
bafkreidgubc3iuqqfrm5qqhmbf6vtwkgpyj2h42pmskokop72mwbxm27da
In current versions of IPFS, they will be treated as totally separate content. In the next version (see issue linked above), they’ll match the same content. However, if you used a different hash algorithm:
- SHA-256 Base 58:
zb2rhdYtXM8X3Jfsm6VrmXnmcSqtfgHZbhYRJ32ENkmARL78K
(same as above) - SHA-512 Base 58:
zB7NCgi6ywUVNus2k55hDjeyHPmJrtY3eMB8S9T2Unjyp9Lc3H9yvTq6b4nXieFCcg4awXzKRrSo5GULNR1TMtSbNhGKp
Those will not match the same content, because they use a different hashing algorithm.
Maybe instead of the last sentence, say something like:
Each IPFS node keeps a list of hashes for all the content it stores; when it receives a request for a CID, it extracts the hash from the CID and checks it against the list, then returns the associated content if it’s found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty good to me! I left some feedback on your question — let me know if you plan to make changes or if I should merge as-is.
content/guides/concepts/cid.md
Outdated
|
||
<!-- TODO: explain more of the details of how CID v1 is composed here. --> | ||
CIDs are based on the content's [cryptographic hash](concepts/hashes). As a result, any difference in content will produce a different CID. Any IPFS node having the content will be able to match the hash and be able to retrieve the original content. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly. The caveat here is that the node has to have the content stored with the same hash algorithm (NOT the same CID — work is currently happening to make sure different base-encodings of the same hash will point to the same content on disk in ipfs/kubo#5231 — and the codec part of a CID is not involved in lookup, it’s only a hint for parsing the content once it’s found and received).
For example, say you have the content "Hello"
. If you hash it with SHA-256, you can make differently encoded CIDs of that:
- Base 58 (default):
zb2rhdYtXM8X3Jfsm6VrmXnmcSqtfgHZbhYRJ32ENkmARL78K
- Base 32:
bafkreidgubc3iuqqfrm5qqhmbf6vtwkgpyj2h42pmskokop72mwbxm27da
In current versions of IPFS, they will be treated as totally separate content. In the next version (see issue linked above), they’ll match the same content. However, if you used a different hash algorithm:
- SHA-256 Base 58:
zb2rhdYtXM8X3Jfsm6VrmXnmcSqtfgHZbhYRJ32ENkmARL78K
(same as above) - SHA-512 Base 58:
zB7NCgi6ywUVNus2k55hDjeyHPmJrtY3eMB8S9T2Unjyp9Lc3H9yvTq6b4nXieFCcg4awXzKRrSo5GULNR1TMtSbNhGKp
Those will not match the same content, because they use a different hashing algorithm.
Maybe instead of the last sentence, say something like:
Each IPFS node keeps a list of hashes for all the content it stores; when it receives a request for a CID, it extracts the hash from the CID and checks it against the list, then returns the associated content if it’s found.
The behavioral details you mention do make sense, though it could be argued that they can muddy up the key concepts we're trying to clarify. I'll push a patch that I think drives people to a right understanding that can be stable even as some of these technical particulars may evolve. PLMKWYT, thanks. |
I think that sounds good. Do you feel like those two points are important enough to be called out more clearly with a bulleted list?
|
License: MIT Signed-off-by: Randall Harmon <rjharmon0316@gmail.com>
ok, I adjusted it as you suggested |
Oh no! I was not trying to make a suggestion. I was actually asking a question. |
:) I almost bulleted them the first time 'round. |
Oh! All good then 👍 |
This includes a variety of small tweaks to spacing, typographical symbols, links, and minor changes to tighten up the language just a bit. Fixes ipfs-inactive#95. License: MIT Signed-off-by: Rob Brackett <rob@robbrackett.com>
Thanks so much for this (and for keeping with it after my absurdly slow review), @rjharmon! |
No description provided.