-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide Way of Representing CidV0 objects in an alternative base #34
Comments
I've been thinking about this too, for the case where we want to represent a CIDv0 in base32 in a URL. The less obvious thing here is that we probably need to be able to recognize such CIDs and convert them back to CIDv0, so that DHT and Bitswap return the expected results. Do you agree this is a concern? @whyrusleeping was skeptical when I first brought it up a while ago. It could be done with a multicodec header that says "what's in here is CIDv0". |
For those cases, we upgrade the CID to CIDv1 and base encoded it with base32. |
Yes, and I think that should just be the byte '0x00' which is the variable-length encoding of the number 0. Hence the equivalent of setting the version number to 0.
The ideas is we want to be able to retrieve a Cidv0 object using base32. Upgrading the CID will not work in this case. |
The rules for displaying CidV0 in an alternative base can be as follows:
This should be fairly easy to implement, I can create a p.r. later today if it will make my idea clearer. |
@diasdavid what Kevin and I are thinking about is the next step after receiving a gateway request with a CIDv0-in-CIDv1: content routing and bitswap, where we need the original CIDv0 because that's what provider records are published for, and what's used as a key in the blockstore of the provider nodes. It's kind of a matter of backward compatibility: there will always be nodes who continue to create CIDv0 stuff. |
@kevina I'd go a step further and start using the CIDv1 formatted CIDv0s by default. I believe that will actually be less work and, given that we're planning on switching to base32 by default anyways, shouldn't cause any additional breakage. Does that make sense or am I missing something? |
@Stebalien What exactly will be a CIDv1 formatted CIDv0? The base used to encode a CID is for the most part an interface issue. The base is is not stored in the key of objects stored as part of a link or in the blockstore. The Cid version and hash used is, on the other hand, part of the key. You can't retrieve an Cidv0 object if it is reformatted to be Cidv1 and sometimes you may still want to store a Cidv0 key in another object. |
The |
@Stebalien sorry I misinterpreted your reply (as I am prone to do The idea behind not returning |
I kind of agree... I'd just like to phase out non-multibase strings as much as possible as soon as possible. But, I guess, if we move to base32 by default we'll be doing that anyways. |
It turns out I made this more complicated then necessary. The CidV0 bit is NOT needed. Just prefixing a CidV0 string with a multibase prefix will already work as expected. For example The only problem is this little bit of code (which gave me the impression that a CidV0 are incompatible with multibase) func (c *Cid) StringOfBase(base mbase.Encoding) (string, error) {
switch c.version {
case 0:
if base != mbase.Base58BTC {
return "", ErrInvalidEncoding
}
return c.hash.B58String(), nil
case 1:
return mbase.Encode(base, c.bytesV1())
default:
panic("not possible to reach this point")
}
} which should likely work on CidV0 string and return a string with the multibase prefix if the code is not 'z'. Thus I propose we fix this function and call it a day. 😄 |
Here's an update to the spec to take this into account: multiformats/cid#14 |
There are two issues here. One is:
a) This was accepted as kind of fine since that is also a valid concern for when someone picks different hashes or bases for their content. Essentially, the content should be identified by it's multihash and not by a key that includes type and multibase. If the hash matches, it is the same content. If an app wants to interpret content "asdldasjdkla" as multicodec A and another app as multicodec B, that's fine. This way we ensure deduplication b) This is what I think will be the be crux. From memory, the Lisbon Treaty (the 2~3 hack nights that me, @jbenet and @whyrusleeping spent figuring all of the details out), the actual solution for this problem was not to use CIDs as Keys in the Repo, instead, keep using the multihashes as it was before. This would solve this issue from the bottom because independently of reading CIDv0 or CIDv1, the repo would always pick up the block from the multihash and that would be the same for both. Right now, go-ipfs uses base32.encode(rawCID) to get the keys for storing blocks in the repo. Eventually, js-ipfs followed so that both repos are compatible. However, I can't recall if there was a valid reason for this change. Please help me refresh my and everyone's memory if you know where that was discussed and approved. Using
|
@diasdavid Thanks for this. Was this tidbit of information documented somewhere? Based on what you are saying all network requests for blocks should be made using the multihash only and not the full CID. I agree that we should be storing blocks using just the multihash. The change to base32 was made before the CID proposal to fix ipfs/kubo#2601, some additional discussion was done here ipfs/go-datastore#39, and the change was made in ipfs/kubo#2903. It should be rather simple to store just the multihash component in the datastore. The biggest pain will be the repo migration. |
Agreed and I believe it was, at least 2 times. Post Lisbon Treaty and somewhere in a go-ipfs PR. It should have been, however, done through the IPFS Specs Repo.
The CID can go on the wire (because of in the future, with graphswap that will be useful), however, the assertions should be done in the hash level. If the Repo was checking by Hash, that will be granted by default. |
@diasdavid you replied before I edited my sentence 😄 |
This was also cleared up but after Lisbon (I think this point might not have been considered , without this it would be impossible to enumerate blocks given only repo for example as in case of pinset loss, it has saved us once already). This also means that the data isn't self descriptive in storage in blockstore. It might be possible to store it aside (in other part of the datastore) but it would require us to reevaluate the leveldb (it might just not scale to number of entries we keep). EDIT: I had this comment sitting in comment windows for ~1h |
@diasdavid I fully agree with all your points. Nevertheless, I still think it will be useful to be able to represent CidV0 in an alternative base. For example if we switch to Base32 users might still have a need to store a CidV0 string as part of a link in a dag. Also converting a CidV0 link in a dag to a Cidv1 for display in same cases can cause problems. If we don't allow multibase prefixed for CidV0 then it will be base32 for everything but the case where CidV0 is still required. That seams odd to me. |
Could I have a link to the "Post Lisbon Treaty"? |
I want too |
I would like to renew this discussion. I don't think that supporting CIDv0 in multibase is wise, it will linger forever everywhere. We should flesh out clear upgrade path that will lead to us not using CIDv0 internally anymore. Quick example of possible upgrade path:
This way we can get CIDv0 mostly out of our codebases (apart from UI layer) and upgrade to new protocol. If we don't do this soon enough we will be stuck with it forever and it decreases performance, increases complexity and reduces our velocity (complexity is bane of moving protocols forward). We need a clear way to move forward with CIDv1 and how finally embrace it fully. |
I do not agree with this and don't see the argument as "it will linger forever then everywhere" very convincing. |
We do want to move away from supporting cidv0. Adding new features to it seems unwise if thats really our goal. I'm not really sure what a valid usecase for this is, as @diasdavid said earlier, we should just upgrade them to cidv1 and provide an internal conversion layer in cases where it might be needed. |
@Kubuxu ideally, the datastore wouldn't care about the CID, just the multihash. |
I will repeat by earlier comment #34 (comment) so its not lost in the noise:
|
Also, as a side note, we can not fully convert CidV0 Dag objects to CidV1 without changing the hash of the object as the links inside will also need to be converted to CidV1. If we do this it will no longer be possible to look up old content. If we don't do this we will need to support CidV0 forever, at least to some capacity. |
I do not see why this will be necessary. Pinned hashes will still be stored by the full CID. The GC does not care about the structure of anything not pinned. |
@kevina you're right. |
In addition to all the points I said above I would also like to add that not allowing CidV0 to be represented in Base32 breaks abstraction and creates this weird special case. As I see it multibase is something that is independent of CID. That is it can be used to encode any sort of (short) binary string. If we first convert to binary and then decode the CID then any sort of base will be permissible (as the multibase code has no idea it is decoding a CID, and the CID decoder just sees the binary string), however if we do it all at once then the CID decoder will reject it. I fully agree with should work to migrate away from CIDv0, however for this reason and reasons in previous comments I think not allowing CIDv0 to be encoded using multibase creates unnecessary complications. In all fairness I did think of one valid use case for not allowing multibase prefixed for CIDv0 that is to distinguish raw multihashes from multihashes that are also CIDv0. That is we switch to storing raw multihashes in the datastore we will need to list them to the user. We could in this case list them with a multibase encoding and define any multihash with a multibase prefix as a raw multihash and not a CIDv0. However, as I see it this is a fragile distinction and it still doesn't solve the problem on how to handle working with existing CIDv0 stored as link in objects. |
I'm not sure, but maybe this is something that we can work through in the upcoming developers meeting. See ipfs-inactive/developer-meetings#16. |
Closing:
To follow migration to CIDv1 (default base32) see: https://github.com/ipfs/ipfs/issues/337 |
It may be useful to represent a CidV0 object in an alternative base. For example if we need to include it in as the domain part of a URL.
I propose we provide allow a non non-standard encoding of:
<multibase prefix><version><multihash>
where version is always 0.
Related to ipfs/kubo#4143.
Note: I made this more complicated than it needs be, see this comment: #34 (comment)
The text was updated successfully, but these errors were encountered: