Skip to content
This repository has been archived by the owner on Jun 20, 2023. It is now read-only.

Make poseidon hashes good hashes #19

Merged
merged 2 commits into from
Feb 22, 2023
Merged

Make poseidon hashes good hashes #19

merged 2 commits into from
Feb 22, 2023

Conversation

hsanjuan
Copy link
Contributor

This hash function is heavily used by Filecoin, making Kubo unable to operate with Filecoin data.

This hash function is heavily used by Filecoin, making Kubo unable to operate
with Filecoin data.
@welcome
Copy link

welcome bot commented Feb 22, 2023

Thank you for submitting this PR!
A maintainer will be here shortly to review it.
We are super grateful, but we are also overloaded! Help us by making sure that:

  • The context for this PR is clear, with relevant discussion, decisions
    and stakeholders linked/mentioned.

  • Your contribution itself is clear (code comments, self-review for the
    rest) and in its best form. Follow the code contribution
    guidelines

    if they apply.

Getting other community members to do a review would be great help too on complex PRs (you can ask in the chats/forums). If you are unsure about something, just leave us a comment.
Next steps:

  • A maintainer will triage and assign priority to this PR, commenting on
    any missing things and potentially assigning a reviewer for high
    priority items.

  • The PR gets reviews, discussed and approvals as needed.

  • The PR is merged by maintainers when it has been approved and comments addressed.

We currently aim to provide initial feedback/triaging within two business days. Please keep an eye on any labelling actions, as these will indicate priorities and status of your contribution.
We are very grateful for your contribution!

@hsanjuan hsanjuan self-assigned this Feb 22, 2023
@hsanjuan hsanjuan requested a review from Kubuxu February 22, 2023 22:23
(additional filecoin hashes)
@hsanjuan hsanjuan merged commit 062f619 into master Feb 22, 2023
@hsanjuan hsanjuan deleted the add-poseidon branch February 22, 2023 23:29
@Jorropo
Copy link
Contributor

Jorropo commented Feb 23, 2023

@hsanjuan have you tried this in Kubo ? I don't see a registered hash in https://github.com/multiformats/go-multihash I think that is just gonna error anyway.

@Jorropo Jorropo mentioned this pull request Feb 23, 2023
@ribasushi
Copy link

@hsanjuan note that POSEIDON_BLS12_381_A1_FC1 is not a verifiable "hash". You need extra parameters that are not contained in the CID to validate that some "blob" corresponds to that "hash". I am not sure it is valid to add it to "good" hashes due to this.

cc @porcuquine @vmx

@hsanjuan
Copy link
Contributor Author

@hsanjuan have you tried this in Kubo ? I don't see a registered hash in https://github.com/multiformats/go-multihash I think that is just gonna error anyway.

I have tried it. It must be registered because this imports go-multihash and works?

@vmx
Copy link
Member

vmx commented Feb 24, 2023

You need extra parameters that are not contained in the CID to validate that some "blob" corresponds to that "hash".

Yes, though I think those parameters are well defined (somewhere :).

@hsanjuan
Copy link
Contributor Author

@hsanjuan note that POSEIDON_BLS12_381_A1_FC1 is not a verifiable "hash". You need extra parameters that are not contained in the CID to validate that some "blob" corresponds to that "hash". I am not sure it is valid to add it to "good" hashes due to this.

cc @porcuquine @vmx

To clarify, things break as soon as one of these CIDs is "seen". This is before the potential step where a block referenced by it is verified (I don't know what error it will throw in that case since these blocks are not retrievable). In general my feeling is that ipfs may not know how to hash a block referenced like this (??), but that doesn't mean the safeguards against "insecure" hashes should kick in. I'm sure this can error at a different place just fine.

@ribasushi
Copy link

To clarify, things break as soon as one of these CIDs is "seen".

But this is terrible... If a block content addressed with sha256 links to an md5 hash and a murmru "hash", there is nothing wrong with that block itself.

@Jorropo @aschmahmann did we graft the verifcid integration incorrectly somewhere...?

@aschmahmann
Copy link

@Jorropo @aschmahmann did we graft the verifcid integration incorrectly somewhere...?

IIUC everything is as it's been in this place for as long as verifcid has existed https://github.com/ipfs/go-blockservice/blob/986faee89be3cef0e742cabda49bdc1d95050d6f/blockservice.go#L232.

To clarify, things break as soon as one of these CIDs is "seen".

@hsanjuan are you sure that's right? Pretty sure this error isn't happening on parse but on before a fetch where you try to pin (i.e. fetch) an entire DAG which includes some blocks go-verifcid doesn't like. Note that even if go-verifcid was fine did kubo would be unable to fetch the over Bitswap due to a failure in https://github.com/ipfs/go-libipfs/blob/6c0f842feed4c8a0ef24d38c94b6e28e622f25d7/bitswap/message/message.go#L222 because there is no hasher registered for go-multihash to use (prefix.Sum calls multihash.Sum which brings us to https://github.com/multiformats/go-multihash/blob/38400376eaefaa6216d075b466037e75437c2838/core/registry.go#L97

It seems like what would happen here if you allowed go-verifcid to allow these hashes you can't understand anyway is that your pin request would just hang until you cancelled it while in the meanwhile the blockservice (and then Bitswap) keep wasting your resources endlessly looking for a block you would never be able to use even if you could find it.

AFAICT there isn't anyone who currently benefits from this change and so reverting it would probably be the correct move. If someone actually runs into a problem here that this PR fixes we can always reapply it. In the meanwhile it's safer and more user friendly to error early than to just hang forever looking for a block you can't possibly get or verify. Is this about right @hsanjuan?

@aschmahmann
Copy link

aschmahmann commented Feb 24, 2023

I'm sure this can error at a different place just fine.

Agreed that it could, for example you could replace most of the calls to go-verifcid.ValidateCIid with something that looks approximately like go.verifcid.ValidateCid(cid) && multihashRegistry.HasHasher(cid.thehashfunction) but is this worth doing at the moment in comparison to say making go-verifcid's collection of good/bad hashes configurable so applications can opt-in/out of what they support (e.g. if someone else doesn't want SHA-1 hashes those should be allowed to error).

Maybe it's more reasonable to have a new error type here alongside "potentially insecure hash function" and complaints about hash size being too big or small for "not a registered hash function". Note: a hash being too big isn't an "insecure one" either, just one that's not larger than reasonable and nobody has yet had a use for.

This would mean go-verifcid is coupled to some go-multihash internals (i.e. global registry) which is unfortunate, although go-verifcid already is a singleton and has a dependency on go-multihash (although one that could be replaced currently with https://github.com/multiformats/go-multicodec).

@hsanjuan do you have a preference for which approach you'd like to take here?

@Kubuxu
Copy link
Member

Kubuxu commented Feb 24, 2023

Under the go-verifcid rules as they were set when go-verifcid was established, these hashes are "good". The primary goal of go-verifcid was to disallow hashes which do not provide second-preimage resistance.
Filecoin hashes, and especially tree structures they use, while having their quirks, provide resistance to that attack.

Answering Riba's comment directly:

note that POSEIDON_BLS12_381_A1_FC1 is not a verifiable "hash". You need extra parameters that are not contained in the CID to validate that some "blob" corresponds to that "hash".

You can verify that some "blob" corresponds to the hash but the padding has to be already applied to the blob.
What is more challenging is recursively exploring the root hash of the tree as it is unknown where the tree structure ends and where the leaves begin.


On the other hand, the purpose of go-verifcid can change so I will leave it to you all as the evolution of Kubo's components is outside of my scope.

@hsanjuan
Copy link
Contributor Author

Note that even if go-verifcid was fine did kubo would be unable to fetch the over Bitswap due to a failure in https://github.com/ipfs/go-libipfs/blob/6c0f842feed4c8a0ef24d38c94b6e28e622f25d7/bitswap/message/message.go#L222 because there is no hasher registered for go-multihash to use

This seems like an issue worth fixing too. Is it only POSEIDON_ that needs special params for hashing? If so, there should be nothing preventing me from adding and retrieving blocks using the other two hashing functions? In general this library seems like the wrong place to make the decision even in the case of Poseidon...

I would expect a pin request to hang when it tries to fetch a block that is not retrievable. That is normal ipfs behavior. What is unexpected is ipfs to error telling me that a hash is insecure without it being insecure, and without actually asking for it to the blockstore or the network.

The "I don't know how to hash" is a better error, but I would expect that ipfs knows how to hash using functions that are actively used in Filecoin land.

Overall, since in practice no one is putting data with such hashes on IPFS yet, it does not matter today. We can leave things as they are. Is there an scenario, however, where someone might put such data on IPFS or we may want it to be retrievable over IPFS? Will FVM have any impact on that?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants