Skip to content
This repository has been archived by the owner on Feb 15, 2023. It is now read-only.

Added padding to Poseidon duplex #26

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

rozbb
Copy link

@rozbb rozbb commented Aug 24, 2021

Description

The Poseidon duplex construction in this crate was trivially insecure. I wrote a regression test, test_collision, that demonstrates two absorb inputs which produce the same squeeze output. The problem was that there was no padding applied to any absorbed messages.

As mentioned in the duplex paper a "sponge-compliant" padding scheme (Definition 1) is crucial to the security claims of the sponge and duplex constructions. So I went ahead and adapted the multi-rate padding function (Section 7) to a finite field setting. The only caveat is that this doesn't work when F = Z/2Z. I assume that's not an issue though. If it is, I can make another version that ends up being even less efficient, requiring all Poseidon rates to be at least 3 in order to function.

Two things that still need to be done (and see Zcash's impl for a reference):

  1. Don't chunk the contents of absorb and feed it all in. As the paper specifies,absorb's input len should be limited to the maximum that can fit in a rate-sized block, after padding. The current auto-chunking trick makes it so that absorb(a || b) and absorb(a), absorb(b) yield the same state. This isn't good. Instead, the duplex needs to limit absorbs and force the user to figure out message packet system that disambiguates the things that they care about. STROBE disambiguates by prepending every new operation with a special message.

  2. Don't allow squeeze to output more than rate bytes. This is also specified by the paper. I'm actually less confident about this suggestion because it seems fine to simply permute() when bytes run out. This is because, by the padding principle, the input will never collide with an actual message. Concretely, this mechanism still satisfies the injectivity claim proven by Lemma 4 in the paper. Also, if this functionality is removed, I'm not certain how one would get more bytes out of a stream.

Finally, I commented out test_poseidon_sponge_consistency because I couldn't figure out how to re-derive the new expected values for the output. That's an easy fix if you know which traits give you a base-10 representation of your field elements (which I do not).


Before we can merge this PR, please make sure that all the following items have been
checked off. If any of the checklist items are not applicable, please leave them but
write a little note why.

  • Targeted PR against correct branch (master)
  • Linked to Github issue with discussion and accepted design OR have an explanation in the PR that describes this work.
  • Wrote unit tests
  • Updated relevant documentation in the code
  • Added a relevant changelog entry to the Pending section in CHANGELOG.md
  • Re-reviewed Files changed in the Github PR explorer

@weikengchen
Copy link
Member

weikengchen commented Aug 24, 2021

We will need to examine the changes in this PR further and likely consider adding a Poseidon duplex wrapper that is less "raw" than what arkworks-rs currently has.

I agree that:

  • The original sponge is too raw, and if used inappropriately, it is not a secure duplex sponge. E.g., as you mention, H(a) = H(a || 0) and absorb(a || b) = absorb(a), absorb(b). These are not random.
  • With careful wrapping (mostly padding some length-related information), the properties above can be removed, and it can be used more securely.

The issue is this:

  • The new sponge does add a lot of overhead.
  • For a few upstream applications currently using Poseidon, such as Merkle tree and Marlin AHP, the original sponge works fine (since we provide fixed-length messages). But I agree this is risky as well. I am thinking about how large the performance impact it would cause, but should be manageable.

The second point you mentioned, regarding the limitation of the output, may be not necessary. Based on the paper, it seems that it limits the amount of bytes (in our case, field elements) that can be outputted "each round". But the sponge implementation in arkworks-rs permutes the state before outputting more bytes (so, the sponge enters the next round), so it should be fine---but padding is necessary to show that this round is different from other rounds, in that it has zero input.

@rozbb
Copy link
Author

rozbb commented Aug 25, 2021

These are fair points. Some thoughts:

  1. It's entirely unclear to me how to make any iteration of this hard to misuse for an end user. Forcing small message sizes is one thing, but this risks the user breaking up larger messages into smaller ones without adding any metadata (this is a collision). Or you could not pad anything until the last input block, in which case you get the same problem but at least you save the overhead you're talking about.
  2. One immediate thing that could be done re padding is to have a separate padding function that the user defines. This is what the zcash impl does. In the constant-length case, the padding function would be a no-op (I think? check me on that)

@rozbb
Copy link
Author

rozbb commented Aug 25, 2021

Also, st4d pointed me to a specification draft for this construction: C2SP/C2SP#3

@hdevalence
Copy link
Contributor

We will need to examine the changes in this PR further and likely consider adding a Poseidon duplex wrapper that is less "raw" than what arkworks-rs currently has.

I would actually suggest that the problem with the current API is that it is both "too raw" and "not raw enough". On the one hand, it's not yet safe for general use, but on the other hand, it doesn't expose the underlying permutation in a way that would allow implementing different constructions (fixed-length hashing, duplex variants, etc) on top.

I think a path forward for this would be to split the current code into two levels, as suggested here: arkworks-rs/crypto-primitives#93

Then, the crate can provide a general-purpose, safe(ish) duplex API like this one, and allow users who want to optimize performance for fixed messages to use the underlying permutation directly.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants