Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement two-stage pour proofs. #43

Closed
nathan-at-least opened this issue Jan 6, 2015 · 18 comments
Closed

implement two-stage pour proofs. #43

nathan-at-least opened this issue Jan 6, 2015 · 18 comments
Labels
A-circuit Area: zk-SNARK circuits A-crypto Area: Cryptography C-research Category: Engineering notes in support of design choices I-performance Problems and improvements with respect to performance libzcash not in 1.0

Comments

@nathan-at-least
Copy link
Contributor

Issue by nathan-at-least
Tuesday Dec 02, 2014 at 18:17 GMT
Originally opened as Electric-Coin-Company/libzerocash#6


The prove time is large enough that users may perceive this as a drawback compared to competing cryptocurrencies. It may be possible to split the proof into two stages, where the early stage can be computed immediately upon receipt before the subsequent send details are known.

If this is possible and the early stage is a large enough fraction of the total proof time, then it would be a major usability improvement to compute the first stage immediately upon receipt. This early stage would overlap with block confirmation time, which users already understand and expect. Additionally, this would mean the delay to spend would be perceptually close to other bitcoin-likes.

@nathan-at-least nathan-at-least added enhancement C-research Category: Engineering notes in support of design choices A-crypto Area: Cryptography I-performance Problems and improvements with respect to performance labels Jan 6, 2015
@nathan-at-least
Copy link
Contributor Author

Comment by daira
Wednesday Dec 03, 2014 at 01:24 GMT


The problem with computing the proof on receipt of coins is that at that point you only know the current root of the Merkle tree over coin commitments. (The proof that the input coins are in this tree takes up the overwhelming majority of the ZK circuit size and therefore the proving time.) To maximise the anonymity set of coins that the transaction could have come from, you want to use the most recent Merkle tree root available at the time of the spend. So splitting the proof is going to leak information about when you received the input coins.

@nathan-at-least
Copy link
Contributor Author

Comment by daira
Wednesday Dec 03, 2014 at 01:38 GMT


BTW I totally believe that we could choose a conservatively secure hash function for the Merkle tree over coin commitments (that's a mouthful so let's start calling it the cm-tree) that is at least 10 times faster in QAP circuits than SHA-256 compression (#3). If we did that then the POUR proof time would be no greater than the time for a credit card transaction (reducing the confirmation latency is a separate problem), so we wouldn't need to split the proof.

@nathan-at-least
Copy link
Contributor Author

Comment by imichaelmiers
Wednesday Dec 03, 2014 at 01:46 GMT


One option we’ve considered at times is letting you do the snark proof with a AND r=R_1 OR r=R_2 …. tacked on. I.e. you prove that the root you’re proving membership in is one in a list of the last x roots. The practical question is how many gates this costs and therefore how long a list we allow . The past 24 hours( so 144), the past week?

Ian
On Dec 2, 2014, at 8:24 PM, Daira Hopwood notifications@github.com wrote:

The problem with computing the proof on receipt of coins is that at that point you only know the current root of the Merkle tree over coin commitments. (The proof that the input coins are in this tree takes up the overwhelming majority of the ZK circuit size and therefore the proving time.) To maximise the anonymity set of coins that the transaction could have come from, you want to use the most recent Merkle tree root available at the time of the spend. So splitting the proof is going to leak information about when you received the input coins.


Reply to this email directly or view it on GitHub.

@nathan-at-least
Copy link
Contributor Author

Comment by tromer
Wednesday Dec 03, 2014 at 03:17 GMT


How does the "AND r=R_1 OR r=R_2 …." extension help? You'd still need to provide the list of recent roots as an input to the prover, and at that time you might as well provide just the latest root.

@nathan-at-least
Copy link
Contributor Author

Comment by amiller
Wednesday Dec 03, 2014 at 03:19 GMT


How about revealing in plaintext which cm-tree you are proving against,
and accepting the proof as long as it refers to a cm-tree within the last
24 hours?

On Tue, Dec 2, 2014 at 10:17 PM, Eran Tromer notifications@github.com
wrote:

How does the "AND r=R_1 OR r=R_2 …." extension help? You'd still need to
provide the list of recent roots as an input to the prover, and at that
time you might as well provide just the latest root.


Reply to this email directly or view it on GitHub
Electric-Coin-Company/libzerocash#6 (comment)
.

@nathan-at-least
Copy link
Contributor Author

Comment by imichaelmiers
Wednesday Dec 03, 2014 at 03:26 GMT


I think it doesn’t and i conflated two things. 1) was something we discussed a while ago about not needing to get updates on the tree. This allows you to just store an old root. but you still need the new roots and have to compute the proof at that point. This isn’t useful really.

There’s then a different idea, that you compute most of the proof at a given point and then compute the last part specifying which root your using at the end. Alternately, you have one proof for root x, call it \pi. and then a proof \pi’ that there exists a proof \pi for some root x where x is in a list. You only ever publish the second proof.

That strikes me as far more complicated .

Ian
On Dec 2, 2014, at 10:17 PM, Eran Tromer notifications@github.com wrote:

How does the "AND r=R_1 OR r=R_2 …." extension help? You'd still need to provide the list of recent roots as an input to the prover, and at that time you might as well provide just the latest root.


Reply to this email directly or view it on GitHub.

@nathan-at-least
Copy link
Contributor Author

Comment by tromer
Wednesday Dec 03, 2014 at 03:35 GMT


Andrew: as Daira wrote above, that leaks information about when you received the input coins.

Ian's second idea shoild work, but requires bootstrapping (running the 2nd SNARK on a statement that includes verifier of the 1st SNARK), so a lot more code, a slower curve, and probably a larger statement so there's no win overall. Not advisable.

A variant on that is to just provide two proofs. What would work is a variant on Ian's 2nd idea above: in the first stage, do the initial proof with respect to a commitment, call it cr, to the then-current root. Then, in the second stage, create a 2nd SNARK proof to the statement "I know an opening of cr which is among the N recent roots". The latter can be much faster than the original SNARK, so latency's much better. But you need to send both proofs, so the transaction grows by ~288 bytes.

Either way, have we solved the problem of having to commit to the parameters of the Pour (e.g., amount) already at the first stage?

@nathan-at-least
Copy link
Contributor Author

Comment by amiller
Wednesday Dec 03, 2014 at 03:43 GMT


I don't think this point has been mentioned yet:
Another reason to allow some slack in which cm-tree you refer to in
your proof is that if a new block arrives in between when you start
generating the proof and before your transaction propagates, then your
transaction will be "stale" and need to be (partially) regenerated. This
would a) add extra work b) delay getting your transaction included.

On Tue, Dec 2, 2014 at 10:35 PM, Eran Tromer notifications@github.com
wrote:

Andrew: as Daira wrote above, that leaks information about when you
received the input coins.

Ian's second idea shoild work, but requires bootstrapping (running the 2nd
SNARK on a statement that includes verifier of the 1st SNARK), so a lot
more code, a slower curve, and probably a larger statement so there's no
win overall. Not advisable.

A variant on that is to just provide two proofs. What would work is a
variant on Ian's 2nd idea above: in the first stage, do the initial proof
with respect to a commitment, call it cr, to the then-current root.
Then, in the second stage, create a 2nd SNARK proof to the statement "I
know an opening of cr which is among the N recent roots". The latter
can be much faster than the original SNARK, so latency's much better. But
you need to send both proofs, so the transaction grows by ~288 bytes.

Either way, have we solved the problem of having to commit to the
parameters of the Pour (e.g., amount) already at the first stage?


Reply to this email directly or view it on GitHub
Electric-Coin-Company/libzerocash#6 (comment)
.

@nathan-at-least
Copy link
Contributor Author

Comment by imichaelmiers
Wednesday Dec 03, 2014 at 04:41 GMT


that we currently allow in the software. The tx points to a block which contains the tree toot used for the proof.
so you can use any root, provided it’s in the block chain. (each block embeds the current root as of that block)
Ian
On Dec 2, 2014, at 10:43 PM, Andrew Miller notifications@github.com wrote:

I don't think this point has been mentioned yet:
Another reason to allow some slack in which cm-tree you refer to in
your proof is that if a new block arrives in between when you start
generating the proof and before your transaction propagates, then your
transaction will be "stale" and need to be (partially) regenerated. This
would a) add extra work b) delay getting your transaction included.

On Tue, Dec 2, 2014 at 10:35 PM, Eran Tromer notifications@github.com
wrote:

Andrew: as Daira wrote above, that leaks information about when you
received the input coins.

Ian's second idea shoild work, but requires bootstrapping (running the 2nd
SNARK on a statement that includes verifier of the 1st SNARK), so a lot
more code, a slower curve, and probably a larger statement so there's no
win overall. Not advisable.

A variant on that is to just provide two proofs. What would work is a
variant on Ian's 2nd idea above: in the first stage, do the initial proof
with respect to a commitment, call it cr, to the then-current root.
Then, in the second stage, create a 2nd SNARK proof to the statement "I
know an opening of cr which is among the N recent roots". The latter
can be much faster than the original SNARK, so latency's much better. But
you need to send both proofs, so the transaction grows by ~288 bytes.

Either way, have we solved the problem of having to commit to the
parameters of the Pour (e.g., amount) already at the first stage?


Reply to this email directly or view it on GitHub
Electric-Coin-Company/libzerocash#6 (comment)
.


Reply to this email directly or view it on GitHub.

@nathan-at-least
Copy link
Contributor Author

Comment by daira
Thursday Dec 04, 2014 at 14:13 GMT


@imichaelmiers Part of your previous comment seems to have been lost, can you edit it to restore that part?

@nathan-at-least nathan-at-least changed the title Investigate two-stage pour proofs. implement two-stage pour proofs. Aug 24, 2015
@daira daira added the A-circuit Area: zk-SNARK circuits label Mar 1, 2016
@daira
Copy link
Contributor

daira commented Mar 15, 2017

There's a way to do this in conjunction with #2171:

Separate the note commitment Merkle tree into a "large" tree and a "small" tree. The small tree is used cyclically; when each half of it has been used, that branch is transplanted to the large tree (this would happen moderately often, say every few hours). So it is always the case that a given commitment can be found in either the small tree or the large tree. All nodes keep track of the current large and small trees. We aim to make it indistinguishable which tree a given JoinSplit is using. Note that the large tree root only changes when a branch is transplanted.

Also split the Merkle tree proof into stages as per #2171, such that the path height proven by one of the stages is the height of the small tree dsmall, and the other stage(s) make up the remaining height of the large tree, i.e. they prove a path of height dlarge - dsmall.

Each proof is tied to a "proof-linking commitment". Some commitments in the large tree will be old enough that the leafmost subtree of height dlarge - dsmall containing them is stable, i.e. its internal root will not be affected by further additions to the large tree. In that case, the proof for that subtree can be cached indefinitely (together with the opening of its proof-linking commitment). Note that these cached proofs are not dependent on the current overall root of the large tree.

In other cases, a commitment will be new enough that it is contained in the small tree. In that case it is sufficient to use a proof that it is in the small tree combined with dummy proof(s) (which can be used, but only once, for any commitment), for the other stage(s).

The input stage of the JoinSplit (per #2171) controls which other proofs are needed. To link itself to a particular proof, it proves knowledge of the opening of the corresponding proof-linking commitment. So, to perform a JoinSplit, we always need the input stage and the small-tree stage (these are dependent on the current small-tree root which changes frequently), but we can often use a cached or dummy proof for the other stage(s).

@daira
Copy link
Contributor

daira commented Mar 15, 2017

Suppose for example that the small tree has height 15, and the large tree has height 29. Every 214 commitments, we transplant that many commitments –half of the small tree– into the next subtree of the large tree, and update the large tree's root.

One Merkle tree proof stage either covers the leafmost 14 layers of a path in the large tree, or is a dummy. Since leafmost subtrees of size 214 in the large tree have a stable root, these proofs are also stable, and can be cached indefinitely with the witness for each commitment in the large tree.

The other Merkle tree proof stage covers either the rootmost 15 layers of a path in the large tree, or the whole path in the small tree. It is given both tree roots as public input, but the choice of which tree to use is a private input. In the case where it is proving a path in the small tree, it is paired with a dummy proof for the other stage.

Since each shielded transaction uses two commitments as a lower bound, and taking the current rate of shielded transactions according to explorer.zcha.in, it would take about 10 days between each update of the large tree. If clients use the next-to-last large tree root in their anchors, then this anchor will always be stable across reorgs (even reorgs that happen just after a large tree update), and they are still guaranteed that a given commitment can either be referenced by that anchor, or is still in the small tree. (This is the motivation for only transplanting half of the small tree at once.) They also have plenty of time to build up a buffer of dummy proofs with the right large-tree anchor, since it is known in advance of being needed.

If the note commitment usage rate were 1000 times higher, the large tree anchor would still only change every ~14 minutes, which is a little short but still workable. (There is no correctness problem no matter how high the commitment usage rate; it's just that dummy proofs are not valid for very long in that case.)

@daira
Copy link
Contributor

daira commented Mar 16, 2017

Credit to Sander and Ta–Shma 1999 for the inspiration to use multiple roots (section 3.2 of that paper, although the details are somewhat different). In particular, when they say "The user proves in a zero knowledge way that he knows a hash chain to one of the roots in the set.", that's basically the same as what we're doing here, although the use of cached proofs of partial Merkle paths is new.

@daira
Copy link
Contributor

daira commented Mar 21, 2017

I wrote:

The input stage of the JoinSplit (per #2171) controls which other proofs are needed.

It's actually better for the input stage to only control which note commitment is selected (i.e. to make a commitment to cmold). This is sufficient for security but may allow the input stage to be precomputed in some cases.

Correction: it also needs to make a commitment to the intermediate Merkle tree hash between the two Merkle stage proofs.

@daira
Copy link
Contributor

daira commented Mar 22, 2017

If we keep the input and output stages together, the number of hashes needed for that is 15. We also need 16 hashes for each of two Merkle tree stages, giving 46, an overhead of 5 hashes over #647 alone. For this (and 592 bytes overhead in JoinSplit description size for the two extra proofs), we get stage circuits about 16/42 = 38% the size of a #647 circuit, drastically cutting down on the memory usage. The two Merkle tree stages can share the same circuit, so the total proving key size is reduced by about 26% relative to #647. Also we can precompute and cache one, and possibly two, of the three stage proofs for each JoinSplit.

@arielgabizon
Copy link
Contributor

arielgabizon commented Mar 26, 2017

@daira : you wrote

We aim to make it indistinguishable which tree a given JoinSplit is using.

Is there a method to do this such that we still always gain something? It seems in the case the transaction is in the large tree we do not.
Can you perhaps summarize what we gain and what we pay in return using this method? (I'm guessing we need more than one SNARK proof?)

@daira
Copy link
Contributor

daira commented Mar 26, 2017

@ariel: We do gain in that case, because we cache the proof for the leafmost half of the large Merkle tree.

What we pay:

  • There are three zk-SNARK proofs instead of one. So we pay 592 bytes per JoinSplit on proof size (with the current encoding and proof system), plus 96 bytes for the extra proof commitments and 32 bytes for the extra Merkle tree root. The JoinSplit description size for single pour input, value commitment scheme #647 is 1786 bytes, so this is a size overhead of (592 + 96 + 32) / 1786 = 40.3% for a JoinSplit (less proportionally for a transaction).
  • We pay a small overhead (5 SHA256Compress hashes) in the total circuit size –and therefore proving effort– per JoinSplit.
  • We pay in engineering complexity of linking the proofs, caching them, maintaining two commitment trees, and security analysis.

What we gain:

  • The two Merkle tree stages are the same, so the proving key size is reduced by about 26% relative to single pour input, value commitment scheme #647 (assuming it is proportional to the total size of unique proof circuits).
  • The size of the proving key per proof stage, which currently accounts for most of the proving-time memory usage, is reduced by about 62% relative to single pour input, value commitment scheme #647. So even without streaming, the memory usage for each proof is reduced accordingly. Note that streaming would introduce its own overheads and complexity. (More capable machines with a lot of parallelism and memory could still do the proofs in parallel if that helps.)
  • About a third of the JoinSplit proof effort can always be precomputed. Once a commitment enters the large tree, then a proof can be cached with its witness, and then never has to be updated. If a commitment is used while it is still in the small tree, then it is possible to use a precomputed dummy proof, from a pool of such proofs that do not depend on the anchor or the commitment.
  • A further third of the JoinSplit proof effort may be easier to precompute, because it does not depend on an anchor that changes every block. This depends on whether the parameters of the JoinSplit are known in advance.
  • Both Merkle tree proof stages can be outsourced without giving away any private key information.

The discussion above assumes that the input and output sections are not further split. (Splitting them is not compatible with the current version of the shielded multisig proposal in #782.)

@daira
Copy link
Contributor

daira commented Sep 1, 2018

Since memory usage of proving is no longer a serious issue after Sapling, the complication of this approach seems unnecessary.

@daira daira closed this as completed Sep 1, 2018
buck54321 pushed a commit to buck54321/zcash that referenced this issue Sep 13, 2023
Co-authored-by: cryptobubbles <cryptobubbles@protonmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-circuit Area: zk-SNARK circuits A-crypto Area: Cryptography C-research Category: Engineering notes in support of design choices I-performance Problems and improvements with respect to performance libzcash not in 1.0
Projects
None yet
Development

No branches or pull requests

3 participants