Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proper ZK treatment in plonky2 #1625

Closed
wants to merge 23 commits into from
Closed

Proper ZK treatment in plonky2 #1625

wants to merge 23 commits into from

Conversation

Nashtare
Copy link
Collaborator

@LindaGuiga I'm opening a draft PR to be able to comment on the code

plonky2/src/batch_fri/recursive_verifier.rs Outdated Show resolved Hide resolved
plonky2/src/batch_fri/verifier.rs Outdated Show resolved Hide resolved
plonky2/src/fri/mod.rs Outdated Show resolved Hide resolved
plonky2/src/fri/oracle.rs Outdated Show resolved Hide resolved
plonky2/src/fri/recursive_verifier.rs Outdated Show resolved Hide resolved
plonky2/src/plonk/proof.rs Outdated Show resolved Hide resolved
plonky2/src/plonk/proof.rs Outdated Show resolved Hide resolved
plonky2/src/plonk/proof.rs Outdated Show resolved Hide resolved
plonky2/src/plonk/prover.rs Outdated Show resolved Hide resolved
plonky2/src/util/serialization/mod.rs Outdated Show resolved Hide resolved
@LindaGuiga LindaGuiga marked this pull request as ready for review September 17, 2024 11:21
@LindaGuiga LindaGuiga requested a review from muursh as a code owner September 17, 2024 11:21
@LindaGuiga
Copy link
Contributor

LindaGuiga commented Sep 17, 2024

This PR aims at addressing #1625, based on this note https://eprint.iacr.org/2024/1037.pdf .

  • For the batch FRI polynomial, we take a random polynomial with twice the degree of the subgroup, so that we can add a FRI step with arity 2 instead of computing 2 different FRI proofs (for the lower half and higher half of the polynomial, as mentioned in the note).
  • For the quotient polynomial chunks, we reduce the degree n of each chunk to n - h, so that we can add to them random polynomials with degree n.
  • For the third point in Proper ZK treatment in plonky2 #1625, the current implementation to randomize the wire polynomials seems to follow the guidelines in the paper. Indeed, currently, the degree h is computed as: h_1 = D + num_fri_openings for wire polynomials and h_2 = 2 * D + num_fri_openings for the permutation polynomial Z where D is the extension degree. The differnece between the two values is because the wire polynomials are openings are only opened at zeta, while the Z polynomial is also opened at g*Z. h_1 is added to all wire polynomials, while h_2 is only added to the routed wires. This is in accordance with the prescription of having h >= 2 * (D * n_DEEP + n_FRI) (Eq. 13) in the paper, for the case where the quotient chunks are computed the canonical way and randomized. (Note that the factor 2 in Eq 13 comes from the evaluation at zeta and g*zeta, but we only evaluate at zeta for the wire polynomials, as explained before.)

@Nashtare Nashtare added this to the System strengthening milestone Sep 17, 2024
@Nashtare Nashtare added the soundness Soundness related changes label Sep 25, 2024
Copy link
Contributor

@4l0n50 4l0n50 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just did a first pass and mostly pointed out nits.

plonky2/src/fri/mod.rs Outdated Show resolved Hide resolved
plonky2/src/batch_fri/oracle.rs Outdated Show resolved Hide resolved
plonky2/src/batch_fri/oracle.rs Outdated Show resolved Hide resolved
plonky2/src/batch_fri/oracle.rs Show resolved Hide resolved
plonky2/src/batch_fri/oracle.rs Outdated Show resolved Hide resolved
plonky2/src/batch_fri/verifier.rs Outdated Show resolved Hide resolved
plonky2/src/batch_fri/verifier.rs Outdated Show resolved Hide resolved
plonky2/src/batch_fri/verifier.rs Outdated Show resolved Hide resolved
plonky2/src/fri/oracle.rs Outdated Show resolved Hide resolved
plonky2/src/plonk/prover.rs Outdated Show resolved Hide resolved
@4l0n50
Copy link
Contributor

4l0n50 commented Sep 27, 2024

For the batch FRI polynomial, we take a random polynomial with twice the degree of the subgroup, so that we can add a FRI step with arity 2 instead of computing 2 different FRI proofs (for the lower half and higher half of the polynomial, as mentioned in the note).

I guess this is what the note meant (Protocol 2, isn't it?), because why doing two proofs if you can batch them and compute only one proof? And batching them is like doing FRI for the large poly.

@LindaGuiga
Copy link
Contributor

After an initial review by @ulrich-haboeck, it came out that the random R polynomial does not actually need to have a higher degree than the batch FRI polynomial. Indeed, the randomization of wires is done using some space in the witness domain, which means that h is "included" in the batch FRI polynomial already.

Moreover, he also mentioned that num_blinding_gates (similarly in computed_h) can be updated to only include num_fri_queries instead of num_fri_openings thanks to the new batch FRI polynomial randomization.

I therefore updated the implementation to include both changes.

Copy link
Collaborator

@ulrich-haboeck ulrich-haboeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's my feedback:

  • We would improve proof sizes by gathering all round-2 polynomials, i.e. the partial products from the permutation argument, the pole sums and the table authenticator sum for the lookup argument, the components of the quotient polynomial, and the zk masking polynomial for batch-FRI round-3 polynomials, the components of the quotient polynomial, and the zk masking polynomial for batch-FRI.
  • I was not able to find out whether the lookup argument polynomials from the second round (for the "pole sums" over table and witness area) are randomized. If not, we need to do this by expanding the randomization of "regular" polynomial (and likewise) to auxiliary ones.
  • Due to the treatment of the permutation argument, the current implementation is only statistical zero-knowledge:
    In order that the prover is able to craft a valid proof, the round-1 verifier challenges (beta, gamma) must not produce a zero in one of the partial products. Hence, with each valid proof the verifier learns a little piece of information on the witnesses, namely all linear factors in the (virtual) permutation argument polynomial
Sigma(X,Y) = \prod_{i,x} (X - x - w_i(x) * Y), 

where the product ranges over all wired columns of the chip, are non-zero at (beta, gamma). (Funnily, this is also the case for Plonk's randomization, see footnote 10 on zero-knowledge in the Plonk paper.)

In my opinion, perfect zero-knowledge would be a nice feature. But that would come at a certain extra cost:

  1. The technique from the mir blog would need to be replaced by a strategy that works for every (beta, gamma).
    A naive approach would cost the double of auxilary columns, by proving
\prod_{i,x} (beta - x - w_i(x) * gamma),

and

\prod_{i,x} (beta - sigma_{i}(x) - w_i(x) * gamma)

via separate running products, with their start values enforced to be 1, and their end values enforced to be equal.
Randomization of the partial products can be done by regular noop-gates plus a selector for the permutation argument (excluding the zk area on the chip), or by multiples of the domain vanishing polynomial. The latter randomization allows to keep the same number of columns per "partial lookup", if one implements a "greedy" evaluation logic of their constraints.

  1. Although not strictly needed, it would be good practice to do proper error handling for the case when the lookup random challenge (alpha, ChallangeA) hits a zero of the (virtual) table polynomial
t(X,Y) = \prod_i (X - t_{i,0} - t_{i,1}* Y),

where the product ranges over all table entries t_i=(t_{i,0}, t_{i,1}) of the functional relation to be looked up.

Sorry for the bad formatting - markdown is a pain.

plonky2/src/plonk/circuit_builder.rs Outdated Show resolved Hide resolved
plonky2/src/plonk/prover.rs Outdated Show resolved Hide resolved
plonky2/src/plonk/prover.rs Show resolved Hide resolved
plonky2/src/plonk/prover.rs Outdated Show resolved Hide resolved
plonky2/src/plonk/prover.rs Outdated Show resolved Hide resolved
Comment on lines 223 to 224
alpha.shift_poly(&mut final_poly);
final_poly += quotient;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, as in the batch FRI oracle. Shouldn't we shift quotient?

plonky2/src/fri/prover.rs Outdated Show resolved Hide resolved
plonky2/src/fri/validate_shape.rs Outdated Show resolved Hide resolved
plonky2/src/fri/verifier.rs Outdated Show resolved Hide resolved
.map(|p| (p.oracle_index == PlonkOracle::R.index) as usize)
.sum();
let last_poly = polynomials.len() - nb_r_polys * (idx == 0) as usize;
let evals = polynomials[..last_poly]
.iter()
.map(|p| {
let poly_blinding = instance.oracles[p.oracle_index].blinding;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, for the line below: what is the purpose of the && here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure (I did not change this code), but I assume this is to allow us to have polynomial that we do not necessarily want to blind?

@ulrich-haboeck
Copy link
Collaborator

Spending another thought on perfect zero-knowledge, the following weakened constraints on a running product should suffice. For simplicity, I explain it for a single-column Plonk with a single witness column w(X) and a single pre-computed polynomial sigma(X) for the permutation of the witness domain H. (A generalization to a meaningful chip is straight-forward.)
The usual constraints for the running product Z(X) are Z(1) = 1 and

Z(g*x) * (beta - x - w(x)*gamma) = Z(x) * (beta - sigma(x) - w(x)*gamma)

for every x in H. (Again, these are in general not satisfiable at the wrap-around point, if one of the linear factors beta - x - w(x)*gamma is zero. ) To allow the prover succeed even when one of these factors are zero, we weaken the domain of the above constraint to H \ {g^{-1}}, i.e. the witness domain except the wrap-around point g^{- 1}, and instead demand that last value of the running sum is either zero, or one. Since

last_val * (beta - x - w(x)*gamma) = Z(x) * (beta - sigma(x) - w(x)*gamma)

with x= g^{-1}, we can enforce this by demanding that

Z(x) * (beta - sigma(x) - w(x)*gamma)  
    * [ Z(g*x) * (beta - x - w(x)*gamma) - Z(x) * (beta - sigma(x) - w(x)*gamma) ] = 0

at x = g^{-1}, resulting in a degree 5 constraint, including selector.

@ulrich-haboeck
Copy link
Collaborator

Also, we can take the same random (beta, gamma) for all three arguments in round 2: The permutation argument, the lookup argument, and for proving the table authenticator t(beta, gamma).

@ulrich-haboeck
Copy link
Collaborator

ulrich-haboeck commented Oct 16, 2024

Actually, there is a gap in the above constraints, which still lets the prover not succeed in certain cases. Notably, this gap also occurs in the Halo2 book (thanks to @Al-Kindi-0 for the reference, and also for proposing the countermeasure) :
Again in the single-column case, assume that one of the linear terms is zero, and that term occurs first (when x goes through 1, g, g^2,... ) on the Z(gx) side of the transitional constraint, i.e.

                  = 0
        __________/\__________
Z(gx) * (alpha - x - beta*w(x)) -  Z(x) * (alpha - sigma(x) - beta*w(x)) = 0.
         

While this leaves Z(gx) undetermined, it enforces Z(x) to be zero (unless sigma(x) = x) at that x, and consequently must be zero at all points before, down to x=1, conflicting the demanded constraint Z(1)=1. To patch this, we additional use the linear term (alpha - x - beta*w(x)) for muting the constraint,

 (alpha - x - beta*w(x)) 
    * [Z(gx) * (alpha - x - beta*w(x)) -  Z(x) * (alpha - sigma(x) - beta*w(x))] = 0,

for all x in H \{g^{-1}}, resulting in a degree 4 constraint (including selector).

Let me point out, that the solution described here is tailored for AIRs, and hopefully can be further optimized.
In the case of Plonk, where we can randomize "on chip" using noop gates, one can additionally reduce the degree for the end value, by enforcing Z(x), for x at the boundary of the zk area, to carry the final value of the product.

@ulrich-haboeck
Copy link
Collaborator

Corrected a mistaken comment on the common Merkle root for each round. Round 2 is fine as implemented (gathering the permutation and lookup argument polynomials), but Round 3 gives us the opportunity to put the masking polynomial R(X) and the quotient components under the same Merkle root. Not sure if this is implemented that way.

@LindaGuiga
Copy link
Contributor

I was not able to find out whether the lookup argument polynomials from the second round (for the "pole sums" over table and witness area) are randomized. If not, we need to do this by expanding the randomization of "regular" polynomial (and likewise) to auxiliary ones.

I'm sorry, I'm not sure I understand what you mean here. What are the "regular" and "auxiliary" polynomials here? Do you mean we should randomize the SLDC polynomials in some way?

@ulrich-haboeck
Copy link
Collaborator

Exactly. One could probably do a more fine-grained analysis, similar to the permutation argument polys, arguing statistical zero-knowledge, but I need to think about it.

@ulrich-haboeck
Copy link
Collaborator

After another round of contemplation, I see the following problem with the current statistically zero-knowledge approach: We take several base field samples, instead of a single one from the extension field. While this is a good approach for amplifying soundness, it actually is bad for statistical zero-knowledge. The verifier learns that not just for one, but for several (X,Y)-samples all of the linear terms (X - x - w_i(x)*Y ) are non zero. Formally, this is reflected in an increasing statistical distance to uniform distribution (over the space of all possible transcripts), blowing up from the simple fraction of bad points

<= total_num_witness_elements / |F|

to its n-fold (for n samples). In any case a distance that is too large for small fields such as Goldilocks.

That being said, I see only to options to remedy this issue:
Either, we

  • drop the several-base-field-samples approach, or we
  • implement perfect zero-knowledge.

The latter is actually quite costly for the Plonk permutation argument (a side effect that I missed in my above elaboration in the single-column setting), practically doubling the number of 2-nd round polynomials (in comparison to non-zk). Besides, in the world of hash-based proofs, perfect zk of the IOP is downgraded to statistical zk anyways.
For this reason I personally would opt for first option, but it is not me to decide @dlubarov @Nashtare @LindaGuiga @Al-Kindi-0.

@Al-Kindi-0
Copy link

After another round of contemplation, I see the following problem with the current statistically zero-knowledge approach: We take several base field samples, instead of a single one from the extension field. While this is a good approach for amplifying soundness, it actually is bad for statistical zero-knowledge. The verifier learns that not just for one, but for several (X,Y)-samples all of the linear terms (X - x - w_i(x)*Y ) are non zero. Formally, this is reflected in an increasing statistical distance to uniform distribution (over the space of all possible transcripts), blowing up from the simple fraction of bad points

<= total_num_witness_elements / |F|

to its n-fold (for n samples). In any case a distance that is too large for small fields such as Goldilocks.

Fully agree, the case of challenges from the base field is worse from the statistical distance point of view. This gets worse the larger the witness size gets.

That being said, I see only to options to remedy this issue: Either, we

* drop the several-base-field-samples approach, or we

* implement perfect zero-knowledge.

The latter is actually quite costly for the Plonk permutation argument (a side effect that I missed in my above elaboration in the single-column setting), practically doubling the number of 2-nd round polynomials (in comparison to non-zk). Besides, in the world of hash-based proofs, perfect zk of the IOP is downgraded to statistical zk anyways. For this reason I personally would opt for first option, but it is not me to decide @dlubarov @Nashtare @LindaGuiga @Al-Kindi-0.

Just to clarify, the doubling of the number of polynomials in the second round is due to the increase in the degree, right?

My proposal is to go with statistical zero-knowledge but give an explicit bound on the statistical distance.

@ulrich-haboeck
Copy link
Collaborator

@Al-Kindi-0 exactly, the doubling of second round polys is due to the increased degree of the constraints.

@Nashtare Nashtare removed the request for review from muursh November 6, 2024 15:23
@Nashtare Nashtare closed this Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
soundness Soundness related changes
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

5 participants