Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LPN and GGM related #54

Merged
merged 14 commits into from
Oct 13, 2023

Conversation

xiangxiecrypto
Copy link
Collaborator

@xiangxiecrypto xiangxiecrypto commented Aug 31, 2023

Please do not merge.

This PR mainly adds two functionalities.

  1. Computation of LPN (Learning Parity with Noise) function and its optimization with multi threads.
  2. GGM reconstruction.

@xiangxiecrypto
Copy link
Collaborator Author

xiangxiecrypto commented Sep 1, 2023

Tried different ways to optimize LPN computation with multi threads.

The problem: Given two F_{2^128} vectors x, e and a fixed sparse binary matrix A, compute y = Ax + e. Each row of A has at most D non-zero elements.

Since the dimensions of A are quite large, A is determined by a seed seed and is generated on-the-fly.

Optimizations:

  1. Single batch computation: compute a batch of rows in one operation.
    a. batch_size = 4.
    b. batch_size = 8.

  2. Multi threads: use Rayon.
    a. use par_chunks_exact_mut to split the rows of A into chunks with length 4 or 8 (depends on 1 above).
    b. use customized threads, and split the rows of A into #rows/#threads chunks.

My experiments show that 1.a (1.b) + 2.a outperform others.

@sinui0
Copy link
Collaborator

sinui0 commented Sep 8, 2023

Is this about ready for review?

@sinui0 sinui0 self-requested a review September 8, 2023 02:56
@xiangxiecrypto
Copy link
Collaborator Author

xiangxiecrypto commented Sep 8, 2023

Is this about ready for review?

Not yet, I am optimizing ggm, probably will finish in these days.

@xiangxiecrypto
Copy link
Collaborator Author

xiangxiecrypto commented Sep 9, 2023

The computation of generation (and reconstruction) of ggm tree in each level is highly parallel. Try to optimize this part in two ways.

  1. Use multi threads to compute each level. This turns out to be a bad idea. The cost of thread switching is high.
  2. Focus on the last level. This is because the overhead to compute the last level is about 50% of the overall overhead (it is a binary tree). However, although I use multi threads to compute the last level, the improvement is tiny (sometimes even worse). The possible reason is that the underlying tkprp is already very fast, it is optimized and paralleled by the complier.

So, I will keep the original implementation, and add comments on public functions.

@xiangxiecrypto xiangxiecrypto changed the title [WIP] Add LPN and GGM related Add LPN and GGM related Sep 9, 2023
@xiangxiecrypto
Copy link
Collaborator Author

It is ready for review. :)

Copy link
Collaborator

@sinui0 sinui0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you!

I mostly provided some feedback around formatting conventions and idiomatic Rust things. Looks good otherwise

You may want to rebase on #59, see the relevant comment below.

Comment on lines 14 to 21
// The seed to generate the random sparse matrix A.
seed: Block,

// The length of the secret, i.e., x.
k: u32,

// A mask to optimize reduction operation.
mask: u32,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use doc comments for these and they will be shown with cargo doc --document-private-items

Suggested change
// The seed to generate the random sparse matrix A.
seed: Block,
// The length of the secret, i.e., x.
k: u32,
// A mask to optimize reduction operation.
mask: u32,
/// The seed to generate the random sparse matrix A.
seed: Block,
/// The length of the secret, i.e., x.
k: u32,
/// A mask to optimize reduction operation.
mask: u32,

Comment on lines 6 to 12
/// A struct related to LPN.
/// The `seed` defines a sparse binary matrix `A` with at most `D` non-zero values in each row.
/// `A` - is a binary matrix with `k` columns and `n` rows. The concrete number of `n` is determined by the input length. `A` will be generated on-the-fly.
/// `x` - is a `F_{2^128}` vector with length `k`.
/// `e` - is a `F_{2^128}` vector with length `n`.
/// Given a vector `x` and `e`, compute `y = Ax + e`.
/// Note that in the standard LPN problem, `x` is a binary vector, `e` is a sparse binary vector. The way we difined here is a more generic way in term of computing `y`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rustdocs need an extra newline

Suggested change
/// A struct related to LPN.
/// The `seed` defines a sparse binary matrix `A` with at most `D` non-zero values in each row.
/// `A` - is a binary matrix with `k` columns and `n` rows. The concrete number of `n` is determined by the input length. `A` will be generated on-the-fly.
/// `x` - is a `F_{2^128}` vector with length `k`.
/// `e` - is a `F_{2^128}` vector with length `n`.
/// Given a vector `x` and `e`, compute `y = Ax + e`.
/// Note that in the standard LPN problem, `x` is a binary vector, `e` is a sparse binary vector. The way we difined here is a more generic way in term of computing `y`.
/// A struct related to LPN.
///
/// The `seed` defines a sparse binary matrix `A` with at most `D` non-zero values in each row.
///
/// Given a vector `x` and `e`, compute `y = Ax + e`, where:
///
/// * `A` - is a binary matrix with `k` columns and `n` rows. The concrete number of `n` is determined by the input length. `A` will be generated on-the-fly.
/// * `x` - is a `F_{2^128}` vector with length `k`.
/// * `e` - is a `F_{2^128}` vector with length `n`.
///
/// Note that in the standard LPN problem, `x` is a binary vector, `e` is a sparse binary vector. The way we defined here is a more generic way in term of computing `y`.


use crate::{prp::Prp, Block};
use rayon::prelude::*;
/// A struct related to LPN.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a better description of this? An "LPN encoder" perhaps?

}

impl<const D: usize> Lpn<D> {
/// New an LPN instance
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// New an LPN instance
/// Creates a new LPN instance.
///
/// # Arguments
///
/// * `seed` - The seed to generate the random sparse matrix A.
/// * `k` - The length of the secret, i.e., `|x|`.

mpz-core/src/lpn.rs Show resolved Hide resolved
assert!(k1.len() == self.depth - 1);
assert!(tree.len() == 1 << (self.depth));
assert!(k0.len() == self.depth);
assert!(k1.len() == self.depth);
let mut buf = vec![Block::ZERO; 8];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can avoid this heap allocation

Suggested change
let mut buf = vec![Block::ZERO; 8];
let mut buf = [Block::ZERO; 8];

Comment on lines 21 to 26
/// Input : `seed` - a seed.
/// Output: `tree` - a GGM (binary tree) `tree`, with size `2^{depth}`.
/// Output: `k0` - XORs of all the left-node values in each level, with size `depth`.
/// Output: `k1`- XORs of all the right-node values in each level, with size `depth`.
/// This implementation is adapted from EMP Toolkit.
pub fn gen(&self, seed: Block, tree: &mut [Block], k0: &mut [Block], k1: &mut [Block]) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Input : `seed` - a seed.
/// Output: `tree` - a GGM (binary tree) `tree`, with size `2^{depth}`.
/// Output: `k0` - XORs of all the left-node values in each level, with size `depth`.
/// Output: `k1`- XORs of all the right-node values in each level, with size `depth`.
/// This implementation is adapted from EMP Toolkit.
pub fn gen(&self, seed: Block, tree: &mut [Block], k0: &mut [Block], k1: &mut [Block]) {
/// Generate a GGM tree in-place.
///
/// # Arguments
///
/// * `seed` - a seed.
/// * `tree` - the destination to write the GGM (binary tree) `tree`, with size `2^{depth}`.
/// * `k0` - XORs of all the left-node values in each level, with size `depth`.
/// * `k1`- XORs of all the right-node values in each level, with size `depth`.
pub fn gen(&self, seed: Block, tree: &mut [Block], k0: &mut [Block], k1: &mut [Block]) {
// This implementation is adapted from EMP Toolkit.

Comment on lines 60 to 63
/// Reconstruct the GGM tree except the value in a given position.
/// Input : `k` - a slice of blocks with length `depth`, the values of k are chosen via OT from k0 and k1. For the i-th value, if alpha[i] == 1, k[i] = k1[i]; else k[i] = k0[i].
/// Input : `alpha` - a slice of bits with length `depth`.
/// Output : `tree` - the ggm tree, except `tree[pos] == Block::ZERO`. The bit decomposition of `pos` is the complement of `alpha`. I.e., `pos[i] = 1 xor alpha[i]`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Reconstruct the GGM tree except the value in a given position.
/// Input : `k` - a slice of blocks with length `depth`, the values of k are chosen via OT from k0 and k1. For the i-th value, if alpha[i] == 1, k[i] = k1[i]; else k[i] = k0[i].
/// Input : `alpha` - a slice of bits with length `depth`.
/// Output : `tree` - the ggm tree, except `tree[pos] == Block::ZERO`. The bit decomposition of `pos` is the complement of `alpha`. I.e., `pos[i] = 1 xor alpha[i]`.
/// Reconstruct the GGM tree except the value in a given position.
///
/// This reconstructs the GGM tree entirely except `tree[pos] == Block::ZERO`. The bit decomposition of `pos` is the complement of `alpha`. i.e., `pos[i] = 1 xor alpha[i]`.
///
/// # Arguments
///
/// * `k` - a slice of blocks with length `depth`, the values of k are chosen via OT from k0 and k1. For the i-th value, if alpha[i] == 1, k[i] = k1[i]; else k[i] = k0[i].
/// * `alpha` - a slice of bits with length `depth`.
/// * `tree` - the destination to write the GGM tree.

return;
}

let mut buf = vec![Block::ZERO; 8];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let mut buf = vec![Block::ZERO; 8];
let mut buf = [Block::ZERO; 8];

pub struct Prp(AesEncryptor);

impl Prp {
/// New an instance of Prp.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// New an instance of Prp.
/// Creates a new Prp instance.

@themighty1 themighty1 self-requested a review September 12, 2023 08:11
///
/// `e` - is a `F_{2^128}` vector with length `n`.
///
/// Note that in the standard LPN problem, `x` is a binary vector, `e` is a sparse binary vector. The way we difined here is a more generic way in term of computing `y`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiangxiecrypto , would it be possible to add some links to literature which justifies mixing a binary matrix with non-binary vectors? I could not find it with a simple web search.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a common step in PCG-style COT. You can find the intuition in the Ferret paper (page 11) and the references in it.
Note that we can embed a bit into a Block to support binary matrix with binary vectors.

@xiangxiecrypto , would it be possible to add some links to literature which justifies mixing a binary matrix with non-binary vectors? I could not find it with a simple web search.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiangxiecrypto , thanks, I was worried because I read in
https://eprint.iacr.org/2017/617
"""
How to sample matrices with large dual distance? We suggest to sample a d-sparse matrix M ∈ Fm×k in two steps. First, choose the locations of the non-zero entries of the matrix (e.g., by selecting a random set of d entries per row), and then fill them with random field elements.
"""
since it says "random field elements" I was assuming that we can't just use "1" as your code does but we must use an element from F_2^128.

Maybe this does not apply to our case, idk. If it's not too laborious to explain, could you shed some light why using "1" instead of a random field element is ok?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiangxiecrypto , thanks, I was worried because I read in https://eprint.iacr.org/2017/617 """ How to sample matrices with large dual distance? We suggest to sample a d-sparse matrix M ∈ Fm×k in two steps. First, choose the locations of the non-zero entries of the matrix (e.g., by selecting a random set of d entries per row), and then fill them with random field elements. """ since it says "random field elements" I was assuming that we can't just use "1" as your code does but we must use an element from F_2^128.

Maybe this does not apply to our case, idk. If it's not too laborious to explain, could you shed some light why using "1" instead of a random field element is ok?

The security here relies on LPN over F_2, therefore you only need to choose bits.
The reason the function also applies to F_2^128 is that the protocol requires these operations (not related to security assumption). See page 11 in the Ferret paper.

use rayon::prelude::*;
/// An LPN encoder.
///
/// The `seed` defines a sparse binary matrix `A` with at most `D` non-zero values in each row.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiangxiecrypto , is at most correct here? Shouldn't it be exactly D non-zero values?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is at most. Recall that the process is choosing d positions at random in a row, and filling the d positions with random elements (in F_2). You do not need to fill all d positions with 1.

/// Panics if `x.len() !=k` or `y.len() != n`.
pub fn compute(&self, y: &mut [Block], x: &[Block]) {
assert_eq!(x.len() as u32, self.k);
assert!(x.len() >= D);
Copy link
Collaborator

@themighty1 themighty1 Sep 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiangxiecrypto , shouldn't we have here a specific algorithm which bounds the minimal amount of rows/columns depending on D?

As mentioned in https://eprint.iacr.org/2017/617
"Then, for 80-bit security and d = 10 it turns out that we will need approximately k = 182 columns and k 2 rows, while for 100-bit security we need k = 240."

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The security also depends on k and Hamming weight of the error vector. I will specify a new struct to contain these parameters (together with others) when I implement the COT protocol. We could simply focus on D = 10 here.
It is only for computation purpose here, maybe I should change the name to LpnEncoder

Copy link
Collaborator

@themighty1 themighty1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gw, thank you.
I have not fully reviewed reconstruct_layer. will do some time next week.

/// # Panics
///
/// Panics if `x.len() !=k` or `y.len() != n`.
pub fn compute(&self, y: &mut [Block], x: &[Block]) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are your thoughts about having one common method for computing rows regardless of row count? The code would look like this:

/// The size of one batch of matrix row computation.
///
/// Experiments show that 4 gives the best performance with rayon.
const BATCH_SIZE: usize = 4;

    pub fn compute(&self, y: &mut [Block], x: &[Block]) {
        assert_eq!(x.len() as u32, self.k);
        assert!(x.len() >= D);
        let prp = Prp::new(self.seed);

        // how many rows will be processed in batches
        let rows_in_batches = y.len() - (y.len() % BATCH_SIZE);

        cfg_if::cfg_if! {
            if #[cfg(feature = "rayon")]{
                let mut iter = y.par_chunks_exact_mut(BATCH_SIZE);
            }else{
                let mut iter = y.par_chunks_exact_mut(BATCH_SIZE);
            }
        }

        let remaining_rows = iter.take_remainder();

        iter.enumerate().for_each(|(i, y)| {
            self.compute_rows(y, x, i * BATCH_SIZE, &prp);
        });

        // process remaining rows, if any
        self.compute_rows(remaining_rows, x, rows_in_batches, &prp);
    }

   /// Computes as many rows as there are elements in `y`, placing the result in `y`.
    #[inline]
    fn compute_rows(&self, y: &mut [Block], x: &[Block], pos: usize, prp: &Prp) {
        // How many `Blocks` needed to derive random u32-sized indices for one row
        let block_cnt = (D + 4 - 1) / 4;

        let mut indices = (0..y.len())
            .flat_map(|offset| {
                (0..block_cnt).map(move |i| {
                    Block::from(bytemuck::cast::<_, [u8; 16]>([
                        (pos + offset) as u64,
                        i as u64,
                    ]))
                })
            })
            .collect::<Vec<Block>>();

        // derive pseudo-random u32-sized indices
        prp.permute_block_inplace(&mut indices);
        let indices = bytemuck::cast_slice_mut::<_, u32>(&mut indices);

        for (row, indices) in y.iter_mut().zip(indices.chunks_exact_mut(block_cnt * 4)) {
            // We only need D indices for one row, ignore any extra ones
            for ind in indices.iter_mut().take(D) {
                // reduce the index to be in the [0, k) range
                *ind &= self.mask;
                *ind = if *ind >= self.k { *ind - self.k } else { *ind };

                *row ^= x[*ind as usize];
            }
        }
    }

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, let me check

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the test failed.

Copy link
Collaborator

@themighty1 themighty1 Oct 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see.
Looking at this again:
earlier in compute_four_rows_indep there were exactly 10 PRG calls for 4 rows, i.e. 2.5 calls/row
This new code makes 3 calls/row.
This is 20% overhead which is too much.

Makes sense to continue with the compute_four_rows_indep approach.

/// # Arguments
///
/// * `seed` - a seed.
/// * `tree` - the destination of write the GGM (binary tree) `tree`, with size `2^{depth}`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// * `tree` - the destination of write the GGM (binary tree) `tree`, with size `2^{depth}`.
/// * `tree` - the destination to write the GGM (binary tree) `tree`, with size `2^{depth}`.

@@ -37,7 +41,7 @@ impl GgmTree {
k1[1] = buf[1] ^ buf[3];
tree[0..4].copy_from_slice(&buf[0..4]);

for h in 2..self.depth - 1 {
for h in 2..self.depth {
k0[h] = Block::ZERO;
k1[h] = Block::ZERO;
let sz = 1 << h;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let sz = 1 << h;
// How many nodes there are in this layer
let sz = 1 << h;

k: Block,
tree: &mut [Block],
) {
let sz = 1 << depth;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let sz = 1 << depth;
// How many nodes there are in this layer
let sz = 1 << depth;

/// * `alpha` - a slice of bits with length `depth`.
/// * `tree` - the destination to write the GGM tree.
pub fn reconstruct(&self, tree: &mut [Block], k: &[Block], alpha: &[bool]) {
let mut pos = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want some asserts on tree.len(), k.len(), alpha.len() similarly like like in gen() above ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we need that asserts.

Copy link
Collaborator

@sinui0 sinui0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there!

mpz-core/src/lpn.rs Show resolved Hide resolved
mpz-core/src/lpn.rs Outdated Show resolved Hide resolved
Comment on lines 103 to 119
/// Encrypt block slice.
pub fn encrypt_block_slice(&self, blks: &mut [Block]) {
let len = blks.len();
let mut buf = [Block::ZERO; AesEncryptor::AES_BLOCK_COUNT];
for i in 0..len / AesEncryptor::AES_BLOCK_COUNT {
buf.copy_from_slice(
&blks[i * AesEncryptor::AES_BLOCK_COUNT..(i + 1) * AesEncryptor::AES_BLOCK_COUNT],
);
blks[i * AesEncryptor::AES_BLOCK_COUNT..(i + 1) * AesEncryptor::AES_BLOCK_COUNT]
.copy_from_slice(&self.encrypt_many_blocks(buf));
}

let remain = len % AesEncryptor::AES_BLOCK_COUNT;
for block in blks[len - remain..].iter_mut() {
*block = self.encrypt_block(*block);
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xiangxiecrypto you should be able to rebase on dev and use the existing impl for this now

@xiangxiecrypto
Copy link
Collaborator Author

Miri CI tests fail on Clmul, would you please help to fix it? @sinui0 @themighty1

@sinui0
Copy link
Collaborator

sinui0 commented Oct 7, 2023

Miri CI tests fail on Clmul, would you please help to fix it? @sinui0 @themighty1

#74 should be merged soon

@sinui0
Copy link
Collaborator

sinui0 commented Oct 9, 2023

You can rebase now, CI should be fixed

Copy link
Collaborator

@sinui0 sinui0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@sinui0 sinui0 merged commit 4736ae0 into privacy-scaling-explorations:dev Oct 13, 2023
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants