Skip to content
This repository has been archived by the owner on Nov 6, 2020. It is now read-only.

Feat/stream contrib #8

Closed
wants to merge 4 commits into from
Closed

Feat/stream contrib #8

wants to merge 4 commits into from

Conversation

porcuquine
Copy link

No description provided.

@porcuquine porcuquine force-pushed the feat/stream-contrib branch from d4c5c83 to b9b1214 Compare June 24, 2020 03:38
Copy link

@DrPeterVanNostrand DrPeterVanNostrand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Everything looks good to me logic-wise. I made a few comments regarding implementation details

@@ -799,23 +799,18 @@ impl MPCParameters {
for (base, projective) in bases.iter_mut().zip(projective.iter_mut()) {
*projective = wnaf.base(base.into_projective(), 1).scalar(coeff);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if this sped things up? I believe the reason why we wait for multiplication to end before converting from projective into affine is that we don't want point conversion stealing cpu from the multiplications.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have observed a slight improvement, or it might have been noise. It didn't get worse.

I would be surprised if the other way were better. The first part was assigning the work to specific thread, one per available cpu. So they should all be running in parallel. Then the work is being chunked back up into (presumably) the same number of threads and continued. It seems to me that this incurs at least:

  • the penalty of waiting for all threads to complete before beginning any normalization/conversion.
  • the rayon overhead
  • possibly moving data between cpus, if work isn't reassigned to the same cpu it started on (despite the same chunking)

As far as 'stealing cpu' goes, the normalization and conversion are single-threaded, so they shouldn't be stealing CPU from other threads — just using the already-assigned idle CPU.

Doing all the work for each chunk on the originally allocated thread at least avoids all the potential costs above. As an aside, if we weren't using the raw format to speed up i/o, we could probably improve performance by moving the affine<->uncompressed conversions into this parallel section. And we could likely get even more speedup than raw by just reading/writing the projective points and skipping the conversions (at the cost of larger small params files). The latter might be worth exploring for future ceremonies.


let h_len = reader.read_u32::<BigEndian>()? as usize;
let h_offset = g1_size + g2_size + 4;
reader.seek(SeekFrom::Current((h_len * g1_size) as i64))?;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeking in buffered i/o always drops the buffer. So you would be doing 1MB reads (or whatever your buf size is) per 4-byte h_len/l_len read. I believe the method that your looking for would be along the lines of BufReader::seek_relative(), but that's on nightly.

I used buffered reads for the vectors larger than 1MB here and non-buffered reads everywhere else. Which is why read_small_params_from_large_file() takes a path argument not a Read type

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I don't think the absolute cost is high, so I'll leave as-is for now given this code is tested. We can refactor to use paths and stash h_offset in the streamer later.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and implemented this in #10, which will replace this PR.

@@ -149,7 +394,12 @@ impl MPCSmall {
}
}

fn keypair<R: Rng>(rng: &mut R, prev: &MPCSmall) -> (PublicKey, PrivateKey) {
fn keypair<R: Rng>(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess just make these changes to keypair() in lib.rs and make it pub (crate). Then we can delete this function

}

// Create a new `Streamer` from large params file, with reader in position to begin reading the h vector.
pub fn new_from_large_file(mut reader: R) -> io::Result<Streamer<R>> {
Copy link

@DrPeterVanNostrand DrPeterVanNostrand Jun 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(repeat comment) Passing in filenames instead of Reads prevents unnecessary buffered i/o seeking. Passing in the path as an argument allows you to force the use of buffered i/o AND to get efficient seeks.

}
chunks_read += this_chunk_size;

info!("phase2::MPCParameters::contribute() batch_exp of h");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We rely heavily on log files during the phase2 ceremony. While-loop logging for 17 hours (much less once your deser. pr is merged) isn't amenable to our current ceremony workflow. I'd rather this log be commented out. Improving the logging around all of the new changes is something that I will be cleaning up later anyway

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is one of the things I wanted your input on. The log was useful in development, but I can remove now.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

}
chunks_read += this_chunk_size;

info!("phase2::MPCParameters::contribute() batch_exp of l");
Copy link

@DrPeterVanNostrand DrPeterVanNostrand Jun 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(repeat comment) The while-loop logging will have a negative effect on how we run the ceremony. For now, the logging at the start of the loop is sufficient.

@@ -219,23 +469,19 @@ fn batch_exp(bases: &mut [G1Affine], coeff: Fr) {
for (base, products) in bases.iter_mut().zip(products.iter_mut()) {
*products = wnaf.base(base.into_projective(), 1).scalar(coeff);
}
// Normalize the projective products.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(repeated comment) I believe the intention of this code is to isolate tasks based upon their CPU requirement (i.e. use all cores for scalar multiplication, then do normalization on all cores, then do the conversion). I think that we should maintain the separation between w-NAF and projective->affine conversion.

Calling .into_affine() after you've normalized the projective points is essentially a free operation, you are just moving the x and y coordinates into a new structure I believe. So maybe those two operations can be a single rayon operation.

@DrPeterVanNostrand
Copy link

Could you please rebase your commits prior to or during merging?

@porcuquine
Copy link
Author

I am closing this in favor of #10.

@porcuquine porcuquine closed this Jun 26, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants