Feat/stream contrib #8

porcuquine · 2020-06-23T17:46:51Z

No description provided.

DrPeterVanNostrand

Thank you! Everything looks good to me logic-wise. I made a few comments regarding implementation details

DrPeterVanNostrand · 2020-06-24T15:15:26Z

src/lib.rs

@@ -799,23 +799,18 @@ impl MPCParameters {
                        for (base, projective) in bases.iter_mut().zip(projective.iter_mut()) {
                            *projective = wnaf.base(base.into_projective(), 1).scalar(coeff);
                        }
+


Do you know if this sped things up? I believe the reason why we wait for multiplication to end before converting from projective into affine is that we don't want point conversion stealing cpu from the multiplications.

I may have observed a slight improvement, or it might have been noise. It didn't get worse.

I would be surprised if the other way were better. The first part was assigning the work to specific thread, one per available cpu. So they should all be running in parallel. Then the work is being chunked back up into (presumably) the same number of threads and continued. It seems to me that this incurs at least:

the penalty of waiting for all threads to complete before beginning any normalization/conversion.

the rayon overhead

possibly moving data between cpus, if work isn't reassigned to the same cpu it started on (despite the same chunking)

As far as 'stealing cpu' goes, the normalization and conversion are single-threaded, so they shouldn't be stealing CPU from other threads — just using the already-assigned idle CPU.

Doing all the work for each chunk on the originally allocated thread at least avoids all the potential costs above. As an aside, if we weren't using the raw format to speed up i/o, we could probably improve performance by moving the affine<->uncompressed conversions into this parallel section. And we could likely get even more speedup than raw by just reading/writing the projective points and skipping the conversions (at the cost of larger small params files). The latter might be worth exploring for future ceremonies.

DrPeterVanNostrand · 2020-06-24T15:54:52Z

src/small.rs

+
+        let h_len = reader.read_u32::<BigEndian>()? as usize;
+        let h_offset = g1_size + g2_size + 4;
+        reader.seek(SeekFrom::Current((h_len * g1_size) as i64))?;


Seeking in buffered i/o always drops the buffer. So you would be doing 1MB reads (or whatever your buf size is) per 4-byte h_len/l_len read. I believe the method that your looking for would be along the lines of BufReader::seek_relative(), but that's on nightly.

I used buffered reads for the vectors larger than 1MB here and non-buffered reads everywhere else. Which is why read_small_params_from_large_file() takes a path argument not a Read type

Makes sense. I don't think the absolute cost is high, so I'll leave as-is for now given this code is tested. We can refactor to use paths and stash h_offset in the streamer later.

I went ahead and implemented this in #10, which will replace this PR.

DrPeterVanNostrand · 2020-06-24T18:10:30Z

src/small.rs

@@ -149,7 +394,12 @@ impl MPCSmall {
    }
 }

-fn keypair<R: Rng>(rng: &mut R, prev: &MPCSmall) -> (PublicKey, PrivateKey) {
+fn keypair<R: Rng>(


I guess just make these changes to keypair() in lib.rs and make it pub (crate). Then we can delete this function

DrPeterVanNostrand · 2020-06-24T18:17:35Z

src/small.rs

+    }
+
+    // Create a new `Streamer` from large params file, with reader in position to begin reading the h vector.
+    pub fn new_from_large_file(mut reader: R) -> io::Result<Streamer<R>> {


(repeat comment) Passing in filenames instead of Reads prevents unnecessary buffered i/o seeking. Passing in the path as an argument allows you to force the use of buffered i/o AND to get efficient seeks.

DrPeterVanNostrand · 2020-06-24T18:28:46Z

src/small.rs

+                }
+                chunks_read += this_chunk_size;
+
+                info!("phase2::MPCParameters::contribute() batch_exp of h");


We rely heavily on log files during the phase2 ceremony. While-loop logging for 17 hours (much less once your deser. pr is merged) isn't amenable to our current ceremony workflow. I'd rather this log be commented out. Improving the logging around all of the new changes is something that I will be cleaning up later anyway

Thanks, this is one of the things I wanted your input on. The log was useful in development, but I can remove now.

DrPeterVanNostrand · 2020-06-24T18:30:13Z

src/small.rs

+                }
+                chunks_read += this_chunk_size;
+
+                info!("phase2::MPCParameters::contribute() batch_exp of l");


(repeat comment) The while-loop logging will have a negative effect on how we run the ceremony. For now, the logging at the start of the loop is sufficient.

DrPeterVanNostrand · 2020-06-26T13:31:30Z

src/small.rs

@@ -219,23 +469,19 @@ fn batch_exp(bases: &mut [G1Affine], coeff: Fr) {
                for (base, products) in bases.iter_mut().zip(products.iter_mut()) {
                    *products = wnaf.base(base.into_projective(), 1).scalar(coeff);
                }
+                // Normalize the projective products.


(repeated comment) I believe the intention of this code is to isolate tasks based upon their CPU requirement (i.e. use all cores for scalar multiplication, then do normalization on all cores, then do the conversion). I think that we should maintain the separation between w-NAF and projective->affine conversion.

Calling .into_affine() after you've normalized the projective points is essentially a free operation, you are just moving the x and y coordinates into a new structure I believe. So maybe those two operations can be a single rayon operation.

DrPeterVanNostrand · 2020-06-26T13:41:37Z

Could you please rebase your commits prior to or during merging?

porcuquine · 2020-06-26T21:44:00Z

I am closing this in favor of #10.

porcuquine added 3 commits June 21, 2020 00:16

Support streaming contribution.

105d912

Simplify parallelism.

95f5add

Fix bugs.

b30d79c

porcuquine requested a review from DrPeterVanNostrand June 23, 2020 18:13

Don't allocate vector for every chunk.

b9b1214

porcuquine force-pushed the feat/stream-contrib branch from d4c5c83 to b9b1214 Compare June 24, 2020 03:38

DrPeterVanNostrand suggested changes Jun 26, 2020

View reviewed changes

porcuquine mentioned this pull request Jun 26, 2020

Support streaming and raw i/o in contribution #10

Merged

porcuquine closed this Jun 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/stream contrib #8

Feat/stream contrib #8

porcuquine commented Jun 23, 2020

DrPeterVanNostrand left a comment

DrPeterVanNostrand Jun 24, 2020

porcuquine Jun 26, 2020

DrPeterVanNostrand Jun 24, 2020

porcuquine Jun 26, 2020

porcuquine Jun 26, 2020

DrPeterVanNostrand Jun 24, 2020

DrPeterVanNostrand Jun 24, 2020 •

edited

Loading

DrPeterVanNostrand Jun 24, 2020

porcuquine Jun 26, 2020

porcuquine Jun 26, 2020

DrPeterVanNostrand Jun 24, 2020 •

edited

Loading

DrPeterVanNostrand Jun 26, 2020

DrPeterVanNostrand commented Jun 26, 2020

porcuquine commented Jun 26, 2020

Feat/stream contrib #8

Feat/stream contrib #8

Conversation

porcuquine commented Jun 23, 2020

DrPeterVanNostrand left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DrPeterVanNostrand Jun 24, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DrPeterVanNostrand Jun 24, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DrPeterVanNostrand commented Jun 26, 2020

porcuquine commented Jun 26, 2020

DrPeterVanNostrand Jun 24, 2020 •

edited

Loading

DrPeterVanNostrand Jun 24, 2020 •

edited

Loading