Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: allow tree_r_last to be built on gpu #1138

Merged
merged 7 commits into from
Jun 5, 2020
Merged

feat: allow tree_r_last to be built on gpu #1138

merged 7 commits into from
Jun 5, 2020

Conversation

cryptonemo
Copy link
Collaborator

feat: attempt to improve gpu tree_c layer retrieval

@cryptonemo
Copy link
Collaborator Author

Work in progress. Still uses CPU compat mode for tree builders. Todo: verify poseidon standard/strengthened on tests. Param generation bump required?

@porcuquine
Copy link
Collaborator

Yes, this will require bumping the parameter version.

@cryptonemo
Copy link
Collaborator Author

@porcuquine Can you confirm if poseidon constraints (tests) should have to be updated with this change (updated neptune, etc)?

@porcuquine
Copy link
Collaborator

Yes, and they should go down.

@cryptonemo cryptonemo force-pushed the gpu-tree-r branch 6 times, most recently from aa19d48 to 758e7d2 Compare June 3, 2020 18:19
@cryptonemo
Copy link
Collaborator Author

Note: This comes out of draft status when neptune v1 is released.

@cryptonemo cryptonemo force-pushed the gpu-tree-r branch 2 times, most recently from 986ca9f to 5900d31 Compare June 4, 2020 12:31
@cryptonemo cryptonemo marked this pull request as ready for review June 4, 2020 14:05
let mut layer_data = Vec::with_capacity(layers);
for _ in 0..layers {
layer_data.push(Vec::new());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be written as

let mut layer_data = vec![vec![]; layers];

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the inner Vec should probably be allocated with with_capacity if you can determine their expected length

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inner vec is entirely replaced

layer_data[k - 1] = fr_elements;
}
});
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be

for layer_elements in layer_data.iter_mut() {
    // ... 
    layer_elements.extend(elements.into_iter().map(Into::into));
}

which should avoid some allocations

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a look at this, thanks

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you're saying now (along with the above comment). Thanks!

)?;
for fr in fr_elements {
buf.extend(fr_into_bytes(&fr));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let buf: Vec<u8> = fr_elements.iter().flat_map(fr_into_bytes).collect();

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I want to avoid this. This looks eerily like the construction that was originally in place that bottlenecked hard.

dignifiedquire
dignifiedquire previously approved these changes Jun 4, 2020
Copy link
Contributor

@dignifiedquire dignifiedquire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some small code improvements, but overall looks good to me


let flat_tree_data: Vec<_> = tree_data
for fr in fr_elements {
buf.extend(fr_into_bytes(&fr));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will still allocate a buffer into which to write the conversion. I think you should add a new version of this function which writes directly into buf.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this instance (of tree data), it's taking about a half second on a 32GiB run. The base data flattening is now taking about 2-3 seconds.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The buf being extended has been pre-allocated above that. I'm not sure if you're saying to avoid that allocation, or the one that extend would do had it not been pre-allocated.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was talking about the allocation inside fr_into_bytes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Got it, thanks. Given the overall speed improvements, we can wait on that a bit since this is no longer bottlenecking.

feat: attempt to improve gpu tree_c layer retrieval
fix: bump parameter version
fix: properly delete tree c in the split config case
fix: update tests to match new (lower) constraints
feat: update debug logging
@cryptonemo cryptonemo merged commit 573d24f into master Jun 5, 2020
@cryptonemo cryptonemo deleted the gpu-tree-r branch June 8, 2020 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cryptocomputelab CryptoComputeLab work
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants