Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stas/best-practice-ntt #517

Merged
merged 6 commits into from
May 16, 2024
Merged

Stas/best-practice-ntt #517

merged 6 commits into from
May 16, 2024

Conversation

svpolonsky
Copy link
Contributor

Describe the changes

Icicle examples: Concurrent Data Transfer and NTT Computation

This PR introduces a Best Practice series of examples in c++. Specifically, the example shows how to concurrently transfer data to/from device and execute NTT

Linked Issues

Resolves #

@svpolonsky svpolonsky requested a review from yuvalingo May 15, 2024 20:13
Copy link
Contributor

@ChickenLover ChickenLover left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very good example @svpolonsky

examples/c++/best-practice-ntt/CMakeLists.txt Outdated Show resolved Hide resolved
mkdir -p build/icicle

# Configure and build Icicle
cmake -S ../../../icicle/ -B build/icicle -DCMAKE_BUILD_TYPE=Release -DCURVE=bn254 -DG2=OFF
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can also add -MSM=OFF to exclude MSM compilation, which boost the build time significantly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

examples/c++/best-practice-ntt/example.cu Show resolved Hide resolved
cudaMemcpyAsync(&h_out[vec_transfer][i*block_size], &d_vec[vec_transfer][i*block_size], sizeof(E)*block_size, cudaMemcpyDeviceToHost, stream_d2h);
}
if (i>0) {
cudaMemcpyAsync(&d_vec[vec_transfer][(i-1)*block_size], &h_inp[vec_transfer][(i-1)*block_size], sizeof(E)*block_size, cudaMemcpyHostToDevice, stream_h2d);
Copy link
Collaborator

@yshekel yshekel May 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(1) how do you guarantee that copy from d_vec ->h_out is fully executed before h_inp->d_vec when they are both async and on different streams? Seems like a race.
(2) I think it would be better to compare it to a naive implementation both in terms of performance and correctness.
(3) since it's an example, I think additional comments you be helpful

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding (1): I sync the streams on line 118, 119

cudaStreamSynchronize(stream_d2h); 
cudaStreamSynchronize(stream_h2d);

Regarding (2): I referenced existing NTT example as baseline in section "Running the example":

To compare with ICICLE baseline (i.e. non-concurrent ) NTT, you can run [this example](../ntt/README.md)

Regarding (3): I added comments to explain why we won't have races

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sync is blocking until the all tasks in the queue are completed but it doesn't prevent races between streams

Copy link

@yuvalingo yuvalingo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@LeonHibnik LeonHibnik merged commit 02059fc into main May 16, 2024
26 checks passed
@LeonHibnik LeonHibnik deleted the stas/best-practice-NTT branch May 16, 2024 20:51
yshekel pushed a commit that referenced this pull request May 19, 2024
## Describe the changes

Icicle examples:  Concurrent Data Transfer and NTT Computation

This PR introduces a Best Practice series of examples in c++.
Specifically, the example shows how to concurrently transfer data
to/from device and execute NTT

## Linked Issues

Resolves #
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants