Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rcb: prerequisites for GPU acceleration #168

Merged
merged 1 commit into from
Jun 23, 2022
Merged

Conversation

hhirtz
Copy link
Member

@hhirtz hhirtz commented Jun 13, 2022

This PR changes the "array of structs" data structure in RCB to a "struct of arrays" (one for the weights, one for part IDs, and one for each dimension) to provide efficient data transfers to the GPU (and also allow for SIMD processing)

TODO

  • update commented-out unit tests

@hhirtz hhirtz force-pushed the rcb-gpu-preparations branch 2 times, most recently from 35e7b0c to ef3363a Compare June 20, 2022 07:34
@hhirtz
Copy link
Member Author

hhirtz commented Jun 20, 2022

2D triangular mesh, 20 million triangles

RCB, 1 iteration (2 parts), sequential times:

  • master: 1.46 s
  • this pr (gpuprep): 1.28 s

RCB, 12 iterations (4096 parts), sequential times:

  • master: 8.24 s
  • this pr (gpuprep): 6.23 s

base strong

@codecov
Copy link

codecov bot commented Jun 20, 2022

Codecov Report

Merging #168 (e07e2f9) into master (00148a8) will increase coverage by 1.03%.
The diff coverage is 92.57%.

❗ Current head e07e2f9 differs from pull request most recent head 69418ee. Consider uploading reports for the commit 69418ee to get more accurate results

@@            Coverage Diff             @@
##           master     #168      +/-   ##
==========================================
+ Coverage   50.00%   51.03%   +1.03%     
==========================================
  Files          42       41       -1     
  Lines        7110     7093      -17     
==========================================
+ Hits         3555     3620      +65     
+ Misses       3555     3473      -82     
Impacted Files Coverage Δ
ffi/src/data.rs 0.00% <0.00%> (ø)
src/algorithms/recursive_bisection.rs 91.48% <93.96%> (-1.01%) ⬇️
tools/src/bin/mesh-dup.rs
src/algorithms/vn/first.rs 72.90% <0.00%> (+1.19%) ⬆️
tools/mesh-io/src/medit/mod.rs 8.12% <0.00%> (+2.24%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 00148a8...69418ee. Read the comment docs.

@hhirtz
Copy link
Member Author

hhirtz commented Jun 23, 2022

Thanks to #170 and #175, weak scaling measurements can be performed.

3D triangular mesh, starting from 2.18 millions of triangles for 1 thread, to 559 millions for 256 threads.

RCB, 1 iteration (2 parts), sequential times:

  • master: 171 ms
  • this pr (gpuprep): 96.8 ms

RCB, 12 iterations (4096 parts), sequential times:

  • master: 4.46 s
  • this pr (gpuprep): 393 ms

base weak

Speedup = N * walltime(1 thread) / walltime(N threads)

@hhirtz hhirtz marked this pull request as ready for review June 23, 2022 12:38
Unpack the array of Item<D> into:

- an array for the weights,
- an array for the part references,
- D arrays, one for each coordinates

Also, change the point type from f64 to f32.

All this is to allow efficient data reads and copies, so that SIMD and
GPU acceleration can be used.  This change alone increases performance,
maybe because of the lower amount of data that needs to be read.

This commit also replaces the use of select_nth_unstable with a variant
of the stdlib "partition" algorithm:

<https://github.com/rust-lang/rust/blob/1d6010816c37186e2bee316709f0c0197c427513/library/core/src/slice/sort.rs#L544>

(basically the same thing as select_nth_unstable, but the pivot is given
as input so there's less computation to be done)
@hhirtz hhirtz merged commit 1dbf986 into master Jun 23, 2022
@hhirtz hhirtz deleted the rcb-gpu-preparations branch June 23, 2022 12:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant