-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[STF] reduce access mode #2830
base: main
Are you sure you want to change the base?
[STF] reduce access mode #2830
Conversation
/ok to test |
@@ -423,7 +452,7 @@ public: | |||
Fun&& f = mv(::std::get<2>(*p)); | |||
const sub_shape_t& shape = ::std::get<3>(*p); | |||
|
|||
auto explode_coords = [&](size_t i, deps_t... data) { | |||
auto explode_coords = [&](size_t i, typename deps_ops_t::first_type... data) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need a comment to say what that type is (we can't use an alias because it's a pack, not a tuple)
…n access mode, and start to implement all the mechanisms for reductions in parallel_for
… to cuda::std::tuple
public: | ||
// no-op operator | ||
template <typename T> | ||
static __host__ __device__ void apply_op(T&, const T&) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should not need that
/ok to test |
/ok to test |
/ok to test |
/ok to test |
// arguments, or an owning local variable for reduction variables. | ||
// extern __shared__ redux_buffer_tup_wrapper<tuple_args, tuple_ops> per_block_redux_buffer[]; | ||
extern __shared__ char dyn_buffer[]; | ||
auto* per_block_redux_buffer = (redux_buffer_tup_wrapper<tuple_args, tuple_ops>*) ((void*) dyn_buffer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this weirdness is due to the fact that external symbols won't work with different type for the same symbol
// Write the block's result to the output array | ||
if (tid == 0) | ||
{ | ||
tuple_set_op<tuple_ops>(redux_buffer[blockIdx.x], per_block_redux_buffer[0].get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
specialize if only one block...
…ting value, or initialize a new one
/ok to test |
1 similar comment
/ok to test |
/ok to test |
/ok to test |
/ok to test |
/ok to test |
/ok to test |
/ok to test |
/ok to test |
🟨 CI finished in 32m 02s: Pass: 88%/54 | Total: 10h 40m | Avg: 11m 51s | Max: 16m 04s | Hits: 90%/123
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
CUB | |
Thrust | |
+/- | CUDA Experimental |
python | |
CCCL C Parallel Library | |
Catch2Helper |
🏃 Runner counts (total jobs: 54)
# | Runner |
---|---|
43 | linux-amd64-cpu16 |
5 | linux-amd64-gpu-v100-latest-1 |
4 | linux-arm64-cpu16 |
2 | windows-amd64-cpu16 |
Description
closes
This PR intends to introduce a reduction access mode to make it much easier to write parallel_for kernels which also perform some reductions to a logical data.
Checklist