-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Add PARTED kernels #382
base: develop
Are you sure you want to change the base?
Commits on Jan 19, 2024
-
This does the same thing as TRIAD but breaks it into multiple for loops over the data instead of a single for loop over the data.
Configuration menu - View commit details
-
Copy full SHA for 027d6f0 - Browse repository at this point
Copy the full SHA 027d6f0View commit details -
Configuration menu - View commit details
-
Copy full SHA for 51057e3 - Browse repository at this point
Copy the full SHA 51057e3View commit details -
Use direct dispatch in RAJA TRIAD_PARTED_FUSED
Leave in comments of other dispatch options.
Configuration menu - View commit details
-
Copy full SHA for 08e5c9f - Browse repository at this point
Copy the full SHA 08e5c9fView commit details -
This makes each partition a multiple of the size of the prevoius partition
Configuration menu - View commit details
-
Copy full SHA for f1dc134 - Browse repository at this point
Copy the full SHA f1dc134View commit details -
Add reuse tuning of TRIAD_PARTED_FUSED
This tuning provides a best case scenario where the overhead of capturing the state and synchronizing per rep is removed.
Configuration menu - View commit details
-
Copy full SHA for 84e1e6c - Browse repository at this point
Copy the full SHA 84e1e6cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 36fb292 - Browse repository at this point
Copy the full SHA 36fb292View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8c06b66 - Browse repository at this point
Copy the full SHA 8c06b66View commit details -
Add len to triad_holder and add gpu tuning
The new gpu tuning is a AOS version using triad_holder. This is now in addition to the SOA tuning.
Configuration menu - View commit details
-
Copy full SHA for 7c94b83 - Browse repository at this point
Copy the full SHA 7c94b83View commit details -
Add a smart memory pool tuning
This copies the basic mempool from RAJA and adds a capability to synchronize as necessary to avoid host device race conditions when memory is needed on the host and but all the memory has been used on the device.
Configuration menu - View commit details
-
Copy full SHA for 1bbd169 - Browse repository at this point
Copy the full SHA 1bbd169View commit details -
Configuration menu - View commit details
-
Copy full SHA for b1d4e24 - Browse repository at this point
Copy the full SHA b1d4e24View commit details -
Add option to shuffle_partition_sizes
Default is on so the sizes of partitions are not always in non-decreasing order.
Configuration menu - View commit details
-
Copy full SHA for 00fb6cf - Browse repository at this point
Copy the full SHA 00fb6cfView commit details -
Configuration menu - View commit details
-
Copy full SHA for cf08b3e - Browse repository at this point
Copy the full SHA cf08b3eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 72fd10c - Browse repository at this point
Copy the full SHA 72fd10cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 94bd68f - Browse repository at this point
Copy the full SHA 94bd68fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 37e4475 - Browse repository at this point
Copy the full SHA 37e4475View commit details -
This uses a scan and binary search to schedule work to blocks instead of a 2d grid. Thus it avoids blocks with no work.
Configuration menu - View commit details
-
Copy full SHA for c2de49f - Browse repository at this point
Copy the full SHA c2de49fView commit details -
Add block wide search impl to triad_parted_fused_scan_aos
This is faster for cuda but slower for hip.
Configuration menu - View commit details
-
Copy full SHA for a072b3f - Browse repository at this point
Copy the full SHA a072b3fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 876a16b - Browse repository at this point
Copy the full SHA 876a16bView commit details -
Use device memory for hip triad parted fused
This has a minimal effect
Configuration menu - View commit details
-
Copy full SHA for 431c80e - Browse repository at this point
Copy the full SHA 431c80eView commit details -
Use cuda managed device preferred host accessed
with triad parted fused This has a large effect and makes a block size of 256 as good or better than 1024
Configuration menu - View commit details
-
Copy full SHA for 9141573 - Browse repository at this point
Copy the full SHA 9141573View commit details -
Configuration menu - View commit details
-
Copy full SHA for bbe8272 - Browse repository at this point
Copy the full SHA bbe8272View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8f884f5 - Browse repository at this point
Copy the full SHA 8f884f5View commit details -
Configuration menu - View commit details
-
Copy full SHA for 0e567b1 - Browse repository at this point
Copy the full SHA 0e567b1View commit details -
add TRIAD_PARTED stream (non-omp) tuning
reorder TRIAD_PARTED gpu tuning declarations
Configuration menu - View commit details
-
Copy full SHA for ddf9c9d - Browse repository at this point
Copy the full SHA ddf9c9dView commit details -
Configuration menu - View commit details
-
Copy full SHA for 5926c63 - Browse repository at this point
Copy the full SHA 5926c63View commit details -
Add gpu event tunings of TRIAD_PARTED
These tuning use events to "fork-join" the streams as would be required in more realistic code. Though it would not always have to be done as frequently.
Configuration menu - View commit details
-
Copy full SHA for 54d8094 - Browse repository at this point
Copy the full SHA 54d8094View commit details
Commits on Jan 22, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 5ba0d3b - Browse repository at this point
Copy the full SHA 5ba0d3bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 9c696d9 - Browse repository at this point
Copy the full SHA 9c696d9View commit details
Commits on Jan 23, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 489f23f - Browse repository at this point
Copy the full SHA 489f23fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 18da3c7 - Browse repository at this point
Copy the full SHA 18da3c7View commit details -
Configuration menu - View commit details
-
Copy full SHA for c75aa00 - Browse repository at this point
Copy the full SHA c75aa00View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5337794 - Browse repository at this point
Copy the full SHA 5337794View commit details -
Configuration menu - View commit details
-
Copy full SHA for b82f291 - Browse repository at this point
Copy the full SHA b82f291View commit details -
Configuration menu - View commit details
-
Copy full SHA for 1f82e28 - Browse repository at this point
Copy the full SHA 1f82e28View commit details -
Configuration menu - View commit details
-
Copy full SHA for 457b829 - Browse repository at this point
Copy the full SHA 457b829View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2a469a3 - Browse repository at this point
Copy the full SHA 2a469a3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4f143c3 - Browse repository at this point
Copy the full SHA 4f143c3View commit details -
Configuration menu - View commit details
-
Copy full SHA for 455baee - Browse repository at this point
Copy the full SHA 455baeeView commit details -
Configuration menu - View commit details
-
Copy full SHA for f1d0120 - Browse repository at this point
Copy the full SHA f1d0120View commit details