Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC for histogram CPU implementation #1930

Merged
merged 34 commits into from
Jan 22, 2025
Merged
Changes from 1 commit
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
a03577e
initial rough commit
danhoeflinger Oct 30, 2024
ce117f5
minor improvements
danhoeflinger Oct 30, 2024
ccc001e
revision
danhoeflinger Nov 1, 2024
d518a14
Formatting, minor
danhoeflinger Nov 1, 2024
6e03468
spelling and grammar
danhoeflinger Nov 1, 2024
10c4e50
Minor improvements
danhoeflinger Nov 1, 2024
efa7c9b
subsection
danhoeflinger Nov 1, 2024
1ac82fd
Adding some alternative approaches
danhoeflinger Nov 1, 2024
02523c4
minor improvements
danhoeflinger Nov 1, 2024
ac7b654
line widths
danhoeflinger Nov 4, 2024
506fb62
fixing numbering.
danhoeflinger Nov 6, 2024
1c6cb47
putting in specifics for TBB / OpenMP
danhoeflinger Nov 6, 2024
ceee3e3
Update Atomic strategy
danhoeflinger Nov 12, 2024
0711090
more clarity about serial backend and policy
danhoeflinger Nov 12, 2024
3c5ad12
minor corrections
danhoeflinger Nov 12, 2024
06a734f
c++17 -> c++20 fix
danhoeflinger Nov 13, 2024
b858a0e
Updates after some experimentation and thought
danhoeflinger Dec 16, 2024
53f4643
improvements from feedback
danhoeflinger Dec 20, 2024
d718e0e
thread enumerable storage +
danhoeflinger Dec 20, 2024
bb9e6f9
remove general language keep specifics to histogram
danhoeflinger Dec 20, 2024
17e0510
SIMD naming
danhoeflinger Dec 20, 2024
9614209
spelling
danhoeflinger Dec 20, 2024
2964a9e
clarifying thread enumerable storage
danhoeflinger Dec 20, 2024
9287fd2
minor improvements
danhoeflinger Dec 30, 2024
cdf5092
spelling
danhoeflinger Dec 30, 2024
215c2b7
adding link to implementation
danhoeflinger Dec 30, 2024
04d5127
rename to __enumerable_thread_local_storage
danhoeflinger Jan 15, 2025
fe1efa2
Added sections on complexity
danhoeflinger Jan 15, 2025
60ec0e5
spelling
danhoeflinger Jan 15, 2025
54e16b6
wording adjustments
danhoeflinger Jan 15, 2025
77435a3
minor formatting
danhoeflinger Jan 15, 2025
52bab0d
describe fall back to serial implementation
danhoeflinger Jan 16, 2025
b25411b
rename rfc directory
danhoeflinger Jan 16, 2025
5d23f2a
adding discussion of input sizes
danhoeflinger Jan 16, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
spelling
Signed-off-by: Dan Hoeflinger <dan.hoeflinger@intel.com>
danhoeflinger committed Dec 30, 2024

Unverified

This user has not yet uploaded their public signing key.
commit cdf50929da2e3c3fe9042ec72796676086d0c6a5
2 changes: 1 addition & 1 deletion rfcs/proposed/host_backend_histogram/README.md
akukanov marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -157,7 +157,7 @@ With this new structure we will use the following algorithm:

1) Run a `parallel_for` pattern which performs a `histogram` on the input sequence where each thread accumulates into
its own temporary histogram returned by `__thread_enumerable_storage`. The parallelism is divided on the input
element axis, and we rely upon existing `parallel_for` to implement chunksize and thread composibility.
element axis, and we rely upon existing `parallel_for` to implement chunksize and thread composability.
2) Run a second `parallel_for` over the `histogram` output sequence which accumulates all temporary copies of the
histogram created within `__thread_enumerable_storage` into the output histogram sequence. The parallelism is divided
on the histogram bin axis, and each chunk loops through all temporary histograms to accumulate into the output