-
Notifications
You must be signed in to change notification settings - Fork 17
IISWC
The figure below illustrates the percentage of time spent in data movement latency, PIM kernel execution, and host execution for various architectures.
To reproduce the bars for benchmarks that do not require host execution, follow these steps (shown for vector addition):
-
Navigate to the directory of your cloned repository:
cd <PIMeval-PIMbench-dir>
-
Clean previous builds and compile the project:
make clean make -j
-
Change to the directory containing the PIM vector addition benchmarks:
cd <PIMbench/vec-add/PIM/>
-
Execute the benchmark for Bit-Serial architecture:
./vec-add.out -l 2035544320 -c ../../../configs/iiswc/PIMeval_BitSerial_Rank32.cfg
-
Upon successful execution, the terminal will output statistics similar to the following:
-
Calculate the total runtime and percentages for data copy and PIM kernel execution (highlighted values) from the generated statistics:
-
Total Runtime: Add the data copy and PIM kernel times (highlighted):
27.8 ms + 0.014 ms = 27.814 ms
. -
Percentage Calculations:
- Data Copy:
(27.8 ms / 27.814 ms) * 100 ≈ 99.95%
- PIM Kernel:
(0.014 ms / 27.814 ms) * 100 ≈ 0.05%
- Data Copy:
-
Total Runtime: Add the data copy and PIM kernel times (highlighted):
-
Repeat steps 4–6 for Fulcrum and Bank-Level architectures by modifying the
-c
parameter:- For Fulcrum:
./vec-add.out -l 2035544320 -c ../../../configs/iiswc/PIMeval_Fulcrum_Rank32.cfg
- For Bank-Level:
./vec-add.out -l 2035544320 -c ../../../configs/iiswc/PIMeval_Bank_Rank32.cfg
- For Fulcrum:
To reproduce the bars for benchmark requiring host execution, follow these steps (shown for radix sort):
-
Navigate to the directory of your cloned repository:
cd <PIMeval-PIMbench-dir>
-
Clean previous builds and compile the project:
make clean make -j
-
Change to the directory containing the PIM radix sort benchmarks:
cd <PIMbench/radix-sort/PIM/>
-
Execute the benchmark for Bit-Serial architecture:
./radix-sort.out -n 6710886 -c ../../../configs/iiswc/PIMeval_BitSerial_Rank32.cfg
-
Upon successful execution, the terminal will output statistics similar to the following:
-
Calculate the total runtime and percentages for data copy, PIM kernel execution and host execution (highlighted values) from the generated statistics:
-
Total Runtime: Add the data copy, PIM kernel time and host runtime (highlighted):
1.22 ms + 9.5 ms + 1179.2 ms = 1189.92 ms
. -
Percentage Calculations:
- Data Copy:
(1.22 ms / 1189.92 ms) * 100 ≈ 0.1%
- PIM Kernel:
(9.5 ms / 1189.92 ms) * 100 ≈ 0.8%
- Host:
(1179.2 ms / 1189.92 ms) * 100 ≈ 99.1%
- Data Copy:
-
Total Runtime: Add the data copy, PIM kernel time and host runtime (highlighted):
-
Repeat steps 4–6 for Fulcrum and Bank-Level architectures by modifying the
-c
parameter:- For Fulcrum:
./radix-sort.out -n 6710886 -c ../../../configs/iiswc/PIMeval_Fulcrum_Rank32.cfg
- For Bank-Level:
./radix-sort.out -n 6710886 -c ../../../configs/iiswc/PIMeval_Bank_Rank32.cfg
- For Fulcrum:
By following these steps, you will be able to reproduce the figures showing the breakdown of time spent in data movement latency, PIM kernel execution, and host execution.
The figure below shows the speedup of the specific PIM architecture over CPU
To reproduce the yellow bar, as in speed up over CPU with kernel + data movement latency, follow the steps (shown for radix sort):
-
Change to the directory containing the radix sort benchmark:
cd <PIMbench/radix-sort/PIM/>
-
Execute the benchmark for Bit-Serial architecture:
./radix-sort.out -n 6710886 -c ../../../configs/iiswc/PIMeval_BitSerial_Rank32.cfg
-
Upon successful execution, the terminal will output statistics similar to the following:
-
Calculate the total runtime (highlighted values) from the generated statistics:
-
Total Runtime: Add the data copy, PIM kernel time and host runtime (highlighted):
1.22 ms + 9.5 ms + 1179.2 ms = 1189.92 ms
.
-
Total Runtime: Add the data copy, PIM kernel time and host runtime (highlighted):
-
Execute the CPU version of radix sort. Details to be found here. Let's assume the run time is 2195.4 ms.
-
Compute the speedup by dividing the CPU runtime by the total PIM runtime:
This value corresponds to the yellow bar in the figure.
- Repeat steps 2–6 for Fulcrum and Bank-Level architectures by modifying the
-c
parameter:- For Fulcrum:
./radix-sort.out -n 6710886 -c ../../../configs/iiswc/PIMeval_Fulcrum_Rank32.cfg
- For Bank-Level:
./radix-sort.out -n 6710886 -c ../../../configs/iiswc/PIMeval_Bank_Rank32.cfg
- For Fulcrum:
To reproduce the green bar, as in speed up over CPU with kernel latency, follow the steps (shown for radix sort):
-
Change to the directory containing the radix sort benchmark:
cd <PIMbench/radix-sort/PIM/>
-
Execute the benchmark for Bit-Serial architecture:
./radix-sort.out -n 6710886 -c ../../../configs/iiswc/PIMeval_BitSerial_Rank32.cfg
-
Upon successful execution, the terminal will output statistics similar to the following:
-
Calculate the total runtime (highlighted values) from the generated statistics:
-
Total Runtime: Add the PIM kernel time and host runtime (highlighted):
9.5 ms + 1179.2 ms = 1188.7 ms
.
-
Total Runtime: Add the PIM kernel time and host runtime (highlighted):
-
Execute the CPU version of radix sort. Details to be found here. Let's assume the run time is 2195.4 ms.
-
Compute the speedup by dividing the CPU runtime by the total PIM runtime:
$Speedup = \frac{\text{CPU Execution Time}}{\text{Total PIM Runtime}} = \frac{2195.4\ \text{ms}}{1188.7\ \text{ms}} \approx 1.8 $
This value corresponds to the green bar in the figure.
- Repeat steps 2–6 for Fulcrum and Bank-Level architectures by modifying the
-c
parameter:- For Fulcrum:
./radix-sort.out -n 6710886 -c ../../../configs/iiswc/PIMeval_Fulcrum_Rank32.cfg
- For Bank-Level:
./radix-sort.out -n 6710886 -c ../../../configs/iiswc/PIMeval_Bank_Rank32.cfg
- For Fulcrum: