IISWC

Reproducing The Results

Figure 7

The figure below illustrates the percentage of time spent in data movement latency, PIM kernel execution, and host execution for various architectures.

Screenshot from 2024-09-24 17-30-12

To reproduce the bars for benchmarks that do not require host execution, follow these steps (shown for vector addition):

Navigate to the directory of your cloned repository:
```
cd <PIMeval-PIMbench-dir>
```
Clean previous builds and compile the project:
```
make clean
make -j
```
Change to the directory containing the PIM vector addition benchmarks:
```
cd <PIMbench/vec-add/PIM/>
```

Execute the benchmark for Bit-Serial architecture:

./vec-add.out -l 2035544320 -c ../../../configs/iiswc/PIMeval_BitSerial_Rank32.cfg

Upon successful execution, the terminal will output statistics similar to the following:
Calculate the total runtime and percentages for data copy and PIM kernel execution (highlighted values) from the generated statistics:
- Total Runtime: Add the data copy and PIM kernel times (highlighted): 27.8 ms + 0.014 ms = 27.814 ms.
- Percentage Calculations:
  - Data Copy: (27.8 ms / 27.814 ms) * 100 ≈ 99.95%
  - PIM Kernel: (0.014 ms / 27.814 ms) * 100 ≈ 0.05%

Repeat steps 4–6 for Fulcrum and Bank-Level architectures by modifying the -c parameter:

For Fulcrum:

./vec-add.out -l 2035544320 -c ../../../configs/iiswc/PIMeval_Fulcrum_Rank32.cfg

For Bank-Level:

./vec-add.out -l 2035544320 -c ../../../configs/iiswc/PIMeval_Bank_Rank32.cfg

To reproduce the bars for benchmark requiring host execution, follow these steps (shown for radix sort):

Navigate to the directory of your cloned repository:
```
cd <PIMeval-PIMbench-dir>
```
Clean previous builds and compile the project:
```
make clean
make -j
```
Change to the directory containing the PIM radix sort benchmarks:
```
cd <PIMbench/radix-sort/PIM/>
```

Execute the benchmark for Bit-Serial architecture:

./radix-sort.out -n 6710886 -c ../../../configs/iiswc/PIMeval_BitSerial_Rank32.cfg

Upon successful execution, the terminal will output statistics similar to the following:

Calculate the total runtime and percentages for data copy, PIM kernel execution and host execution (highlighted values) from the generated statistics:
- Total Runtime: Add the data copy, PIM kernel time and host runtime (highlighted): 1.22 ms + 9.5 ms + 1179.2 ms = 1189.92 ms.
- Percentage Calculations:
  - Data Copy: (1.22 ms / 1189.92 ms) * 100 ≈ 0.1%
  - PIM Kernel: (9.5 ms / 1189.92 ms) * 100 ≈ 0.8%
  - Host: (1179.2 ms / 1189.92 ms) * 100 ≈ 99.1%

Repeat steps 4–6 for Fulcrum and Bank-Level architectures by modifying the -c parameter:

For Fulcrum:

./radix-sort.out -n 6710886 -c ../../../configs/iiswc/PIMeval_Fulcrum_Rank32.cfg

For Bank-Level:

./radix-sort.out -n 6710886 -c ../../../configs/iiswc/PIMeval_Bank_Rank32.cfg

By following these steps, you will be able to reproduce the figures showing the breakdown of time spent in data movement latency, PIM kernel execution, and host execution.

Figure 9

The figure below shows the speedup of the specific PIM architecture over CPU

Screenshot from 2024-09-25 20-19-24

Reproduce yellow bar

To reproduce the yellow bar, as in speed up over CPU with kernel + data movement latency, follow the steps (shown for radix sort):

Change to the directory containing the radix sort benchmark:
```
cd <PIMbench/radix-sort/PIM/>
```

Execute the benchmark for Bit-Serial architecture:

./radix-sort.out -n 6710886 -c ../../../configs/iiswc/PIMeval_BitSerial_Rank32.cfg

Upon successful execution, the terminal will output statistics similar to the following:

Calculate the total runtime (highlighted values) from the generated statistics:
- Total Runtime: Add the data copy, PIM kernel time and host runtime (highlighted): 1.22 ms + 9.5 ms + 1179.2 ms = 1189.92 ms.
Execute the CPU version of radix sort. Details to be found here. Let's assume the run time is 2195.4 ms.
Compute the speedup by dividing the CPU runtime by the total PIM runtime:

$Speedup = \frac{\text{CPU Execution Time}}{\text{Total PIM Runtime}} = \frac{2195.4\ \text{ms}}{1189.92\ \text{ms}} \approx 1.8 $

This value corresponds to the yellow bar in the figure.

Repeat steps 2–6 for Fulcrum and Bank-Level architectures by modifying the -c parameter:

For Fulcrum:

./radix-sort.out -n 6710886 -c ../../../configs/iiswc/PIMeval_Fulcrum_Rank32.cfg

For Bank-Level:

./radix-sort.out -n 6710886 -c ../../../configs/iiswc/PIMeval_Bank_Rank32.cfg

Reproduce green bar

To reproduce the green bar, as in speed up over CPU with kernel latency, follow the steps (shown for radix sort):

Change to the directory containing the radix sort benchmark:
```
cd <PIMbench/radix-sort/PIM/>
```

Execute the benchmark for Bit-Serial architecture:

./radix-sort.out -n 6710886 -c ../../../configs/iiswc/PIMeval_BitSerial_Rank32.cfg

Upon successful execution, the terminal will output statistics similar to the following:

Calculate the total runtime (highlighted values) from the generated statistics:
- Total Runtime: Add the PIM kernel time and host runtime (highlighted): 9.5 ms + 1179.2 ms = 1188.7 ms.
Execute the CPU version of radix sort. Details to be found here. Let's assume the run time is 2195.4 ms.
Compute the speedup by dividing the CPU runtime by the total PIM runtime:

$Speedup = \frac{\text{CPU Execution Time}}{\text{Total PIM Runtime}} = \frac{2195.4\ \text{ms}}{1188.7\ \text{ms}} \approx 1.8 $

This value corresponds to the green bar in the figure.

Repeat steps 2–6 for Fulcrum and Bank-Level architectures by modifying the -c parameter:

For Fulcrum:

./radix-sort.out -n 6710886 -c ../../../configs/iiswc/PIMeval_Fulcrum_Rank32.cfg

For Bank-Level:

./radix-sort.out -n 6710886 -c ../../../configs/iiswc/PIMeval_Bank_Rank32.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IISWC

Reproducing The Results

Figure 7

Figure 9

Reproduce yellow bar

Reproduce green bar

Clone this wiki locally