Skip to content

statycc/loop-fission

Repository files navigation

Loop Fission Benchmarks

Usability Latest run

This repository contains the source code for benchmarking the loop fission algorithm presented in "Distributing and Parallelizing Non-canonical Loops".

This is a curated benchmark suite. The included benchmarks have been selected specifically for offering opportunity to perform loop fission transformation. For each benchmark, we measure difference in clock time, after loop fission and parallelization using OpenMP.

We compare the original baseline programs to those generated by our loop fission technique. We also compare our technique to an alternative loop fission technique implemented in ROSE compiler's LoopProcessor tool.

Getting started

  • For an instructed walk-through and reproduction of paper experiments, access the artifact published on zenodo and follow the readme instructions.
  • For running experiments on native hardware or extending this source code, clone this repository.

Files and organization

Benchmark directories

Directory Description
original baseline, unmodified programs
fission transformed and parallelized programs, using our method
alt transformed and parallelized programs, using alternative method
A more detailed explanation of benchmark transformations


In this repository, transformations of benchmarks have already been applied in the appropriate directories, fission and alt, and we measure differences between those transformations and original. This section explains briefly how the transformations are generated in the first place.

Fission benchmarks have been transformed manually following our loop fission algorithm. These transformations are not reproducible mechanically. The programs have been parallelized by hand. We use manual approach here because automatic parallelization of while loops is not supported by any tool we are aware of. Programs with for loops could be parallelized automatically, and as shown in the next case.

Alt benchmarks are the results of applying automatic transformation using ROSE compiler. They are also automatically parallelized using the same tool. Benchmark remap fails during transformation, and we measure no difference between the original version. Because the tool transforms all code, including the timing code of the benchmark template, as last step we restore the original benchmark template. Detailed steps for re-generating the alt-benchmarks are here.

Benchmark descriptions

All programs are written in C language. We include programs with both for and while loops.

Benchmark Description for while Origin
3mm 3D matrix multiplication PolyBench/C
bicg BiCG sub kernel of BiCGStab linear solver PolyBench/C
colormap TIFF image conversion of photometric palette MiBench
conjgrad Conjugate gradient routine NAS-CG
cp50 Ghostscript/CP50 color print routine MiBench
deriche Edge detection filter PolyBench/C
fdtd-2d 2-D finite different time domain kernel PolyBench/C
gemm Matrix-multiply C=alpha.A.B+beta.C PolyBench/C
gesummv Scalar, vector and matrix multiplication PolyBench/C
mvt Matrix vector product and transpose PolyBench/C
remap 4D matrix memory remapping NAS-UA
tblshift TIFF PixarLog compression main table bit shift MiBench

Other directories and files

  • headers/ header files for benchmark programs.

  • utilities/ e.g. the timing script, obtained from PolyBench/C benchmark suite version 4.2.

  • plot.py is used for generating tables and plots from results.

  • run.sh is a wrapper for the timing script; it enables benchmarking directories.

  • ref_eval referential result; the evaluation on which we base results of the paper.

Reproducing results

System requirements: Linux/OSX host, C compiler with OpenMP support, Make, and Python 3.8+ for plotting. The system should have multiple cores for parallelism, but 4 cores is sufficient for this experiment.

Benchmarking proceeds in two phases: first capture the timing results, then generate plots and tables from those results. Details of these steps follow next.

Running the benchmarks

✳️ System should include a C compiler that supports OpenMP pragmas (defaults to GCC).

To specify an alternative compiler, append to the make commands CC=[compiler_name].

Small evaluation — time partial benchmarks — ⏲️ ~ 10 min.

make small

Run all benchmarks — time execution of all benchmarks — ⏲️ ~ 2-4 h, by OS.

make all

Run specific benchmark — ⏲️ 1-60 min, varies by benchmark and OS.

make [benchmark_name]

Custom execution — call the run.sh script directly:

./run.sh 

Available arguments for timing

ARGUMENT DESCRIPTION: options DEFAULT
-c system compiler to use gcc
-d which directory: original, fission, alt original
-o optimization level: O0, O1, O2, O3, ... O0
-v max. variance (%) when timing results: > 0.0 5.0
-s data size: MINI, SMALL, MEDIUM, LARGE, EXTRALARGE, STANDARD STANDARD
-p only specific benchmark: 3mm, bicg, deriche ... not set

If necessary, change permissions: chmod u+r+x ./run.sh.

Locating and interpreting results

The results can be found in eval/results directory, categorized by source directory name, optimization level and data size. Two files will be generated for each category, for each run:

  1. [args]_model.txt - machine + processor snapshot, meta data

  2. [args].txt - actual results of timing.

Data labels, in order:

  • program: name of evaluated program
  • variance (%): variance of recorded execution times
  • time (s): average runtime (clock time), in seconds
  • UTC timestamp: time at completion, in seconds since epoch

Data labels are always the same. They are not included in timing results file.

Timing options are same as default:

  • perform 5 executions/program
  • take average of 3 runs (min and max time are excluded)
  • variance controls the %-difference allowed between the 3 median runs

Generating plots and tables

After capturing results, use the plotting script to generate tables or graphs. This step requires Python version 3.8 or higher.

  1. Install dependencies

    python3 -m pip install -q -r requirements.txt
    
  2. Generate tables or plots

    make plots
    

To customize plot options, call the plot.py directly with selected arguments.

Available arguments for plotting

python3 plot.py --help
ARGUMENT DESCRIPTION : options DEFAULT
--data data choice: time, speedup time
--input path to results (input) directory eval/results
--out path to output directory eval/plots
--fmt output format: tex, md, plot md
--digits number of digits for tabular values: 0...15 6
--ss source directory for calculating speedup original
--st target directory for calculating speedup (all when not set) not set
--millis display table of times in milliseconds (otherwise in seconds) not set
--show show generated plot or table not set
--dir_filter include directories (comma-separated list): original, fission, alt not set
--prog_filter include benchmarks (comma-separated list): 3mm, bicg, deriche ... not set
--help show help message and exit not set

e.g. to generate a text-based speedup table, and display it, run:

python3 plot.py -d speedup -f md --digits 2  --show