- First
- Second
- Third
- Fourth
- Bitonic Sort:
- Sample Sort:
- Merge Sort:
- Radix Sort:
- For MPI programs, include MPI calls you will use to coordinate between processes
- Input sizes, Input types
- Strong scaling (same problem size, increase number of processors/nodes)
- Weak scaling (increase problem size, increase number of processors)
Please use the caliper build /scratch/group/csce435-f24/Caliper/caliper/share/cmake/caliper
(same as lab2 build.sh) to collect caliper files for each experiment you run.
Your Caliper annotations should result in the following calltree
(use Thicket.tree()
to see the calltree):
main
|_ data_init_X # X = runtime OR io
|_ comm
| |_ comm_small
| |_ comm_large
|_ comp
| |_ comp_small
| |_ comp_large
|_ correctness_check
Required region annotations:
main
- top-level main function.data_init_X
- the function where input data is generated or read in from file. Use data_init_runtime if you are generating the data during the program, and data_init_io if you are reading the data from a file.correctness_check
- function for checking the correctness of the algorithm output (e.g., checking if the resulting data is sorted).comm
- All communication-related functions in your algorithm should be nested under thecomm
region.- Inside the
comm
region, you should create regions to indicate how much data you are communicating (i.e.,comm_small
if you are sending or broadcasting a few values,comm_large
if you are sending all of your local values). - Notice that auxillary functions like MPI_init are not under here.
- Inside the
comp
- All computation functions within your algorithm should be nested under thecomp
region.- Inside the
comp
region, you should create regions to indicate how much data you are computing on (i.e.,comp_small
if you are sorting a few values like the splitters,comp_large
if you are sorting values in the array). - Notice that auxillary functions like data_init are not under here.
- Inside the
MPI_X
- You will also see MPI regions in the calltree if using the appropriate MPI profiling configuration (see Builds/). Examples shown below.
All functions will be called from main
and most will be grouped under either comm
or comp
regions, representing communication and computation, respectively. You should be timing as many significant functions in your code as possible. Do not time print statements or other insignificant operations that may skew the performance measurements.
Nesting Code Regions Example - all computation code regions should be nested in the "comp" parent code region as following:
CALI_MARK_BEGIN("comp");
CALI_MARK_BEGIN("comp_small");
sort_pivots(pivot_arr);
CALI_MARK_END("comp_small");
CALI_MARK_END("comp");
# Other non-computation code
...
CALI_MARK_BEGIN("comp");
CALI_MARK_BEGIN("comp_large");
sort_values(arr);
CALI_MARK_END("comp_large");
CALI_MARK_END("comp");
# MPI Mergesort
4.695 main
├─ 0.001 MPI_Comm_dup
├─ 0.000 MPI_Finalize
├─ 0.000 MPI_Finalized
├─ 0.000 MPI_Init
├─ 0.000 MPI_Initialized
├─ 2.599 comm
│ ├─ 2.572 MPI_Barrier
│ └─ 0.027 comm_large
│ ├─ 0.011 MPI_Gather
│ └─ 0.016 MPI_Scatter
├─ 0.910 comp
│ └─ 0.909 comp_large
├─ 0.201 data_init_runtime
└─ 0.440 correctness_check
Have the following code in your programs to collect metadata:
adiak::init(NULL);
adiak::launchdate(); // launch date of the job
adiak::libraries(); // Libraries used
adiak::cmdline(); // Command line used to launch the job
adiak::clustername(); // Name of the cluster
adiak::value("algorithm", algorithm); // The name of the algorithm you are using (e.g., "merge", "bitonic")
adiak::value("programming_model", programming_model); // e.g. "mpi"
adiak::value("data_type", data_type); // The datatype of input elements (e.g., double, int, float)
adiak::value("size_of_data_type", size_of_data_type); // sizeof(datatype) of input elements in bytes (e.g., 1, 2, 4)
adiak::value("input_size", input_size); // The number of elements in input dataset (1000)
adiak::value("input_type", input_type); // For sorting, this would be choices: ("Sorted", "ReverseSorted", "Random", "1_perc_perturbed")
adiak::value("num_procs", num_procs); // The number of processors (MPI ranks)
adiak::value("scalability", scalability); // The scalability of your algorithm. choices: ("strong", "weak")
adiak::value("group_num", group_number); // The number of your group (integer, e.g., 1, 10)
adiak::value("implementation_source", implementation_source); // Where you got the source code of your algorithm. choices: ("online", "ai", "handwritten").
They will show up in the Thicket.metadata
if the caliper file is read into Thicket.
See the Builds/
directory to find the correct Caliper configurations to get the performance metrics. They will show up in the Thicket.dataframe
when the Caliper file is read into Thicket.
Include detailed analysis of computation performance, communication performance. Include figures and explanation of your analysis.
For input_size's:
- 2^16, 2^18, 2^20, 2^22, 2^24, 2^26, 2^28
For input_type's:
- Sorted, Random, Reverse sorted, 1%perturbed
MPI: num_procs:
- 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024
This should result in 4x7x10=280 Caliper files for your MPI experiments.
To automate running a set of experiments, parameterize your program.
- input_type: "Sorted" could generate a sorted input to pass into your algorithms
- algorithm: You can have a switch statement that calls the different algorithms and sets the Adiak variables accordingly
- num_procs: How many MPI ranks you are using
When your program works with these parameters, you can write a shell script that will run a for loop over the parameters above (e.g., on 64 processors, perform runs that invoke algorithm2 for Sorted, ReverseSorted, and Random data).
Time
- Min time/rank
- Max time/rank
- Avg time/rank
- Total time
- Variance time/rank
Plots for the presentation should be as follows:
- For each implementation:
- For each of comp_large, comm, and main:
- Strong scaling plots for each input_size with lines for input_type (7 plots - 4 lines each)
- Strong scaling speedup plot for each input_type (4 plots)
- Weak scaling plots for each input_type (4 plots)
- For each of comp_large, comm, and main:
Analyze these plots and choose a subset to present and explain in your presentation.
Submit a zip named TeamX.zip
where X
is your team number. The zip should contain the following files:
- Algorithms: Directory of source code of your algorithms.
- Data: All
.cali
files used to generate the plots seperated by algorithm/implementation. - Jupyter notebook: The Jupyter notebook(s) used to generate the plots for the report.
- Report.md