Skip to content

Code, documentation, and resources for the Master's thesis: Optimizing Data-Centric Applications - A Machine Learning Approach to Optimizations in DaCe.

Notifications You must be signed in to change notification settings

pr0f3ss/DaCe_AutoOpt_DL

Repository files navigation

DaCe_AutoOpt_DL: Optimizing Data-Centric Applications - A Machine Learning Approach to Optimizations in DaCe

Description

This git repository contains the code, documentation, and all relevant files for the Master's thesis Optimizing Data-Centric Applications - A Machine Learning Approach to Optimizations in DaCe. In this thesis, we provide a cost model implementation for the DaCe Parallel Programming framework alongside a Beam search implementation. For the data generation, we utilize the MLIR-Forge implementation by Berke Ates et al. Our cost model is based on the Tiramisu cost model architecture, however adjusted to work within the DaCe framework. Our Beam search algorithm can be configured to use real-time runtime measurements or cost model predictions.

Installation

To install and run the project, follow the steps below:

git clone --recurse-submodules https://spclgitlab.ethz.ch/dofilip/dace_autoopt_dl.git

Repository Structure

  • data: Contains plotting data points, plots, as well as a small data set collection for testing.
  • notes: Textfiles for notes
  • papers: Annotated papers used within the thesis
  • src: All source files for this thesis
    • daisytuner_evaluation: Benchmark files for the Daisytuner evaluation
    • dataset_generation: Script files to generate a set of base SDFGs with MLIR-Forge
    • model: Contains source files for the cost model implementation
    • paper_examples: Code examples used within the thesis writing
    • pass_application: Implementation of the transformation pass and benchmarking infrastructure
    • plotting: Scripts to plot various data points from our cost model analysis
    • scripts: Contains auxiliary scripts for the data point generation
    • search_space_exploration: Beam search implementation files
    • workload_analysis: SDFG structure analysis scripts

More information about the repository setup can be found in the paragraphs below.

Custom Dataset

If you want to generate a custom dataset for training, you have to set up MLIR-Forge in the following way:

cd MLIR-Smith

Follow the steps on MLIR-Forge to build all necessary components.

Usage

Training

To train the model, use python training.py in src/model. Make sure that the training data was properly initialized in the next step.

Dataset Generation

To generate a training data set based on some SDFG directory, run generate_test_dataset.py in the src/scripts directory. This generates the base graphs and their transformed graphs in src/model/train_graphs/. By running python data_loader.py in src/model, the training data will be initialized and stored in src/model/train_data.

MLIR-Forge Dataset Generation

To generate a training data set based on random programs provided by MLIR-Forge, ensure a proper installation of MLIR-Forge first. Next, generate base SDFGs by using src/dataset_generation/dataset_generation.sh. You can parameterize MLIR-Forge with the gen_config file (See more information about this below). Lastly, run generate_gen_dataset.py in the src/scripts directory.

Workload Analysis

Training our cost model requires a large dataset of SDFGs. However, randomly generated SDFGs do not resemble the workload that scientific applications exhibit. For this reason, we carry out a structure analysis on NPBench SDFGs and set the generation parameters of MLIR-Forge according to our analysis in our SDFG generation pipeline.

The workload_analysis directory holds two files analyze_sdfg.py and analyze_python.py which may be used to analyze the NPBench programs in SDFG and Python representation, respectively. The SDFGs for all NPBench programs are generated with npbench_to_sdfg.py and subsequently stored in npbench_sdfgs/. By running python analyze_sdfg.py a workload analysis of the npbench_to_sdfg.py directory is carried out and printed into SDFG_analysis.txt.

Beam search

The src/search_space_exploration directory contains all relevant files for conducting the Beam search. Run python beam_search.py +beam.sdfg_filename=FILENAME +beam.batch_number=BATCH_NR to conduct a Beam search on an SDFG file in a specific batch (Use batch number 0). Please note, that you need to set up the search_graphs/base_graphs directory prior to the Beam search. This directory must specifically contain the SDFGs that you want to run the Beam search on. The supplied filepath should lead to the SDFG within this directory.

Authors and acknowledgment

Filip Dobrosavljevic Advisors: Andrei Ivanov, Lukas Gianinazzi, and Afif Boudaoud Supervisors: Prof. Dr. Torsten Hoefler

Acknowledgements: Thanks to Lukas Truemper for providing parts of the benchmarking infrastructure and for providing a data set of SDFGs for our cost model training.

License

TODO

About

Code, documentation, and resources for the Master's thesis: Optimizing Data-Centric Applications - A Machine Learning Approach to Optimizations in DaCe.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published