Performance benchmarks for coupled simulation and AI workflows on HPC systems
The goal of SimAI-Bench is to host a series of micro, mini, and full benchmarks for various coupled simulation and AI/ML workflow motifs designed for leadership HPC systems. These benchmarks may be used to evaluate the performance and scalability of different workflow libraries or compare the same workflow motif across different hardware.
The benchmarks below are named according Brewer et al., 2024, which organized coupled workflows into six motifs based on their goals and data transfer patterns.
The first mini-benchmark is representative of an online training and inference workflow for developing mesh-based ML surrogate models from ongoing HPC simulations. Referring to Brewer et al., 2024, this falls into motif #6: Adaptive training of large AI models.
The focus of this benchmark is to replicate the data transfer patterns and key components of a realistic workflow, but removing the complexities involved with compiling and running complex simulation codes. It is composed of a mock parallel simulation which performs a time step loop, advancing the system dynamics, and at pre-determined intervals (selected at runtime) transfers training data to either the training component or a staging area. The simulation also receives model checkpoints and performs inference with the ML surrogate, comparing the predicted solution field with the true one, and thus determining whether more training is needed or if the required accuracy is met, at which point the workflow ends. The distributed ML training component is training the ML surrogate from the incoming data sent by the simulation, checking for updates each epoch and loading the new data as it is generated. The training component preriodically saves model checkpoints and sends them to the simulation. Depending on the implementation used, a third component is also present in the form of a staging area (e.g., SmartSim Orchestrator or Dragon Distributed Dictionary).
The benchmark is currently implemented using the SmartSim/SmartRedis and Dragon libraries, each providing both the workflow driver API and the client API. Both of these implementations use a staging area to store meta-data, training data, and model checkpoints. Which implementation to run with can be selected at runtime. Additionally, it can be launched with either a colocated deployment strategy (i.e., all components run on the same set of nodes) or a clustered strategy (i.e., all components run on distinct set of nodes from each other). The deployment strategy can also be selected at runtime.
The purpose of the benchmark is to focus on the data transfer between the different components and provide figures of merit (FOM) capturing the transfer overhead on the simulation and training components. Therefore, the benchmark captures the computational effectiveness of the workflow for a given library, deployment strategy, and HPC system.
- PyTorch
- PyTorch Geometric and PyTorch Cluster
- SmartSim and SmartRedis
- Dragon
- mpi4py
- MPIPartition
- 0.0.1
- Added an online training workflow for ML surrogate models implemented with SmartSim and Dragon
- Tested on ALCF Polaris
- Fork it (https://github.com/argonne-lcf/SimAI-Bench/fork)
- Cline it (
git clone https://github.com/username/SimAI-Bench.git
) - Create your feature branch (
git checkout -b feature/fooBar
) - Commit your changes (
git commit -am 'Add some fooBar'
) - Push to the branch (
git push -u origin feature/fooBar
) - Create a new Pull Request
Riccardo Balin, Argonne National Lab, rbalin@anl.gov
Shivam Barwey, Argonne National Lab, sbarwey@anl.gov