Skip to content

πŸš€ Automated deployment stack for AMD MI300 GPUs with optimized ML/DL frameworks and HPC-ready configurations

Notifications You must be signed in to change notification settings

AI-DarwinLabs/amd-mi300-ml-stack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

9 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

AMD MI300 ML Stack

This repository provides a comprehensive setup for machine learning environments on AMD MI300 GPUs. It includes automated deployment scripts and configurations for popular ML frameworks optimized for AMD MI300 architecture.

Prerequisites

  • AMD MI300 GPU
  • ROCm 6.2 or later
  • Python 3.10.11
  • Conda package manager

Quick Start

  1. Clone the repository:
git clone https://github.com/AI-DarwinLabs/amd-mi300-ml-stack.git
cd amd-mi300-ml-stack
  1. Run the installation script:
bash install.sh

Environment Setup

The installation script will create a conda environment with Python 3.10.11 and install the following optimized packages:

  • DeepSpeed 0.15.4 (custom build)
  • Bitsandbytes 0.44.1 (optimized for AMD)
  • PyTorch ROCm 2.5.0
  • Flash Attention (ROCm version)
  • Axolotl 0.5.2 (modified for MI300)

HPC/Slurm Configuration

When running on HPC systems with Slurm, add these environment variables to your job scripts:

# Device Visibility
export CUDA_VISIBLE_DEVICES=0,1,2,3
export TORCH_CUDA_ARCH_LIST="9.0"
export HIP_VISIBLE_DEVICES=0,1,2,3
export ROCR_VISIBLE_DEVICES=0,1,2,3

# RCCL Optimizations
export RCCL_ENABLE_DIRECT=1
export RCCL_ENABLE_NUMA_BINDING=1
export RCCL_ENABLE_SYNC_MEMOPS=1
export RCCL_TRANSPORT=SHM
export RCCL_DEBUG=INFO

Repository Structure

.
β”œβ”€β”€ install.sh              # Main installation script
β”œβ”€β”€ environment.yml         # Conda environment specification
β”œβ”€β”€ scripts/               # Installation scripts for individual components
β”‚   β”œβ”€β”€ install_deepspeed.sh
β”‚   β”œβ”€β”€ install_bitsandbytes.sh
β”‚   β”œβ”€β”€ install_torch.sh
β”‚   β”œβ”€β”€ install_flash_attention.sh
β”‚   └── install_axolotl.sh
└── vendor/                # Third-party packages and modifications
    β”œβ”€β”€ DeepSpeed-0.15.4.tar.gz
    β”œβ”€β”€ axolotl/
    └── flash-attention/

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

About

πŸš€ Automated deployment stack for AMD MI300 GPUs with optimized ML/DL frameworks and HPC-ready configurations

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages