AMD MI300 ML Stack

This repository provides a comprehensive setup for machine learning environments on AMD MI300 GPUs. It includes automated deployment scripts and configurations for popular ML frameworks optimized for AMD MI300 architecture.

Prerequisites

AMD MI300 GPU
ROCm 6.2 or later
Python 3.10.11
Conda package manager

Quick Start

Clone the repository:

git clone https://github.com/AI-DarwinLabs/amd-mi300-ml-stack.git
cd amd-mi300-ml-stack

Run the installation script:

bash install.sh

Environment Setup

The installation script will create a conda environment with Python 3.10.11 and install the following optimized packages:

DeepSpeed 0.15.4 (custom build)
Bitsandbytes 0.44.1 (optimized for AMD)
PyTorch ROCm 2.5.0
Flash Attention (ROCm version)
Axolotl 0.5.2 (modified for MI300)

HPC/Slurm Configuration

When running on HPC systems with Slurm, add these environment variables to your job scripts:

# Device Visibility
export CUDA_VISIBLE_DEVICES=0,1,2,3
export TORCH_CUDA_ARCH_LIST="9.0"
export HIP_VISIBLE_DEVICES=0,1,2,3
export ROCR_VISIBLE_DEVICES=0,1,2,3

# RCCL Optimizations
export RCCL_ENABLE_DIRECT=1
export RCCL_ENABLE_NUMA_BINDING=1
export RCCL_ENABLE_SYNC_MEMOPS=1
export RCCL_TRANSPORT=SHM
export RCCL_DEBUG=INFO

Repository Structure

.
├── install.sh              # Main installation script
├── environment.yml         # Conda environment specification
├── scripts/               # Installation scripts for individual components
│   ├── install_deepspeed.sh
│   ├── install_bitsandbytes.sh
│   ├── install_torch.sh
│   ├── install_flash_attention.sh
│   └── install_axolotl.sh
└── vendor/                # Third-party packages and modifications
    ├── DeepSpeed-0.15.4.tar.gz
    ├── axolotl/
    └── flash-attention/

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
scripts		scripts
vendor		vendor
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
install.sh		install.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AMD MI300 ML Stack

Prerequisites

Quick Start

Environment Setup

HPC/Slurm Configuration

Repository Structure

Contributing

About

Packages

Languages

AI-DarwinLabs/amd-mi300-ml-stack

Folders and files

Latest commit

History

Repository files navigation

AMD MI300 ML Stack

Prerequisites

Quick Start

Environment Setup

HPC/Slurm Configuration

Repository Structure

Contributing

About

Topics

Resources

Stars

Watchers

Forks

Packages 0

Languages

Packages