This repository provides a comprehensive setup for machine learning environments on AMD MI300 GPUs. It includes automated deployment scripts and configurations for popular ML frameworks optimized for AMD MI300 architecture.
- AMD MI300 GPU
- ROCm 6.2 or later
- Python 3.10.11
- Conda package manager
- Clone the repository:
git clone https://github.com/AI-DarwinLabs/amd-mi300-ml-stack.git
cd amd-mi300-ml-stack
- Run the installation script:
bash install.sh
The installation script will create a conda environment with Python 3.10.11 and install the following optimized packages:
- DeepSpeed 0.15.4 (custom build)
- Bitsandbytes 0.44.1 (optimized for AMD)
- PyTorch ROCm 2.5.0
- Flash Attention (ROCm version)
- Axolotl 0.5.2 (modified for MI300)
When running on HPC systems with Slurm, add these environment variables to your job scripts:
# Device Visibility
export CUDA_VISIBLE_DEVICES=0,1,2,3
export TORCH_CUDA_ARCH_LIST="9.0"
export HIP_VISIBLE_DEVICES=0,1,2,3
export ROCR_VISIBLE_DEVICES=0,1,2,3
# RCCL Optimizations
export RCCL_ENABLE_DIRECT=1
export RCCL_ENABLE_NUMA_BINDING=1
export RCCL_ENABLE_SYNC_MEMOPS=1
export RCCL_TRANSPORT=SHM
export RCCL_DEBUG=INFO
.
βββ install.sh # Main installation script
βββ environment.yml # Conda environment specification
βββ scripts/ # Installation scripts for individual components
β βββ install_deepspeed.sh
β βββ install_bitsandbytes.sh
β βββ install_torch.sh
β βββ install_flash_attention.sh
β βββ install_axolotl.sh
βββ vendor/ # Third-party packages and modifications
βββ DeepSpeed-0.15.4.tar.gz
βββ axolotl/
βββ flash-attention/
Contributions are welcome! Please feel free to submit a Pull Request.