Just record my journey to advance and democratize artificial intelligence through MSOS ZeRO and DeepSpeed 只在记录使用微软开源分布式AI训练框架ZeRO and DeepSpeed过程中的问题和解决方法。
ID | ID-2 | 名称 | 说明 | 文档 | 源代码 |
---|---|---|---|---|---|
00-Install | - | Install | deepspeed相关安装 | Install.md | src |
00-Training | - | - | 模型训练相关 | Readme.md | - |
01-Training | 00 | Startup | 开始训练 | Startup.md | src |
01-Training | 01 | Transformer | Transformer基础模型 | Transformer.md | src |
01-Training | 02 | GPT-2 | GPT-2 基础模型 | GPT2.md | src |
01-Training | 03 | ZeRO Offload/ZeRO++ | ZeROO ffload/ZeRO++ | ZeROPlusPlus.md | src |
01-Training | 04 | LLAMA | LLAMA 模型 | LLAMA.md | src |
01-Training | 05 | DeepSpeed-Chat | DeepSpeed-Chat | DeepSpeed-Chat.md | src |
01-Training | 06 | Megatron | Megatron | Megatron.md | src |
01-Training | 09 | NCCL | NCCL相关 | NCCL.md | src |
02-Optimization | - | - | 优化 | Readme.md | - |
02-Optimization | 00 | 00 Accelerate | Accelerate工具 | Accelerate.md | src |
02-Optimization | 01 | LLM Accelerating | LLM 加速基础 | LLM-Accelerating.md | src |
02-optimizations | 02 | Inference | 推理优化 | Inference.md | src |
-
ZeRO: Memory Optimizations Toward Training Trillion Parameter Models. URL:https://arxiv.org/pdf/1910.02054.pdf
-
ZeRO-Offload: Democratizing Billion-Scale Model Training. URL:https://arxiv.org/pdf/2101.06840.pdf
-
DeepSpeed: A deep learning optimization library. URL: https://github.com/microsoft/DeepSpeed
-
微软DeepSpeed组官方账号 URL: https://www.zhihu.com/people/deepspeed
-
DeepSpeed Examples URL:https://github.com/microsoft/DeepSpeedExamples