Lists (1)
Sort Name ascending (A-Z)
Stars
Fully open reproduction of DeepSeek-R1
Deita: Data-Efficient Instruction Tuning for Alignment [ICLR2024]
This is a replicate of DeepSeek-R1-Zero and DeepSeek-R1 training on small models with limited data
Qwen2.5-Coder is the code version of Qwen2.5, the large language model series developed by Qwen team, Alibaba Cloud.
Ongoing research training transformer language models at scale, including: BERT & GPT-2
This repository is a curated collection of resources, tutorials, and practical examples designed to guide you through the journey of mastering CUDA programming. Whether you're just starting or look…
Problem statements on System Design and Software Architecture as part of Arpit's System Design Masterclass
🔥Highlighting the top ML papers every week.
Distribute and run LLMs with a single file.
A small self-contained alternative to readline and libedit
Create beautiful terminal-based code tutorials with syntax highlighting and interactive navigation.
Unofficial implementation of Titans, SOTA memory for transformers, in Pytorch
Official repository for our work on micro-budget training of large-scale diffusion models.
A Reinforcement Learning agent that learns how to to solve maze missions in Minecraft.
Minimalistic 4D-parallelism distributed training framework for education purpose
What would you do with 1000 H100s...
The repository will contain a list of projects which we will work on while reading the books of Natural Language Processing & Transformers.
Minimalistic large language model 3D-parallelism training
All the resources you need to get to Senior Engineer and beyond
My Digital Palace - A Personal Journal for Reflection - A place to store all my thoughts
DevOps Roadmap for 2025. with learning resources
Modeling, training, eval, and inference code for OLMo