The main goal of this small project is to educate myself on how things are built from scratch, and I hope to convince at least a single person that they could build anything from scratch. Andrej Karpathy's llm.c and micrograd were the projects that motivated me to build this.
- Multi-dimensional arrays and tensors are just simple 1-dimensional arrays but with strides enabling us to access rows and columns in the desired way.
- Learned a lot about the C language, including memory management, parallel processing, and memory access patterns. This is just the second thing I built in C, the first one being a basic password manager.
- Derived backpropagation of layers like LayerNorm and Attention mechanisms. Improved my mathematical ability a lot.
- Learned about how we could map files and use them as a sort of virtual memory (it was hard storing the activations and parameters in the RAM. They are humongous, something like ~20GB).
It was fun building something like this.
Some compiler flags to optimize the performance: -O3 -march=native -funroll-loops -fopenmp
-O3
: Aggressive optimizations-march=native
: CPU-specific optimizations-funroll-loops
: Loop unrolling for potential speed improvements-fopenmp
: OpenMP support for parallel processing
This implementation isn't the most optimal approach; there are lots of things to improve.
- Improve the Matrix Multiplication.
- Improve the Attention Mechanism and its backprop, as it consumes a lot of training time.