Matrix-Multiplication-SIMD-Intrinsics-and-FPU NxN Matrix Multiplication using SIMD with Intrinsics (MMX, SSE, SSE2, AVX, etc.) and FPU as inline ASM in C Find the final documentation of this work as pdf file here