Skip to content

Latest commit

 

History

History
8 lines (8 loc) · 693 Bytes

README.md

File metadata and controls

8 lines (8 loc) · 693 Bytes

Used SIMD AVX512 instruction set for vectorizing code, openmp for using multiple threads, prefetching ,locality of reference ,thread pinning, loop unrolling , fused multiply-add , blocking , mmap in first 3 assignments.
Last assignment was done using cuda

Run : ./runner_script.sh
Matrix vector : Optimizations in report

2D image convolution: Optimized till 1.7 sec. Roofline analysis(Intel advisor) gave 1.5 sec

Matrix Matrix Multiplication :Optimized from 10 second(Naive:A * B transpose ) to 52 millisecond's (compared 2 best code in main.cpp 170ms vs 52ms)

Cuda convolution : 6777 milli sec on 4096x4096 matrix, 3x3 kernel