F16 intrinsics standalone #5

Narsil · 2023-08-01T11:51:39Z

This is very dirty PR more a POC than anything else at this point.

It seems to work and be correct. (It passes in every scenario I tried.)
It is faster than without.

half-rs is using a fork starkat99/half-rs#98 to get some currently non existing intrinsics for pure f16 computing.

Then hackilishly added them into gemm:

Copy-pasted the code for f16 gemm (which does f16 -> f32simd -> matmul -> f16) to do purely f16 -> f16.

The code requires black_box atm for the compiler to be happy. This is most likely an error of mine in half-rs intrinsics implementation (I used arm! macro but do no understand how that affects the compiler).

I didn't re-optimize this afterwards to make sure cache lines were adapted or anything of the sort.

Current results:

GGML WITHOUT ACCELERATE (f32xf16) -> f32 :  220ms (1 thread) - 197ms (8 threads)
GEMM (f16xf16x) -> f16:   136ms (thread) - 68ms (8 threads)
M, N, K :  4096 x 128 x 11108

For reference Accelerate seems to do ~25ms for the same op and threading seems to decrease performance on it , which I guess is because Accelerate already uses threading underneath).

~-25% overall 97ms (1 thread) 52ms (8 threads)

Narsil added 3 commits August 1, 2023 10:39

Using m1 intrinsics for f16xf16

c7a1ceb

Removing black box.

a8f0280

Cleanup.

c2d2173

Narsil requested a review from LaurentMazare August 1, 2023 11:51

Following @sarah-ek advices, adding more register helped !

c39304a

~-25% overall 97ms (1 thread) 52ms (8 threads)

LaurentMazare approved these changes Aug 1, 2023

View reviewed changes

Narsil merged commit e7ef6f9 into main Aug 1, 2023
3 checks passed

Narsil mentioned this pull request Aug 3, 2023

Apple silicon (MPS backends) support? huggingface/candle#313

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

F16 intrinsics standalone #5

F16 intrinsics standalone #5

Narsil commented Aug 1, 2023

F16 intrinsics standalone #5

F16 intrinsics standalone #5

Conversation

Narsil commented Aug 1, 2023