Multihead arch with cutlass fused multihead attention #1976

almaudoh · 2024-03-11T14:51:44Z

cutlass implementation of fused multihead attention layer giving about 10% speedup on A100.

Fusing biases
Fusing layernorms
Fusing FFNs

- skip connection add before layer norm now has a scaling factor (alpha) - replace conv layer of value and mlh heads with an embedding layer when attention body is used.

- also remove hardcoding.

- will be removed once it's fixed.

- also fix scratch space calculation.

- factor of sizeof(DataType) was missing.

get almaudoh's code

- to handle bigger/wider networks

1.3% improvement in BT2 on RTX 4090 15.6% improvement in test BT3 network with 64 heads.

…ad architecture.

…da-cutlass-fmha

ankan-ban and others added 30 commits March 22, 2022 22:28

WIP attention body changes

aa865a2

more updates to match training code

d6728dc

- skip connection add before layer norm now has a scaling factor (alpha) - replace conv layer of value and mlh heads with an embedding layer when attention body is used.

fix few crashes

49a8143

use the right encoder block for body!

7d96c63

fix output of AttentionBody

f5fe737

move pos encoding table to common header file

72cbc13

- also remove hardcoding.

add hack to match training side bug

af2db3d

- will be removed once it's fixed.

fix build error

3c2639c

remove hack for plies ply plane training side bug

410c4f9

- also fix scratch space calculation.

Fix attention body/head size

b6c8e43

- factor of sizeof(DataType) was missing.

Merge branch 'master' into attentionbody-cuda

e0f94f0

Add input gating kernel.

844186b

Completed input gating

780d47f

Add input gating, smolgen and sqrrelu.

f25f0bd

Fixed unstable softmax implementation.

e1cd35e

Remove debug log

b5d6930

Tilp's fix for smolgen gemms.

e9cda40

Merge branch 'master' into attentionbody-cuda

d7a8adf

Remove debug code

e98ef8d

Add tilps perf improvement on existing attention qkv matmuls.

b5afc19

Fix cudnn build failures.

9f9304b

Add tilps perf patch for fused smolgen weights add / softmax

edbd8a8

Merge branch 'master' into attentionbody-cuda

f1f485a

Fix errors in non-attentionbody nets.

d207abe

Add multistream support. Allow new attentionbody nets.

70b0521

Merge pull request LeelaChessZero#28 from almaudoh/attentionbody-cuda

ba83ef8

get almaudoh's code

Merge branch 'LeelaChessZero:master' into attention-opts

32cf3a4

add 8 elements per thread layernorm

eb184f4

- to handle bigger/wider networks

Try fused MHA from cutlass

6e0161a

1.3% improvement in BT2 on RTX 4090 15.6% improvement in test BT3 network with 64 heads.

Common changes for new multiple head architecture.

787090b

almaudoh-1 and others added 26 commits October 5, 2023 02:11

Support for multihead architecture in protobuf.

bfbe14c

Add backward compatibility adjustments to old nets to work in multihe…

4cdfab3

…ad architecture.

Merge branch 'multihead-arch-common' into multihead-arch-cuda

135c151

Add backward compatibility.

3c0ded9

Merge branch 'attention-opts' into multihead-arch-cuda-cutlass-fmha

c71d885

Fix conflict resolution artifacts.

4333f15

Fix omissions.

288a337

Merge branch 'master' into multihead-arch-cuda

742567e

Remove short-term error value accessor.

316b85c

Remove old artifacts from network_legacy

284eb27

Fix backend to use MultiHeadWeights struct.

d446d12

File formatting.

c38b568

Fix layernorm epsilon for older attentionbody nets.

eb26621

Minor comment fixes.

8a2009d

Change 'optimistic_st' key to 'optimistic' in policy head map.

4ffee57

Switch cudnn to cuda for multiheadformat.

ee81336

Merge branch 'multihead-arch-cuda' into multihead-arch-cuda-cutlass-fmha

b14ad8b

Merge remote-tracking branch 'upstream/master' into multihead-arch-cuda

18acdf0

Merge remote-tracking branch 'upstream/master' into multihead-arch-cu…

9a7bba2

…da-cutlass-fmha

Merge remote-tracking branch 'upstream/master' into multihead-arch-cuda

b12c205

Remove value error head inference.

cda055e

Merge remote-tracking branch 'upstream/master' into multihead-arch-cu…

1275584

…da-cutlass-fmha

Minor change in function parameter signature.

037b43e

Use actual policy head to calculate buffer sizes.

a55ebfa

Merge remote-tracking branch 'upstream/master' into multihead-arch-cuda

4cc4ddf

Merge branch 'multihead-arch-cuda' into multihead-arch-cuda-cutlass-fmha

f3b9e38

almaudoh changed the title ~~Multihead arch cuda cutlass fmha~~ Multihead arch cuda cutlass fused multihead attention Mar 11, 2024

almaudoh changed the title ~~Multihead arch cuda cutlass fused multihead attention~~ Multihead arch with cutlass fused multihead attention Mar 11, 2024

almaudoh added 2 commits March 23, 2024 13:27

Merge branch 'master' into multihead-arch-cuda-cutlass-fmha

b5cdc3e

Merge branch 'master' into multihead-arch-cuda-cutlass-fmha

955ba90

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multihead arch with cutlass fused multihead attention #1976

Multihead arch with cutlass fused multihead attention #1976

almaudoh commented Mar 11, 2024

Multihead arch with cutlass fused multihead attention #1976

Are you sure you want to change the base?

Multihead arch with cutlass fused multihead attention #1976

Conversation

almaudoh commented Mar 11, 2024