G2

G2 is a TPC(Tensor Processor Cpre) optimized attention lib for transformer decoder on Intel Gaudi2.

Usage

build from source

clone source code in Intel Gaudi2 docker

git clone https://github.com/ZhaiFeiyue/g2.git

build

pip install .

install from github

pip install git+https://github.com/ZhaiFeiyue/g2.git#egg=g2attn

Run test

cd tests
./run.sh

Motivation

QK bmm of transformer decoder could be illustrated in the following picture

The Q shape is [B, M, 1, H], the K shape is [B, M, T, H] and the output shape is [B, M, 1, T].

where:

B is the Batch Size.
M is the number of head.
H is head dimention
T is the number of cached tokens for K.

Intel Gaudi2 is a Systolic array based AI Accelerators, and the peek tops is ~410T for BF16. But when calculate QK Bmm(same on ScoreV Bmm), the valid tops is only 3.2T, since the valid row of Q is only one, see following.

So this project aims to tackle the above problem by leverage the tops of TPC.

Benchmark

Llama2-7B QK BMM

BS	Head	KV length	Head Dim	Dtype	TPC latency(us)	MME latency(us)
64	32	128	128	BF16	63	41
64	32	256	128	BF16	62	77
64	32	512	128	BF16	126	69
64	32	1024	128	BF16	244	192
64	32	2048	128	BF16	498	467
64	32	4096	128	BF16	1007	935
64	32	8192	128	BF16	2011	1894

Llama2-7B ScoreV BMM

BS	Head	KV length	Head Dim	Dtype	TPC latency(us)	MME latency(us)
64	32	128	128	BF16	63	42
64	32	256	128	BF16	123	81
64	32	512	128	BF16	247	160
64	32	1024	128	BF16	482	311
64	32	2048	128	BF16	981	639
64	32	4096	128	BF16	1965	1279
64	32	8192	128	BF16	3911	2539

Know issue

~~not support release >1.12~~
bad perf of ScoreV BMM

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
cpp		cpp
g2attn		g2attn
imgs		imgs
tests		tests
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

G2

Usage

build from source

install from github

Run test

Motivation

Benchmark

Llama2-7B QK BMM

Llama2-7B ScoreV BMM

Know issue

About

Releases

Packages

Languages

yanfeich/g2

Folders and files

Latest commit

History

Repository files navigation

G2

Usage

build from source

install from github

Run test

Motivation

Benchmark

Llama2-7B QK BMM

Llama2-7B ScoreV BMM

Know issue

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages