x86 optimization for gemm int8 #5763

nihui · 2024-10-31T03:21:45Z

codecov-commenter · 2024-12-16T07:17:10Z

Codecov Report

Attention: Patch coverage is 91.47803% with 64 lines in your changes missing coverage. Please review.

Project coverage is 95.08%. Comparing base (a9553fc) to head (2df658c).
Report is 1 commits behind head on master.

Files with missing lines	Patch %	Lines
src/layer/x86/gemm_x86.cpp	92.06%	30 Missing ⚠️
src/layer/x86/gemm_x86_avx2.cpp	0.00%	30 Missing ⚠️
src/layer/x86/gemm_x86_xop.cpp	0.00%	3 Missing ⚠️
src/layer/x86/x86_usability.h	98.30%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #5763      +/-   ##
==========================================
+ Coverage   94.93%   95.08%   +0.14%     
==========================================
  Files         820      824       +4     
  Lines      267315   276713    +9398     
==========================================
+ Hits       253778   263111    +9333     
- Misses      13537    13602      +65

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

nihui · 2024-12-16T10:58:22Z

a simple gemm int8 benchmark test

#include "benchmark.h"

static void benchmark_gemm_int8(int M, int N, int K)
{
    ncnn::Mat A = RandomMat(K, M);
    ncnn::Mat BT = RandomMat(K, N);

    ncnn::ParamDict pd;
    pd.set(0, 1.f); // alpha
    pd.set(1, 1.f); // beta
    pd.set(2, 0); // transA
    pd.set(3, 1); // transB
    pd.set(4, 0); // constantA
    pd.set(5, 0); // constantB
    pd.set(6, 1); // constantC
    pd.set(7, M);
    pd.set(8, N);
    pd.set(9, K);
    pd.set(10, -1); // broadcast_type_C
    pd.set(11, 0); // output_N1M
    pd.set(13, 0); // output_elemtype
    pd.set(14, 0); // output_transpose
    pd.set(18, 2); // int8_scale_term

    ncnn::Option opt;
    opt.num_threads = 1;

    ncnn::Layer* gemm = ncnn::create_layer("Gemm");

    gemm->load_param(pd);

    gemm->load_model(ncnn::ModelBinFromMatArray(0));

    gemm->create_pipeline(opt);

    std::vector<ncnn::Mat> inputs(2);
    inputs[0] = A;
    inputs[1] = BT;
    std::vector<ncnn::Mat> outputs(1);

    double mint = 999999999;

    for (int i = 0; i < 10; i++)
    {
    double t0 = ncnn::get_current_time();

    gemm->forward(inputs, outputs, opt);

    double t1 = ncnn::get_current_time();

    double t = t1 - t0;

    fprintf(stderr, "%.2f\n", t);

    if (t < mint)
        mint = t;
    }

    fprintf(stderr, "mint = %.2f\n", mint);

    ncnn::Mat out = outputs[0];

    gemm->destroy_pipeline(opt);

    delete gemm;
}

nihui · 2024-12-16T11:17:32Z

EPYC 9754 single thread gemm int8 dynamic-quant

M = N = K = 5000

transA=0 transB=1	fp32 time(ms)	int8 time(ms)	fps ratio versus fp32
naive	46371.86	14472.54	320.41%
+sse2	10214.79	4485.26	227.73%
+avx	3327.28	4126.75	80.62%
+avx2	3131.61	2278.17	137.46%
+avx512	2788.72	1832.91	152.15%
+avx512vnni	2788.72	822.01	339.26%

nihui and others added 3 commits October 28, 2024 14:54

x86 sse2/xop/avx/avx2 optimization for gemm int8

8076f4e

apply code-format changes

9c85f62

Merge branch 'Tencent:master' into gemm-quantize-x86

952bedd

github-actions bot added the x86 label Oct 31, 2024

w

502d58b

github-actions bot added the layer label Nov 4, 2024

nihui and others added 6 commits November 4, 2024 08:46

apply code-format changes

5721ca7

w

2c4bf75

apply code-format changes

58cfbb8

fix

c49bb4f

add c

c8b2c31

apply code-format changes

99a3a11

nihui closed this Nov 14, 2024

nihui reopened this Nov 14, 2024

nihui and others added 8 commits November 14, 2024 15:04

Merge branch 'master' into gemm-quantize-x86

3184d8b

fix

d314a14

fix avx

a321287

fix avx

459cf4c

divps

1bec95e

dispatch avxvnni

de33f23

Merge branch 'master' into gemm-quantize-x86

b2b0b96

skip round problem

aa727e4

github-actions bot added the test label Nov 28, 2024

nihui and others added 2 commits November 28, 2024 11:33

fix for x86 32bit

2143260

apply code-format changes

ea43b93

nihui closed this Nov 28, 2024

nihui reopened this Nov 28, 2024

nihui added 3 commits November 29, 2024 02:45

no sse fix

f32b4b4

comp avxvnni

15ebb61

fix vs2019 ice

1069d2f

nihui and others added 18 commits December 11, 2024 09:18

a

a3bf66f

aa

6347dd1

skip more halfway cases

82f1a2b

a

494f042

ua fix

4feceac

f

0df3241

f

fcdae8f

sde on ubuntu24

e52daad

fix avxvnni dispatch

9d22cd6

gcov14

65d185f

Merge branch 'master' into gemm-quantize-x86

7a7d918

opt avxvnniint8

165f2d4

apply code-format changes

2c08968

avxvnniint8 without wshift

9d759d9

fix

0d0abf2

fix dispatch

cdba870

ooops

fa88315

ooops

301f902

nihui added 6 commits December 16, 2024 07:20

fix

b69bb0f

opt unpack aligned cvt

f3f1fb5

opt avx512 scatter

33781f0

opt++

786247d

f

8126a5e

cc

2df658c

nihui changed the title ~~[WIP] x86 optimization for gemm int8~~ x86 optimization for gemm int8 Dec 16, 2024

cc

0c76a3d

nihui merged commit 44e0d95 into Tencent:master Dec 17, 2024
72 of 78 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

x86 optimization for gemm int8 #5763

x86 optimization for gemm int8 #5763

nihui commented Oct 31, 2024 •

edited

Loading

codecov-commenter commented Dec 16, 2024 •

edited

Loading

nihui commented Dec 16, 2024 •

edited

Loading

nihui commented Dec 16, 2024 •

edited

Loading

x86 optimization for gemm int8 #5763

x86 optimization for gemm int8 #5763

Conversation

nihui commented Oct 31, 2024 • edited Loading

codecov-commenter commented Dec 16, 2024 • edited Loading

Codecov Report

nihui commented Dec 16, 2024 • edited Loading

nihui commented Dec 16, 2024 • edited Loading

nihui commented Oct 31, 2024 •

edited

Loading

codecov-commenter commented Dec 16, 2024 •

edited

Loading

nihui commented Dec 16, 2024 •

edited

Loading

nihui commented Dec 16, 2024 •

edited

Loading