Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x86 optimization for gemm int8 #5763

Merged
merged 75 commits into from
Dec 17, 2024
Merged

x86 optimization for gemm int8 #5763

merged 75 commits into from
Dec 17, 2024

Conversation

nihui
Copy link
Member

@nihui nihui commented Oct 31, 2024

  • sse2 madd
  • sse4.1 cvt epi16
  • 64bit 16 registers
  • xop maddd
  • avx pack8
  • avx2 madd
  • avx2 gather
  • avx512 pack16 + 32 registers
  • avx512 scatter
  • avx512 vnni
  • avx vnni
  • avx vnni int8 (depends on avx vnni int8, avx vnni int16, avx ne convert infrastructure #5749)
  • avx10.1 + scatter + 32 registers (TODO infrastructure)
  • avx10.2 + avx512 vnni int8 (TODO infrastructure)
  • opt pack a
  • opt pack at
  • opt pack b
  • opt pack bt
  • opt unpack out
  • opt unpack aligned load + cvt ps

@github-actions github-actions bot added the x86 label Oct 31, 2024
@github-actions github-actions bot added the layer label Nov 4, 2024
@nihui nihui closed this Nov 14, 2024
@nihui nihui reopened this Nov 14, 2024
@github-actions github-actions bot added the test label Nov 28, 2024
@nihui nihui closed this Nov 28, 2024
@nihui nihui reopened this Nov 28, 2024
@codecov-commenter
Copy link

codecov-commenter commented Dec 16, 2024

Codecov Report

Attention: Patch coverage is 91.47803% with 64 lines in your changes missing coverage. Please review.

Project coverage is 95.08%. Comparing base (a9553fc) to head (2df658c).
Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
src/layer/x86/gemm_x86.cpp 92.06% 30 Missing ⚠️
src/layer/x86/gemm_x86_avx2.cpp 0.00% 30 Missing ⚠️
src/layer/x86/gemm_x86_xop.cpp 0.00% 3 Missing ⚠️
src/layer/x86/x86_usability.h 98.30% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5763      +/-   ##
==========================================
+ Coverage   94.93%   95.08%   +0.14%     
==========================================
  Files         820      824       +4     
  Lines      267315   276713    +9398     
==========================================
+ Hits       253778   263111    +9333     
- Misses      13537    13602      +65     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@nihui nihui changed the title [WIP] x86 optimization for gemm int8 x86 optimization for gemm int8 Dec 16, 2024
@nihui
Copy link
Member Author

nihui commented Dec 16, 2024

a simple gemm int8 benchmark test

#include "benchmark.h"

static void benchmark_gemm_int8(int M, int N, int K)
{
    ncnn::Mat A = RandomMat(K, M);
    ncnn::Mat BT = RandomMat(K, N);

    ncnn::ParamDict pd;
    pd.set(0, 1.f); // alpha
    pd.set(1, 1.f); // beta
    pd.set(2, 0); // transA
    pd.set(3, 1); // transB
    pd.set(4, 0); // constantA
    pd.set(5, 0); // constantB
    pd.set(6, 1); // constantC
    pd.set(7, M);
    pd.set(8, N);
    pd.set(9, K);
    pd.set(10, -1); // broadcast_type_C
    pd.set(11, 0); // output_N1M
    pd.set(13, 0); // output_elemtype
    pd.set(14, 0); // output_transpose
    pd.set(18, 2); // int8_scale_term

    ncnn::Option opt;
    opt.num_threads = 1;

    ncnn::Layer* gemm = ncnn::create_layer("Gemm");

    gemm->load_param(pd);

    gemm->load_model(ncnn::ModelBinFromMatArray(0));

    gemm->create_pipeline(opt);

    std::vector<ncnn::Mat> inputs(2);
    inputs[0] = A;
    inputs[1] = BT;
    std::vector<ncnn::Mat> outputs(1);

    double mint = 999999999;

    for (int i = 0; i < 10; i++)
    {
    double t0 = ncnn::get_current_time();

    gemm->forward(inputs, outputs, opt);

    double t1 = ncnn::get_current_time();

    double t = t1 - t0;

    fprintf(stderr, "%.2f\n", t);

    if (t < mint)
        mint = t;
    }

    fprintf(stderr, "mint = %.2f\n", mint);

    ncnn::Mat out = outputs[0];

    gemm->destroy_pipeline(opt);

    delete gemm;
}

@nihui
Copy link
Member Author

nihui commented Dec 16, 2024

EPYC 9754 single thread gemm int8 dynamic-quant

M = N = K = 5000

transA=0 transB=1 fp32 time(ms) int8 time(ms) fps ratio versus fp32
naive 46371.86 14472.54 320.41%
+sse2 10214.79 4485.26 227.73%
+avx 3327.28 4126.75 80.62%
+avx2 3131.61 2278.17 137.46%
+avx512 2788.72 1832.91 152.15%
+avx512vnni 2788.72 822.01 339.26%

@nihui nihui merged commit 44e0d95 into Tencent:master Dec 17, 2024
72 of 78 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants