Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arm optimization for convolution int8 winograd unified elempack #5087

Merged
merged 32 commits into from
Oct 22, 2023

Conversation

nihui
Copy link
Member

@nihui nihui commented Oct 13, 2023

  • f23
  • f43
  • armv7
  • aarch64
  • opt kernel transform
  • opt input transform
  • opt output transform
  • clean old code
  • unroll 8x12 on aarch64
  • unroll 8x6 on armv7
  • armv7 asm dot
  • aarch64 asm dot
  • armv6 asm dot
  • winograd strategy
  • coverage++
  • opt pipeline

@github-actions github-actions bot added the arm label Oct 13, 2023
@codecov-commenter
Copy link

codecov-commenter commented Oct 13, 2023

Codecov Report

Merging #5087 (dedf148) into master (7b02425) will increase coverage by 0.12%.
Report is 8 commits behind head on master.
The diff coverage is 86.48%.

@@            Coverage Diff             @@
##           master    #5087      +/-   ##
==========================================
+ Coverage   94.80%   94.92%   +0.12%     
==========================================
  Files         766      768       +2     
  Lines      233813   238832    +5019     
==========================================
+ Hits       221660   226707    +5047     
+ Misses      12153    12125      -28     
Files Coverage Δ
src/layer/arm/convolution_3x3_winograd_int8.h 86.73% <ø> (ø)
src/layer/arm/convolution_arm.cpp 94.69% <95.00%> (+0.01%) ⬆️
src/net.cpp 68.53% <76.47%> (+0.14%) ⬆️

... and 17 files with indirect coverage changes

@nihui nihui closed this Oct 15, 2023
@nihui nihui reopened this Oct 15, 2023
@nihui nihui closed this Oct 16, 2023
@nihui nihui reopened this Oct 16, 2023
@nihui
Copy link
Member Author

nihui commented Oct 19, 2023

rpi b+ ARM1176JZF-S
cpuinfo
pi@raspberrypi:~/ncnn-nihui/build/benchmark $ cat /proc/cpuinfo 
processor       : 0
model name      : ARMv6-compatible processor rev 7 (v6l)
BogoMIPS        : 697.95
Features        : half thumb fastmult vfp edsp java tls 
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x0
CPU part        : 0xb76
CPU revision    : 7

Hardware        : BCM2835
Revision        : 0010
Serial          : 00000000caad5d6c
Model           : Raspberry Pi Model B Plus Rev 1.2
benchncnn 4 1 0 -1 1
pi@raspberrypi:~/ncnn-nihui/build/benchmark $ ./benchncnn 4 1 0 -1 1
loop_count = 4
num_threads = 1
powersave = 0
gpu_device = -1
cooling_down = 1
          squeezenet  min = 4329.51  max = 4357.80  avg = 4341.54
     squeezenet_int8  min = 2419.66  max = 2507.00  avg = 2443.90
           mobilenet  min = 8155.59  max = 8353.52  avg = 8233.01
      mobilenet_int8  min = 3710.09  max = 3722.64  avg = 3716.83
        mobilenet_v2  min = 4527.67  max = 4534.54  avg = 4531.17
        mobilenet_v3  min = 3740.68  max = 3874.33  avg = 3777.30
          shufflenet  min = 2184.64  max = 2189.82  avg = 2186.83
       shufflenet_v2  min = 2084.47  max = 2098.41  avg = 2092.13
             mnasnet  min = 4322.04  max = 4327.11  avg = 4324.81
     proxylessnasnet  min = 4845.69  max = 4985.55  avg = 4904.14
     efficientnet_b0  min = 6982.16  max = 7144.19  avg = 7037.88
   efficientnetv2_b0  min = 8249.22  max = 8464.07  avg = 8304.05
        regnety_400m  min = 5569.03  max = 5791.17  avg = 5630.66
           blazeface  min =  574.71  max =  582.05  avg =  578.40
           googlenet  min = 14143.84  max = 14357.70  avg = 14208.96
      googlenet_int8  min = 8094.48  max = 8317.30  avg = 8154.98
            resnet18  min = 12582.33  max = 12742.96  avg = 12636.53
       resnet18_int8  min = 6881.77  max = 7095.92  avg = 6938.09
             alexnet  min = 9076.03  max = 9204.58  avg = 9134.50
            resnet50  min = 37487.75  max = 37760.14  avg = 37679.77
       resnet50_int8  min = 18935.19  max = 19190.13  avg = 19008.13
      squeezenet_ssd  min = 8980.34  max = 8999.25  avg = 8987.50
 squeezenet_ssd_int8  min = 5928.54  max = 5947.80  avg = 5937.39
       mobilenet_ssd  min = 16174.84  max = 16386.42  avg = 16232.88
  mobilenet_ssd_int8  min = 7729.10  max = 7806.16  avg = 7767.07
      mobilenet_yolo  min = 38328.63  max = 38581.02  avg = 38499.40
  mobilenetv2_yolov3  min = 17110.36  max = 17337.74  avg = 17182.45
         yolov4-tiny  min = 21850.60  max = 22076.44  avg = 21963.88
           nanodet_m  min = 5271.22  max = 5403.54  avg = 5306.32
    yolo-fastest-1.1  min = 2242.89  max = 2248.87  avg = 2245.37
      yolo-fastestv2  min = 1911.73  max = 1934.84  avg = 1923.62
          FastestDet  min = 2063.00  max = 2109.80  avg = 2086.33

rpi 2b Cortex-A7
cpuinfo
pi@raspberrypi:~/ncnn-nihui/build/benchmark $ cat /proc/cpuinfo 
processor       : 0
model name      : ARMv7 Processor rev 5 (v7l)
BogoMIPS        : 57.60
Features        : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm 
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x0
CPU part        : 0xc07
CPU revision    : 5

processor       : 1
model name      : ARMv7 Processor rev 5 (v7l)
BogoMIPS        : 57.60
Features        : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm 
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x0
CPU part        : 0xc07
CPU revision    : 5

processor       : 2
model name      : ARMv7 Processor rev 5 (v7l)
BogoMIPS        : 57.60
Features        : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm 
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x0
CPU part        : 0xc07
CPU revision    : 5

processor       : 3
model name      : ARMv7 Processor rev 5 (v7l)
BogoMIPS        : 57.60
Features        : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm 
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x0
CPU part        : 0xc07
CPU revision    : 5

Hardware        : BCM2835
Revision        : a01041
Serial          : 0000000049ec5540
Model           : Raspberry Pi 2 Model B Rev 1.1
benchncnn 4 1 0 -1 1 (baseline)
pi@raspberrypi:~/ncnn-nihui/build/benchmark $ ./benchncnn 4 1 0 -1 1
loop_count = 4
num_threads = 1
powersave = 0
gpu_device = -1
cooling_down = 1
          squeezenet  min =  518.58  max =  532.61  avg =  522.45
     squeezenet_int8  min =  364.08  max =  365.41  avg =  364.61
           mobilenet  min =  948.61  max =  949.18  avg =  948.77
      mobilenet_int8  min =  559.51  max =  559.77  avg =  559.61
        mobilenet_v2  min =  621.26  max =  621.44  avg =  621.39
        mobilenet_v3  min =  502.36  max =  503.00  avg =  502.58
          shufflenet  min =  308.57  max =  308.65  avg =  308.61
       shufflenet_v2  min =  287.93  max =  288.93  avg =  288.24
             mnasnet  min =  608.35  max =  608.66  avg =  608.48
     proxylessnasnet  min =  706.64  max =  707.23  avg =  706.79
     efficientnet_b0  min = 1058.97  max = 1059.43  avg = 1059.13
   efficientnetv2_b0  min = 1216.76  max = 1217.83  avg = 1217.24
        regnety_400m  min =  749.11  max =  749.31  avg =  749.23
           blazeface  min =   81.88  max =   82.37  avg =   82.18
           googlenet  min = 1762.08  max = 1763.22  avg = 1762.61
      googlenet_int8  min = 1150.01  max = 1152.68  avg = 1151.58
            resnet18  min = 1521.20  max = 1522.94  avg = 1522.17
       resnet18_int8  min =  875.23  max =  877.04  avg =  876.09
             alexnet  min = 1033.56  max = 1034.10  avg = 1033.76
            resnet50  min = 4479.07  max = 4480.21  avg = 4479.75
       resnet50_int8  min = 2631.74  max = 2632.86  avg = 2632.23
      squeezenet_ssd  min = 1169.08  max = 1170.23  avg = 1169.70
 squeezenet_ssd_int8  min =  844.51  max =  846.18  avg =  845.30
       mobilenet_ssd  min = 1929.58  max = 1931.11  avg = 1930.62
  mobilenet_ssd_int8  min = 1127.38  max = 1129.86  avg = 1128.17
      mobilenet_yolo  min = 4382.64  max = 4386.28  avg = 4384.19
  mobilenetv2_yolov3  min = 2192.50  max = 2193.69  avg = 2193.20
         yolov4-tiny  min = 2651.63  max = 2654.14  avg = 2652.88
           nanodet_m  min =  721.11  max =  723.02  avg =  721.89
    yolo-fastest-1.1  min =  321.80  max =  323.82  avg =  322.81
      yolo-fastestv2  min =  265.26  max =  265.78  avg =  265.52
          FastestDet  min =  296.89  max =  297.26  avg =  297.07
benchncnn 4 1 0 -1 1
pi@raspberrypi:~/ncnn-nihui/build/benchmark $ ./benchncnn 4 1 0 -1 1
loop_count = 4
num_threads = 1
powersave = 0
gpu_device = -1
cooling_down = 1
          squeezenet  min =  512.18  max =  512.66  avg =  512.44
     squeezenet_int8  min =  357.68  max =  358.05  avg =  357.87
           mobilenet  min =  937.70  max =  938.07  avg =  937.94
      mobilenet_int8  min =  532.32  max =  532.88  avg =  532.61
        mobilenet_v2  min =  613.82  max =  614.09  avg =  613.98
        mobilenet_v3  min =  496.76  max =  501.53  avg =  500.15
          shufflenet  min =  308.55  max =  309.07  avg =  308.91
       shufflenet_v2  min =  288.57  max =  289.06  avg =  288.83
             mnasnet  min =  608.06  max =  608.38  avg =  608.18
     proxylessnasnet  min =  705.81  max =  706.11  avg =  705.95
     efficientnet_b0  min = 1056.65  max = 1057.76  avg = 1057.18
   efficientnetv2_b0  min = 1214.17  max = 1214.83  avg = 1214.61
        regnety_400m  min =  746.31  max =  747.92  avg =  747.08
           blazeface  min =   82.37  max =   82.63  avg =   82.51
           googlenet  min = 1761.57  max = 1771.29  avg = 1766.47
      googlenet_int8  min = 1124.74  max = 1126.94  avg = 1125.83
            resnet18  min = 1521.04  max = 1523.23  avg = 1522.06
       resnet18_int8  min =  808.57  max =  813.74  avg =  810.75
             alexnet  min = 1032.23  max = 1033.45  avg = 1032.67
            resnet50  min = 4473.51  max = 4491.04  avg = 4479.68
       resnet50_int8  min = 2571.11  max = 2582.15  avg = 2574.05
      squeezenet_ssd  min = 1168.41  max = 1173.20  avg = 1171.39
 squeezenet_ssd_int8  min =  762.80  max =  763.85  avg =  763.35
       mobilenet_ssd  min = 1925.42  max = 1927.27  avg = 1926.13
  mobilenet_ssd_int8  min = 1086.77  max = 1093.43  avg = 1089.58
      mobilenet_yolo  min = 4368.95  max = 4369.57  avg = 4369.36
  mobilenetv2_yolov3  min = 2207.41  max = 2208.40  avg = 2208.02
         yolov4-tiny  min = 2665.51  max = 2666.58  avg = 2665.86
           nanodet_m  min =  722.40  max =  725.39  avg =  723.56
    yolo-fastest-1.1  min =  327.61  max =  328.95  avg =  328.42
      yolo-fastestv2  min =  267.36  max =  268.07  avg =  267.80
          FastestDet  min =  300.51  max =  301.31  avg =  300.88
benchncnn 4 4 0 -1 1 (baseline)
pi@raspberrypi:~/ncnn-nihui/build/benchmark $ ./benchncnn 4 4 0 -1 1
loop_count = 4
num_threads = 4
powersave = 0
gpu_device = -1
cooling_down = 1
          squeezenet  min =  179.14  max =  179.54  avg =  179.26
     squeezenet_int8  min =  137.49  max =  137.74  avg =  137.59
           mobilenet  min =  296.58  max =  297.12  avg =  296.81
      mobilenet_int8  min =  155.55  max =  156.80  avg =  155.91
        mobilenet_v2  min =  237.43  max =  239.10  avg =  238.03
        mobilenet_v3  min =  189.16  max =  190.10  avg =  189.48
          shufflenet  min =  131.31  max =  131.69  avg =  131.47
       shufflenet_v2  min =  113.94  max =  114.03  avg =  114.00
             mnasnet  min =  217.68  max =  218.13  avg =  217.99
     proxylessnasnet  min =  238.68  max =  239.56  avg =  239.00
     efficientnet_b0  min =  339.77  max =  340.28  avg =  339.97
   efficientnetv2_b0  min =  375.56  max =  376.72  avg =  376.24
        regnety_400m  min =  285.06  max =  285.96  avg =  285.51
           blazeface  min =   34.55  max =   34.74  avg =   34.66
           googlenet  min =  556.23  max =  558.14  avg =  557.35
      googlenet_int8  min =  375.95  max =  377.53  avg =  376.67
            resnet18  min =  517.93  max =  519.53  avg =  518.58
       resnet18_int8  min =  285.20  max =  286.37  avg =  285.64
             alexnet  min =  343.10  max =  344.66  avg =  343.96
            resnet50  min = 1351.43  max = 1366.64  avg = 1355.37
       resnet50_int8  min =  795.59  max =  796.60  avg =  796.07
      squeezenet_ssd  min =  451.83  max =  454.66  avg =  452.87
 squeezenet_ssd_int8  min =  339.62  max =  340.85  avg =  340.30
       mobilenet_ssd  min =  604.77  max =  605.58  avg =  605.14
  mobilenet_ssd_int8  min =  318.34  max =  318.85  avg =  318.52
      mobilenet_yolo  min = 1352.10  max = 1363.70  avg = 1355.33
  mobilenetv2_yolov3  min =  789.76  max =  791.34  avg =  790.52
         yolov4-tiny  min =  911.63  max =  912.88  avg =  912.20
           nanodet_m  min =  294.92  max =  295.78  avg =  295.24
    yolo-fastest-1.1  min =  160.43  max =  161.47  avg =  160.93
      yolo-fastestv2  min =  125.98  max =  126.43  avg =  126.14
          FastestDet  min =  131.26  max =  131.43  avg =  131.32
benchncnn 4 4 0 -1 1
pi@raspberrypi:~/ncnn-nihui/build/benchmark $ ./benchncnn 4 4 0 -1 1
loop_count = 4
num_threads = 4
powersave = 0
gpu_device = -1
cooling_down = 1
          squeezenet  min =  178.10  max =  178.57  avg =  178.24
     squeezenet_int8  min =  134.06  max =  134.18  avg =  134.11
           mobilenet  min =  296.53  max =  296.84  avg =  296.67
      mobilenet_int8  min =  149.91  max =  150.23  avg =  150.04
        mobilenet_v2  min =  236.43  max =  237.30  avg =  236.78
        mobilenet_v3  min =  189.10  max =  189.76  avg =  189.32
          shufflenet  min =  130.70  max =  131.15  avg =  130.89
       shufflenet_v2  min =  114.91  max =  115.15  avg =  115.05
             mnasnet  min =  217.05  max =  217.45  avg =  217.29
     proxylessnasnet  min =  238.07  max =  238.41  avg =  238.20
     efficientnet_b0  min =  338.96  max =  339.72  avg =  339.36
   efficientnetv2_b0  min =  374.33  max =  375.66  avg =  374.75
        regnety_400m  min =  285.02  max =  285.41  avg =  285.22
           blazeface  min =   33.55  max =   33.92  avg =   33.67
           googlenet  min =  554.78  max =  556.07  avg =  555.50
      googlenet_int8  min =  368.47  max =  369.42  avg =  369.05
            resnet18  min =  515.32  max =  516.09  avg =  515.55
       resnet18_int8  min =  275.26  max =  275.63  avg =  275.45
             alexnet  min =  340.04  max =  341.64  avg =  340.73
            resnet50  min = 1344.16  max = 1344.86  avg = 1344.67
       resnet50_int8  min =  789.21  max =  790.96  avg =  789.78
      squeezenet_ssd  min =  449.68  max =  466.64  avg =  455.10
 squeezenet_ssd_int8  min =  309.57  max =  311.24  avg =  310.20
       mobilenet_ssd  min =  603.28  max =  604.08  avg =  603.58
  mobilenet_ssd_int8  min =  307.20  max =  308.63  avg =  307.83
      mobilenet_yolo  min = 1347.47  max = 1365.64  avg = 1355.48
  mobilenetv2_yolov3  min =  793.39  max =  794.62  avg =  794.04
         yolov4-tiny  min =  907.81  max =  928.53  avg =  915.57
           nanodet_m  min =  292.37  max =  293.11  avg =  292.82
    yolo-fastest-1.1  min =  158.16  max =  159.08  avg =  158.62
      yolo-fastestv2  min =  126.99  max =  127.27  avg =  127.10
          FastestDet  min =  133.80  max =  133.89  avg =  133.86

rpi 3b+ Cortex-A53
cpuinfo
pi@raspberrypi:~/ncnn-nihui/build/benchmark $ cat /proc/cpuinfo 
processor       : 0
BogoMIPS        : 38.40
Features        : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x0
CPU part        : 0xd03
CPU revision    : 4

processor       : 1
BogoMIPS        : 38.40
Features        : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x0
CPU part        : 0xd03
CPU revision    : 4

processor       : 2
BogoMIPS        : 38.40
Features        : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x0
CPU part        : 0xd03
CPU revision    : 4

processor       : 3
BogoMIPS        : 38.40
Features        : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x0
CPU part        : 0xd03
CPU revision    : 4

Hardware        : BCM2835
Revision        : a020d3
Serial          : 0000000007cc3d4b
Model           : Raspberry Pi 3 Model B Plus Rev 1.3
benchncnn 4 1 0 -1 1 (baseline)
pi@raspberrypi:~/ncnn-nihui/build/benchmark $ ./benchncnn 4 1 0 -1 1
loop_count = 4
num_threads = 1
powersave = 0
gpu_device = -1
cooling_down = 1
          squeezenet  min =  125.44  max =  125.68  avg =  125.57
     squeezenet_int8  min =  131.01  max =  131.68  avg =  131.34
           mobilenet  min =  204.91  max =  205.13  avg =  205.05
      mobilenet_int8  min =  168.36  max =  168.92  avg =  168.59
        mobilenet_v2  min =  156.91  max =  158.43  avg =  157.34
        mobilenet_v3  min =  125.69  max =  126.67  avg =  126.19
          shufflenet  min =   84.08  max =   84.20  avg =   84.16
       shufflenet_v2  min =   73.47  max =   73.65  avg =   73.57
             mnasnet  min =  150.57  max =  151.12  avg =  150.80
     proxylessnasnet  min =  204.80  max =  205.81  avg =  205.31
     efficientnet_b0  min =  233.95  max =  234.72  avg =  234.42
   efficientnetv2_b0  min =  264.98  max =  266.89  avg =  265.74
        regnety_400m  min =  185.89  max =  186.31  avg =  186.09
           blazeface  min =   22.95  max =   23.32  avg =   23.07
           googlenet  min =  449.79  max =  450.84  avg =  450.43
      googlenet_int8  min =  409.38  max =  410.16  avg =  409.66
            resnet18  min =  376.04  max =  378.51  avg =  377.46
       resnet18_int8  min =  288.59  max =  289.93  avg =  289.32
             alexnet  min =  272.49  max =  272.99  avg =  272.72
               vgg16  min = 1907.40  max = 1911.05  avg = 1909.04
          vgg16_int8  min = 1580.93  max = 1660.43  avg = 1607.75
            resnet50  min =  975.32  max =  979.14  avg =  977.05
       resnet50_int8  min =  836.57  max =  836.71  avg =  836.63
      squeezenet_ssd  min =  329.74  max =  332.33  avg =  330.93
 squeezenet_ssd_int8  min =  304.49  max =  305.34  avg =  304.98
       mobilenet_ssd  min =  425.41  max =  426.72  avg =  426.16
  mobilenet_ssd_int8  min =  347.71  max =  348.18  avg =  347.93
      mobilenet_yolo  min =  936.43  max =  938.41  avg =  937.25
  mobilenetv2_yolov3  min =  540.89  max =  542.19  avg =  541.51
         yolov4-tiny  min =  665.50  max =  669.36  avg =  667.41
           nanodet_m  min =  187.51  max =  189.26  avg =  188.07
    yolo-fastest-1.1  min =   99.38  max =  100.14  avg =   99.88
      yolo-fastestv2  min =   80.79  max =   81.33  avg =   81.16
  vision_transformer  min = 10579.97  max = 10647.02  avg = 10609.02
          FastestDet  min =   88.84  max =   89.06  avg =   88.96
benchncnn 4 1 0 -1 1
pi@raspberrypi:~/ncnn-nihui/build/benchmark $ ./benchncnn 4 1 0 -1 1
loop_count = 4
num_threads = 1
powersave = 0
gpu_device = -1
cooling_down = 1
          squeezenet  min =  124.92  max =  125.45  avg =  125.07
     squeezenet_int8  min =  116.38  max =  116.66  avg =  116.48
           mobilenet  min =  204.22  max =  204.41  avg =  204.34
      mobilenet_int8  min =  166.71  max =  167.54  avg =  167.14
        mobilenet_v2  min =  158.84  max =  159.33  avg =  159.13
        mobilenet_v3  min =  125.46  max =  126.25  avg =  125.92
          shufflenet  min =   82.75  max =   82.92  avg =   82.84
       shufflenet_v2  min =   73.46  max =   73.63  avg =   73.54
             mnasnet  min =  150.50  max =  150.85  avg =  150.69
     proxylessnasnet  min =  207.24  max =  207.76  avg =  207.54
     efficientnet_b0  min =  233.75  max =  234.29  avg =  234.09
   efficientnetv2_b0  min =  265.64  max =  266.20  avg =  265.82
        regnety_400m  min =  186.21  max =  186.50  avg =  186.35
           blazeface  min =   23.49  max =   23.52  avg =   23.50
           googlenet  min =  453.07  max =  454.25  avg =  453.49
      googlenet_int8  min =  383.37  max =  384.04  avg =  383.80
            resnet18  min =  377.41  max =  458.65  avg =  397.92
       resnet18_int8  min =  259.24  max =  260.11  avg =  259.63
             alexnet  min =  273.15  max =  273.60  avg =  273.34
               vgg16  min = 1910.45  max = 1928.25  avg = 1915.65
          vgg16_int8  min = 1512.30  max = 1518.76  avg = 1515.76
            resnet50  min =  976.53  max =  979.01  avg =  977.22
       resnet50_int8  min =  802.85  max =  805.10  avg =  803.67
      squeezenet_ssd  min =  331.39  max =  331.99  avg =  331.65
 squeezenet_ssd_int8  min =  268.93  max =  270.05  avg =  269.48
       mobilenet_ssd  min =  426.92  max =  428.40  avg =  427.58
  mobilenet_ssd_int8  min =  339.42  max =  340.46  avg =  339.92
      mobilenet_yolo  min =  942.63  max =  943.88  avg =  943.03
  mobilenetv2_yolov3  min =  541.70  max =  542.87  avg =  542.04
         yolov4-tiny  min =  669.26  max =  673.23  avg =  671.31
           nanodet_m  min =  187.01  max =  188.24  avg =  187.48
    yolo-fastest-1.1  min =  101.10  max =  102.28  avg =  101.86
      yolo-fastestv2  min =   81.13  max =   81.47  avg =   81.34
  vision_transformer  min = 10558.38  max = 10655.02  avg = 10614.41
          FastestDet  min =   89.90  max =   90.13  avg =   90.02
benchncnn 4 4 0 -1 1 (baseline)
pi@raspberrypi:~/ncnn-nihui/build/benchmark $ ./benchncnn 4 4 0 -1 1
loop_count = 4
num_threads = 4
powersave = 0
gpu_device = -1
cooling_down = 1
          squeezenet  min =   83.58  max =   84.09  avg =   83.90
     squeezenet_int8  min =   75.52  max =   76.47  avg =   75.96
           mobilenet  min =  107.48  max =  109.39  avg =  108.03
      mobilenet_int8  min =   63.97  max =   64.08  avg =   64.03
        mobilenet_v2  min =  109.03  max =  109.26  avg =  109.10
        mobilenet_v3  min =   83.85  max =   84.85  avg =   84.40
          shufflenet  min =   60.02  max =   60.18  avg =   60.10
       shufflenet_v2  min =   45.69  max =   45.99  avg =   45.79
             mnasnet  min =   90.54  max =   91.26  avg =   90.99
     proxylessnasnet  min =  104.88  max =  107.08  avg =  105.61
     efficientnet_b0  min =  134.28  max =  134.92  avg =  134.58
   efficientnetv2_b0  min =  145.43  max =  146.74  avg =  146.39
        regnety_400m  min =  116.00  max =  116.36  avg =  116.20
           blazeface  min =   15.59  max =   15.69  avg =   15.64
           googlenet  min =  220.78  max =  221.44  avg =  221.21
      googlenet_int8  min =  179.20  max =  181.56  avg =  180.38
            resnet18  min =  275.70  max =  277.07  avg =  276.28
       resnet18_int8  min =  152.83  max =  154.78  avg =  153.80
             alexnet  min =  160.33  max =  163.19  avg =  161.81
               vgg16  min = 1042.99  max = 1112.94  avg = 1067.10
          vgg16_int8  min =  683.60  max =  725.77  avg =  697.55
            resnet50  min =  564.26  max =  567.52  avg =  565.90
       resnet50_int8  min =  374.21  max =  376.66  avg =  375.07
      squeezenet_ssd  min =  248.07  max =  248.97  avg =  248.52
 squeezenet_ssd_int8  min =  193.81  max =  194.61  avg =  194.22
       mobilenet_ssd  min =  240.68  max =  242.19  avg =  241.50
  mobilenet_ssd_int8  min =  134.09  max =  134.81  avg =  134.42
      mobilenet_yolo  min =  502.81  max =  504.28  avg =  503.59
  mobilenetv2_yolov3  min =  360.16  max =  361.24  avg =  360.77
         yolov4-tiny  min =  403.21  max =  410.73  avg =  407.58
           nanodet_m  min =  123.88  max =  124.39  avg =  124.06
    yolo-fastest-1.1  min =   82.75  max =   83.87  avg =   83.30
      yolo-fastestv2  min =   64.19  max =   66.54  avg =   64.87
  vision_transformer  min = 3052.05  max = 3128.10  avg = 3092.54
          FastestDet  min =   67.35  max =   68.19  avg =   67.66
benchncnn 4 4 0 -1 1
pi@raspberrypi:~/ncnn-nihui/build/benchmark $ ./benchncnn 4 4 0 -1 1
loop_count = 4
num_threads = 4
powersave = 0
gpu_device = -1
cooling_down = 1
          squeezenet  min =   84.72  max =   85.57  avg =   84.99
     squeezenet_int8  min =   67.33  max =   67.64  avg =   67.50
           mobilenet  min =  107.52  max =  108.02  avg =  107.68
      mobilenet_int8  min =   60.82  max =   61.63  avg =   61.05
        mobilenet_v2  min =  109.57  max =  113.92  avg =  110.87
        mobilenet_v3  min =   83.79  max =   85.36  avg =   84.54
          shufflenet  min =   59.94  max =   60.20  avg =   60.07
       shufflenet_v2  min =   45.78  max =   45.98  avg =   45.86
             mnasnet  min =   94.13  max =   94.81  avg =   94.52
     proxylessnasnet  min =  106.48  max =  106.58  avg =  106.55
     efficientnet_b0  min =  132.03  max =  132.22  avg =  132.11
   efficientnetv2_b0  min =  143.59  max =  145.08  avg =  144.44
        regnety_400m  min =  118.39  max =  128.78  avg =  121.15
           blazeface  min =   15.49  max =   15.56  avg =   15.51
           googlenet  min =  223.50  max =  225.50  avg =  224.04
      googlenet_int8  min =  175.51  max =  176.37  avg =  175.98
            resnet18  min =  265.24  max =  266.84  avg =  266.01
       resnet18_int8  min =  147.70  max =  148.40  avg =  148.20
             alexnet  min =  161.68  max =  164.01  avg =  162.73
               vgg16  min = 1048.43  max = 1076.76  avg = 1056.07
          vgg16_int8  min =  657.45  max =  692.69  avg =  670.58
            resnet50  min =  560.62  max =  562.66  avg =  561.75
       resnet50_int8  min =  371.20  max =  375.11  avg =  372.68
      squeezenet_ssd  min =  244.91  max =  247.88  avg =  246.46
 squeezenet_ssd_int8  min =  171.17  max =  175.52  avg =  172.57
       mobilenet_ssd  min =  241.25  max =  243.41  avg =  242.23
  mobilenet_ssd_int8  min =  128.81  max =  129.24  avg =  129.02
      mobilenet_yolo  min =  487.40  max =  490.34  avg =  488.26
  mobilenetv2_yolov3  min =  358.78  max =  362.31  avg =  360.18
         yolov4-tiny  min =  406.67  max =  420.51  avg =  412.07
           nanodet_m  min =  124.22  max =  124.72  avg =  124.44
    yolo-fastest-1.1  min =   83.45  max =   84.02  avg =   83.66
      yolo-fastestv2  min =   64.01  max =   64.39  avg =   64.20
  vision_transformer  min = 3109.58  max = 3176.68  avg = 3145.96
          FastestDet  min =   65.57  max =   65.87  avg =   65.78

rpi 4b Cortex-A72
cpuinfo
pi@raspberrypi:~/ncnn-nihui/build/benchmark $ cat /proc/cpuinfo 
processor       : 0
BogoMIPS        : 108.00
Features        : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x0
CPU part        : 0xd08
CPU revision    : 3

processor       : 1
BogoMIPS        : 108.00
Features        : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x0
CPU part        : 0xd08
CPU revision    : 3

processor       : 2
BogoMIPS        : 108.00
Features        : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x0
CPU part        : 0xd08
CPU revision    : 3

processor       : 3
BogoMIPS        : 108.00
Features        : fp asimd evtstrm crc32 cpuid
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x0
CPU part        : 0xd08
CPU revision    : 3

Hardware        : BCM2835
Revision        : c03115
Serial          : 1000000051cdb36a
Model           : Raspberry Pi 4 Model B Rev 1.5
benchncnn 4 1 0 -1 1 (baseline)
pi@raspberrypi:~/ncnn-nihui/build/benchmark $ ./benchncnn 4 1 0 -1 1
loop_count = 4
num_threads = 1
powersave = 0
gpu_device = -1
cooling_down = 1
          squeezenet  min =   71.83  max =   71.95  avg =   71.87
     squeezenet_int8  min =   63.69  max =   63.83  avg =   63.76
           mobilenet  min =  121.30  max =  121.48  avg =  121.39
      mobilenet_int8  min =   80.31  max =   81.24  avg =   80.65
        mobilenet_v2  min =   90.14  max =   90.33  avg =   90.23
        mobilenet_v3  min =   70.17  max =   71.18  avg =   70.76
          shufflenet  min =   45.08  max =   45.19  avg =   45.15
       shufflenet_v2  min =   39.69  max =   39.73  avg =   39.71
             mnasnet  min =   82.87  max =   84.21  avg =   83.23
     proxylessnasnet  min =   98.64  max =   99.13  avg =   98.90
     efficientnet_b0  min =  133.21  max =  133.41  avg =  133.32
   efficientnetv2_b0  min =  153.54  max =  153.91  avg =  153.76
        regnety_400m  min =  103.38  max =  103.66  avg =  103.47
           blazeface  min =   12.60  max =   12.67  avg =   12.64
           googlenet  min =  236.86  max =  237.33  avg =  237.16
      googlenet_int8  min =  197.63  max =  198.67  avg =  198.23
            resnet18  min =  183.85  max =  184.18  avg =  184.04
       resnet18_int8  min =  139.71  max =  140.31  avg =  140.09
             alexnet  min =  141.91  max =  141.98  avg =  141.94
               vgg16  min = 1049.27  max = 1056.14  avg = 1052.68
          vgg16_int8  min =  831.77  max =  867.64  avg =  841.85
            resnet50  min =  523.68  max =  524.41  avg =  524.06
       resnet50_int8  min =  403.59  max =  404.71  avg =  404.15
      squeezenet_ssd  min =  174.45  max =  174.75  avg =  174.64
 squeezenet_ssd_int8  min =  150.50  max =  150.85  avg =  150.63
       mobilenet_ssd  min =  243.19  max =  243.54  avg =  243.39
  mobilenet_ssd_int8  min =  161.94  max =  162.16  avg =  162.05
      mobilenet_yolo  min =  553.20  max =  555.25  avg =  553.77
  mobilenetv2_yolov3  min =  298.91  max =  299.23  avg =  299.07
         yolov4-tiny  min =  363.05  max =  364.83  avg =  363.93
           nanodet_m  min =  104.99  max =  105.31  avg =  105.12
    yolo-fastest-1.1  min =   53.94  max =   54.23  avg =   54.11
      yolo-fastestv2  min =   45.58  max =   45.76  avg =   45.64
  vision_transformer  min = 5867.75  max = 5874.66  avg = 5870.28
          FastestDet  min =   48.69  max =   49.20  avg =   49.03
benchncnn 4 1 0 -1 1
pi@raspberrypi:~/ncnn-nihui/build/benchmark $ ./benchncnn 4 1 0 -1 1
loop_count = 4
num_threads = 1
powersave = 0
gpu_device = -1
cooling_down = 1
          squeezenet  min =   71.53  max =   71.93  avg =   71.66
     squeezenet_int8  min =   60.92  max =   61.08  avg =   61.00
           mobilenet  min =  120.02  max =  120.18  avg =  120.11
      mobilenet_int8  min =   79.57  max =   79.67  avg =   79.61
        mobilenet_v2  min =   88.56  max =   88.66  avg =   88.60
        mobilenet_v3  min =   70.27  max =   70.61  avg =   70.46
          shufflenet  min =   45.22  max =   45.70  avg =   45.38
       shufflenet_v2  min =   39.48  max =   39.58  avg =   39.52
             mnasnet  min =   81.78  max =   82.04  avg =   81.90
     proxylessnasnet  min =   99.43  max =   99.75  avg =   99.59
     efficientnet_b0  min =  132.76  max =  133.44  avg =  133.18
   efficientnetv2_b0  min =  153.48  max =  154.17  avg =  153.78
        regnety_400m  min =  103.26  max =  103.51  avg =  103.40
           blazeface  min =   12.50  max =   12.59  avg =   12.53
           googlenet  min =  236.49  max =  236.92  avg =  236.78
      googlenet_int8  min =  193.24  max =  193.44  avg =  193.34
            resnet18  min =  183.06  max =  183.54  avg =  183.31
       resnet18_int8  min =  136.08  max =  136.29  avg =  136.16
             alexnet  min =  141.42  max =  142.11  avg =  141.63
               vgg16  min = 1046.01  max = 1050.32  avg = 1047.36
          vgg16_int8  min =  850.98  max =  853.93  avg =  851.95
            resnet50  min =  523.67  max =  523.92  avg =  523.77
       resnet50_int8  min =  393.57  max =  393.89  avg =  393.75
      squeezenet_ssd  min =  175.02  max =  175.40  avg =  175.18
 squeezenet_ssd_int8  min =  141.46  max =  144.58  avg =  142.35
       mobilenet_ssd  min =  246.40  max =  248.00  avg =  247.12
  mobilenet_ssd_int8  min =  160.86  max =  161.14  avg =  160.98
      mobilenet_yolo  min =  552.93  max =  553.56  avg =  553.20
  mobilenetv2_yolov3  min =  299.88  max =  300.35  avg =  300.19
         yolov4-tiny  min =  364.75  max =  366.17  avg =  365.42
           nanodet_m  min =  105.38  max =  106.46  avg =  105.77
    yolo-fastest-1.1  min =   54.43  max =   54.94  avg =   54.65
      yolo-fastestv2  min =   45.59  max =   45.83  avg =   45.71
  vision_transformer  min = 5868.41  max = 5871.90  avg = 5870.07
          FastestDet  min =   49.71  max =   49.75  avg =   49.73
benchncnn 4 4 0 -1 1 (baseline)
pi@raspberrypi:~/ncnn-nihui/build/benchmark $ ./benchncnn 4 4 0 -1 1
loop_count = 4
num_threads = 4
powersave = 0
gpu_device = -1
cooling_down = 1
          squeezenet  min =   44.07  max =   44.38  avg =   44.23
     squeezenet_int8  min =   39.82  max =   39.88  avg =   39.84
           mobilenet  min =   58.19  max =   58.50  avg =   58.39
      mobilenet_int8  min =   32.58  max =   32.63  avg =   32.60
        mobilenet_v2  min =   59.01  max =   59.78  avg =   59.24
        mobilenet_v3  min =   44.29  max =   44.45  avg =   44.36
          shufflenet  min =   30.98  max =   31.20  avg =   31.09
       shufflenet_v2  min =   22.21  max =   22.39  avg =   22.31
             mnasnet  min =   48.97  max =   49.46  avg =   49.13
     proxylessnasnet  min =   50.65  max =   50.86  avg =   50.76
     efficientnet_b0  min =   72.22  max =   72.54  avg =   72.34
   efficientnetv2_b0  min =   74.86  max =   75.75  avg =   75.27
        regnety_400m  min =   63.04  max =   63.10  avg =   63.07
           blazeface  min =    8.00  max =    8.02  avg =    8.01
           googlenet  min =  113.12  max =  113.97  avg =  113.61
      googlenet_int8  min =   95.22  max =   95.37  avg =   95.27
            resnet18  min =  119.41  max =  119.90  avg =  119.65
       resnet18_int8  min =   78.21  max =   78.27  avg =   78.25
             alexnet  min =   77.03  max =   78.75  avg =   77.89
               vgg16  min =  539.93  max =  556.65  avg =  544.59
          vgg16_int8  min =  377.55  max =  380.28  avg =  378.86
            resnet50  min =  259.10  max =  259.43  avg =  259.22
       resnet50_int8  min =  191.10  max =  191.54  avg =  191.23
      squeezenet_ssd  min =  124.25  max =  124.86  avg =  124.57
 squeezenet_ssd_int8  min =   97.36  max =   97.65  avg =   97.53
       mobilenet_ssd  min =  118.11  max =  118.85  avg =  118.47
  mobilenet_ssd_int8  min =   66.42  max =   66.95  avg =   66.73
      mobilenet_yolo  min =  269.19  max =  269.79  avg =  269.51
  mobilenetv2_yolov3  min =  179.84  max =  180.22  avg =  180.01
         yolov4-tiny  min =  203.21  max =  208.08  avg =  206.24
           nanodet_m  min =   67.69  max =   67.88  avg =   67.79
    yolo-fastest-1.1  min =   44.17  max =   44.71  avg =   44.38
      yolo-fastestv2  min =   34.30  max =   36.16  avg =   34.97
  vision_transformer  min = 2119.71  max = 2149.15  avg = 2131.72
          FastestDet  min =   35.28  max =   35.41  avg =   35.36
benchncnn 4 4 0 -1 1
pi@raspberrypi:~/ncnn-nihui/build/benchmark $ ./benchncnn 4 4 0 -1 1
loop_count = 4
num_threads = 4
powersave = 0
gpu_device = -1
cooling_down = 1
          squeezenet  min =   44.31  max =   45.08  avg =   44.85
     squeezenet_int8  min =   38.85  max =   39.14  avg =   39.03
           mobilenet  min =   58.65  max =   59.03  avg =   58.84
      mobilenet_int8  min =   32.90  max =   32.97  avg =   32.93
        mobilenet_v2  min =   58.50  max =   58.69  avg =   58.61
        mobilenet_v3  min =   44.71  max =   44.90  avg =   44.82
          shufflenet  min =   31.84  max =   31.97  avg =   31.88
       shufflenet_v2  min =   22.96  max =   23.13  avg =   23.05
             mnasnet  min =   49.87  max =   50.10  avg =   49.98
     proxylessnasnet  min =   50.97  max =   51.21  avg =   51.05
     efficientnet_b0  min =   71.03  max =   71.52  avg =   71.24
   efficientnetv2_b0  min =   75.30  max =   75.42  avg =   75.37
        regnety_400m  min =   62.71  max =   62.79  avg =   62.75
           blazeface  min =    8.10  max =    8.19  avg =    8.16
           googlenet  min =  112.95  max =  113.65  avg =  113.20
      googlenet_int8  min =   92.81  max =   96.24  avg =   94.30
            resnet18  min =  120.29  max =  120.84  avg =  120.51
       resnet18_int8  min =   71.66  max =   72.21  avg =   71.95
             alexnet  min =   77.98  max =   78.20  avg =   78.11
               vgg16  min =  543.02  max =  551.09  avg =  547.39
          vgg16_int8  min =  365.33  max =  367.00  avg =  366.16
            resnet50  min =  256.01  max =  257.39  avg =  256.75
       resnet50_int8  min =  184.14  max =  184.38  avg =  184.26
      squeezenet_ssd  min =  123.26  max =  123.85  avg =  123.56
 squeezenet_ssd_int8  min =   93.95  max =   97.51  avg =   95.67
       mobilenet_ssd  min =  118.43  max =  119.24  avg =  118.95
  mobilenet_ssd_int8  min =   66.81  max =   66.95  avg =   66.89
      mobilenet_yolo  min =  260.28  max =  261.86  avg =  261.34
  mobilenetv2_yolov3  min =  179.18  max =  179.46  avg =  179.34
         yolov4-tiny  min =  205.35  max =  218.31  avg =  209.97
           nanodet_m  min =   67.85  max =   68.46  avg =   68.13
    yolo-fastest-1.1  min =   44.10  max =   44.52  avg =   44.31
      yolo-fastestv2  min =   34.76  max =   35.28  avg =   35.02
  vision_transformer  min = 2120.09  max = 2200.03  avg = 2168.78
          FastestDet  min =   35.89  max =   36.27  avg =   36.06
pi@raspberrypi:~/vkpeak/build $ ./vkpeak 0
device       = V3D 4.2.14

fp32-scalar  = 1.66 GFLOPS
fp32-vec4    = 2.12 GFLOPS

fp16-scalar  = 0.00 GFLOPS
fp16-vec4    = 0.00 GFLOPS
fp16-matrix  = 0.00 GFLOPS

fp64-scalar  = 0.00 GFLOPS
fp64-vec4    = 0.00 GFLOPS

int32-scalar = 1.18 GIOPS
int32-vec4   = 1.52 GIOPS

int16-scalar = 0.00 GIOPS
int16-vec4   = 0.00 GIOPS
pi@raspberrypi:~/vkpeak/build $ ./vkpeak 1
device       = llvmpipe (LLVM 15.0.6, 128 bits)

fp32-scalar  = 3.61 GFLOPS
fp32-vec4    = 19.97 GFLOPS

fp16-scalar  = 1.31 GFLOPS
fp16-vec4    = 3.86 GFLOPS
fp16-matrix  = 0.00 GFLOPS

fp64-scalar  = 3.46 GFLOPS
fp64-vec4    = 3.27 GFLOPS

int32-scalar = 4.76 GIOPS
int32-vec4   = 20.13 GIOPS

int16-scalar = 4.88 GIOPS
int16-vec4   = 24.33 GIOPS

@github-actions github-actions bot added the core label Oct 20, 2023
@nihui nihui force-pushed the arm-conv-unified-elempack-5 branch from b06133d to adede93 Compare October 20, 2023 03:38
@nihui
Copy link
Member Author

nihui commented Oct 21, 2023

nihui@rock-3a:~/ncnn-nihui/build/benchmark$ cat /proc/cpuinfo 
processor       : 0
BogoMIPS        : 48.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x2
CPU part        : 0xd05
CPU revision    : 0

processor       : 1
BogoMIPS        : 48.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x2
CPU part        : 0xd05
CPU revision    : 0

processor       : 2
BogoMIPS        : 48.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x2
CPU part        : 0xd05
CPU revision    : 0

processor       : 3
BogoMIPS        : 48.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x2
CPU part        : 0xd05
CPU revision    : 0

benchncnn 4 1 0 -1 1 (baseline)
nihui@rock-3a:~/ncnn-nihui/build/benchmark$ ./benchncnn 4 1 0 -1 1
loop_count = 4
num_threads = 1
powersave = 0
gpu_device = -1
cooling_down = 1
          squeezenet  min =   44.38  max =   45.31  avg =   44.89
     squeezenet_int8  min =   55.54  max =   56.40  avg =   55.91
           mobilenet  min =   65.24  max =   65.46  avg =   65.34
      mobilenet_int8  min =   57.23  max =   57.80  avg =   57.59
        mobilenet_v2  min =   55.47  max =   55.66  avg =   55.56
        mobilenet_v3  min =   43.68  max =   43.86  avg =   43.78
          shufflenet  min =   32.36  max =   32.62  avg =   32.52
       shufflenet_v2  min =   31.01  max =   31.96  avg =   31.35
             mnasnet  min =   50.52  max =   50.64  avg =   50.58
     proxylessnasnet  min =   67.47  max =   67.81  avg =   67.62
     efficientnet_b0  min =   77.74  max =   78.06  avg =   77.95
   efficientnetv2_b0  min =  130.93  max =  132.17  avg =  131.70
        regnety_400m  min =   59.21  max =   59.41  avg =   59.35
           blazeface  min =   11.42  max =   11.48  avg =   11.45
           googlenet  min =  196.16  max =  197.09  avg =  196.62
      googlenet_int8  min =  198.34  max =  198.79  avg =  198.62
            resnet18  min =  123.26  max =  124.19  avg =  123.65
       resnet18_int8  min =  157.07  max =  157.51  avg =  157.33
             alexnet  min =  125.93  max =  126.18  avg =  126.01
               vgg16  min =  776.19  max =  779.18  avg =  777.75
          vgg16_int8  min = 1096.79  max = 1097.76  avg = 1097.21
            resnet50  min =  298.46  max =  299.58  avg =  299.07
       resnet50_int8  min =  338.11  max =  339.65  avg =  338.83
      squeezenet_ssd  min =  155.76  max =  159.51  avg =  156.86
 squeezenet_ssd_int8  min =  180.52  max =  182.09  avg =  181.21
       mobilenet_ssd  min =  148.81  max =  149.52  avg =  149.21
  mobilenet_ssd_int8  min =  135.44  max =  136.93  avg =  136.20
      mobilenet_yolo  min =  317.86  max =  319.50  avg =  318.69
  mobilenetv2_yolov3  min =  183.94  max =  184.17  avg =  184.09
         yolov4-tiny  min =  269.07  max =  270.17  avg =  269.46
           nanodet_m  min =   91.18  max =   91.85  avg =   91.58
    yolo-fastest-1.1  min =   38.18  max =   38.33  avg =   38.23
      yolo-fastestv2  min =   35.42  max =   35.74  avg =   35.60
  vision_transformer  min = 6598.21  max = 6616.54  avg = 6606.37
          FastestDet  min =   33.90  max =   34.40  avg =   34.10
benchncnn 4 1 0 -1 1
nihui@rock-3a:~/ncnn-nihui/build/benchmark$ ./benchncnn 4 1 0 -1 1
loop_count = 4
num_threads = 1
powersave = 0
gpu_device = -1
cooling_down = 1
          squeezenet  min =   45.37  max =   45.93  avg =   45.60
     squeezenet_int8  min =   46.81  max =   47.17  avg =   46.96
           mobilenet  min =   66.77  max =   66.87  avg =   66.83
      mobilenet_int8  min =   53.69  max =   54.24  avg =   53.98
        mobilenet_v2  min =   54.11  max =   54.49  avg =   54.27
        mobilenet_v3  min =   43.43  max =   43.61  avg =   43.56
          shufflenet  min =   32.09  max =   32.62  avg =   32.41
       shufflenet_v2  min =   31.06  max =   31.31  avg =   31.20
             mnasnet  min =   49.70  max =   49.91  avg =   49.82
     proxylessnasnet  min =   66.90  max =   67.11  avg =   67.01
     efficientnet_b0  min =   78.60  max =   78.83  avg =   78.69
   efficientnetv2_b0  min =   97.87  max =   99.10  avg =   98.64
        regnety_400m  min =   59.66  max =   60.01  avg =   59.82
           blazeface  min =   11.51  max =   11.55  avg =   11.53
           googlenet  min =  177.28  max =  177.93  avg =  177.56
      googlenet_int8  min =  185.73  max =  186.04  avg =  185.89
            resnet18  min =  122.60  max =  123.74  avg =  123.05
       resnet18_int8  min =  140.17  max =  140.45  avg =  140.27
             alexnet  min =  114.62  max =  115.22  avg =  114.90
               vgg16  min =  770.81  max =  777.07  avg =  774.88
          vgg16_int8  min = 1089.85  max = 1090.91  avg = 1090.35
            resnet50  min =  299.53  max =  300.23  avg =  299.76
       resnet50_int8  min =  302.53  max =  303.14  avg =  302.81
      squeezenet_ssd  min =  136.26  max =  136.93  avg =  136.61
 squeezenet_ssd_int8  min =  142.32  max =  143.48  avg =  143.11
       mobilenet_ssd  min =  139.15  max =  139.83  avg =  139.47
  mobilenet_ssd_int8  min =  116.48  max =  118.21  avg =  117.30
      mobilenet_yolo  min =  317.63  max =  318.43  avg =  318.14
  mobilenetv2_yolov3  min =  183.71  max =  184.13  avg =  183.92
         yolov4-tiny  min =  266.47  max =  267.37  avg =  266.97
           nanodet_m  min =   85.67  max =   86.53  avg =   86.10
    yolo-fastest-1.1  min =   37.85  max =   38.21  avg =   37.98
      yolo-fastestv2  min =   33.43  max =   33.80  avg =   33.65
  vision_transformer  min = 6559.39  max = 6573.34  avg = 6565.83
          FastestDet  min =   34.47  max =   34.80  avg =   34.61
benchncnn 4 4 0 -1 1 (baseline)
nihui@rock-3a:~/ncnn-nihui/build/benchmark$ ./benchncnn 4 4 0 -1 1
loop_count = 4
num_threads = 4
powersave = 0
gpu_device = -1
cooling_down = 1
          squeezenet  min =   23.42  max =   23.89  avg =   23.71
     squeezenet_int8  min =   24.47  max =   24.91  avg =   24.69
           mobilenet  min =   30.10  max =   30.60  avg =   30.28
      mobilenet_int8  min =   26.23  max =   26.84  avg =   26.66
        mobilenet_v2  min =   31.98  max =   32.24  avg =   32.14
        mobilenet_v3  min =   24.74  max =   25.15  avg =   24.88
          shufflenet  min =   19.74  max =   19.89  avg =   19.81
       shufflenet_v2  min =   16.50  max =   16.98  avg =   16.68
             mnasnet  min =   27.65  max =   27.79  avg =   27.70
     proxylessnasnet  min =   31.06  max =   31.42  avg =   31.25
     efficientnet_b0  min =   40.07  max =   40.52  avg =   40.28
   efficientnetv2_b0  min =   49.89  max =   50.11  avg =   50.00
        regnety_400m  min =   39.12  max =   39.66  avg =   39.38
           blazeface  min =    6.24  max =    6.45  avg =    6.34
           googlenet  min =   74.92  max =   75.38  avg =   75.08
      googlenet_int8  min =   74.11  max =   74.48  avg =   74.34
            resnet18  min =   59.11  max =   60.90  avg =   59.90
       resnet18_int8  min =   53.29  max =   53.66  avg =   53.49
             alexnet  min =   51.18  max =   51.69  avg =   51.40
               vgg16  min =  323.70  max =  325.36  avg =  324.31
          vgg16_int8  min =  367.23  max =  368.45  avg =  367.89
            resnet50  min =  131.28  max =  133.13  avg =  131.95
       resnet50_int8  min =  122.78  max =  123.33  avg =  123.05
      squeezenet_ssd  min =   76.11  max =   76.95  avg =   76.42
 squeezenet_ssd_int8  min =   73.98  max =   76.13  avg =   74.74
       mobilenet_ssd  min =   69.24  max =   69.76  avg =   69.49
  mobilenet_ssd_int8  min =   57.61  max =   61.64  avg =   59.20
      mobilenet_yolo  min =  146.18  max =  148.11  avg =  146.88
  mobilenetv2_yolov3  min =  102.27  max =  102.60  avg =  102.44
         yolov4-tiny  min =  130.68  max =  131.43  avg =  131.06
           nanodet_m  min =   49.83  max =   50.18  avg =   49.97
    yolo-fastest-1.1  min =   25.74  max =   26.43  avg =   26.05
      yolo-fastestv2  min =   20.97  max =   21.26  avg =   21.15
  vision_transformer  min = 2043.76  max = 2051.95  avg = 2048.57
          FastestDet  min =   21.40  max =   22.05  avg =   21.61
benchncnn 4 4 0 -1 1
nihui@rock-3a:~/ncnn-nihui/build/benchmark$ ./benchncnn 4 4 0 -1 1
loop_count = 4
num_threads = 4
powersave = 0
gpu_device = -1
cooling_down = 1
          squeezenet  min =   23.30  max =   23.91  avg =   23.59
     squeezenet_int8  min =   28.16  max =   28.67  avg =   28.50
           mobilenet  min =   29.95  max =   30.22  avg =   30.06
      mobilenet_int8  min =   29.47  max =   29.86  avg =   29.62
        mobilenet_v2  min =   30.78  max =   31.53  avg =   31.16
        mobilenet_v3  min =   25.19  max =   25.30  avg =   25.24
          shufflenet  min =   18.78  max =   19.25  avg =   19.04
       shufflenet_v2  min =   15.77  max =   16.06  avg =   15.94
             mnasnet  min =   26.66  max =   27.00  avg =   26.85
     proxylessnasnet  min =   31.10  max =   31.28  avg =   31.23
     efficientnet_b0  min =   39.07  max =   39.23  avg =   39.15
   efficientnetv2_b0  min =   55.77  max =   55.99  avg =   55.87
        regnety_400m  min =   37.50  max =   37.70  avg =   37.62
           blazeface  min =    6.03  max =    6.15  avg =    6.10
           googlenet  min =   80.91  max =   81.49  avg =   81.27
      googlenet_int8  min =   81.17  max =   83.89  avg =   81.99
            resnet18  min =   58.82  max =   59.08  avg =   58.93
       resnet18_int8  min =   64.44  max =   65.76  avg =   65.14
             alexnet  min =   53.95  max =   54.06  avg =   54.02
               vgg16  min =  326.08  max =  327.17  avg =  326.52
          vgg16_int8  min =  369.17  max =  371.39  avg =  370.47
            resnet50  min =  131.21  max =  131.96  avg =  131.55
       resnet50_int8  min =  144.80  max =  146.79  avg =  145.67
      squeezenet_ssd  min =   79.68  max =   81.28  avg =   80.27
 squeezenet_ssd_int8  min =   88.48  max =   90.42  avg =   89.48
       mobilenet_ssd  min =   70.79  max =   71.03  avg =   70.88
  mobilenet_ssd_int8  min =   63.61  max =   63.77  avg =   63.70
      mobilenet_yolo  min =  146.35  max =  152.99  avg =  148.55
  mobilenetv2_yolov3  min =  104.11  max =  104.23  avg =  104.16
         yolov4-tiny  min =  132.30  max =  132.87  avg =  132.52
           nanodet_m  min =   49.84  max =   51.39  avg =   50.29
    yolo-fastest-1.1  min =   24.49  max =   24.69  avg =   24.56
      yolo-fastestv2  min =   21.28  max =   21.67  avg =   21.50
  vision_transformer  min = 2033.62  max = 2037.07  avg = 2035.40
          FastestDet  min =   21.06  max =   21.47  avg =   21.29

@nihui
Copy link
Member Author

nihui commented Oct 21, 2023

nihui@rock-5b:~/ncnn-nihui/build/benchmark$ cat /proc/cpuinfo 
processor       : 0
BogoMIPS        : 48.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x2
CPU part        : 0xd05
CPU revision    : 0

processor       : 1
BogoMIPS        : 48.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x2
CPU part        : 0xd05
CPU revision    : 0

processor       : 2
BogoMIPS        : 48.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x2
CPU part        : 0xd05
CPU revision    : 0

processor       : 3
BogoMIPS        : 48.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x2
CPU part        : 0xd05
CPU revision    : 0

processor       : 4
BogoMIPS        : 48.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x4
CPU part        : 0xd0b
CPU revision    : 0

processor       : 5
BogoMIPS        : 48.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x4
CPU part        : 0xd0b
CPU revision    : 0

processor       : 6
BogoMIPS        : 48.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x4
CPU part        : 0xd0b
CPU revision    : 0

processor       : 7
BogoMIPS        : 48.00
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x4
CPU part        : 0xd0b
CPU revision    : 0

Serial          : d26a2a2027037f58
nihui@rock-5b:~/ncnn-nihui/build/benchmark$ taskset -c 6 ./benchncnn 10 1 0 -1 0
loop_count = 10
num_threads = 1
powersave = 0
gpu_device = -1
cooling_down = 0
          squeezenet  min =   10.60  max =   10.65  avg =   10.63
     squeezenet_int8  min =   11.29  max =   11.32  avg =   11.30
           mobilenet  min =   18.25  max =   18.29  avg =   18.27
      mobilenet_int8  min =   13.89  max =   13.96  avg =   13.93
        mobilenet_v2  min =   12.44  max =   12.50  avg =   12.47
        mobilenet_v3  min =   10.23  max =   10.27  avg =   10.25
          shufflenet  min =    6.96  max =    6.98  avg =    6.98
       shufflenet_v2  min =    7.05  max =    7.07  avg =    7.06
             mnasnet  min =   11.55  max =   11.59  avg =   11.57
     proxylessnasnet  min =   14.53  max =   14.61  avg =   14.56
     efficientnet_b0  min =   22.29  max =   22.33  avg =   22.32
   efficientnetv2_b0  min =   36.33  max =   36.42  avg =   36.37
        regnety_400m  min =   15.38  max =   15.44  avg =   15.41
           blazeface  min =    2.94  max =    2.96  avg =    2.95
           googlenet  min =   50.45  max =   50.71  avg =   50.58
      googlenet_int8  min =   46.06  max =   46.15  avg =   46.12
            resnet18  min =   28.01  max =   28.08  avg =   28.05
       resnet18_int8  min =   49.66  max =   49.82  avg =   49.72
             alexnet  min =   36.27  max =   36.32  avg =   36.29
               vgg16  min =  168.13  max =  170.27  avg =  169.50
          vgg16_int8  min =  417.74  max =  418.87  avg =  418.34
            resnet50  min =   83.36  max =   83.95  avg =   83.67
       resnet50_int8  min =   98.34  max =   98.55  avg =   98.42
      squeezenet_ssd  min =   34.02  max =   35.15  avg =   34.21
 squeezenet_ssd_int8  min =   44.48  max =   44.58  avg =   44.54
       mobilenet_ssd  min =   41.57  max =   41.69  avg =   41.64
  mobilenet_ssd_int8  min =   31.43  max =   31.58  avg =   31.52
      mobilenet_yolo  min =   88.36  max =   88.69  avg =   88.50
  mobilenetv2_yolov3  min =   46.48  max =   46.51  avg =   46.49
         yolov4-tiny  min =   58.36  max =   58.53  avg =   58.43
           nanodet_m  min =   19.14  max =   19.22  avg =   19.16
    yolo-fastest-1.1  min =    6.96  max =    7.01  avg =    6.98
      yolo-fastestv2  min =    6.53  max =    6.55  avg =    6.54
  vision_transformer  min = 1488.01  max = 1489.93  avg = 1488.83
          FastestDet  min =    6.30  max =    6.32  avg =    6.31
nihui@rock-5b:~/ncnn-nihui/build/benchmark$ taskset -c 6 ./benchncnn1 10 1 0 -1 0
loop_count = 10
num_threads = 1
powersave = 0
gpu_device = -1
cooling_down = 0
          squeezenet  min =   10.60  max =   13.07  avg =   12.23
     squeezenet_int8  min =   10.51  max =   10.58  avg =   10.54
           mobilenet  min =   18.23  max =   18.28  avg =   18.25
      mobilenet_int8  min =   13.71  max =   13.80  avg =   13.76
        mobilenet_v2  min =   12.34  max =   12.45  avg =   12.40
        mobilenet_v3  min =   10.17  max =   10.19  avg =   10.18
          shufflenet  min =    7.03  max =    7.05  avg =    7.03
       shufflenet_v2  min =    7.10  max =    7.13  avg =    7.12
             mnasnet  min =   11.55  max =   11.59  avg =   11.57
     proxylessnasnet  min =   14.53  max =   14.58  avg =   14.55
     efficientnet_b0  min =   22.29  max =   22.33  avg =   22.31
   efficientnetv2_b0  min =   25.97  max =   26.05  avg =   26.01
        regnety_400m  min =   15.43  max =   15.48  avg =   15.46
           blazeface  min =    2.93  max =    2.94  avg =    2.94
           googlenet  min =   44.46  max =   44.61  avg =   44.53
      googlenet_int8  min =   44.10  max =   44.13  avg =   44.11
            resnet18  min =   28.00  max =   28.08  avg =   28.04
       resnet18_int8  min =   33.47  max =   33.67  avg =   33.56
             alexnet  min =   32.65  max =   32.74  avg =   32.69
               vgg16  min =  168.25  max =  169.44  avg =  168.81
          vgg16_int8  min =  266.49  max =  270.18  avg =  268.18
            resnet50  min =   83.52  max =   83.95  avg =   83.79
       resnet50_int8  min =   72.38  max =   72.58  avg =   72.51
      squeezenet_ssd  min =   29.14  max =   29.30  avg =   29.20
 squeezenet_ssd_int8  min =   31.59  max =   31.80  avg =   31.72
       mobilenet_ssd  min =   38.76  max =   38.89  avg =   38.82
  mobilenet_ssd_int8  min =   29.06  max =   29.22  avg =   29.13
      mobilenet_yolo  min =   88.23  max =   88.59  avg =   88.42
  mobilenetv2_yolov3  min =   46.44  max =   46.53  avg =   46.48
         yolov4-tiny  min =   58.29  max =   58.49  avg =   58.39
           nanodet_m  min =   17.69  max =   17.78  avg =   17.73
    yolo-fastest-1.1  min =    6.94  max =    6.98  avg =    6.95
      yolo-fastestv2  min =    6.08  max =    6.12  avg =    6.11
  vision_transformer  min = 1487.93  max = 1489.86  avg = 1489.06
          FastestDet  min =    6.34  max =    6.38  avg =    6.36
nihui@rock-5b:~/ncnn-nihui/build/benchmark$ taskset -c 6,7,4,5 ./benchncnn 10 4 0 -1 0
loop_count = 10
num_threads = 4
powersave = 0
gpu_device = -1
cooling_down = 0
          squeezenet  min =    3.65  max =    3.70  avg =    3.67
     squeezenet_int8  min =    4.35  max =    4.45  avg =    4.39
           mobilenet  min =    5.45  max =    5.54  avg =    5.48
      mobilenet_int8  min =    4.45  max =    4.51  avg =    4.48
        mobilenet_v2  min =    4.73  max =    4.81  avg =    4.78
        mobilenet_v3  min =    4.21  max =    4.25  avg =    4.23
          shufflenet  min =    3.41  max =    3.43  avg =    3.42
       shufflenet_v2  min =    2.84  max =    2.89  avg =    2.87
             mnasnet  min =    4.22  max =    4.27  avg =    4.24
     proxylessnasnet  min =    4.91  max =    4.95  avg =    4.93
     efficientnet_b0  min =    7.50  max =    7.54  avg =    7.51
   efficientnetv2_b0  min =   11.35  max =   11.43  avg =   11.38
        regnety_400m  min =    7.96  max =    8.03  avg =    8.00
           blazeface  min =    1.36  max =    1.39  avg =    1.37
           googlenet  min =   16.09  max =   16.17  avg =   16.13
      googlenet_int8  min =   15.37  max =   15.46  avg =   15.41
            resnet18  min =   10.08  max =   10.15  avg =   10.11
       resnet18_int8  min =   14.62  max =   14.66  avg =   14.64
             alexnet  min =   13.75  max =   13.79  avg =   13.78
               vgg16  min =   63.06  max =   63.26  avg =   63.19
          vgg16_int8  min =  115.52  max =  115.82  avg =  115.65
            resnet50  min =   27.07  max =   27.14  avg =   27.10
       resnet50_int8  min =   30.78  max =   30.88  avg =   30.83
      squeezenet_ssd  min =   14.15  max =   14.46  avg =   14.22
 squeezenet_ssd_int8  min =   16.86  max =   16.93  avg =   16.89
       mobilenet_ssd  min =   13.47  max =   13.63  avg =   13.53
  mobilenet_ssd_int8  min =   10.58  max =   10.65  avg =   10.61
      mobilenet_yolo  min =   31.34  max =   31.66  avg =   31.50
  mobilenetv2_yolov3  min =   19.10  max =   19.13  avg =   19.11
         yolov4-tiny  min =   24.05  max =   24.18  avg =   24.12
           nanodet_m  min =    7.55  max =    7.63  avg =    7.60
    yolo-fastest-1.1  min =    3.84  max =    3.88  avg =    3.85
      yolo-fastestv2  min =    3.33  max =    3.36  avg =    3.34
  vision_transformer  min =  441.94  max =  448.76  avg =  445.49
          FastestDet  min =    3.19  max =    3.22  avg =    3.20
nihui@rock-5b:~/ncnn-nihui/build/benchmark$ taskset -c 6,7,4,5 ./benchncnn1 10 4 0 -1 0
loop_count = 10
num_threads = 4
powersave = 0
gpu_device = -1
cooling_down = 0
          squeezenet  min =    3.67  max =    3.74  avg =    3.70
     squeezenet_int8  min =    3.83  max =    3.91  avg =    3.88
           mobilenet  min =    5.42  max =    5.46  avg =    5.44
      mobilenet_int8  min =    4.41  max =    4.44  avg =    4.43
        mobilenet_v2  min =    4.72  max =    4.83  avg =    4.77
        mobilenet_v3  min =    4.18  max =    4.27  avg =    4.21
          shufflenet  min =    3.40  max =    3.45  avg =    3.42
       shufflenet_v2  min =    2.84  max =    2.88  avg =    2.86
             mnasnet  min =    4.21  max =    4.24  avg =    4.22
     proxylessnasnet  min =    4.91  max =    4.95  avg =    4.93
     efficientnet_b0  min =    7.45  max =    7.48  avg =    7.47
   efficientnetv2_b0  min =    8.65  max =    8.78  avg =    8.69
        regnety_400m  min =    7.87  max =    7.92  avg =    7.89
           blazeface  min =    1.35  max =    1.39  avg =    1.37
           googlenet  min =   14.60  max =   15.10  avg =   14.69
      googlenet_int8  min =   14.56  max =   14.67  avg =   14.60
            resnet18  min =   10.09  max =   10.15  avg =   10.12
       resnet18_int8  min =   10.18  max =   10.27  avg =   10.23
             alexnet  min =   12.84  max =   12.90  avg =   12.88
               vgg16  min =   63.28  max =   63.79  avg =   63.51
          vgg16_int8  min =   75.67  max =   76.12  avg =   75.84
            resnet50  min =   27.14  max =   27.24  avg =   27.19
       resnet50_int8  min =   22.88  max =   22.95  avg =   22.91
      squeezenet_ssd  min =   12.88  max =   13.01  avg =   12.93
 squeezenet_ssd_int8  min =   12.45  max =   12.84  avg =   12.62
       mobilenet_ssd  min =   12.78  max =   12.87  avg =   12.83
  mobilenet_ssd_int8  min =    9.88  max =    9.97  avg =    9.92
      mobilenet_yolo  min =   31.53  max =   31.89  avg =   31.63
  mobilenetv2_yolov3  min =   18.92  max =   18.99  avg =   18.94
         yolov4-tiny  min =   24.05  max =   24.21  avg =   24.12
           nanodet_m  min =    7.12  max =    7.18  avg =    7.15
    yolo-fastest-1.1  min =    3.83  max =    3.87  avg =    3.84
      yolo-fastestv2  min =    3.20  max =    3.24  avg =    3.23
  vision_transformer  min =  438.36  max =  444.69  avg =  441.64
          FastestDet  min =    3.16  max =    3.20  avg =    3.18

@nihui nihui changed the title [WIP] arm optimization for convolution int8 winograd unified elempack arm optimization for convolution int8 winograd unified elempack Oct 21, 2023
@nihui
Copy link
Member Author

nihui commented Oct 21, 2023

nihui@jetson-nano:~/ncnn-nihui/build/benchmark$ cat /proc/cpuinfo 
processor       : 0
model name      : ARMv8 Processor rev 1 (v8l)
BogoMIPS        : 38.40
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x1
CPU part        : 0xd07
CPU revision    : 1

processor       : 1
model name      : ARMv8 Processor rev 1 (v8l)
BogoMIPS        : 38.40
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x1
CPU part        : 0xd07
CPU revision    : 1

processor       : 2
model name      : ARMv8 Processor rev 1 (v8l)
BogoMIPS        : 38.40
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x1
CPU part        : 0xd07
CPU revision    : 1

processor       : 3
model name      : ARMv8 Processor rev 1 (v8l)
BogoMIPS        : 38.40
Features        : fp asimd evtstrm aes pmull sha1 sha2 crc32
CPU implementer : 0x41
CPU architecture: 8
CPU variant     : 0x1
CPU part        : 0xd07
CPU revision    : 1

nihui@jetson-nano:~/ncnn-nihui/build/benchmark$ ./benchncnn 4 1 0 -1 0
loop_count = 4
num_threads = 1
powersave = 0
gpu_device = -1
cooling_down = 0
          squeezenet  min =   77.37  max =   77.44  avg =   77.40
     squeezenet_int8  min =   66.10  max =   66.14  avg =   66.11
           mobilenet  min =  132.13  max =  132.18  avg =  132.16
      mobilenet_int8  min =   94.80  max =   94.86  avg =   94.83
        mobilenet_v2  min =   88.73  max =   88.86  avg =   88.78
        mobilenet_v3  min =   73.28  max =   73.65  avg =   73.48
          shufflenet  min =   45.16  max =   45.32  avg =   45.25
       shufflenet_v2  min =   41.80  max =   41.94  avg =   41.87
             mnasnet  min =   86.44  max =   86.81  avg =   86.65
     proxylessnasnet  min =  108.43  max =  108.60  avg =  108.53
     efficientnet_b0  min =  161.46  max =  161.70  avg =  161.58
   efficientnetv2_b0  min =  180.58  max =  181.27  avg =  180.91
        regnety_400m  min =  112.54  max =  112.73  avg =  112.66
           blazeface  min =   12.58  max =   12.62  avg =   12.61
           googlenet  min =  276.04  max =  276.17  avg =  276.09
      googlenet_int8  min =  227.33  max =  227.72  avg =  227.52
            resnet18  min =  202.54  max =  202.70  avg =  202.64
       resnet18_int8  min =  165.16  max =  165.54  avg =  165.32
             alexnet  min =  170.57  max =  170.69  avg =  170.62
               vgg16  min = 1123.28  max = 1126.70  avg = 1124.77
          vgg16_int8  min =  965.31  max =  972.61  avg =  968.39
            resnet50  min =  592.32  max =  592.42  avg =  592.38
       resnet50_int8  min =  455.41  max =  455.66  avg =  455.48
      squeezenet_ssd  min =  177.79  max =  178.11  avg =  177.92
 squeezenet_ssd_int8  min =  161.39  max =  161.95  avg =  161.74
       mobilenet_ssd  min =  266.80  max =  266.99  avg =  266.90
  mobilenet_ssd_int8  min =  188.93  max =  189.50  avg =  189.21
      mobilenet_yolo  min =  603.29  max =  603.32  avg =  603.30
  mobilenetv2_yolov3  min =  313.47  max =  313.87  avg =  313.71
         yolov4-tiny  min =  390.29  max =  390.75  avg =  390.51
           nanodet_m  min =  106.59  max =  106.77  avg =  106.70
    yolo-fastest-1.1  min =   47.75  max =   48.37  avg =   48.03
      yolo-fastestv2  min =   41.13  max =   41.39  avg =   41.29
  vision_transformer  min = 8513.80  max = 8547.08  avg = 8533.60
          FastestDet  min =   45.40  max =   45.45  avg =   45.43
nihui@jetson-nano:~/ncnn-nihui/build/benchmark$ ./benchncnn 4 1 0 -1 0
loop_count = 4
num_threads = 1
powersave = 0
gpu_device = -1
cooling_down = 0
          squeezenet  min =   77.48  max =   77.67  avg =   77.59
     squeezenet_int8  min =   63.31  max =   63.47  avg =   63.39
           mobilenet  min =  132.30  max =  132.47  avg =  132.39
      mobilenet_int8  min =   94.99  max =   95.06  avg =   95.02
        mobilenet_v2  min =   89.00  max =   89.06  avg =   89.04
        mobilenet_v3  min =   73.08  max =   73.45  avg =   73.26
          shufflenet  min =   45.13  max =   45.19  avg =   45.16
       shufflenet_v2  min =   41.88  max =   42.07  avg =   41.98
             mnasnet  min =   87.02  max =   87.12  avg =   87.06
     proxylessnasnet  min =  107.77  max =  107.91  avg =  107.86
     efficientnet_b0  min =  161.27  max =  161.47  avg =  161.32
   efficientnetv2_b0  min =  180.51  max =  181.20  avg =  180.82
        regnety_400m  min =  112.25  max =  112.30  avg =  112.28
           blazeface  min =   12.60  max =   12.64  avg =   12.62
           googlenet  min =  276.19  max =  276.53  avg =  276.36
      googlenet_int8  min =  221.04  max =  221.20  avg =  221.12
            resnet18  min =  202.47  max =  202.53  avg =  202.49
       resnet18_int8  min =  154.02  max =  154.22  avg =  154.13
             alexnet  min =  171.36  max =  171.41  avg =  171.39
               vgg16  min = 1124.74  max = 1126.51  avg = 1125.52
          vgg16_int8  min =  956.88  max =  957.58  avg =  957.28
            resnet50  min =  591.80  max =  592.08  avg =  591.91
       resnet50_int8  min =  441.40  max =  441.83  avg =  441.70
      squeezenet_ssd  min =  176.70  max =  176.91  avg =  176.76
 squeezenet_ssd_int8  min =  147.00  max =  148.69  avg =  147.47
       mobilenet_ssd  min =  266.65  max =  267.12  avg =  266.92
  mobilenet_ssd_int8  min =  189.57  max =  189.72  avg =  189.65
      mobilenet_yolo  min =  602.68  max =  602.99  avg =  602.87
  mobilenetv2_yolov3  min =  312.57  max =  312.72  avg =  312.67
         yolov4-tiny  min =  389.42  max =  390.11  avg =  389.78
           nanodet_m  min =  107.55  max =  108.04  avg =  107.75
    yolo-fastest-1.1  min =   47.47  max =   47.76  avg =   47.62
      yolo-fastestv2  min =   40.91  max =   41.05  avg =   41.00
  vision_transformer  min = 8509.11  max = 8545.08  avg = 8531.10
          FastestDet  min =   45.52  max =   45.64  avg =   45.58
nihui@jetson-nano:~/ncnn-nihui/build/benchmark$ ./benchncnn 4 4 0 -1 0
loop_count = 4
num_threads = 4
powersave = 0
gpu_device = -1
cooling_down = 0
          squeezenet  min =   28.64  max =   29.01  avg =   28.79
     squeezenet_int8  min =   25.67  max =   25.84  avg =   25.75
           mobilenet  min =   40.41  max =   40.49  avg =   40.43
      mobilenet_int8  min =   27.25  max =   27.30  avg =   27.28
        mobilenet_v2  min =   33.08  max =   33.20  avg =   33.15
        mobilenet_v3  min =   28.17  max =   28.55  avg =   28.40
          shufflenet  min =   19.73  max =   19.88  avg =   19.81
       shufflenet_v2  min =   16.35  max =   17.24  avg =   16.64
             mnasnet  min =   30.87  max =   31.14  avg =   30.98
     proxylessnasnet  min =   35.63  max =   35.87  avg =   35.70
     efficientnet_b0  min =   53.29  max =   53.60  avg =   53.41
   efficientnetv2_b0  min =   59.26  max =   59.72  avg =   59.47
        regnety_400m  min =   53.44  max =   53.72  avg =   53.53
           blazeface  min =    5.77  max =    5.80  avg =    5.78
           googlenet  min =   93.07  max =   93.81  avg =   93.29
      googlenet_int8  min =   74.50  max =   74.64  avg =   74.58
            resnet18  min =   76.23  max =   76.63  avg =   76.43
       resnet18_int8  min =   53.39  max =   53.65  avg =   53.53
             alexnet  min =   57.93  max =   58.32  avg =   58.10
               vgg16  min =  367.77  max =  371.39  avg =  369.61
          vgg16_int8  min =  279.77  max =  281.11  avg =  280.32
            resnet50  min =  195.56  max =  196.48  avg =  196.03
       resnet50_int8  min =  143.39  max =  144.07  avg =  143.64
      squeezenet_ssd  min =   74.24  max =   74.66  avg =   74.41
 squeezenet_ssd_int8  min =   61.37  max =   61.89  avg =   61.68
       mobilenet_ssd  min =   83.52  max =   83.75  avg =   83.65
  mobilenet_ssd_int8  min =   57.25  max =   57.77  avg =   57.49
      mobilenet_yolo  min =  189.17  max =  189.74  avg =  189.36
  mobilenetv2_yolov3  min =  117.01  max =  117.27  avg =  117.10
         yolov4-tiny  min =  143.93  max =  144.31  avg =  144.17
           nanodet_m  min =   44.71  max =   45.51  avg =   44.96
    yolo-fastest-1.1  min =   24.30  max =   24.54  avg =   24.40
      yolo-fastestv2  min =   19.47  max =   19.83  avg =   19.62
  vision_transformer  min = 2450.92  max = 2469.70  avg = 2459.10
          FastestDet  min =   19.72  max =   19.97  avg =   19.82
nihui@jetson-nano:~/ncnn-nihui/build/benchmark$ ./benchncnn 4 4 0 -1 0
loop_count = 4
num_threads = 4
powersave = 0
gpu_device = -1
cooling_down = 0
          squeezenet  min =   28.57  max =   29.00  avg =   28.78
     squeezenet_int8  min =   24.30  max =   24.60  avg =   24.42
           mobilenet  min =   40.31  max =   40.71  avg =   40.49
      mobilenet_int8  min =   27.45  max =   27.77  avg =   27.58
        mobilenet_v2  min =   33.24  max =   33.63  avg =   33.43
        mobilenet_v3  min =   28.07  max =   28.29  avg =   28.21
          shufflenet  min =   19.72  max =   20.43  avg =   19.98
       shufflenet_v2  min =   16.26  max =   16.46  avg =   16.35
             mnasnet  min =   31.05  max =   32.11  avg =   31.42
     proxylessnasnet  min =   35.28  max =   35.42  avg =   35.35
     efficientnet_b0  min =   52.48  max =   52.69  avg =   52.60
   efficientnetv2_b0  min =   59.39  max =   59.62  avg =   59.47
        regnety_400m  min =   53.06  max =   53.22  avg =   53.13
           blazeface  min =    5.60  max =    5.83  avg =    5.69
           googlenet  min =   93.23  max =   93.61  avg =   93.40
      googlenet_int8  min =   71.82  max =   71.90  avg =   71.85
            resnet18  min =   76.03  max =   76.70  avg =   76.31
       resnet18_int8  min =   50.57  max =   50.68  avg =   50.63
             alexnet  min =   57.68  max =   57.95  avg =   57.79
               vgg16  min =  370.22  max =  371.53  avg =  371.02
          vgg16_int8  min =  273.72  max =  274.66  avg =  274.09
            resnet50  min =  195.35  max =  195.94  avg =  195.57
       resnet50_int8  min =  138.93  max =  145.04  avg =  140.69
      squeezenet_ssd  min =   73.53  max =   74.06  avg =   73.78
 squeezenet_ssd_int8  min =   59.61  max =   60.78  avg =   60.06
       mobilenet_ssd  min =   83.81  max =   84.02  avg =   83.92
  mobilenet_ssd_int8  min =   57.12  max =   57.26  avg =   57.17
      mobilenet_yolo  min =  190.13  max =  190.55  avg =  190.38
  mobilenetv2_yolov3  min =  117.24  max =  120.55  avg =  118.42
         yolov4-tiny  min =  143.97  max =  144.43  avg =  144.20
           nanodet_m  min =   44.74  max =   48.01  avg =   45.58
    yolo-fastest-1.1  min =   24.32  max =   24.50  avg =   24.43
      yolo-fastestv2  min =   19.65  max =   19.80  avg =   19.72
  vision_transformer  min = 2467.41  max = 2527.86  avg = 2503.97
          FastestDet  min =   19.78  max =   19.96  avg =   19.83

@nihui nihui merged commit 80b3b9c into Tencent:master Oct 22, 2023
94 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants