Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shufflenetV2: an extremely light-weight architecture | Implementation #3750

Open
gmayday1997 opened this issue Aug 11, 2019 · 69 comments
Open
Labels
want enhancement Want to improve accuracy, speed or functionality

Comments

@gmayday1997
Copy link

shufflenetV2: Practical Guidelines for Efficient CNN Architecture Design
paper: https://arxiv.org/abs/1807.11164
source code(caffe): https://github.com/miaow1988/ShuffleNet_V2_pytorch_caffe
timg
I have implemented channel shuffle layer and channel_slice layer. Everyone who is interested in this work can try it.

How to use

basicunit darknetcfg
@AlexeyAB AlexeyAB added the want enhancement Want to improve accuracy, speed or functionality label Aug 11, 2019
@WongKinYiu
Copy link
Collaborator

WongKinYiu commented Aug 15, 2019

@gmayday1997 hello,

i v tried several models with channel_shuffle layers, all model got nan after training 30~80k epochs.
could you provide the learning rate schedule for training imagenet dataset?
(models with only channel_split layers r sometimes OK, but training speed are quite slower than models without channel_split layers.)

thanks a lot.

@gmayday1997
Copy link
Author

hi, @WongKinYiu , I have trained model with channel_shuffle and channel_slice for 100k epochs, it got 56% top5 precision. Training is still in progress.

   layer   filters  size/strd(dil)      input                output
   0 conv     16       3 x 3/ 1    224 x 224 x   3 ->  224 x 224 x  16 0.043 BF
   1 max               2 x 2/ 2    224 x 224 x  16 ->  112 x 112 x  16 0.001 BF
   2 conv     16       1 x 1/ 1    112 x 112 x  16 ->  112 x 112 x  16 0.006 BF
   3 max               2 x 2/ 2    112 x 112 x  16 ->   56 x  56 x  16 0.000 BF
   4 conv     32       1 x 1/ 1     56 x  56 x  16 ->   56 x  56 x  32 0.003 BF
   5 conv     32/  32  3 x 3/ 1     56 x  56 x  32 ->   56 x  56 x  32 0.002 BF
   6 conv     32       1 x 1/ 1     56 x  56 x  32 ->   56 x  56 x  32 0.006 BF
   7 channel_slice             56 x  56 x  32   ->    56 x  56 x  16 
   8 channel_slice             56 x  56 x  32   ->    56 x  56 x  16 
   9 conv     16       1 x 1/ 1     56 x  56 x  16 ->   56 x  56 x  16 0.002 BF
  10 conv     16/  16  3 x 3/ 1     56 x  56 x  16 ->   56 x  56 x  16 0.001 BF
  11 conv     16       1 x 1/ 1     56 x  56 x  16 ->   56 x  56 x  16 0.002 BF
  12 route  7 11
  13 channel_shuffle                56 x  56 x  32   ->    56 x  56 x  32 
  14 channel_slice             56 x  56 x  32   ->    56 x  56 x  16 
  15 channel_slice             56 x  56 x  32   ->    56 x  56 x  16 
  16 conv     16       1 x 1/ 1     56 x  56 x  16 ->   56 x  56 x  16 0.002 BF
  17 conv     16/  16  3 x 3/ 1     56 x  56 x  16 ->   56 x  56 x  16 0.001 BF
  18 conv     16       1 x 1/ 1     56 x  56 x  16 ->   56 x  56 x  16 0.002 BF
  19 route  14 18
  20 channel_shuffle                56 x  56 x  32   ->    56 x  56 x  32 
  21 channel_slice             56 x  56 x  32   ->    56 x  56 x  16 
  22 channel_slice             56 x  56 x  32   ->    56 x  56 x  16 
  23 conv     16       1 x 1/ 1     56 x  56 x  16 ->   56 x  56 x  16 0.002 BF
  24 conv     16/  16  3 x 3/ 1     56 x  56 x  16 ->   56 x  56 x  16 0.001 BF
  25 conv     16       1 x 1/ 1     56 x  56 x  16 ->   56 x  56 x  16 0.002 BF
  26 route  21 25
  27 channel_shuffle                56 x  56 x  32   ->    56 x  56 x  32 
  28 channel_slice             56 x  56 x  32   ->    56 x  56 x  16 
  29 channel_slice             56 x  56 x  32   ->    56 x  56 x  16 
  30 conv     16       1 x 1/ 1     56 x  56 x  16 ->   56 x  56 x  16 0.002 BF
  31 conv     16/  16  3 x 3/ 1     56 x  56 x  16 ->   56 x  56 x  16 0.001 BF
  32 conv     16       1 x 1/ 1     56 x  56 x  16 ->   56 x  56 x  16 0.002 BF
  33 route  28 32
  34 conv     32       1 x 1/ 1     56 x  56 x  32 ->   56 x  56 x  32 0.006 BF
  35 conv     32/  32  3 x 3/ 1     56 x  56 x  32 ->   56 x  56 x  32 0.002 BF
  36 conv     32       1 x 1/ 1     56 x  56 x  32 ->   56 x  56 x  32 0.006 BF
  37 route  36 6
  38 max               2 x 2/ 2     56 x  56 x  64 ->   28 x  28 x  64 0.000 BF
  39 conv     64       1 x 1/ 1     28 x  28 x  64 ->   28 x  28 x  64 0.006 BF
  40 conv     64/  64  3 x 3/ 1     28 x  28 x  64 ->   28 x  28 x  64 0.001 BF
  41 conv     64       1 x 1/ 1     28 x  28 x  64 ->   28 x  28 x  64 0.006 BF
  42 channel_slice             28 x  28 x  64   ->    28 x  28 x  32 
  43 channel_slice             28 x  28 x  64   ->    28 x  28 x  32 
  44 conv     32       1 x 1/ 1     28 x  28 x  32 ->   28 x  28 x  32 0.002 BF
  45 conv     32/  32  3 x 3/ 1     28 x  28 x  32 ->   28 x  28 x  32 0.000 BF
  46 conv     32       1 x 1/ 1     28 x  28 x  32 ->   28 x  28 x  32 0.002 BF
  47 route  42 46
  48 channel_shuffle                28 x  28 x  64   ->    28 x  28 x  64 
  49 channel_slice             28 x  28 x  64   ->    28 x  28 x  32 
  50 channel_slice             28 x  28 x  64   ->    28 x  28 x  32 
  51 conv     32       1 x 1/ 1     28 x  28 x  32 ->   28 x  28 x  32 0.002 BF
  52 conv     32/  32  3 x 3/ 1     28 x  28 x  32 ->   28 x  28 x  32 0.000 BF
  53 conv     32       1 x 1/ 1     28 x  28 x  32 ->   28 x  28 x  32 0.002 BF
  54 route  49 53
  55 channel_shuffle                28 x  28 x  64   ->    28 x  28 x  64 
  56 channel_slice             28 x  28 x  64   ->    28 x  28 x  32 
  57 channel_slice             28 x  28 x  64   ->    28 x  28 x  32 
  58 conv     32       1 x 1/ 1     28 x  28 x  32 ->   28 x  28 x  32 0.002 BF
  59 conv     32/  32  3 x 3/ 1     28 x  28 x  32 ->   28 x  28 x  32 0.000 BF
  60 conv     32       1 x 1/ 1     28 x  28 x  32 ->   28 x  28 x  32 0.002 BF
  61 route  56 60
  62 channel_shuffle                28 x  28 x  64   ->    28 x  28 x  64 
  63 channel_slice             28 x  28 x  64   ->    28 x  28 x  32 
  64 channel_slice             28 x  28 x  64   ->    28 x  28 x  32 
  65 conv     32       1 x 1/ 1     28 x  28 x  32 ->   28 x  28 x  32 0.002 BF
  66 conv     32/  32  3 x 3/ 1     28 x  28 x  32 ->   28 x  28 x  32 0.000 BF
  67 conv     32       1 x 1/ 1     28 x  28 x  32 ->   28 x  28 x  32 0.002 BF
  68 route  63 67
  69 conv     64       1 x 1/ 1     28 x  28 x  64 ->   28 x  28 x  64 0.006 BF
  70 conv     64/  64  3 x 3/ 1     28 x  28 x  64 ->   28 x  28 x  64 0.001 BF
  71 conv     64       1 x 1/ 1     28 x  28 x  64 ->   28 x  28 x  64 0.006 BF
  72 route  71 41
  73 max               2 x 2/ 2     28 x  28 x 128 ->   14 x  14 x 128 0.000 BF
  74 conv    128       1 x 1/ 1     14 x  14 x 128 ->   14 x  14 x 128 0.006 BF
  75 conv    128       3 x 3/ 1     14 x  14 x 128 ->   14 x  14 x 128 0.058 BF
  76 conv    128       1 x 1/ 1     14 x  14 x 128 ->   14 x  14 x 128 0.006 BF
  77 channel_slice             14 x  14 x 128   ->    14 x  14 x  64 
  78 channel_slice             14 x  14 x 128   ->    14 x  14 x  64 
  79 conv     64       1 x 1/ 1     14 x  14 x  64 ->   14 x  14 x  64 0.002 BF
  80 conv     64/  64  3 x 3/ 1     14 x  14 x  64 ->   14 x  14 x  64 0.000 BF
  81 conv     64       1 x 1/ 1     14 x  14 x  64 ->   14 x  14 x  64 0.002 BF
  82 route  77 81
  83 channel_shuffle                14 x  14 x 128   ->    14 x  14 x 128 
  84 channel_slice             14 x  14 x 128   ->    14 x  14 x  64 
  85 channel_slice             14 x  14 x 128   ->    14 x  14 x  64 
  86 conv     64       1 x 1/ 1     14 x  14 x  64 ->   14 x  14 x  64 0.002 BF
  87 conv     64/  64  3 x 3/ 1     14 x  14 x  64 ->   14 x  14 x  64 0.000 BF
  88 conv     64       1 x 1/ 1     14 x  14 x  64 ->   14 x  14 x  64 0.002 BF
  89 route  84 88
  90 channel_shuffle                14 x  14 x 128   ->    14 x  14 x 128 
  91 channel_slice             14 x  14 x 128   ->    14 x  14 x  64 
  92 channel_slice             14 x  14 x 128   ->    14 x  14 x  64 
  93 conv     64       1 x 1/ 1     14 x  14 x  64 ->   14 x  14 x  64 0.002 BF
  94 conv     64/  64  3 x 3/ 1     14 x  14 x  64 ->   14 x  14 x  64 0.000 BF
  95 conv     64       1 x 1/ 1     14 x  14 x  64 ->   14 x  14 x  64 0.002 BF
  96 route  91 95
  97 channel_shuffle                14 x  14 x 128   ->    14 x  14 x 128 
  98 channel_slice             14 x  14 x 128   ->    14 x  14 x  64 
  99 channel_slice             14 x  14 x 128   ->    14 x  14 x  64 
 100 conv     64       1 x 1/ 1     14 x  14 x  64 ->   14 x  14 x  64 0.002 BF
 101 conv     64/  64  3 x 3/ 1     14 x  14 x  64 ->   14 x  14 x  64 0.000 BF
 102 conv     64       1 x 1/ 1     14 x  14 x  64 ->   14 x  14 x  64 0.002 BF
 103 route  98 102
 104 conv    128       1 x 1/ 1     14 x  14 x 128 ->   14 x  14 x 128 0.006 BF
 105 conv    128/ 128  3 x 3/ 1     14 x  14 x 128 ->   14 x  14 x 128 0.000 BF
 106 conv    128       1 x 1/ 1     14 x  14 x 128 ->   14 x  14 x 128 0.006 BF
 107 route  106 76
 108 max               2 x 2/ 2     14 x  14 x 256 ->    7 x   7 x 256 0.000 BF
 109 conv    256       1 x 1/ 1      7 x   7 x 256 ->    7 x   7 x 256 0.006 BF
 110 conv    256/ 256  3 x 3/ 1      7 x   7 x 256 ->    7 x   7 x 256 0.000 BF
 111 conv    256       1 x 1/ 1      7 x   7 x 256 ->    7 x   7 x 256 0.006 BF
 112 channel_slice              7 x   7 x 256   ->     7 x   7 x 128 
 113 channel_slice              7 x   7 x 256   ->     7 x   7 x 128 
 114 conv    128       1 x 1/ 1      7 x   7 x 128 ->    7 x   7 x 128 0.002 BF
 115 conv    128/ 128  3 x 3/ 1      7 x   7 x 128 ->    7 x   7 x 128 0.000 BF
 116 conv    128       1 x 1/ 1      7 x   7 x 128 ->    7 x   7 x 128 0.002 BF
 117 route  112 116
 118 channel_shuffle                 7 x   7 x 256   ->     7 x   7 x 256 
 119 channel_slice              7 x   7 x 256   ->     7 x   7 x 128 
 120 channel_slice              7 x   7 x 256   ->     7 x   7 x 128 
 121 conv    128       1 x 1/ 1      7 x   7 x 128 ->    7 x   7 x 128 0.002 BF
 122 conv    128/ 128  3 x 3/ 1      7 x   7 x 128 ->    7 x   7 x 128 0.000 BF
 123 conv    128       1 x 1/ 1      7 x   7 x 128 ->    7 x   7 x 128 0.002 BF
 124 route  119 123
 125 channel_shuffle                 7 x   7 x 256   ->     7 x   7 x 256 
 126 channel_slice              7 x   7 x 256   ->     7 x   7 x 128 
 127 channel_slice              7 x   7 x 256   ->     7 x   7 x 128 
 128 conv    128       1 x 1/ 1      7 x   7 x 128 ->    7 x   7 x 128 0.002 BF
 129 conv    128/ 128  3 x 3/ 1      7 x   7 x 128 ->    7 x   7 x 128 0.000 BF
 130 conv    128       1 x 1/ 1      7 x   7 x 128 ->    7 x   7 x 128 0.002 BF
 131 route  126 130
 132 channel_shuffle                 7 x   7 x 256   ->     7 x   7 x 256 
 133 channel_slice              7 x   7 x 256   ->     7 x   7 x 128 
 134 channel_slice              7 x   7 x 256   ->     7 x   7 x 128 
 135 conv    128       1 x 1/ 1      7 x   7 x 128 ->    7 x   7 x 128 0.002 BF
 136 conv    128/ 128  3 x 3/ 1      7 x   7 x 128 ->    7 x   7 x 128 0.000 BF
 137 conv    128       1 x 1/ 1      7 x   7 x 128 ->    7 x   7 x 128 0.002 BF
 138 route  133 137
 139 conv    256       1 x 1/ 1      7 x   7 x 256 ->    7 x   7 x 256 0.006 BF
 140 conv    256/ 256  3 x 3/ 1      7 x   7 x 256 ->    7 x   7 x 256 0.000 BF
 141 conv    256       1 x 1/ 1      7 x   7 x 256 ->    7 x   7 x 256 0.006 BF
 142 route  141 110
 143 conv    512       1 x 1/ 1      7 x   7 x 512 ->    7 x   7 x 512 0.026 BF
 144 conv    512/ 512  3 x 3/ 1      7 x   7 x 512 ->    7 x   7 x 512 0.000 BF
 145 conv    512       1 x 1/ 1      7 x   7 x 512 ->    7 x   7 x 512 0.026 BF
 146 conv   1000       1 x 1/ 1      7 x   7 x 512 ->    7 x   7 x1000 0.050 BF
 147 avg                             7 x   7 x1000 ->   1000
 148 softmax                                        1000
 149 cost                                           1000
Total BFLOPS 0.375 
 Allocate additional workspace_size = 1.64 MB

Here are the cfg and weights.
shuffle_imagenet.cfg.txt
shuffle.weights[google] OR [baidupan]

@WongKinYiu
Copy link
Collaborator

WongKinYiu commented Aug 15, 2019

@gmayday1997 thank you for sharing the cfg file.

i checked my cfgs, it seems all models with lrelu activation function are failed.
and all models with swish activation function can converge.
i will do more experiments to make sure the reason.

the mainly difference between ur cfg and mine are:

  1. i use leaky relu activation function.
  2. i use warm up at the first 2000 epochs.
  3. there is no sse cost layer in my cfgs.
  4. i use down sampling module proposed in shuffulenet-v2.
    image

@gmayday1997
Copy link
Author

@WongKinYiu Yes, you are right. In fact, I tried to implement the proposed down sampling module, but it seems hard to converge. Do you mind show your cfg file?

@WongKinYiu
Copy link
Collaborator

WongKinYiu commented Aug 15, 2019

@gmayday1997 here is the cfg file. SNet49.cfg.txt
i implement the SNet49 of thundernet. #3380 (comment)
it gets nan after training 80k epochs.

@gmayday1997
Copy link
Author

Hi @WongKinYiu
Thank you for sharing.
I found there are no activation functions used in some layers (activation = linear). I am not sure but it may hurt gradient propagation without shortcut layer.

[convolutional]
filters=30
groups=30
size=3
stride=1
pad=1
batch_normalize=1
activation=linear

@WongKinYiu
Copy link
Collaborator

WongKinYiu commented Aug 15, 2019

@gmayday1997 Hello,

depthwise convolutional layers of shufflenetv2 do not have activation function.
image
only 1 by 1 convolutional layers use relu activation function.

@gmayday1997
Copy link
Author

@WongKinYiu There is really no activation function in depthwise convolution module. Thank you for pointing this.
Can you share the top5 precision before the model become crash?

@WongKinYiu
Copy link
Collaborator

WongKinYiu commented Aug 15, 2019

@gmayday1997 i m sorry about that i can not provide such information.
i delete all of weight files after it gets nan.
and the loss never go down when training.

@gmayday1997
Copy link
Author

@WongKinYiu so the model never learn anything from training.
What is your opinion about that route is total equal to split layer, which used in dw module?
I really doubt about that. Maybe i missed some details.

@WongKinYiu
Copy link
Collaborator

WongKinYiu commented Aug 15, 2019

@gmayday1997
yes, i use equivalent architecture composed by route layer instead of channel split layers before, and it works fine. (also, the training speed quite quicker than model using channel split layers)

module using route:
[conv * 16]
[route -2]
[conv * 16]

module using channel split:
[conv * 32]
[channel split form -1, 0~16]
[channel split from -2, 16~32]

maybe i will check the code of channel split layer and channel shuffle layer after my busy weeks.
or if u update the code in these days, i can train & check the performance is normal or not.

@gmayday1997
Copy link
Author

gmayday1997 commented Aug 15, 2019

@WongKinYiu sorry for late reply.
I mean that split layer(caffe) is equal to route?
If you found any errors about code of channel_shuffle or channel_slice, don't hesitate to tell me. Thanks in advance!!!

@WongKinYiu
Copy link
Collaborator

i do not use caffe, but the behavior of split and route are almost same.
ok.

@dexception
Copy link

hi, @WongKinYiu , I have trained model with channel_shuffle and channel_slice for 100k epochs, it got 56% top5 precision. Training is still in progress.

   layer   filters  size/strd(dil)      input                output
   0 conv     16       3 x 3/ 1    224 x 224 x   3 ->  224 x 224 x  16 0.043 BF
   1 max               2 x 2/ 2    224 x 224 x  16 ->  112 x 112 x  16 0.001 BF
   2 conv     16       1 x 1/ 1    112 x 112 x  16 ->  112 x 112 x  16 0.006 BF
   3 max               2 x 2/ 2    112 x 112 x  16 ->   56 x  56 x  16 0.000 BF
   4 conv     32       1 x 1/ 1     56 x  56 x  16 ->   56 x  56 x  32 0.003 BF
   5 conv     32/  32  3 x 3/ 1     56 x  56 x  32 ->   56 x  56 x  32 0.002 BF
   6 conv     32       1 x 1/ 1     56 x  56 x  32 ->   56 x  56 x  32 0.006 BF
   7 channel_slice             56 x  56 x  32   ->    56 x  56 x  16 
   8 channel_slice             56 x  56 x  32   ->    56 x  56 x  16 
   9 conv     16       1 x 1/ 1     56 x  56 x  16 ->   56 x  56 x  16 0.002 BF
  10 conv     16/  16  3 x 3/ 1     56 x  56 x  16 ->   56 x  56 x  16 0.001 BF
  11 conv     16       1 x 1/ 1     56 x  56 x  16 ->   56 x  56 x  16 0.002 BF
  12 route  7 11
  13 channel_shuffle                56 x  56 x  32   ->    56 x  56 x  32 
  14 channel_slice             56 x  56 x  32   ->    56 x  56 x  16 
  15 channel_slice             56 x  56 x  32   ->    56 x  56 x  16 
  16 conv     16       1 x 1/ 1     56 x  56 x  16 ->   56 x  56 x  16 0.002 BF
  17 conv     16/  16  3 x 3/ 1     56 x  56 x  16 ->   56 x  56 x  16 0.001 BF
  18 conv     16       1 x 1/ 1     56 x  56 x  16 ->   56 x  56 x  16 0.002 BF
  19 route  14 18
  20 channel_shuffle                56 x  56 x  32   ->    56 x  56 x  32 
  21 channel_slice             56 x  56 x  32   ->    56 x  56 x  16 
  22 channel_slice             56 x  56 x  32   ->    56 x  56 x  16 
  23 conv     16       1 x 1/ 1     56 x  56 x  16 ->   56 x  56 x  16 0.002 BF
  24 conv     16/  16  3 x 3/ 1     56 x  56 x  16 ->   56 x  56 x  16 0.001 BF
  25 conv     16       1 x 1/ 1     56 x  56 x  16 ->   56 x  56 x  16 0.002 BF
  26 route  21 25
  27 channel_shuffle                56 x  56 x  32   ->    56 x  56 x  32 
  28 channel_slice             56 x  56 x  32   ->    56 x  56 x  16 
  29 channel_slice             56 x  56 x  32   ->    56 x  56 x  16 
  30 conv     16       1 x 1/ 1     56 x  56 x  16 ->   56 x  56 x  16 0.002 BF
  31 conv     16/  16  3 x 3/ 1     56 x  56 x  16 ->   56 x  56 x  16 0.001 BF
  32 conv     16       1 x 1/ 1     56 x  56 x  16 ->   56 x  56 x  16 0.002 BF
  33 route  28 32
  34 conv     32       1 x 1/ 1     56 x  56 x  32 ->   56 x  56 x  32 0.006 BF
  35 conv     32/  32  3 x 3/ 1     56 x  56 x  32 ->   56 x  56 x  32 0.002 BF
  36 conv     32       1 x 1/ 1     56 x  56 x  32 ->   56 x  56 x  32 0.006 BF
  37 route  36 6
  38 max               2 x 2/ 2     56 x  56 x  64 ->   28 x  28 x  64 0.000 BF
  39 conv     64       1 x 1/ 1     28 x  28 x  64 ->   28 x  28 x  64 0.006 BF
  40 conv     64/  64  3 x 3/ 1     28 x  28 x  64 ->   28 x  28 x  64 0.001 BF
  41 conv     64       1 x 1/ 1     28 x  28 x  64 ->   28 x  28 x  64 0.006 BF
  42 channel_slice             28 x  28 x  64   ->    28 x  28 x  32 
  43 channel_slice             28 x  28 x  64   ->    28 x  28 x  32 
  44 conv     32       1 x 1/ 1     28 x  28 x  32 ->   28 x  28 x  32 0.002 BF
  45 conv     32/  32  3 x 3/ 1     28 x  28 x  32 ->   28 x  28 x  32 0.000 BF
  46 conv     32       1 x 1/ 1     28 x  28 x  32 ->   28 x  28 x  32 0.002 BF
  47 route  42 46
  48 channel_shuffle                28 x  28 x  64   ->    28 x  28 x  64 
  49 channel_slice             28 x  28 x  64   ->    28 x  28 x  32 
  50 channel_slice             28 x  28 x  64   ->    28 x  28 x  32 
  51 conv     32       1 x 1/ 1     28 x  28 x  32 ->   28 x  28 x  32 0.002 BF
  52 conv     32/  32  3 x 3/ 1     28 x  28 x  32 ->   28 x  28 x  32 0.000 BF
  53 conv     32       1 x 1/ 1     28 x  28 x  32 ->   28 x  28 x  32 0.002 BF
  54 route  49 53
  55 channel_shuffle                28 x  28 x  64   ->    28 x  28 x  64 
  56 channel_slice             28 x  28 x  64   ->    28 x  28 x  32 
  57 channel_slice             28 x  28 x  64   ->    28 x  28 x  32 
  58 conv     32       1 x 1/ 1     28 x  28 x  32 ->   28 x  28 x  32 0.002 BF
  59 conv     32/  32  3 x 3/ 1     28 x  28 x  32 ->   28 x  28 x  32 0.000 BF
  60 conv     32       1 x 1/ 1     28 x  28 x  32 ->   28 x  28 x  32 0.002 BF
  61 route  56 60
  62 channel_shuffle                28 x  28 x  64   ->    28 x  28 x  64 
  63 channel_slice             28 x  28 x  64   ->    28 x  28 x  32 
  64 channel_slice             28 x  28 x  64   ->    28 x  28 x  32 
  65 conv     32       1 x 1/ 1     28 x  28 x  32 ->   28 x  28 x  32 0.002 BF
  66 conv     32/  32  3 x 3/ 1     28 x  28 x  32 ->   28 x  28 x  32 0.000 BF
  67 conv     32       1 x 1/ 1     28 x  28 x  32 ->   28 x  28 x  32 0.002 BF
  68 route  63 67
  69 conv     64       1 x 1/ 1     28 x  28 x  64 ->   28 x  28 x  64 0.006 BF
  70 conv     64/  64  3 x 3/ 1     28 x  28 x  64 ->   28 x  28 x  64 0.001 BF
  71 conv     64       1 x 1/ 1     28 x  28 x  64 ->   28 x  28 x  64 0.006 BF
  72 route  71 41
  73 max               2 x 2/ 2     28 x  28 x 128 ->   14 x  14 x 128 0.000 BF
  74 conv    128       1 x 1/ 1     14 x  14 x 128 ->   14 x  14 x 128 0.006 BF
  75 conv    128       3 x 3/ 1     14 x  14 x 128 ->   14 x  14 x 128 0.058 BF
  76 conv    128       1 x 1/ 1     14 x  14 x 128 ->   14 x  14 x 128 0.006 BF
  77 channel_slice             14 x  14 x 128   ->    14 x  14 x  64 
  78 channel_slice             14 x  14 x 128   ->    14 x  14 x  64 
  79 conv     64       1 x 1/ 1     14 x  14 x  64 ->   14 x  14 x  64 0.002 BF
  80 conv     64/  64  3 x 3/ 1     14 x  14 x  64 ->   14 x  14 x  64 0.000 BF
  81 conv     64       1 x 1/ 1     14 x  14 x  64 ->   14 x  14 x  64 0.002 BF
  82 route  77 81
  83 channel_shuffle                14 x  14 x 128   ->    14 x  14 x 128 
  84 channel_slice             14 x  14 x 128   ->    14 x  14 x  64 
  85 channel_slice             14 x  14 x 128   ->    14 x  14 x  64 
  86 conv     64       1 x 1/ 1     14 x  14 x  64 ->   14 x  14 x  64 0.002 BF
  87 conv     64/  64  3 x 3/ 1     14 x  14 x  64 ->   14 x  14 x  64 0.000 BF
  88 conv     64       1 x 1/ 1     14 x  14 x  64 ->   14 x  14 x  64 0.002 BF
  89 route  84 88
  90 channel_shuffle                14 x  14 x 128   ->    14 x  14 x 128 
  91 channel_slice             14 x  14 x 128   ->    14 x  14 x  64 
  92 channel_slice             14 x  14 x 128   ->    14 x  14 x  64 
  93 conv     64       1 x 1/ 1     14 x  14 x  64 ->   14 x  14 x  64 0.002 BF
  94 conv     64/  64  3 x 3/ 1     14 x  14 x  64 ->   14 x  14 x  64 0.000 BF
  95 conv     64       1 x 1/ 1     14 x  14 x  64 ->   14 x  14 x  64 0.002 BF
  96 route  91 95
  97 channel_shuffle                14 x  14 x 128   ->    14 x  14 x 128 
  98 channel_slice             14 x  14 x 128   ->    14 x  14 x  64 
  99 channel_slice             14 x  14 x 128   ->    14 x  14 x  64 
 100 conv     64       1 x 1/ 1     14 x  14 x  64 ->   14 x  14 x  64 0.002 BF
 101 conv     64/  64  3 x 3/ 1     14 x  14 x  64 ->   14 x  14 x  64 0.000 BF
 102 conv     64       1 x 1/ 1     14 x  14 x  64 ->   14 x  14 x  64 0.002 BF
 103 route  98 102
 104 conv    128       1 x 1/ 1     14 x  14 x 128 ->   14 x  14 x 128 0.006 BF
 105 conv    128/ 128  3 x 3/ 1     14 x  14 x 128 ->   14 x  14 x 128 0.000 BF
 106 conv    128       1 x 1/ 1     14 x  14 x 128 ->   14 x  14 x 128 0.006 BF
 107 route  106 76
 108 max               2 x 2/ 2     14 x  14 x 256 ->    7 x   7 x 256 0.000 BF
 109 conv    256       1 x 1/ 1      7 x   7 x 256 ->    7 x   7 x 256 0.006 BF
 110 conv    256/ 256  3 x 3/ 1      7 x   7 x 256 ->    7 x   7 x 256 0.000 BF
 111 conv    256       1 x 1/ 1      7 x   7 x 256 ->    7 x   7 x 256 0.006 BF
 112 channel_slice              7 x   7 x 256   ->     7 x   7 x 128 
 113 channel_slice              7 x   7 x 256   ->     7 x   7 x 128 
 114 conv    128       1 x 1/ 1      7 x   7 x 128 ->    7 x   7 x 128 0.002 BF
 115 conv    128/ 128  3 x 3/ 1      7 x   7 x 128 ->    7 x   7 x 128 0.000 BF
 116 conv    128       1 x 1/ 1      7 x   7 x 128 ->    7 x   7 x 128 0.002 BF
 117 route  112 116
 118 channel_shuffle                 7 x   7 x 256   ->     7 x   7 x 256 
 119 channel_slice              7 x   7 x 256   ->     7 x   7 x 128 
 120 channel_slice              7 x   7 x 256   ->     7 x   7 x 128 
 121 conv    128       1 x 1/ 1      7 x   7 x 128 ->    7 x   7 x 128 0.002 BF
 122 conv    128/ 128  3 x 3/ 1      7 x   7 x 128 ->    7 x   7 x 128 0.000 BF
 123 conv    128       1 x 1/ 1      7 x   7 x 128 ->    7 x   7 x 128 0.002 BF
 124 route  119 123
 125 channel_shuffle                 7 x   7 x 256   ->     7 x   7 x 256 
 126 channel_slice              7 x   7 x 256   ->     7 x   7 x 128 
 127 channel_slice              7 x   7 x 256   ->     7 x   7 x 128 
 128 conv    128       1 x 1/ 1      7 x   7 x 128 ->    7 x   7 x 128 0.002 BF
 129 conv    128/ 128  3 x 3/ 1      7 x   7 x 128 ->    7 x   7 x 128 0.000 BF
 130 conv    128       1 x 1/ 1      7 x   7 x 128 ->    7 x   7 x 128 0.002 BF
 131 route  126 130
 132 channel_shuffle                 7 x   7 x 256   ->     7 x   7 x 256 
 133 channel_slice              7 x   7 x 256   ->     7 x   7 x 128 
 134 channel_slice              7 x   7 x 256   ->     7 x   7 x 128 
 135 conv    128       1 x 1/ 1      7 x   7 x 128 ->    7 x   7 x 128 0.002 BF
 136 conv    128/ 128  3 x 3/ 1      7 x   7 x 128 ->    7 x   7 x 128 0.000 BF
 137 conv    128       1 x 1/ 1      7 x   7 x 128 ->    7 x   7 x 128 0.002 BF
 138 route  133 137
 139 conv    256       1 x 1/ 1      7 x   7 x 256 ->    7 x   7 x 256 0.006 BF
 140 conv    256/ 256  3 x 3/ 1      7 x   7 x 256 ->    7 x   7 x 256 0.000 BF
 141 conv    256       1 x 1/ 1      7 x   7 x 256 ->    7 x   7 x 256 0.006 BF
 142 route  141 110
 143 conv    512       1 x 1/ 1      7 x   7 x 512 ->    7 x   7 x 512 0.026 BF
 144 conv    512/ 512  3 x 3/ 1      7 x   7 x 512 ->    7 x   7 x 512 0.000 BF
 145 conv    512       1 x 1/ 1      7 x   7 x 512 ->    7 x   7 x 512 0.026 BF
 146 conv   1000       1 x 1/ 1      7 x   7 x 512 ->    7 x   7 x1000 0.050 BF
 147 avg                             7 x   7 x1000 ->   1000
 148 softmax                                        1000
 149 cost                                           1000
Total BFLOPS 0.375 
 Allocate additional workspace_size = 1.64 MB

Here are the cfg and weights.
shuffle_imagenet.cfg.txt
shuffle.weights[google] OR [baidupan]

I am assuming the weights are trained on imagenet.
The original model shufflenetv2 scores about

69.06% Top 1 Accuracy
88.77% Top 5 Accuracy

@WongKinYiu
Copy link
Collaborator

@gmayday1997 I have finished training of ur provided cfg and some alternatives.

  1. shufflenet with swish activation function
    shuffle_swish.cfg.txt
    it got 31.8% top-1 acc and 55.6% top-5 acc.
  2. shufflenet with swish activation function + warm up learning rate scheduler
    shuffle_swish_warmup.txt
    it got 33.4% top-1 acc and 57.4% top-5 acc.
  3. shufflenet with leaky relu activation function
    shuffle_leaky.cfg.txt
    it got 29.0% top-1 acc and 52.1% top-5 acc.
  4. shufflenet with leaky relu activation function + warm up learning rate scheduler
    shuffle_leaky_warmup.txt
    it got 31.5% top-1 acc and 55.1% top-5 acc.

@gmayday1997
Copy link
Author

@WongKinYiu wow, thank you for sharing so valuable experiment comparisons. It seems that warm up learning gives major improvement.
Since I have only one GPU, so i will update some results after the other projects finished. Thank you again.

@AlexeyAB
Copy link
Owner

@WongKinYiu Will you try to train Detector with ShuffleNet backbone?

@WongKinYiu
Copy link
Collaborator

@AlexeyAB Hello,
Currently, the model with channel_split or channel_shuffle layers take very long training time.
it takes almost five times of training time than the model without channel_split or channel_shuffle layers on my machine.
It may spend 40 days to train an imagenet pre-trained model.

So i won't train a detector with shufflenetv2 backbone now.
but if the training speed issue can be solved, i'd like to do it.

Thanks.

@beHappy666
Copy link

@gmayday1997 Hi, i have a question about the slice layer, is it the same as slice layer in caffe which is using for cut the channels?

@gmayday1997
Copy link
Author

@beHappy666 Yes. As darknet don't support multi-ouputs, so we need pay attention to propagate gradients correctly.

[convolutional]
batch_normalize=1
filters=32
size=1
stride=1
pad=0
activation=swish

[channel_slice]
from=-1
axis=1
start=0
end=16
[channel_slice]
from=-2
axis=1
start=16
end=32

we use "from" to indicate which feature blobs are sliced and use "start" and "end" to set the slice point.

@LukeAI
Copy link

LukeAI commented Aug 20, 2019

@gmayday1997 I have finished training of ur provided cfg and some alternatives.

1. shufflenet with swish activation function
   [shuffle_swish.cfg.txt](https://github.com/AlexeyAB/darknet/files/3511851/shuffle_swish.cfg.txt)
   it got 31.8% top-1 acc and 55.6% top-5 acc.

2. shufflenet with swish activation function + warm up learning rate scheduler
   [shuffle_swish_warmup.txt](https://github.com/AlexeyAB/darknet/files/3511852/shuffle_swish_warmup.txt)
   it got 33.4% top-1 acc and 57.4% top-5 acc.

3. shufflenet with leaky relu activation function
   [shuffle_leaky.cfg.txt](https://github.com/AlexeyAB/darknet/files/3511854/shuffle_leaky.cfg.txt)
   it got 29.0% top-1 acc and 52.1% top-5 acc.

4. shufflenet with leaky relu activation function + warm up learning rate scheduler
   [shuffle_leaky_warmup.txt](https://github.com/AlexeyAB/darknet/files/3511856/shuffle_leaky_warmup.txt)
   it got 31.5% top-1 acc and 55.1% top-5 acc.

What is the inference time on these guys?

@WongKinYiu
Copy link
Collaborator

@LukeAI
u can just download and examine them.
the inference time will be different in different machines.

@beHappy666
Copy link

@gmayday1997 Ok,thank you.

@pcorner
Copy link

pcorner commented Aug 21, 2019 via email

@WongKinYiu
Copy link
Collaborator

@pcorner Do you mean https://youtu.be/sY4tLRI6pYc ?

@pcorner
Copy link

pcorner commented Aug 21, 2019 via email

@jamessmith90
Copy link

hi, @WongKinYiu , I have trained model with channel_shuffle and channel_slice for 100k epochs, it got 56% top5 precision. Training is still in progress.

   layer   filters  size/strd(dil)      input                output
   0 conv     16       3 x 3/ 1    224 x 224 x   3 ->  224 x 224 x  16 0.043 BF
   1 max               2 x 2/ 2    224 x 224 x  16 ->  112 x 112 x  16 0.001 BF
   2 conv     16       1 x 1/ 1    112 x 112 x  16 ->  112 x 112 x  16 0.006 BF
   3 max               2 x 2/ 2    112 x 112 x  16 ->   56 x  56 x  16 0.000 BF
   4 conv     32       1 x 1/ 1     56 x  56 x  16 ->   56 x  56 x  32 0.003 BF
   5 conv     32/  32  3 x 3/ 1     56 x  56 x  32 ->   56 x  56 x  32 0.002 BF
   6 conv     32       1 x 1/ 1     56 x  56 x  32 ->   56 x  56 x  32 0.006 BF
   7 channel_slice             56 x  56 x  32   ->    56 x  56 x  16 
   8 channel_slice             56 x  56 x  32   ->    56 x  56 x  16 
   9 conv     16       1 x 1/ 1     56 x  56 x  16 ->   56 x  56 x  16 0.002 BF
  10 conv     16/  16  3 x 3/ 1     56 x  56 x  16 ->   56 x  56 x  16 0.001 BF
  11 conv     16       1 x 1/ 1     56 x  56 x  16 ->   56 x  56 x  16 0.002 BF
  12 route  7 11
  13 channel_shuffle                56 x  56 x  32   ->    56 x  56 x  32 
  14 channel_slice             56 x  56 x  32   ->    56 x  56 x  16 
  15 channel_slice             56 x  56 x  32   ->    56 x  56 x  16 
  16 conv     16       1 x 1/ 1     56 x  56 x  16 ->   56 x  56 x  16 0.002 BF
  17 conv     16/  16  3 x 3/ 1     56 x  56 x  16 ->   56 x  56 x  16 0.001 BF
  18 conv     16       1 x 1/ 1     56 x  56 x  16 ->   56 x  56 x  16 0.002 BF
  19 route  14 18
  20 channel_shuffle                56 x  56 x  32   ->    56 x  56 x  32 
  21 channel_slice             56 x  56 x  32   ->    56 x  56 x  16 
  22 channel_slice             56 x  56 x  32   ->    56 x  56 x  16 
  23 conv     16       1 x 1/ 1     56 x  56 x  16 ->   56 x  56 x  16 0.002 BF
  24 conv     16/  16  3 x 3/ 1     56 x  56 x  16 ->   56 x  56 x  16 0.001 BF
  25 conv     16       1 x 1/ 1     56 x  56 x  16 ->   56 x  56 x  16 0.002 BF
  26 route  21 25
  27 channel_shuffle                56 x  56 x  32   ->    56 x  56 x  32 
  28 channel_slice             56 x  56 x  32   ->    56 x  56 x  16 
  29 channel_slice             56 x  56 x  32   ->    56 x  56 x  16 
  30 conv     16       1 x 1/ 1     56 x  56 x  16 ->   56 x  56 x  16 0.002 BF
  31 conv     16/  16  3 x 3/ 1     56 x  56 x  16 ->   56 x  56 x  16 0.001 BF
  32 conv     16       1 x 1/ 1     56 x  56 x  16 ->   56 x  56 x  16 0.002 BF
  33 route  28 32
  34 conv     32       1 x 1/ 1     56 x  56 x  32 ->   56 x  56 x  32 0.006 BF
  35 conv     32/  32  3 x 3/ 1     56 x  56 x  32 ->   56 x  56 x  32 0.002 BF
  36 conv     32       1 x 1/ 1     56 x  56 x  32 ->   56 x  56 x  32 0.006 BF
  37 route  36 6
  38 max               2 x 2/ 2     56 x  56 x  64 ->   28 x  28 x  64 0.000 BF
  39 conv     64       1 x 1/ 1     28 x  28 x  64 ->   28 x  28 x  64 0.006 BF
  40 conv     64/  64  3 x 3/ 1     28 x  28 x  64 ->   28 x  28 x  64 0.001 BF
  41 conv     64       1 x 1/ 1     28 x  28 x  64 ->   28 x  28 x  64 0.006 BF
  42 channel_slice             28 x  28 x  64   ->    28 x  28 x  32 
  43 channel_slice             28 x  28 x  64   ->    28 x  28 x  32 
  44 conv     32       1 x 1/ 1     28 x  28 x  32 ->   28 x  28 x  32 0.002 BF
  45 conv     32/  32  3 x 3/ 1     28 x  28 x  32 ->   28 x  28 x  32 0.000 BF
  46 conv     32       1 x 1/ 1     28 x  28 x  32 ->   28 x  28 x  32 0.002 BF
  47 route  42 46
  48 channel_shuffle                28 x  28 x  64   ->    28 x  28 x  64 
  49 channel_slice             28 x  28 x  64   ->    28 x  28 x  32 
  50 channel_slice             28 x  28 x  64   ->    28 x  28 x  32 
  51 conv     32       1 x 1/ 1     28 x  28 x  32 ->   28 x  28 x  32 0.002 BF
  52 conv     32/  32  3 x 3/ 1     28 x  28 x  32 ->   28 x  28 x  32 0.000 BF
  53 conv     32       1 x 1/ 1     28 x  28 x  32 ->   28 x  28 x  32 0.002 BF
  54 route  49 53
  55 channel_shuffle                28 x  28 x  64   ->    28 x  28 x  64 
  56 channel_slice             28 x  28 x  64   ->    28 x  28 x  32 
  57 channel_slice             28 x  28 x  64   ->    28 x  28 x  32 
  58 conv     32       1 x 1/ 1     28 x  28 x  32 ->   28 x  28 x  32 0.002 BF
  59 conv     32/  32  3 x 3/ 1     28 x  28 x  32 ->   28 x  28 x  32 0.000 BF
  60 conv     32       1 x 1/ 1     28 x  28 x  32 ->   28 x  28 x  32 0.002 BF
  61 route  56 60
  62 channel_shuffle                28 x  28 x  64   ->    28 x  28 x  64 
  63 channel_slice             28 x  28 x  64   ->    28 x  28 x  32 
  64 channel_slice             28 x  28 x  64   ->    28 x  28 x  32 
  65 conv     32       1 x 1/ 1     28 x  28 x  32 ->   28 x  28 x  32 0.002 BF
  66 conv     32/  32  3 x 3/ 1     28 x  28 x  32 ->   28 x  28 x  32 0.000 BF
  67 conv     32       1 x 1/ 1     28 x  28 x  32 ->   28 x  28 x  32 0.002 BF
  68 route  63 67
  69 conv     64       1 x 1/ 1     28 x  28 x  64 ->   28 x  28 x  64 0.006 BF
  70 conv     64/  64  3 x 3/ 1     28 x  28 x  64 ->   28 x  28 x  64 0.001 BF
  71 conv     64       1 x 1/ 1     28 x  28 x  64 ->   28 x  28 x  64 0.006 BF
  72 route  71 41
  73 max               2 x 2/ 2     28 x  28 x 128 ->   14 x  14 x 128 0.000 BF
  74 conv    128       1 x 1/ 1     14 x  14 x 128 ->   14 x  14 x 128 0.006 BF
  75 conv    128       3 x 3/ 1     14 x  14 x 128 ->   14 x  14 x 128 0.058 BF
  76 conv    128       1 x 1/ 1     14 x  14 x 128 ->   14 x  14 x 128 0.006 BF
  77 channel_slice             14 x  14 x 128   ->    14 x  14 x  64 
  78 channel_slice             14 x  14 x 128   ->    14 x  14 x  64 
  79 conv     64       1 x 1/ 1     14 x  14 x  64 ->   14 x  14 x  64 0.002 BF
  80 conv     64/  64  3 x 3/ 1     14 x  14 x  64 ->   14 x  14 x  64 0.000 BF
  81 conv     64       1 x 1/ 1     14 x  14 x  64 ->   14 x  14 x  64 0.002 BF
  82 route  77 81
  83 channel_shuffle                14 x  14 x 128   ->    14 x  14 x 128 
  84 channel_slice             14 x  14 x 128   ->    14 x  14 x  64 
  85 channel_slice             14 x  14 x 128   ->    14 x  14 x  64 
  86 conv     64       1 x 1/ 1     14 x  14 x  64 ->   14 x  14 x  64 0.002 BF
  87 conv     64/  64  3 x 3/ 1     14 x  14 x  64 ->   14 x  14 x  64 0.000 BF
  88 conv     64       1 x 1/ 1     14 x  14 x  64 ->   14 x  14 x  64 0.002 BF
  89 route  84 88
  90 channel_shuffle                14 x  14 x 128   ->    14 x  14 x 128 
  91 channel_slice             14 x  14 x 128   ->    14 x  14 x  64 
  92 channel_slice             14 x  14 x 128   ->    14 x  14 x  64 
  93 conv     64       1 x 1/ 1     14 x  14 x  64 ->   14 x  14 x  64 0.002 BF
  94 conv     64/  64  3 x 3/ 1     14 x  14 x  64 ->   14 x  14 x  64 0.000 BF
  95 conv     64       1 x 1/ 1     14 x  14 x  64 ->   14 x  14 x  64 0.002 BF
  96 route  91 95
  97 channel_shuffle                14 x  14 x 128   ->    14 x  14 x 128 
  98 channel_slice             14 x  14 x 128   ->    14 x  14 x  64 
  99 channel_slice             14 x  14 x 128   ->    14 x  14 x  64 
 100 conv     64       1 x 1/ 1     14 x  14 x  64 ->   14 x  14 x  64 0.002 BF
 101 conv     64/  64  3 x 3/ 1     14 x  14 x  64 ->   14 x  14 x  64 0.000 BF
 102 conv     64       1 x 1/ 1     14 x  14 x  64 ->   14 x  14 x  64 0.002 BF
 103 route  98 102
 104 conv    128       1 x 1/ 1     14 x  14 x 128 ->   14 x  14 x 128 0.006 BF
 105 conv    128/ 128  3 x 3/ 1     14 x  14 x 128 ->   14 x  14 x 128 0.000 BF
 106 conv    128       1 x 1/ 1     14 x  14 x 128 ->   14 x  14 x 128 0.006 BF
 107 route  106 76
 108 max               2 x 2/ 2     14 x  14 x 256 ->    7 x   7 x 256 0.000 BF
 109 conv    256       1 x 1/ 1      7 x   7 x 256 ->    7 x   7 x 256 0.006 BF
 110 conv    256/ 256  3 x 3/ 1      7 x   7 x 256 ->    7 x   7 x 256 0.000 BF
 111 conv    256       1 x 1/ 1      7 x   7 x 256 ->    7 x   7 x 256 0.006 BF
 112 channel_slice              7 x   7 x 256   ->     7 x   7 x 128 
 113 channel_slice              7 x   7 x 256   ->     7 x   7 x 128 
 114 conv    128       1 x 1/ 1      7 x   7 x 128 ->    7 x   7 x 128 0.002 BF
 115 conv    128/ 128  3 x 3/ 1      7 x   7 x 128 ->    7 x   7 x 128 0.000 BF
 116 conv    128       1 x 1/ 1      7 x   7 x 128 ->    7 x   7 x 128 0.002 BF
 117 route  112 116
 118 channel_shuffle                 7 x   7 x 256   ->     7 x   7 x 256 
 119 channel_slice              7 x   7 x 256   ->     7 x   7 x 128 
 120 channel_slice              7 x   7 x 256   ->     7 x   7 x 128 
 121 conv    128       1 x 1/ 1      7 x   7 x 128 ->    7 x   7 x 128 0.002 BF
 122 conv    128/ 128  3 x 3/ 1      7 x   7 x 128 ->    7 x   7 x 128 0.000 BF
 123 conv    128       1 x 1/ 1      7 x   7 x 128 ->    7 x   7 x 128 0.002 BF
 124 route  119 123
 125 channel_shuffle                 7 x   7 x 256   ->     7 x   7 x 256 
 126 channel_slice              7 x   7 x 256   ->     7 x   7 x 128 
 127 channel_slice              7 x   7 x 256   ->     7 x   7 x 128 
 128 conv    128       1 x 1/ 1      7 x   7 x 128 ->    7 x   7 x 128 0.002 BF
 129 conv    128/ 128  3 x 3/ 1      7 x   7 x 128 ->    7 x   7 x 128 0.000 BF
 130 conv    128       1 x 1/ 1      7 x   7 x 128 ->    7 x   7 x 128 0.002 BF
 131 route  126 130
 132 channel_shuffle                 7 x   7 x 256   ->     7 x   7 x 256 
 133 channel_slice              7 x   7 x 256   ->     7 x   7 x 128 
 134 channel_slice              7 x   7 x 256   ->     7 x   7 x 128 
 135 conv    128       1 x 1/ 1      7 x   7 x 128 ->    7 x   7 x 128 0.002 BF
 136 conv    128/ 128  3 x 3/ 1      7 x   7 x 128 ->    7 x   7 x 128 0.000 BF
 137 conv    128       1 x 1/ 1      7 x   7 x 128 ->    7 x   7 x 128 0.002 BF
 138 route  133 137
 139 conv    256       1 x 1/ 1      7 x   7 x 256 ->    7 x   7 x 256 0.006 BF
 140 conv    256/ 256  3 x 3/ 1      7 x   7 x 256 ->    7 x   7 x 256 0.000 BF
 141 conv    256       1 x 1/ 1      7 x   7 x 256 ->    7 x   7 x 256 0.006 BF
 142 route  141 110
 143 conv    512       1 x 1/ 1      7 x   7 x 512 ->    7 x   7 x 512 0.026 BF
 144 conv    512/ 512  3 x 3/ 1      7 x   7 x 512 ->    7 x   7 x 512 0.000 BF
 145 conv    512       1 x 1/ 1      7 x   7 x 512 ->    7 x   7 x 512 0.026 BF
 146 conv   1000       1 x 1/ 1      7 x   7 x 512 ->    7 x   7 x1000 0.050 BF
 147 avg                             7 x   7 x1000 ->   1000
 148 softmax                                        1000
 149 cost                                           1000
Total BFLOPS 0.375 
 Allocate additional workspace_size = 1.64 MB

Here are the cfg and weights.
shuffle_imagenet.cfg.txt
shuffle.weights[google] OR [baidupan]

Can you update this information.

@dexception
Copy link

@AlexeyAB
Will this be merged ?

@AlexeyAB
Copy link
Owner

@dexception @gmayday1997 @WongKinYiu

Just we should understand if it is necessary.

What Top1/5 or mAP can be achieved with shuffle net?

@WongKinYiu
Copy link
Collaborator

WongKinYiu commented Sep 20, 2019

@AlexeyAB

framework type top-1 acc.
Darknet.CG split+shuffle 60.4%
Darknet equivalent split 69.2%
Pytorch split+shuffle 69.54%
Pytorch equivalent split 69.48%

so i think there r some problems in Darknet.CG
but i do not find the problems in c code of channel split and channel shuffle in Darknet.CG

update
channel split works fine (but slow)
maybe the problem is in channel shuffle layer.

@AlexeyAB
Copy link
Owner

@WongKinYiu Yes, it seems something wrong with Darknet.CG

Can you provide the model (cfg + weights) for Darknet | equivalent split | 69.2% there ? #3874
Since there is only 52% Top1 :

Model BFLOPs Inference Time (ms) Top1, % URL
shufflenetv2 and weights 0.375 32 52% URL

@WongKinYiu
Copy link
Collaborator

i can not share the cfg file currently.
the bflops of the model is about 0.8, so 69% top1 acc is normal.

@AlexeyAB
Copy link
Owner

@WongKinYiu
Copy link
Collaborator

Yes, it is different.
I do not use depth-wise convolutional layers.
The model is modified from the paper of https://github.com/WongKinYiu/PartialResidualNetworks

@AlexeyAB
Copy link
Owner

@WongKinYiu

Do you have plan to measure Inference time or FPS on CPU and GPU in addition to the BFLOPs for all these https://github.com/WongKinYiu/PartialResidualNetworks models?

@WongKinYiu
Copy link
Collaborator

@AlexeyAB OK, i have updated fps information.

@dexception
Copy link

@WongKinYiu

Have you integrated Yolo with Shufflenetv2 ?
if yes, can you share what FPS your getting with Yolo-ShuffleNetv2 ?

@WongKinYiu
Copy link
Collaborator

@dexception

No, I haven't.
If you could help for making accuracy of shufflenetv2 on imagenet to be normal.
I would like to integrate it.

@dexception
Copy link

We haven't looked at operator fusion and still dependent on TVM or TensorRT for that. Won't be a bad idea to look into that. The benefits would apply to all almost everything we are doing.

@AlexeyAB
Is it possible to implement in this repo ?
Is it too much of an effort ?

@AlexeyAB
Copy link
Owner

@dexception
I don't know, should we do this?

If the model Darknet | equivalent split | 69.2% (which already works with this repo) gives us the same accuracy and the same speed, then why should we implement channel_split + shuffle for shuffle_net?

@WongKinYiu
Can you provide FPS on GPU and CPU for these models?
#3750 (comment)

@WongKinYiu
Copy link
Collaborator

@AlexeyAB

GPU: ~120 FPS; CPU: ~7 FPS
when applied PRN head and using input size as 416 by 416.

@AlexeyAB
Copy link
Owner

@WongKinYiu Thanks.

But what FPS GPU/CPU for other models?
Is Darknet | equivalent split | 69.2% faster or slower than other models from this table?

@WongKinYiu
Copy link
Collaborator

@AlexeyAB
Yes, It is the fastest model in the table.
GPU: GTX 1080ti, CPU: i7 6700

I am in ICIP now, so I can not provide the exact FPS of those models immediately.

@AlexeyAB
Copy link
Owner

@dexception
@WongKinYiu In this case, we should not implement layers channel_split + shuffle. Just wait until you put this model in open access. Since it already works in this repository without any changes to the source code, as I understand it.

@dexception
Copy link

@AlexeyAB
At minimum we should be able to get 1.5x increase in FPS.

https://github.com/NVIDIA/TensorRT is now open source so it won't be bad idea.

@deimsdeutsch
Copy link

@WongKinYiu
Can you share the cfg file ? I would like to try it on my dataset.

@WongKinYiu
Copy link
Collaborator

@deimsdeutsch

Sorry for that i can not share the cfg file for #3750 (comment)

For cfg of sufflenetv2, you can check #3750 (comment)

But currently, i do not suggest you train these models.
The channel split layer and channel shuffle layer seems have some problems. #3750 (comment)

Maybe mobilenetv2 is more stable on darknet now.
https://github.com/WePCf/darknet-mobilenet-v2 for your reference

@deimsdeutsch
Copy link

@WongKinYiu
MobileNetV2 seems to suffer the same issue with group convolution implementation.

@WongKinYiu
Copy link
Collaborator

WongKinYiu commented Sep 26, 2019

@dexception yes.

for general gpus, resnet18 is a good choice.
it can run >140 fps on gpu and get 28.1 ap@.5:.95 on coco using centernet.
https://github.com/xingyizhou/CenterNet

i m also training resnet18 based models now.

@deimsdeutsch
Copy link

Since you mentioned Resnet18. Nvidia is using Resnet10 with deepstream4.
On Tesla T4 they can manage 35-68 streams running at 30FPS all of them 1080P.

Here is the model they are using:
https://ngc.nvidia.com/catalog/models/nvidia:tlt_iva_object_detection_resnet10

@WongKinYiu
Copy link
Collaborator

Thank you for sharing the information

@dexception
Copy link

@deimsdeutsch
Nice share. Just ran my demo and this is where all the magic is happening.
Custom plugins for TensorRT is where you should all dig and stay all night.

@spaul13
Copy link

spaul13 commented Jan 31, 2020

@AlexeyAB @gmayday1997 @WongKinYiu @dexception @beHappy666 while using shuffle_swiss.cfg. (provided by @WongKinYiu) as my configuration file, I am getting the following error.

setting up CUDA Devices compute_capability = 610, cudnn_half = 0
layer filters size/strd(dil) input output
0 conv 16 3 x 3/ 1 224 x 224 x 3 -> 224 x 224 x 16 0.043 BF
1 max 2x 2/ 2 224 x 224 x 16 -> 112 x 112 x 16 0.001 BF
2 conv 16 1 x 1/ 1 112 x 112 x 16 -> 112 x 112 x 16 0.006 BF
3 max 2x 2/ 2 112 x 112 x 16 -> 56 x 56 x 16 0.000 BF
4 conv 32 1 x 1/ 1 56 x 56 x 16 -> 56 x 56 x 32 0.003 BF
5 conv 32/ 32 3 x 3/ 1 56 x 56 x 32 -> 56 x 56 x 32 0.002 BF
6 conv 32 1 x 1/ 1 56 x 56 x 32 -> 56 x 56 x 32 0.006 BF
7 Type not recognized: [channel_slice]
Unused field: 'from = -1'
Unused field: 'axis = 1'
Unused field: 'start = 0'
Unused field: 'end = 16'
8 Type not recognized: [channel_slice]
Unused field: 'from = -2'
Unused field: 'axis = 1'
Unused field: 'start = 16'
Unused field: 'end = 32'
9 Layer before convolutional layer must output image.: No error
Assertion failed: 0, file c:\yolo\darknet\src\utils.c, line 293

can anyone plz tell me how to get rid of this error?
shuffle_swish.cfg.txt

@WongKinYiu
Copy link
Collaborator

Hello, this cfg seems for https://github.com/gmayday1997/darknet.CG.

@spaul13
Copy link

spaul13 commented Jan 31, 2020

@WongKinYiu, @AlexeyAB is there any shufflenet_swiss.cfg file for this darknet repo?

@AlexeyAB
Copy link
Owner

@spaul13 No, since I have not seen evidence that this is better than SOTA models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
want enhancement Want to improve accuracy, speed or functionality
Projects
None yet
Development

No branches or pull requests

10 participants