Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EfficientNet | Implementation ? #3380

Closed
dexception opened this issue Jun 11, 2019 · 218 comments
Closed

EfficientNet | Implementation ? #3380

dexception opened this issue Jun 11, 2019 · 218 comments

Comments

@dexception
Copy link

https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet
https://ai.googleblog.com/2019/05/efficientnet-improving-accuracy-and.html
https://www.youtube.com/watch?v=3svIm5UC94I

This is good.

@AlexeyAB AlexeyAB added the want enhancement Want to improve accuracy, speed or functionality label Jun 11, 2019
@nseidl
Copy link

nseidl commented Jun 12, 2019

+1

@AlexeyAB
Copy link
Owner

AlexeyAB commented Jun 12, 2019

Paper: https://arxiv.org/abs/1905.11946v2

Classifier

While (Official) EfficientNetB0 (224x224) 0.78 BFLOPS - 0.39 FMA, 5.3M params - that is trained by official code https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet with batch size equals to 256 has lower accuracy: 70.0% Top1 and 88.9% Top5


Detector - 3.7 BFLOPs, 45.0 mAP@0.5 on COCO test-dev.

aaa


efficientnet-lite3-leaky.cfg: top-1 73.0%, top-5 92.4%. - change relu6 to leaky: activation=leaky https://github.com/AlexeyAB/darknet/blob/master/cfg/efficientnet-lite3.cfg


Classifiers: - Can be trained on ImageNet(ILSVRC2012) by using 4 x GPU 2080 TI:


Training command:
./darknet classifier train cfg/imagenet1k_c.data cfg/efficientnet_b0.cfg -topk

Continue training:
./darknet classifier train cfg/imagenet1k_c.data cfg/efficientnet_b0.cfg backup/efficientnet_b0_last.weights -topk

Content of imagenet1k_c.data:

classes=1000
train  = data/imagenet1k.train_c.list
valid  = data/inet.val_c.list
backup = backup
labels = data/imagenet.labels.list
names  = data/imagenet.shortnames.list
top=5

Dataset - each image in imagenet1k.train_c.list and inet.val_c.list has one of 1000 labels from imagenet.labels.list, for example n01440764

More: http://www.image-net.org/challenges/LSVRC/2012/nonpub-downloads


Models: https://github.com/tensorflow/tpu/blob/05f7b15cdf0ae36bac84beb4aef0a09983ce8f66/models/official/efficientnet/efficientnet_builder.py#L28-L39

      # (width_coefficient, depth_coefficient, resolution, dropout_rate)
      'efficientnet-b0': (1.0, 1.0, 224, 0.2),
      'efficientnet-b1': (1.0, 1.1, 240, 0.2),
      'efficientnet-b2': (1.1, 1.2, 260, 0.3),
      'efficientnet-b3': (1.2, 1.4, 300, 0.3),
      'efficientnet-b4': (1.4, 1.8, 380, 0.4),
      'efficientnet-b5': (1.6, 2.2, 456, 0.4),
      'efficientnet-b6': (1.8, 2.6, 528, 0.5),
      'efficientnet-b7': (2.0, 3.1, 600, 0.5),
CLICK ME - EfficientNet B0 model details
#alpha=1.2, beta=1.1, gamma=1.15
#d=pow(alpha, fi), w=pow(beta, fi), r=pow(gamma, fi)
#fi=0, d=1.0, w=1.0, r=1.0 - theoretically
# in practice: https://github.com/tensorflow/tpu/blob/05f7b15cdf0ae36bac84beb4aef0a09983ce8f66/models/official/efficientnet/efficientnet_builder.py#L28-L40
# 'efficientnet-b0': (1.0, 1.0, 224, 0.2):
#  width=1.0,  depth=1.0, resolution=224,  dropout=0.2

BLOCKS 1 - 7:
      'r1_k3_s11_e1_i32_o16_se0.25', 'r2_k3_s22_e6_i16_o24_se0.25',
      'r2_k5_s22_e6_i24_o40_se0.25', 'r3_k3_s22_e6_i40_o80_se0.25',
      'r3_k5_s11_e6_i80_o112_se0.25', 'r4_k5_s22_e6_i112_o192_se0.25',
      'r1_k3_s11_e6_i192_o320_se0.25',

In details: https://github.com/tensorflow/tpu/blob/05f7b15cdf0ae36bac84beb4aef0a09983ce8f66/models/official/efficientnet/efficientnet_builder.py#L61-L69

BLOCK-1

# r1_k3_s11_e1_i32_o16_se0.25
#    return efficientnet_model.BlockArgs(
#        kernel_size=int(options['k']),				kernel_size = 3
#        num_repeat=int(options['r']),				num_repeat = 1
#        input_filters=int(options['i']),			input_filters = 32
#        output_filters=int(options['o']),			output_filters = 16
#        expand_ratio=int(options['e']),			expand_ratio = 1
#        id_skip=('noskip' not in block_string),
#        se_ratio=float(options['se']) if 'se' in options else None,	se_ratio = 0.25
#        strides=[int(options['s'][0]), int(options['s'][1])])			strides = 1,1


BLOCK-2

# r2_k3_s22_e6_i16_o24_se0.25
#    return efficientnet_model.BlockArgs(
#        kernel_size=int(options['k']),				kernel_size = 3
#        num_repeat=int(options['r']),				num_repeat = 2
#        input_filters=int(options['i']),			input_filters = 16
#        output_filters=int(options['o']),			output_filters = 24
#        expand_ratio=int(options['e']),			expand_ratio = 6
#        id_skip=('noskip' not in block_string),
#        se_ratio=float(options['se']) if 'se' in options else None,	se_ratio = 0.25
#        strides=[int(options['s'][0]), int(options['s'][1])])			strides = 2,2



BLOCK-3

# r2_k5_s22_e6_i24_o40_se0.25
#    return efficientnet_model.BlockArgs(
#        kernel_size=int(options['k']),				kernel_size = 5
#        num_repeat=int(options['r']),				num_repeat = 2
#        input_filters=int(options['i']),			input_filters = 24
#        output_filters=int(options['o']),			output_filters = 40
#        expand_ratio=int(options['e']),			expand_ratio = 6
#        id_skip=('noskip' not in block_string),
#        se_ratio=float(options['se']) if 'se' in options else None,	se_ratio = 0.25
#        strides=[int(options['s'][0]), int(options['s'][1])])			strides = 2,2



BLOCK-4

# r3_k3_s22_e6_i40_o80_se0.25
#    return efficientnet_model.BlockArgs(
#        kernel_size=int(options['k']),				kernel_size = 3
#        num_repeat=int(options['r']),				num_repeat = 3
#        input_filters=int(options['i']),			input_filters = 40
#        output_filters=int(options['o']),			output_filters = 80
#        expand_ratio=int(options['e']),			expand_ratio = 6
#        id_skip=('noskip' not in block_string),
#        se_ratio=float(options['se']) if 'se' in options else None,	se_ratio = 0.25
#        strides=[int(options['s'][0]), int(options['s'][1])])			strides = 2,2



BLOCK-5

# r3_k5_s11_e6_i80_o112_se0.25
#    return efficientnet_model.BlockArgs(
#        kernel_size=int(options['k']),				kernel_size = 5
#        num_repeat=int(options['r']),				num_repeat = 3
#        input_filters=int(options['i']),			input_filters = 80
#        output_filters=int(options['o']),			output_filters = 112
#        expand_ratio=int(options['e']),			expand_ratio = 6
#        id_skip=('noskip' not in block_string),
#        se_ratio=float(options['se']) if 'se' in options else None,	se_ratio = 0.25
#        strides=[int(options['s'][0]), int(options['s'][1])])			strides = 1,1



BLOCK-6

# r4_k5_s22_e6_i112_o192_se0.25
#    return efficientnet_model.BlockArgs(
#        kernel_size=int(options['k']),				kernel_size = 5
#        num_repeat=int(options['r']),				num_repeat = 4
#        input_filters=int(options['i']),			input_filters = 112
#        output_filters=int(options['o']),			output_filters = 192
#        expand_ratio=int(options['e']),			expand_ratio = 6
#        id_skip=('noskip' not in block_string),
#        se_ratio=float(options['se']) if 'se' in options else None,	se_ratio = 0.25
#        strides=[int(options['s'][0]), int(options['s'][1])])			strides = 2,2



BLOCK-7

# r1_k3_s11_e6_i192_o320_se0.25
#    return efficientnet_model.BlockArgs(
#        kernel_size=int(options['k']),				kernel_size = 3
#        num_repeat=int(options['r']),				num_repeat = 1
#        input_filters=int(options['i']),			input_filters = 192
#        output_filters=int(options['o']),			output_filters = 320
#        expand_ratio=int(options['e']),			expand_ratio = 6
#        id_skip=('noskip' not in block_string),
#        se_ratio=float(options['se']) if 'se' in options else None,	se_ratio = 0.25
#        strides=[int(options['s'][0]), int(options['s'][1])])			strides = 1,1

efficientnet_b0_ext


CLICK ME - EfficientNet B3 model details
#alpha=1.2, beta=1.1, gamma=1.15
#d=pow(alpha, fi), w=pow(beta, fi), r=pow(gamma, fi)
#fi=3, d=1.73, w=1.33, r=1.52 - theoretically
# in practice: https://github.com/tensorflow/tpu/blob/05f7b15cdf0ae36bac84beb4aef0a09983ce8f66/models/official/efficientnet/efficientnet_builder.py#L28-L40
# 'efficientnet-b3': (1.2, 1.4, 300, 0.3):
#  width=1.2,  depth=1.4, resolution=300 (320),  dropout=0.3

# https://github.com/tensorflow/tpu/blob/05f7b15cdf0ae36bac84beb4aef0a09983ce8f66/models/official/efficientnet/efficientnet_model.py#L120-L125
# repeats_new = int(math.ceil(depth * repeats)) ### ceil - Rounds x upward,

# https://github.com/tensorflow/tpu/blob/05f7b15cdf0ae36bac84beb4aef0a09983ce8f66/models/official/efficientnet/efficientnet_builder.py#L134-L137
#      width_coefficient=width_coefficient,
#      depth_coefficient=depth_coefficient,
#      depth_divisor=8,
#      min_depth=None)
#
# https://github.com/tensorflow/tpu/blob/05f7b15cdf0ae36bac84beb4aef0a09983ce8f66/models/official/efficientnet/efficientnet_model.py#L101-L117
# multiplier = width_coefficient = 1.2
# divisor = 8
# min_depth = none
# min_depth = divisor = 8

filters = filters * 1.2
new_filters = max(8, (int(filters + 4) // 8) * 8)   ## //===floor in this case
if new_filters < 0.9 * filters:  new_filters += 8

16 *1.2=19,2
new_filters = max(8, int(19,2+4)//8  * 8) = 16 (>=16)

24 *1.2=28,8
new_filters = max(8, int(28,8+4)//8  * 8) = 32 (>24)

32 *1.2=38,4
new_filters = max(8, int(38,4+4)//8  * 8) = 40 (>32)

40 *1.2=48
new_filters = max(8, int(48+4)//8  * 8) = 48 (>40)

80 *1.2=96
new_filters = max(8, int(96+4)//8  * 8) = 96 (>80)

112 *1.2=134,4
new_filters = max(8, int(134,4+4)//8  * 8) = 136 (>112)

192 *1.2=230,4
new_filters = max(8, int(230,4+4)//8  * 8) = 232 (>192)

320 *1.2=384
new_filters = max(8, int(384+4)//8  * 8) = 384 (>320)



8 *1.2=9,6
new_filters = max(8, int(9,6+4)//8  * 8) = 8 (==8)

64 *1.2=76,8
new_filters = max(8, int(76,8+4)//8  * 8) = 80 (>64)

96 *1.2=115,2
new_filters = max(8, int(115,2+4)//8  * 8) = 112 (>96)


144 *1.2=172,8
new_filters = max(8, int(172,8+4)//8  * 8) = 176 (>144)


384 *1.2=460,8
new_filters = max(8, int(460,8+4)//8  * 8) = 464 (>384)

576 *1.2=691,2
new_filters = max(8, int(691,2+4)//8  * 8) = 688 (>576)

960 *1.2=1152
new_filters = max(8, int(1152+4)//8  * 8) = 1152 (>960)

1280 *1.2=1536
new_filters = max(8, int(1536+4)//8  * 8) = 1536 (>1280)




BLOCKS 1 - 7: (for b0)
      'r2_k3_s11_e1_i32_o16_se0.25', 'r4_k3_s22_e6_i16_o24_se0.25',
      'r4_k5_s22_e6_i24_o40_se0.25', 'r6_k3_s22_e6_i40_o80_se0.25',
      'r6_k5_s11_e6_i80_o112_se0.25', 'r8_k5_s22_e6_i112_o192_se0.25',
      'r2_k3_s11_e6_i192_o320_se0.25',

In details: https://github.com/tensorflow/tpu/blob/05f7b15cdf0ae36bac84beb4aef0a09983ce8f66/models/official/efficientnet/efficientnet_builder.py#L61-L69

BLOCK-1

# r1_k3_s11_e1_i32_o16_se0.25
#    return efficientnet_model.BlockArgs(
#        kernel_size=int(options['k']),				kernel_size = 3
#        num_repeat=int(options['r']),				num_repeat = 2	//1
#        input_filters=int(options['i']),			input_filters = 40	//32
#        output_filters=int(options['o']),			output_filters = 16	//16
#        expand_ratio=int(options['e']),			expand_ratio = 1
#        id_skip=('noskip' not in block_string),
#        se_ratio=float(options['se']) if 'se' in options else None,	se_ratio = 0.25
#        strides=[int(options['s'][0]), int(options['s'][1])])			strides = 1,1


BLOCK-2

# r2_k3_s22_e6_i16_o24_se0.25
#    return efficientnet_model.BlockArgs(
#        kernel_size=int(options['k']),				kernel_size = 3
#        num_repeat=int(options['r']),				num_repeat = 3	//2
#        input_filters=int(options['i']),			input_filters = 16	//16
#        output_filters=int(options['o']),			output_filters = 32	//24
#        expand_ratio=int(options['e']),			expand_ratio = 6
#        id_skip=('noskip' not in block_string),
#        se_ratio=float(options['se']) if 'se' in options else None,	se_ratio = 0.25
#        strides=[int(options['s'][0]), int(options['s'][1])])			strides = 2,2



BLOCK-3

# r2_k5_s22_e6_i24_o40_se0.25
#    return efficientnet_model.BlockArgs(
#        kernel_size=int(options['k']),				kernel_size = 5
#        num_repeat=int(options['r']),				num_repeat = 3	//2
#        input_filters=int(options['i']),			input_filters = 32	//24
#        output_filters=int(options['o']),			output_filters = 48	//40
#        expand_ratio=int(options['e']),			expand_ratio = 6
#        id_skip=('noskip' not in block_string),
#        se_ratio=float(options['se']) if 'se' in options else None,	se_ratio = 0.25
#        strides=[int(options['s'][0]), int(options['s'][1])])			strides = 2,2



BLOCK-4

# r3_k3_s22_e6_i40_o80_se0.25
#    return efficientnet_model.BlockArgs(
#        kernel_size=int(options['k']),				kernel_size = 3
#        num_repeat=int(options['r']),				num_repeat = 5	//3
#        input_filters=int(options['i']),			input_filters = 48		//40
#        output_filters=int(options['o']),			output_filters = 96		//80
#        expand_ratio=int(options['e']),			expand_ratio = 6
#        id_skip=('noskip' not in block_string),
#        se_ratio=float(options['se']) if 'se' in options else None,	se_ratio = 0.25
#        strides=[int(options['s'][0]), int(options['s'][1])])			strides = 2,2



BLOCK-5

# r3_k5_s11_e6_i80_o112_se0.25
#    return efficientnet_model.BlockArgs(
#        kernel_size=int(options['k']),				kernel_size = 5
#        num_repeat=int(options['r']),				num_repeat = 5	//3
#        input_filters=int(options['i']),			input_filters = 96		//80
#        output_filters=int(options['o']),			output_filters = 136	//112
#        expand_ratio=int(options['e']),			expand_ratio = 6
#        id_skip=('noskip' not in block_string),
#        se_ratio=float(options['se']) if 'se' in options else None,	se_ratio = 0.25
#        strides=[int(options['s'][0]), int(options['s'][1])])			strides = 1,1



BLOCK-6

# r4_k5_s22_e6_i112_o192_se0.25
#    return efficientnet_model.BlockArgs(
#        kernel_size=int(options['k']),				kernel_size = 5
#        num_repeat=int(options['r']),				num_repeat = 6	//4
#        input_filters=int(options['i']),			input_filters = 136		//112
#        output_filters=int(options['o']),			output_filters = 232	//192
#        expand_ratio=int(options['e']),			expand_ratio = 6
#        id_skip=('noskip' not in block_string),
#        se_ratio=float(options['se']) if 'se' in options else None,	se_ratio = 0.25
#        strides=[int(options['s'][0]), int(options['s'][1])])			strides = 2,2



BLOCK-7

# r1_k3_s11_e6_i192_o320_se0.25
#    return efficientnet_model.BlockArgs(
#        kernel_size=int(options['k']),				kernel_size = 3
#        num_repeat=int(options['r']),				num_repeat = 2	//1
#        input_filters=int(options['i']),			input_filters = 232		//192
#        output_filters=int(options['o']),			output_filters = 384	//320
#        expand_ratio=int(options['e']),			expand_ratio = 6
#        id_skip=('noskip' not in block_string),
#        se_ratio=float(options['se']) if 'se' in options else None,	se_ratio = 0.25
#        strides=[int(options['s'][0]), int(options['s'][1])])			strides = 1,1

efficientnet_b3


CLICK ME - EfficientNet B4 model details
#alpha=1.2, beta=1.1, gamma=1.15
#d=pow(alpha, fi), w=pow(beta, fi), r=pow(gamma, fi)
#fi=4, d=2.07, w=1.46, r=1.75 - theoretically
# in practice: https://github.com/tensorflow/tpu/blob/05f7b15cdf0ae36bac84beb4aef0a09983ce8f66/models/official/efficientnet/efficientnet_builder.py#L28-L40
# efficientnet-b4': (1.4, 1.8, 380, 0.4):
#  width=1.4,  depth=1.8, resolution=380,  dropout=0.4

# https://github.com/tensorflow/tpu/blob/05f7b15cdf0ae36bac84beb4aef0a09983ce8f66/models/official/efficientnet/efficientnet_model.py#L120-L125
# repeats_new = int(math.ceil(depth * repeats)) ### ceil - Rounds x upward,

# https://github.com/tensorflow/tpu/blob/05f7b15cdf0ae36bac84beb4aef0a09983ce8f66/models/official/efficientnet/efficientnet_builder.py#L134-L137
#      width_coefficient=width_coefficient,
#      depth_coefficient=depth_coefficient,
#      depth_divisor=8,
#      min_depth=None)
#
# https://github.com/tensorflow/tpu/blob/05f7b15cdf0ae36bac84beb4aef0a09983ce8f66/models/official/efficientnet/efficientnet_model.py#L101-L117
# multiplier = width_coefficient = 1.4
# divisor = 8
# min_depth = none
# min_depth = divisor = 8

filters = filters * 1.4
new_filters = max(8, (int(filters + 4) // 8) * 8)   ## //===floor in this case
if new_filters < 0.9 * filters:  new_filters += 8

16 *1.4=22.4
new_filters = max(8, int(22.4+4)//8  * 8) = 24 (>16)

24 *1.4=33.6
new_filters = max(8, int(33.6+4)//8  * 8) = 32 (>24)

32 *1.4=44.8
new_filters = max(8, int(44.8+4)//8  * 8) = 48 (>32)

40 *1.4=56
new_filters = max(8, int(56+4)//8  * 8) = 56 (>40)

80 *1.4=112
new_filters = max(8, int(112+4)//8  * 8) = 112 (>80)

112 *1.4=156,8
new_filters = max(8, int(156,8+4)//8  * 8) = 160 (>112)

192 *1.4=268,8
new_filters = max(8, int(268,8+4)//8  * 8) = 272 (>192)

320 *1.4=448
new_filters = max(8, int(448+4)//8  * 8) = 448 (>320)



8 *1.4=11,2
new_filters = max(8, int(11,2+4)//8  * 8) = 8 (==8)

64 *1.4=89,6
new_filters = max(8, int(89,6+4)//8  * 8) = 88 (>64)

96 *1.4=134,4
new_filters = max(8, int(134,4+4)//8  * 8) = 136 (>96)


144 *1.4=201,6
new_filters = max(8, int(201,6+4)//8  * 8) = 200 (>144)


384 *1.4=537,6
new_filters = max(8, int(537,6+4)//8  * 8) = 536 (>384)

576 *1.4=806,4
new_filters = max(8, int(806,4+4)//8  * 8) = 808 (>576)

960 *1.4=1344
new_filters = max(8, int(1344+4)//8  * 8) = 1344 (>960)

1280 *1.4=1792
new_filters = max(8, int(1792+4)//8  * 8) = 1792 (>1280)




BLOCKS 1 - 7:
      'r2_k3_s11_e1_i32_o16_se0.25', 'r4_k3_s22_e6_i16_o24_se0.25',
      'r4_k5_s22_e6_i24_o40_se0.25', 'r6_k3_s22_e6_i40_o80_se0.25',
      'r6_k5_s11_e6_i80_o112_se0.25', 'r8_k5_s22_e6_i112_o192_se0.25',
      'r2_k3_s11_e6_i192_o320_se0.25',

In details: https://github.com/tensorflow/tpu/blob/05f7b15cdf0ae36bac84beb4aef0a09983ce8f66/models/official/efficientnet/efficientnet_builder.py#L61-L69

BLOCK-1

# r1_k3_s11_e1_i32_o16_se0.25
#    return efficientnet_model.BlockArgs(
#        kernel_size=int(options['k']),				kernel_size = 3
#        num_repeat=int(options['r']),				num_repeat = 2	//1
#        input_filters=int(options['i']),			input_filters = 48	//32
#        output_filters=int(options['o']),			output_filters = 24	//16
#        expand_ratio=int(options['e']),			expand_ratio = 1
#        id_skip=('noskip' not in block_string),
#        se_ratio=float(options['se']) if 'se' in options else None,	se_ratio = 0.25
#        strides=[int(options['s'][0]), int(options['s'][1])])			strides = 1,1


BLOCK-2

# r2_k3_s22_e6_i16_o24_se0.25
#    return efficientnet_model.BlockArgs(
#        kernel_size=int(options['k']),				kernel_size = 3
#        num_repeat=int(options['r']),				num_repeat = 4	//2
#        input_filters=int(options['i']),			input_filters = 24	//16
#        output_filters=int(options['o']),			output_filters = 32	//24
#        expand_ratio=int(options['e']),			expand_ratio = 6
#        id_skip=('noskip' not in block_string),
#        se_ratio=float(options['se']) if 'se' in options else None,	se_ratio = 0.25
#        strides=[int(options['s'][0]), int(options['s'][1])])			strides = 2,2



BLOCK-3

# r2_k5_s22_e6_i24_o40_se0.25
#    return efficientnet_model.BlockArgs(
#        kernel_size=int(options['k']),				kernel_size = 5
#        num_repeat=int(options['r']),				num_repeat = 4	//2
#        input_filters=int(options['i']),			input_filters = 32	//24
#        output_filters=int(options['o']),			output_filters = 56	//40
#        expand_ratio=int(options['e']),			expand_ratio = 6
#        id_skip=('noskip' not in block_string),
#        se_ratio=float(options['se']) if 'se' in options else None,	se_ratio = 0.25
#        strides=[int(options['s'][0]), int(options['s'][1])])			strides = 2,2



BLOCK-4

# r3_k3_s22_e6_i40_o80_se0.25
#    return efficientnet_model.BlockArgs(
#        kernel_size=int(options['k']),				kernel_size = 3
#        num_repeat=int(options['r']),				num_repeat = 6	//3
#        input_filters=int(options['i']),			input_filters = 56		//40
#        output_filters=int(options['o']),			output_filters = 112	//80
#        expand_ratio=int(options['e']),			expand_ratio = 6
#        id_skip=('noskip' not in block_string),
#        se_ratio=float(options['se']) if 'se' in options else None,	se_ratio = 0.25
#        strides=[int(options['s'][0]), int(options['s'][1])])			strides = 2,2



BLOCK-5

# r3_k5_s11_e6_i80_o112_se0.25
#    return efficientnet_model.BlockArgs(
#        kernel_size=int(options['k']),				kernel_size = 5
#        num_repeat=int(options['r']),				num_repeat = 6	//3
#        input_filters=int(options['i']),			input_filters = 112		//80
#        output_filters=int(options['o']),			output_filters = 160	//112
#        expand_ratio=int(options['e']),			expand_ratio = 6
#        id_skip=('noskip' not in block_string),
#        se_ratio=float(options['se']) if 'se' in options else None,	se_ratio = 0.25
#        strides=[int(options['s'][0]), int(options['s'][1])])			strides = 1,1



BLOCK-6

# r4_k5_s22_e6_i112_o192_se0.25
#    return efficientnet_model.BlockArgs(
#        kernel_size=int(options['k']),				kernel_size = 5
#        num_repeat=int(options['r']),				num_repeat = 8	//4
#        input_filters=int(options['i']),			input_filters = 160		//112
#        output_filters=int(options['o']),			output_filters = 272	//192
#        expand_ratio=int(options['e']),			expand_ratio = 6
#        id_skip=('noskip' not in block_string),
#        se_ratio=float(options['se']) if 'se' in options else None,	se_ratio = 0.25
#        strides=[int(options['s'][0]), int(options['s'][1])])			strides = 2,2



BLOCK-7

# r1_k3_s11_e6_i192_o320_se0.25
#    return efficientnet_model.BlockArgs(
#        kernel_size=int(options['k']),				kernel_size = 3
#        num_repeat=int(options['r']),				num_repeat = 2	//1
#        input_filters=int(options['i']),			input_filters = 272		//192
#        output_filters=int(options['o']),			output_filters = 448	//320
#        expand_ratio=int(options['e']),			expand_ratio = 6
#        id_skip=('noskip' not in block_string),
#        se_ratio=float(options['se']) if 'se' in options else None,	se_ratio = 0.25
#        strides=[int(options['s'][0]), int(options['s'][1])])			strides = 1,1

efficientnet_b4_ext


In other words, to scale up the CNN, the depth of layers should increase 20%, the width 10% and the image resolution 15% to keep things as efficient as possible while expanding the implementation and improving the CNN accuracy.

The MBConv block is nothing fancy but an Inverted Residual Block (used in MobileNetV2) with a Squeeze and Excite block injected sometimes.

image MBConv6

MobileNet_v2:

image


EfficientNet_b0:

1_OpvSpqMP61IO_9cp4mAXnA


image2


image


params flops

1_BvAqynrNCq5RjMesSPvPgg


image

@AlexeyAB AlexeyAB added ToDo RoadMap and removed want enhancement Want to improve accuracy, speed or functionality labels Jun 12, 2019
@AlexeyAB

This comment has been minimized.

@dexception
Copy link
Author

dexception commented Jun 15, 2019

Would like to share this link.

https://pypi.org/project/gluoncv2/

Interesting to see the imagenet-1k comparison chart.

Model | Top 1 Error | Top 5 Error | Params | Flops
DarkNet-53 | 21.41 | 5.56 | 41,609,928 | 7,133.86M
EfficientNet-B0b | 23.41 | 6.95 | 5,288,548 | 414.31M

With the difference of 2% in top 1 error with number of parameters are 1/8 and 1/17 less flops.
Would love to see the inference time and accuracy as object detection.

Also a tiny version wouldn't be bad after all.
This is like running yolov3-tiny with yolov3 accuracy.

@AlexeyAB
Copy link
Owner

@dexception
Have you ever seen a graphic representation of EfficientNet b1 - b7 models (other than b0), or their exact text description, like Caffe proto-files?

@AlexeyAB

This comment has been minimized.

@dexception
Copy link
Author

dexception commented Jun 18, 2019

@AlexeyAB

Keras, Pytorch and Mxnet implementation is definitely there:
https://github.com/qubvel/efficientnet
https://github.com/lukemelas/EfficientNet-PyTorch
https://github.com/titu1994/keras-efficientnets
https://github.com/zsef123/EfficientNets-PyTorch
https://github.com/DableUTeeF/keras-efficientnet
https://github.com/qubvel/efficientnet
https://github.com/mnikitin/EfficientNet/blob/master/efficientnet_model.py

The code and research paper is different. But the code is correct.
tensorflow/tpu#383

I don't think there is any caffe implementation as of yet.

@WongKinYiu
Copy link
Collaborator

WongKinYiu commented Jun 18, 2019

Hello, I draw the model from Keras implementation: https://github.com/qubvel/efficientnet .
Here are b0 and b1.

CLICK ME - EfficientNet B0 and B1 model diagrams

EfficientNetB0

EfficientNetB1

I use the code:
`from efficientnet import EfficientNetB1
from keras.utils import plot_model

model = EfficientNetB1()
plot_model(model, to_file='EfficientNetB1.png')`

@WongKinYiu
Copy link
Collaborator

EfficientNet_b0: efficientnet_b0.cfg.txt - Accuracy: Top1 = 19.3%, Top5 = 40.6% (something goes wrong)

efficientnet_b0_ext

Maybe squeeze and excitation blocks are missing?

@AlexeyAB
Copy link
Owner

AlexeyAB commented Jun 18, 2019

@WongKinYiu Thanks!

Can you also add model diagram for B4?

Maybe squeeze and excitation blocks are missing?

I think yes, there should be:

image

@AlexeyAB
Copy link
Owner

@dexception Thanks!

@WongKinYiu
Copy link
Collaborator

WongKinYiu commented Jun 18, 2019

Model diagram for EfficientNets.

CLICK ME - EfficientNet B0 model diagram

EfficientNetB0

CLICK ME - EfficientNet B1 model diagram

EfficientNetB1

CLICK ME - EfficientNet B2 model diagram

EfficientNetB2

CLICK ME - EfficientNet B3 model diagram

EfficientNetB3

CLICK ME - EfficientNet B4 model diagram

EfficientNetB4

CLICK ME - EfficientNet B5 model diagram

EfficientNetB5

CLICK ME - EfficientNet B6 model diagram

EfficientNetB6

CLICK ME - EfficientNet B7 model diagram

EfficientNetB7

@AlexeyAB
Copy link
Owner

AlexeyAB commented Jun 19, 2019

@WongKinYiu Thanks!

It seems now it looks like your diagram:
efficientnet_b0.cfg.txt

  • top1 = 69.49%
  • top5 = 89.44%

Should be used: should be trained at least 1.6 M iterations with learning_rate=0.256 policy=step scale=0.97 step=10000 (initial learning rate 0.256 that decays by 0.97 every 2.4 epochs) to achieve Top1 = 76.3%, Top5 = 93.2%

Trained weights-file, 500 000 iterations with batch=120: https://drive.google.com/open?id=1MvX0skcmg87T_jn8kDf2Oc6raIb56xq9

chart


Just

  • I use [dropout] instead of DropConnect

On your diagrams Lambda is a [avgpool].

MBConv blocks includes:

efficientnet_b0_ext

@WongKinYiu
Copy link
Collaborator

WongKinYiu commented Jun 19, 2019

@AlexeyAB Good job! And thank you for sharing the cfg file.

I will also implement SNet of ThunderNet as backbone to compare with EfficientNet.

@AlexeyAB
Copy link
Owner

AlexeyAB commented Jun 19, 2019

@WongKinYiu Yes, this is interesting that SNet+ThunderNet achieved the same accuracy 78.6% mAP@0.5 as Yolo v2, but by using 2-stage-detector with 24 FPS on ARM CPU: https://paperswithcode.com/sota/object-detection-on-pascal-voc-2007
image

@WongKinYiu
Copy link
Collaborator

WongKinYiu commented Jun 19, 2019

@AlexeyAB I also want to implement CEM (Context Enhancement Module) and SAM (Spatial Attention Module) of ThunderNet.

CEM + YOLOv3 got 41.2% mAP@0.5 with 2.85 BFLOPs.
CEM + SAM + YOLOv3 got 42.0% mAP@0.5 with 2.90 BFLOPs.

CEM:
image

SAM:
image

Results:
image

@LukeAI
Copy link

LukeAI commented Jun 20, 2019

I'd be interested in running a trial with efficientnet and sharing the results - do you have a B6 or B7 version of the model? Do I use it in the same way as I would with any of the other cfg files? No need to manually calculate anchors and enter classes in the cfg?

@LukeAI
Copy link

LukeAI commented Jun 20, 2019

Oh I see - efficientnet is a full Object Detector? But maybe the B7 model with a Yolo head... ?

@dexception
Copy link
Author

@LukeAI
This is imagenet classification.

@LukeAI
Copy link

LukeAI commented Jun 20, 2019

Ok so I realise that this is image classification - I have an image classification problem with 7 classes - if necessary I could resize all my images to 32x32 - how could I train/test on my dataset with the .cfg ?

@WongKinYiu
Copy link
Collaborator

@LukeAI https://pjreddie.com/darknet/train-cifar/

@mdv3101
Copy link

mdv3101 commented Jun 21, 2019

@AlexeyAB
Nice work on EfficientNet.
If implemented successfully this would give the fastest training and inference time among all implementations.

@dexception
Copy link
Author

@AlexeyAB
Since we are already discussing the newer models here
#3114

This issue should be merged with this.
Because eventually we will have yolo-head with EfficientNet once the niggles are sorted out.

@AlexeyAB
Copy link
Owner

AlexeyAB commented Nov 7, 2019

@dexception

  1. Original MixNet uses 4 depthwise conv-layers (3x3, 5x5, 7x7, 9x9) instead of 1 depthwise conv-layer, so this has aproximately the same speed. But we can try to use uses only 4 groups for MixNet conv-layers (3x3, 5x5, 7x7, 9x9), while EfficientNet and ShuffleNetv2 uses 16-512 groups, so this modified MixNet can be faster on GPU.

  2. Mix_net has higher Top1 accuracy and lower Flops - so Mix_net can be faster on CPU: MixNet (Mix_Conv) - 0.360 (0.5) BFlops - 77.0% (71.5%) Top1 #4203

  • ShuffleNetV2 - xxxx params - 0.600 BFlops - 75.4% Top1 - xxxx Top5
  • MixNet-L - 7.3M params - 0.565 BFlops - 78.9% Top1 - 94.2% Top5
  • EfficientNetB1 - 7.8M params - 0.700 BFlops - 78.8% Top1 - 94.4% Top5

mixnet-flops

@syjeon121
Copy link

syjeon121 commented Nov 21, 2019

this paper might be relevant with this issue in detection task
https://arxiv.org/pdf/1911.09070.pdf
EfficientDet: Scalable and Efficient Object Detection

@AlexeyAB
Copy link
Owner

@syjeon121 Thanks! I added a separate issue: #4346

@dexception
Copy link
Author

dexception commented Nov 29, 2019

@AlexeyAB
Q1. Can you please highlight which if these architectures use depthwise-convolution layers so that we know they are for offline usage ?

Q2. I would like to know which is the most efficient architecture that doesn't use depthwise-convoltion ?

Also,
According to this link:
https://pypi.org/project/gluoncv2/

If we set the criteria for Min Top 1 Accuracy to 75%
MobileNetV3 L/224/1.0 Seem to be better than Mixnet-S.

MobileNetV3 L/224/1.0 vs Mixnet-S
226 Flops vs 260 Flops
75.37 Top 1 Accuracy vs 75.68 Accuracy.

@AlexeyAB
Copy link
Owner

AlexeyAB commented Nov 29, 2019

@dexception

  1. All these models use grouped/depthwise-conv: ShuffleNet v1/v2, EfficientNet b0-b7, MixNet-S/M/L, MobileNet v2/v3, ...

What is the offline usage ?

  1. Do you have any paper with comparison MixNet vs MobileNetv3?

As I understand it, these are very close architectures. Yes, according to https://pypi.org/project/gluoncv2/ MobileNetV3 L/224/1.0 (14% lower BFLOPS, but 4% = 7.69 / 7.39 higher Top5-error) better than MixNet-S

Model Top1 Top5 Params FLOPs/2 Remarks
AlexNet 40.46 17.70 62,378,344 1,132.33M Training (log)
-- -- -- -- --
MixNet-S 24.32 7.39 4,134,606 260.26M
-- -- -- -- --
MixNet-M 23.31 6.78 5,014,382 366.05M
-- -- -- -- --
MobileNetV3 L/224/1.0 24.63 7.69 5,481,752 226.80M
-- -- -- -- --

68056773-08741580-fd05-11e9-8c69-c2d60749d031

@LukeAI
Copy link

LukeAI commented Nov 29, 2019

Is there a darknet implementation of MobileNetV3?

@dexception
Copy link
Author

dexception commented Nov 29, 2019

@dexception

  1. All these models use grouped/depthwise-conv: ShuffleNet v1/v2, EfficientNet b0-b7, MixNet-S/M/L, MobileNet v2/v3, ...

What is the offline usage ?

  1. Do you have any paper with comparison MixNet vs MobileNetv3?

As I understand it, these are very close architectures. Yes, according to https://pypi.org/project/gluoncv2/ MobileNetV3 L/224/1.0 (14% faster and 4% = 7.69 / 7.39 higher Top5-error) better than MixNet-S

Model Top1 Top5 Params FLOPs/2 Remarks
AlexNet 40.46 17.70 62,378,344 1,132.33M Training (log)


MixNet-S 24.32 7.39 4,134,606 260.26M


MixNet-M 23.31 6.78 5,014,382 366.05M


MobileNetV3 L/224/1.0 24.63 7.69 5,481,752 226.80M


68056773-08741580-fd05-11e9-8c69-c2d60749d031

By offline usage i meant user can wait for the output even if it is a little late and where accuracy is more important and we can use heavy models for detection.

Can you share which architecture is the most efficient if we don't include any that use depthwise-convolution with min top 1 accuracy 75% ?

@AlexeyAB
Copy link
Owner

AlexeyAB commented Nov 29, 2019

@LukeAI No.

@dexception

  1. At first, you should know that this accuracy MobileNetv3 Top1 75,37% and MixNet-S Top1 75,68% and EfficientNetB0 Top1 76.3% can be achieved only if you train with very large mini_batch size:

With small mini_batch size instead of Top1 76.3% we get: #3380 (comment)

  • Our EfficientNet B0 (224x224) 0.9 BFLOPS - 0.45 B_FMA (16ms / RTX 2070), 4.9M params - 71.3% Top1
  • Official EfficientNetB0 (224x224) 0.78 BFLOPS - 0.39 FMA, 5.3M params - 70.0% Top1
  1. Lower BFLOPS doesn't mean a faster model

So:

  1. to compare apples with apples we should compare results with the same mini_batch size
  2. if we are talking about speed rather than BFLOPS, then the best models:

More: #4406

@dexception
Copy link
Author

dexception commented Nov 30, 2019

@AlexeyAB
Thanks for sharing the stats.
Do you have any cfg files related to PeleeNet with swish activation ? I would to train it on Imagenet.

@WongKinYiu
Amazing work. Are you planning to share CSPPeleeNet - EFM (SAM) 512x512 this model in future ?

@WongKinYiu
Copy link
Collaborator

https://arxiv.org/abs/1911.11907
https://github.com/iamhankai/ghostnet

maybe better than mobilenetv3, efficientnet, ..., etc.
image

@dexception
Copy link
Author

@AlexeyAB @WongKinYiu
Its funny but i thought i should mention this. I have temporary access to 2xTesla T4 for next 10-20 days.
After that i have to return them. So i would love to train some of these new models your talking about. Can i have some cfg files for imagenet from you guys ?

@AlexeyAB
Copy link
Owner

@dexception

While CSPPeleeNet is private.

And since GhostNet there isn't implemented yet.

There are MixNet-M and MixNet-M-GPU which are approximately have the same architecture, accuracy ~76.5% Top1 and speed ~0.38 FMA BFlops as large-MobileNet v3

  • MixNet-M: mixnet_m.cfg.txt - 0.759 BFlops (0.379 FMA) - 4.6 sec per iteration training - 45ms inference

  • MixNet-M-GPU (minor modification for GPU): mixnet_m_gpu.cfg.txt - 1.0 BFlops (0.500 FMA) - 2.7 sec per iteration training - 45 ms inference

Now @WongKinYiu is training MixNet-M-GPU: #4203 (comment)

Or you can try to train with CutMix and Large mini-batch size https://github.com/WongKinYiu/CrossStagePartialNetworks#big-models


For example, you can try to train https://github.com/AlexeyAB/darknet/files/3838329/mixnet_m.cfg.txt

[net]
batch=120
subdivisions=2
height=224
width=224
cutmix=1
[net]
batch=256 # or 512
subdivisions=1
height=224
width=224
cutmix=1
optimized_memory=3
workspace_size_limit_MB=2000 # or 4000

@dexception
Copy link
Author

@AlexeyAB
What about PeleeNet ? Do you have any cfg files for this ?

@hfassold
Copy link

hfassold commented Dec 1, 2019

This paper might also Help a Bit: https://arxiv.org/pdf/1904.11486
Modification can be applied to any backbone

@WongKinYiu
Copy link
Collaborator

@hfassold

done in #3672

@AlexeyAB AlexeyAB removed the ToDo RoadMap label Dec 2, 2019
@litingsjj
Copy link

@AlexeyAB Hi, about enet-coco.cfg
139 upsample 2x 13 x 13 x 128 -> 26 x 26 x 128 140 Shortcut Layer: 90
I got this tip:
139 upsample 2x 17 x 17 x 128 -> 34 x 34 x 128 140 Shortcut Layer: 90 w = 34, w2 = 34, h = 34, h2 = 34, c = 128, c2 = 576
Is there something wrong in cfg file?

@kinglintianxia
Copy link

@AlexeyAB Hi, in enet-coco.cfg the last yolo layer:
mask = 1,2,3 should be mask = 0,1,2?

@Fetulhak
Copy link

@AlexeyAB @WongKinYiu enet-coco.cfg is very slow during training than yolov4.cfg. I am running both models in google colab. why is that? I have seen the enet-coco model have very small BFLoPs than yolov4.

@AlexeyAB
Copy link
Owner

@Fetulhak It is because FLOPs usually doesn't mean anything for the most of devices mobile (GPU, NPU, APU, DSP), Embedded (Jetson), mid/high-end GPUs (RTX 3070, A100....)

FLOPs says about computations cost, but it doesn't say anything about the possibility of parallelization and the need for memory bandwidth, while weak parallelization and high memory bandwidth requirements can slow down the speed many times.

For example, EfficientDetD3 has 2.4x times less Flops but 2x times slower:

  • YOLOv4-CSP (640x640) — 47.5% AP — 70 FPS — 120 BFlops (60 FMA)
  • EfficientDetD3 (896x896) — 47.5% AP — 36 FPS — 50 BFlops (25 FMA)

@Fetulhak
Copy link

Fetulhak commented Nov 10, 2021

@AlexeyAB is it because of their network architecture difference (enet-coco and yolov4), the possibility of parallelization and the need for memory bandwidth is going to be varied? Because I am using the same GPU to train both models.

@AlexeyAB
Copy link
Owner

AlexeyAB commented Nov 10, 2021

@Fetulhak Yes, it is because of their network architecture difference. EfficientNet uses:

  • Depth-wise convolutional layers - they require a lot of memory bandwidth (as regular conv), but low number of FLOPs (much less than regular conv)
  • Squeeze-and-excitation blocks - they require low number of FLOPs, but can't be well parallelized

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests