This is the keras implementation of deepinsight/insightface, and is released under the MIT License. There is no limitation for both academic and commercial usage.
The training data containing the annotation (and the models trained with these data) are available for non-commercial research purposes only.
- Current accuracy
- Usage
- Sub Center ArcFace
- Knowledge distillation
- Evaluating on IJB datasets
- TFLite model inference time test on ARM64
- Related Projects
- Citing
- Some comparing on EfficientNetV2_b0 with activation / data augmentation / loss function / others
- Model structures may change due to changing default behavior of building models.
IJBB
andIJBC
are scored atTAR@FAR=1e-4
- Links in
Model backbone
areh5
models in Google drive. Links inTraining
are training details. - The
r18
/r34
/r50
/r100
onglint360k
are models loaded weights from official publication. r50 magface
andr100 magface
are ported from Github IrvingMeng/MagFace.r100 4m adaface
andr100 12m adaface
are ported from Github mk-minchul/AdaFace.- Please note
WebFace4M
/WebFace12M
pretrained models cannot be used for any commercial purposes: WebFace.
Model backbone | Training | lfw | cfp_fp | agedb_30 | IJBB | IJBC |
---|---|---|---|---|---|---|
Resnet34 | CASIA, E40 | 0.994667 | 0.949143 | 0.9495 | ||
Mobilenet emb256 | Emore,E110 | 0.996000 | 0.951714 | 0.959333 | 0.887147 | 0.911745 |
Mobilenet distill | MS1MV3,E50 | 0.997333 | 0.969 | 0.975333 | 0.91889 | 0.940328 |
se_mobile_facenet | MS1MV3,E50 | 0.997333 | 0.969286 | 0.973000 | 0.922103 | 0.941913 |
Ghostnet,S2,swish | MS1MV3,E50 | 0.997333 | 0.966143 | 0.973667 | 0.923661 | 0.941402 |
Ghostnet,S1,swish | MS1MV3,E67 | 0.997500 | 0.981429 | 0.978167 | 0.93739 | 0.953163 |
EfficientNetV2B0 | MS1MV3,E67 | 0.997833 | 0.976571 | 0.977333 | 0.940701 | 0.955259 |
Botnet50 relu GDC | MS1MV3,E52 | 0.9985 | 0.980286 | 0.979667 | 0.940019 | 0.95577 |
r50 swish | MS1MV3,E50 | 0.998333 | 0.989571 | 0.984333 | 0.950828 | 0.964463 |
se_r50 swish SD | MS1MV3,E67 | 0.9985 | 0.989429 | 0.9840 | 0.956378 | 0.968144 |
Resnet101V2 swish | MS1MV3,E50 | 0.9985 | 0.989143 | 0.9845 | 0.952483 | 0.966406 |
EfficientNetV2S | MS1MV3,E67 | 0.9985 | 0.991143 | 0.986167 | 0.956475 | 0.968605 |
EffV2S,AdamW | MS1MV3,E53 | 0.998500 | 0.991429 | 0.985833 | 0.957449 | 0.97065 |
EffV2S,MagFace | MS1MV3,E53 | 0.998500 | 0.991571 | 0.984667 | 0.958325 | 0.971212 |
r100,AdaFace | MS1MV3,E53 | 0.998667 | 0.992286 | 0.984333 | 0.961636 | 0.972849 |
r100,AdaFace | Glint360k,E53 | 0.998500 | 0.993000 | 0.986000 | 0.962415 | 0.974843 |
Ported Models | ||||||
r18 converted | Glint360k | 0.997500 | 0.977143 | 0.976500 | 0.936806 | 0.9533 |
r34 converted | Glint360k | 0.998167 | 0.987000 | 0.982833 | 0.951801 | 0.9656 |
r50 converted | Glint360k | 0.998333 | 0.991 | 0.9835 | 0.957157 | 0.970292 |
r100 converted | Glint360k | 0.9985 | 0.992286 | 0.985167 | 0.962512 | 0.974689 |
r50 magface | MS1MV2,E25 | 0.998167 | 0.981143 | 0.980500 | 0.943622 | |
r100 magface | MS1MV2,E25 | 0.998333 | 0.987429 | 0.983333 | 0.949562 | |
r100 4m AdaFace | WebFace4M,E26 | 0.998333 | 0.992857 | 0.978833 | 0.960954 | 0.974485 |
r100 12m AdaFace | WebFace12M,E26 | 0.998500 | 0.993286 | 0.981667 | 0.964752 | 0.977451 |
- Currently using
Tensorflow 2.9.1
withcuda==11.2
cudnn==8.1
- python and tensorflow version
Or
# $ ipython # Python 3.8.5 (default, Sep 4 2020, 07:30:14) >>> tf.__version__ # '2.9.1' >>> import tensorflow_addons as tfa >>> tfa.__version__ Out[3]: '0.17.0'
tf-nightly
conda create -n tf-nightly python==3.8.5 conda activate tf-nightly pip install tf-nightly tfa-nightly glob2 pandas tqdm scikit-image scikit-learn ipython # Not required pip install pip-search icecream opencv-python cupy-cuda112 tensorflow-datasets tabulate mxnet-cu112 torch
- Default import for ipython
import os import sys import pandas as pd import numpy as np import tensorflow as tf from tensorflow import keras gpus = tf.config.experimental.list_physical_devices("GPU") for gpu in gpus: tf.config.experimental.set_memory_growth(gpu, True)
- All from scratch #71 is an explanation of the basic implementation line by line from scratch, depending only on basic packages like
tensorflow
/numpy
.
- Training Data in this project is downloaded from Insightface Dataset Zoo
- Evaluating data is
LFW
CFP-FP
AgeDB-30
bin files included inMS1M-ArcFace
dataset - Any other data is also available just in the right format
- prepare_data.py script, Extract data from mxnet record format to
folders
.Executing again will skip# Convert `/datasets/faces_emore` to `/datasets/faces_emore_112x112_folders` CUDA_VISIBLE_DEVICES='-1' ./prepare_data.py -D /datasets/faces_emore # Convert evaluating bin files CUDA_VISIBLE_DEVICES='-1' ./prepare_data.py -D /datasets/faces_emore -T lfw.bin cfp_fp.bin agedb_30.bin
dataset
conversion. - Training dataset Required is a
folder
includingperson folders
, eachperson folder
including multiface images
. Format like. # dataset folder ├── 0 # person folder │  ├── 100.jpg # face image │  ├── 101.jpg # face image │  └── 102.jpg # face image ├── 1 # person folder │  ├── 111.jpg │  ├── 112.jpg │  └── 113.jpg ├── 10 │  ├── 707.jpg │  ├── 708.jpg │  └── 709.jpg
- Evaluating bin files include jpeg image data pairs, and a label indicating if it's a same person, so there are double images than labels
Image data in bin files like
# bins | issame_list img_1 img_2 | True img_3 img_4 | True img_5 img_6 | False img_7 img_8 | False
CFP-FP
AgeDB-30
is not compatible withtf.image.decode_jpeg
, we need to reformat it, which is done by-T
parameter.''' Throw error if not reformated yet ''' ValueError: Can't convert non-rectangular Python sequence to Tensor.
- Custom dataset if in format like the required training dataset, means a dataset folder containing
person folders
, andperson folder
containingface images
. May runto detect and align face images. Target saving directory will be# For dataset folder name `/dataset/Foo` CUDA_VISIBLE_DEVICES='0' ./face_detector.py /dataset/Foo
/dataset/Foo_aligned_112_112
. Then this one can be used asdata_path
fortrain.Train
. - Cache file
{dataset_name}_shuffle.npz
is saved in first time training. Remove it if dataset content changed.
- Basic Modules
- backbones basic model implementation of
mobilefacenet
/mobilenetv3
/efficientnet
/botnet
/ghostnet
. Most of them are copied fromkeras.applications
source code and modified. Other backbones likeResNet101V2
is loaded fromkeras.applications
intrain.buildin_models
. - data.py loads image data as
tf.dataset
for training.Triplet
dataset is different from others. - evals.py contains evaluating callback using
bin
files. - losses.py contains
softmax
/arcface
/centerloss
/triplet
loss functions. - myCallbacks.py contains my other callbacks, like saving model / learning rate adjusting / save history.
- models.py contains model build related functions, like
buildin_models
/add_l2_regularizer_2_model
/replace_ReLU_with_PReLU
. - train.py contains a
Train
class. It uses ascheduler
to connect differentloss
/optimizer
/epochs
. The basic function is simplybasic_model
-->build dataset
-->add output layer
-->add callbacks
-->compile
-->fit
.
- backbones basic model implementation of
- Other Modules
- augment.py including implementation of
RandAug
andAutoAug
. - IJB_evals.py evaluates model accuracy using insightface/evaluation/IJB/ datasets.
- data_distiller.py create dataset for Knowledge distillation.
- data_drop_top_k.py create dataset after trained with Sub Center ArcFace method.
- eval_folder.py Run model evaluation on any custom dataset folder, which is in the same format with Training dataset.
- face_detector.py contains face detectors. Currently 2 added, pure Keras one
YoloV5FaceDetector
, and ONNX oneSCRFD
. - plot.py contains a history plot function.
- image_video_test.py can be used to test model using images or video camera inputs.
- augment.py including implementation of
-
Training example
train.Train
is mostly functioned as a scheduler.from tensorflow import keras import losses, train, models import tensorflow_addons as tfa # basic_model = models.buildin_models("ResNet101V2", dropout=0.4, emb_shape=512, output_layer="E") basic_model = models.buildin_models("MobileNet", dropout=0, emb_shape=256, output_layer="GDC") data_path = '/datasets/faces_emore_112x112_folders' eval_paths = ['/datasets/faces_emore/lfw.bin', '/datasets/faces_emore/cfp_fp.bin', '/datasets/faces_emore/agedb_30.bin'] tt = train.Train(data_path, save_path='keras_mobilenet_emore.h5', eval_paths=eval_paths, basic_model=basic_model, batch_size=512, random_status=0, lr_base=0.1, lr_decay=0.5, lr_decay_steps=16, lr_min=1e-5) optimizer = tfa.optimizers.SGDW(learning_rate=0.1, momentum=0.9, weight_decay=5e-5) sch = [ {"loss": losses.ArcfaceLoss(scale=16), "epoch": 5, "optimizer": optimizer}, {"loss": losses.ArcfaceLoss(scale=32), "epoch": 5}, {"loss": losses.ArcfaceLoss(scale=64), "epoch": 40}, # {"loss": losses.ArcfaceLoss(), "epoch": 20, "triplet": 64, "alpha": 0.35}, ] tt.train(sch, 0)
May use
tt.train_single_scheduler
controlling the behavior more detail. -
Model basically containing two parts:
- Basic model is layers from
input
toembedding
. - Model is
Basic model
+bottleneck
layer, likesoftmax
/arcface
layer. For triplet training,Model
==Basic model
. For combinedloss
training, it may have multiple outputs.
- Basic model is layers from
-
Saving strategy
- Model will save the latest one on every epoch end to local path
./checkpoints
, name is specified bytrain.Train
save_path
. - basic_model will be saved monitoring on the last
eval_paths
evaluatingbin
item, and save the best only.
- Model will save the latest one on every epoch end to local path
-
train.Train model parameters including
basic_model
/model
. Combine them to initialize model from different sources. Sometimes may needcustom_objects
to load model.basic_model model Used for model structure None Scratch train basic model .h5 file None Continue training from a saved basic model None for 'embedding' layer or layer index of basic model output model .h5 file Continue training from last saved model None for 'embedding' layer or layer index of basic model output model structure Continue training from a modified model None None Reload model from "checkpoints/{save_path}" -
Scheduler is a list of dicts, each containing a training plan
- epoch indicates how many epochs will be trained. Required.
- loss indicates the loss function. If not provided, will try to use the previous one if
model.built
isTrue
. - optimizer is the optimizer used in this plan,
None
indicates using the last one. - bottleneckOnly True / False,
True
will setbasic_model.trainable = False
, train the output layer only. - centerloss float value, if set a non zero value, attach a
CenterLoss
tologits_loss
, and the value meansloss_weight
. - triplet float value, if set a non zero value, attach a
BatchHardTripletLoss
tologits_loss
, and the value meansloss_weight
. - alpha float value, default to
0.35
. Alpha value forBatchHardTripletLoss
if attached. - lossTopK indicates the
top K
value for Sub Center ArcFace method. - distill indicates the
loss_weight
fordistiller_loss
using Knowledge distillation, default7
. - type
softmax
/arcface
/triplet
/center
, but mostly this could be guessed fromloss
.
# Scheduler examples sch = [ {"loss": losses.scale_softmax, "optimizer": "adam", "epoch": 2}, {"loss": keras.losses.CategoricalCrossentropy(label_smoothing=0.1), "centerloss": 0.01, "epoch": 2}, {"loss": losses.ArcfaceLoss(scale=32.0, label_smoothing=0.1), "optimizer": keras.optimizers.SGD(0.1, momentum=0.9), "epoch": 2}, {"loss": losses.BatchAllTripletLoss(0.3), "epoch": 2}, {"loss": losses.BatchHardTripletLoss(0.25), "epoch": 2}, {"loss": losses.CenterLoss(num_classes=85742, emb_shape=256), "epoch": 2}, {"loss": losses.CurricularFaceLoss(), "epoch": 2}, ]
Some more complicated combinations are also supported.
# `softmax` + `centerloss`, `"centerloss": 0.1` means loss_weight sch = [{"loss": keras.losses.CategoricalCrossentropy(label_smoothing=0.1), "centerloss": 0.1, "epoch": 2}] # `softmax` / `arcface` + `triplet`, `"triplet": 64` means loss_weight sch = [{"loss": keras.losses.ArcfaceLoss(scale=64), "triplet": 64, "alpha": 0.3, "epoch": 2}] # `triplet` + `centerloss` sch = [{"loss": losses.BatchHardTripletLoss(0.25), "centerloss": 0.01, "epoch": 2}] sch = [{"loss": losses.CenterLoss(num_classes=85742, emb_shape=256), "triplet": 10, "alpha": 0.25, "epoch": 2}] # `softmax` / `arcface` + `triplet` + `centerloss` sch = [{"loss": losses.ArcfaceLoss(), "centerloss": 1, "triplet": 32, "alpha": 0.2, "epoch": 2}]
-
Restore training from break point
from tensorflow import keras import losses, train data_path = '/datasets/faces_emore_112x112_folders' eval_paths = ['/datasets/faces_emore/lfw.bin', '/datasets/faces_emore/cfp_fp.bin', '/datasets/faces_emore/agedb_30.bin'] tt = train.Train(data_path, 'keras_mobilenet_emore.h5', eval_paths, model='./checkpoints/keras_mobilenet_emore.h5', batch_size=512, random_status=0, lr_base=0.1, lr_decay=0.5, lr_decay_steps=16, lr_min=1e-5) sch = [ # {"loss": losses.ArcfaceLoss(scale=16), "epoch": 5, "optimizer": optimizer}, # {"loss": losses.ArcfaceLoss(scale=32), "epoch": 5}, {"loss": losses.ArcfaceLoss(scale=64), "epoch": 35}, # {"loss": losses.ArcfaceLoss(), "epoch": 20, "triplet": 64, "alpha": 0.35}, ] tt.train(sch, initial_epoch=15)
-
Evaluation
import evals basic_model = keras.models.load_model('checkpoints/keras_mobilefacenet_256_basic_agedb_30_epoch_39_0.942500.h5', compile=False) ee = evals.eval_callback(basic_model, '/datasets/faces_emore/lfw.bin') ee.on_epoch_end(0) # >>>> lfw evaluation max accuracy: 0.993167, thresh: 0.316535, previous max accuracy: 0.000000, PCA accuray = 0.993167 ± 0.003905 # >>>> Improved = 0.993167
For training process, default evaluating strategy is
on_epoch_end
. Setting aneval_freq
greater than1
intrain.Train
will also add anon_batch_end
evaluation.# Change evaluating strategy to `on_epoch_end`, as long as `on_batch_end` for every `1000` batch. tt = train.Train(data_path, 'keras_mobilefacenet_256.h5', eval_paths, basic_model=basic_model, eval_freq=1000)
- train.Train output_weight_decay controls
L2 regularizer
value added tooutput_layer
.0
for None.(0, 1)
for specific value, actual added value will also divided by2
.>= 1
will be value multiplied byL2 regularizer
value inbasic_model
if added.
- train.Train random_status controls data augmentation weights.
-1
will disable all augmentation.0
will applyrandom_flip_left_right
only.1
will also applyrandom_brightness
.2
will also applyrandom_contrast
andrandom_saturation
.3
will also applyrandom_crop
.>= 100
will applyRandAugment
withmagnitude = 5 * random_status / 100
, sorandom_status=100
means usingRandAugment
withmagnitude=5
.
- train.Train random_cutout_mask_area set ratio of randomly cutout image bottom
2/5
area, regarding as ignoring mask area. - train.Train partial_fc_split set a int number like
2
/4
, will build model and dataset with total classes split inpartial_fc_split
parts. Works also on a single GPU. Currently onlyArcFace
loss family likeArcFace
/AirFaceLoss
/CosFaceLoss
/MagFaceLoss
supports. Still under testing. - models.buildin_models is mainly for adding output feature layer
GDC
/E
or others to a backbone model. The first parameterstem_model
can be:- String like
MobileNet
/r50
/ResNet50
or other names printed bymodels.print_buildin_models()
. - Self built
keras.models.Model
instance. Likekeras.applications.MobileNet(input_shape=(112, 112, 3), include_top=False)
.
- String like
- models.add_l2_regularizer_2_model will add
l2_regularizer
todense
/convolution
layers, or setapply_to_batch_normal=True
also toPReLU
/BatchNormalization
layers. The actual addedl2
value is divided by2
.# Will add keras.regularizers.L2(5e-4) to `dense` / `convolution` layers. basic_model = models.add_l2_regularizer_2_model(basic_model, 1e-3, apply_to_batch_normal=False)
- Gently stop is a callback to stop training gently. Input an
n
and<Enter>
anytime during training, will set training stop on that epoch ends. - My history
- This is a callback collecting training
loss
,accuracy
andevaluating accuracy
. - On every epoch end, backup to the path
save_path
defined intrain.Train
with suffix_hist.json
. - Reload when initializing, if the backup
<save_path>_hist.json
file exists. - The saved
_hist.json
can be used for plotting usingplot.py
.
- This is a callback collecting training
- eval_folder.py is used for test evaluating accuracy on custom test dataset:
Or create own test bin file which can be used in
CUDA_VISIBLE_DEVICES='0' ./eval_folder.py -d {DATA_PATH} -m {BASIC_MODEL.h5}
train.Train
eval_paths
:CUDA_VISIBLE_DEVICES='0' ./eval_folder.py -d {DATA_PATH} -m {BASIC_MODEL.h5} -B {BIN_FILE.bin}
- image_video_test.py is used for testing model with either images or video inputs. May import or modify it for own usage.
""" Comparing images """ python image_video_test.py --images test1.jpg test2.jpg test3.jpg # >>>> image_path: test1.jpg, faces count: 1 # >>>> image_path: test2.jpg, faces count: 1 # >>>> image_path: test3.jpg, faces count: 1 # cosine_similarities: # [[1.0000001 1.0000001 1.0000001] # [1.0000001 1.0000001 1.0000001] # [1.0000001 1.0000001 1.0000001]] """ Search in known users """ python image_video_test.py --images test.jpg --known_users test # >>>> image_classes info: # 0 10 # 1 10 # ... # recognition_similarities: [0.47837412] # recognition_classes: ['9'] # bbs: [[176.56265 54.588932 272.8746 181.40137 ]] # ccs: [0.8820559] # >>>> Saving result to: test_recognition_result.jpg """ Video test """ python image_video_test.py --known_users test --video_source 0
-
train.Train
parameterslr_base
/lr_decay
/lr_decay_steps
/lr_warmup_steps
set different decay strategies and their parameters. -
tt.lr_scheduler
can also be used to set learning rate scheduler directly.tt = train.Train(...) import myCallbacks tt.lr_scheduler = myCallbacks.CosineLrSchedulerEpoch(lr_base=1e-3, first_restart_step=16, warmup_steps=3)
-
lr_decay_steps controls different decay types.
- Default is
Exponential decay
withlr_base=0.001, lr_decay=0.05
. - For
CosineLrScheduler
,steps_per_epoch
is set after dataset been inited. - For
CosineLrScheduler
, default value ofcooldown_steps=1
, means will train1 epoch
usinglr_min
before each restart.
lr_decay_steps decay type mean of lr_decay_steps mean of lr_decay <= 1 Exponential decay decay_rate > 1 Cosine decay, will multiply with steps_per_epoch first_restart_step, epoch m_mul list Constant decay lr_decay_steps decay_rate # lr_decay_steps == 0, Exponential tt = train.Train(..., lr_base=0.001, lr_decay=0.05, ...) # 1 < lr_decay_steps, Cosine decay, first_restart_step = lr_decay_steps * steps_per_epoch # restart on epoch [16 * 1 + 1, 16 * 3 + 2, 16 * 7 + 3] == [17, 50, 115] tt = train.Train(..., lr_base=0.001, lr_decay=0.5, lr_decay_steps=16, lr_min=1e-7, ...) # 1 < lr_decay_steps, lr_min == lr_base * lr_decay, Cosine decay, no restart tt = train.Train(..., lr_base=0.001, lr_decay=1e-4, lr_decay_steps=24, lr_min=1e-7, ...) # lr_decay_steps is a list, Constant tt = train.Train(..., lr_base=0.1, lr_decay=0.1, lr_decay_steps=[3, 5, 7, 16, 20, 24], ...)
- Default is
-
Example learning rates
from myCallbacks import exp_scheduler, CosineLrScheduler, constant_scheduler epochs = np.arange(60) plt.figure(figsize=(14, 6)) plt.plot(epochs, [exp_scheduler(ii, 0.001, 0.1, warmup_steps=10) for ii in epochs], label="lr=0.001, decay=0.1") plt.plot(epochs, [exp_scheduler(ii, 0.001, 0.05, warmup_steps=10) for ii in epochs], label="lr=0.001, decay=0.05") plt.plot(epochs, [constant_scheduler(ii, 0.001, [10, 20, 30, 40], 0.1) for ii in epochs], label="Constant, lr=0.001, decay_steps=[10, 20, 30, 40], decay_rate=0.1") steps_per_epoch = 100 batchs = np.arange(60 * steps_per_epoch) aa = CosineLrScheduler(0.001, first_restart_step=50, lr_min=1e-6, warmup_steps=0, m_mul=1e-3, steps_per_epoch=steps_per_epoch) lrs = [] for ii in epochs: aa.on_epoch_begin(ii) lrs.extend([aa.on_train_batch_begin(jj) for jj in range(steps_per_epoch)]) plt.plot(batchs / steps_per_epoch, lrs, label="Cosine, first_restart_step=50, min=1e-6, m_mul=1e-3") bb = CosineLrScheduler(0.001, first_restart_step=16, lr_min=1e-7, warmup_steps=1, m_mul=0.4, steps_per_epoch=steps_per_epoch) lrs = [] for ii in epochs: bb.on_epoch_begin(ii) lrs.extend([bb.on_train_batch_begin(jj) for jj in range(steps_per_epoch)]) plt.plot(batchs / steps_per_epoch, lrs, label="Cosine restart, first_restart_step=16, min=1e-7, warmup=1, m_mul=0.4") plt.xlim(0, 60) plt.legend() plt.grid(True) plt.tight_layout()
- Tensorflow Guide - Mixed precision
- Enable
Mixed precision
at the beginning of all functional code bykeras.mixed_precision.set_global_policy("mixed_float16")
- In most training case, it will have a
~2x
speedup and less GPU memory consumption.
- SGDW / AdamW tensorflow_addons AdamW.
# !pip install tensorflow-addons !pip install tfa-nightly import tensorflow_addons as tfa optimizer = tfa.optimizers.SGDW(learning_rate=0.1, weight_decay=5e-4, momentum=0.9) optimizer = tfa.optimizers.AdamW(learning_rate=0.001, weight_decay=5e-5)
weight_decay
andlearning_rate
should share the same decay strategy. A callbackOptimizerWeightDecay
will setweight_decay
according tolearning_rate
.opt = tfa.optimizers.AdamW(weight_decay=5e-5) sch = [{"loss": keras.losses.CategoricalCrossentropy(label_smoothing=0.1), "centerloss": True, "epoch": 60, "optimizer": opt}]
- The different behavior of
mx.optimizer.SGD weight_decay
/tfa.optimizers.SGDW weight_decay
/L2_regulalizer
is explained here the discussion. - PDF DECOUPLED WEIGHT DECAY REGULARIZATION
- Train test of SGDW on cifar10
- The different behavior of
- RAdam / Lookahead / Ranger optimizer tensorflow_addons RectifiedAdam.
# Rectified Adam,a.k.a. RAdam, [ON THE VARIANCE OF THE ADAPTIVE LEARNING RATE AND BEYOND](https://arxiv.org/pdf/1908.03265.pdf) optimizer = tfa.optimizers.RectifiedAdam() # SGD with Lookahead [Lookahead Optimizer: k steps forward, 1 step back](https://arxiv.org/pdf/1907.08610.pdf) optmizer = tfa.optimizers.Lookahead(keras.optimizers.SGD(0.1)) # Ranger [Gradient Centralization: A New Optimization Technique for Deep Neural Networks](https://arxiv.org/pdf/2004.01461.pdf) optmizer = tfa.optimizers.Lookahead(tfa.optimizers.RectifiedAdam())
- Horovod usage is still under test. Tensorflow multi GPU training using distribute strategies vs Horovod
- Add an overall
tf.distribute.MirroredStrategy().scope()
with
block. This is just working in my case... Thebatch_size
will be multiplied bycount of GPUs
.with tf.distribute.MirroredStrategy().scope(): basic_model = ... tt = train.Train(..., batch_size=1024, ...) # With 2 GPUs, `batch_size` will be 2048 sch = [...] tt.train(sch, 0)
- Using build-in loss functions like
keras.losses.CategoricalCrossentropy
should specify thereduction
parameter.sch = [{"loss": keras.losses.CategoricalCrossentropy(label_smoothing=0.1, reduction=tf.keras.losses.Reduction.NONE), "epoch": 25}]
-
PDF Sub-center ArcFace: Boosting Face Recognition by Large-scale Noisy Web Faces
-
This is still under test, Multi GPU is NOT tested
-
As far as I can see
Sub Center ArcFace
works like cleaning the dataset.- In
lossTopK=3
case, it will train3 sub classes
in each label, and eachsub class
is acenter
. - Then choose a
domain center
, and remove those are too far away from thiscenter
. - So it's better train a
large model
to clean thedataset
, and then train other models on thecleaned dataset
.
-
Train Original MXNet version
cd ~/workspace/insightface/recognition/SubCenter-ArcFace cp sample_config.py config.py sed -i 's/config.ckpt_embedding = True/config.ckpt_embedding = False/' config.py CUDA_VISIBLE_DEVICES='1' python train_parall.py --network r50 --per-batch-size 512 # Iter[20] Batch [8540], accuracy 0.80078125, loss 1.311261, lfw 0.99817, cfp_fp 0.97557, agedb_30 0.98167 CUDA_VISIBLE_DEVICES='1' python drop.py --data /datasets/faces_emore --model models/r50-arcface-emore/model,1 --threshold 75 --k 3 --output /datasets/faces_emore_topk3_1 # header0 label [5822654. 5908396.] (5822653, 4) # total: 5800493 sed -i 's/config.ckpt_embedding = False/config.ckpt_embedding = True/' config.py sed -i 's/config.loss_K = 3/config.loss_K = 1/' config.py sed -i 's#/datasets/faces_emore#/datasets/faces_emore_topk3_1#' config.py ls -1 /datasets/faces_emore/*.bin | xargs -I '{}' ln -s {} /datasets/faces_emore_topk3_1/ CUDA_VISIBLE_DEVICES='1' python train_parall.py --network r50 --per-batch-size 512 # 5800493 # Iter[20] Batch [5400], accuracy 0.8222656, loss 1.469272, lfw 0.99833, cfp_fp 0.97986, agedb_30 0.98050
-
Keras version train mobilenet on CASIA test
import tensorflow_addons as tfa import train, losses, models data_basic_path = '/datasets/faces_casia' data_path = data_basic_path + '_112x112_folders' eval_paths = [os.path.join(data_basic_path, ii) for ii in ['lfw.bin', 'cfp_fp.bin', 'agedb_30.bin']] """ First, Train with `lossTopK = 3` """ basic_model = models.buildin_models("mobilenet", dropout=0, emb_shape=256, output_layer='E') tt = train.Train(data_path, save_path='TT_mobilenet_topk_bs256.h5', eval_paths=eval_paths, basic_model=basic_model, model=None, lr_base=0.1, lr_decay=0.1, lr_decay_steps=[20, 30], batch_size=256, random_status=0, output_wd_multiply=1) optimizer = tfa.optimizers.SGDW(learning_rate=0.1, weight_decay=5e-4, momentum=0.9) sch = [ {"loss": losses.ArcfaceLoss(scale=16), "epoch": 5, "optimizer": optimizer, "lossTopK": 3}, {"loss": losses.ArcfaceLoss(scale=32), "epoch": 5, "lossTopK": 3}, {"loss": losses.ArcfaceLoss(scale=64), "epoch": 40, "lossTopK": 3}, ] tt.train(sch, 0) """ Then drop non-dominant subcenters and high-confident noisy data, which is `>75 degrees` """ import data_drop_top_k # data_drop_top_k.data_drop_top_k('./checkpoints/TT_mobilenet_topk_bs256.h5', '/datasets/faces_casia_112x112_folders/', limit=20) new_data_path = data_drop_top_k.data_drop_top_k(tt.model, tt.data_path) """ Train with the new dataset again, this time `lossTopK = 1` """ tt.reset_dataset(new_data_path) optimizer = tfa.optimizers.SGDW(learning_rate=0.1, weight_decay=5e-4, momentum=0.9) sch = [ {"loss": losses.ArcfaceLoss(scale=16), "epoch": 5, "optimizer": optimizer}, {"loss": losses.ArcfaceLoss(scale=32), "epoch": 5}, {"loss": losses.ArcfaceLoss(scale=64), "epoch": 40}, ] tt.train(sch, 0)
-
data_drop_top_k.py
can also be used as a script.-M
and-D
are required.$ CUDA_VISIBLE_DEVICES='-1' ./data_drop_top_k.py -h # usage: data_drop_top_k.py [-h] -M MODEL_FILE -D DATA_PATH [-d DEST_FILE] # [-t DEG_THRESH] [-L LIMIT] # # optional arguments: # -h, --help show this help message and exit # -M MODEL_FILE, --model_file MODEL_FILE # Saved model file path, NOT basic_model (default: None) # -D DATA_PATH, --data_path DATA_PATH # Original dataset path (default: None) # -d DEST_FILE, --dest_file DEST_FILE # Dest file path to save the processed dataset npz # (default: None) # -t DEG_THRESH, --deg_thresh DEG_THRESH # Thresh value in degree, [0, 180] (default: 75) # -L LIMIT, --limit LIMIT # Test parameter, limit converting only the first [NUM] # ones (default: 0)
$ CUDA_VISIBLE_DEVICES='-1' ./data_drop_top_k.py -M checkpoints/TT_mobilenet_topk_bs256.h5 -D /datasets/faces_casia_112x112_folders/ -L 20
-
[Discussions] SubCenter_training_Mobilenet_on_CASIA
Scenario Max lfw Max cfp_fp Max agedb_30 Baseline, topk 1 0.9822 0.8694 0.8695 TopK 3 0.9838 0.9044 0.8743 TopK 3->1 0.9838 0.8960 0.8768 TopK 3->1, bottleneckOnly, initial_epoch=0 0.9878 0.8920 0.8857 TopK 3->1, bottleneckOnly, initial_epoch=40 0.9835 0.9030 0.8763
-
PDF Improving Face Recognition from Hard Samples via Distribution Distillation Loss
-
data_distiller.py
works to extractembedding
data from images and save locally.MODEL_FILE
can beKeras h5
/pytorch jit pth
/MXNet model
.- --save_npz Default saving format is
.tfrecord
, which needs less memory while training. - -D xxx.npz Convert
xxx.npz
toxxx.tfrecord
. - --use_fp16 Save embedding data in
float16
format, which needs half less disk space than defaultfloat32
.
$ CUDA_VISIBLE_DEVICES='-1' ./data_distiller.py -h # usage: data_distiller.py [-h] -D DATA_PATH [-M MODEL_FILE] [-d DEST_FILE] # [-b BATCH_SIZE] [-L LIMIT] [--use_fp16] [--save_npz] # # optional arguments: # -h, --help show this help message and exit # -D DATA_PATH, --data_path DATA_PATH # Data path, or npz file converting to tfrecord # (default: None) # -M MODEL_FILE, --model_file MODEL_FILE # Model file, keras h5 / pytorch pth / mxnet (default: # None) # -d DEST_FILE, --dest_file DEST_FILE # Dest file path to save the processed dataset (default: # None) # -b BATCH_SIZE, --batch_size BATCH_SIZE # Batch size (default: 256) # -L LIMIT, --limit LIMIT # Test parameter, limit converting only the first [NUM] # (default: -1) # --use_fp16 Save using float16 (default: False) # --save_npz Save as npz file, default is tfrecord (default: False)
$ CUDA_VISIBLE_DEVICES='0' ./data_distiller.py -M subcenter-arcface-logs/r100-arcface-msfdrop75/model,0 -D /datasets/faces_casia_112x112_folders/ -b 32 --use_fp16 # >>>> Output: faces_casia_112x112_folders_shuffle_label_embs_normed_512.npz
- --save_npz Default saving format is
-
Then this dataset can be used to train a new model.
- Just specify
data_path
as the new dataset path. If keyembeddings
is in, then it will be adistiller train
. - A new loss
distiller_loss_cosine
will be added to match thisembeddings
data, defaultloss_weights = [1, 7]
. Parameterdistill
inscheduler
set this loss weight. - Distill loss can be used along or combined with
softmax
/arcface
/centerloss
/triplet
. - The
emb_shape
can be differ fromteacher
, in this case, a dense layerdistill_emb_map_layer
will be added betweenbasic_model
embedding layer output andteacher
embedding data.
import train, losses, models import tensorflow_addons as tfa data_basic_path = '/datasets/faces_casia' data_path = 'faces_casia_112x112_folders_shuffle_label_embs_512_fp16.tfrecord' eval_paths = [os.parh.join(data_basic_path, ii) for ii in ['lfw.bin', 'cfp_fp.bin', 'agedb_30.bin']] basic_model = models.buildin_models("mobilenet", dropout=0.4, emb_shape=512, output_layer='E') tt = train.Train(data_path, save_path='TT_mobilenet_distill_bs400.h5', eval_paths=eval_paths, basic_model=basic_model, model=None, lr_base=0.1, lr_decay=0.1, lr_decay_steps=[20, 30], batch_size=400, random_status=0) optimizer = tfa.optimizers.SGDW(learning_rate=0.1, weight_decay=5e-4, momentum=0.9) sch = [ {"loss": losses.ArcfaceLoss(scale=16), "epoch": 5, "optimizer": optimizer, "distill": 128}, {"loss": losses.ArcfaceLoss(scale=32), "epoch": 5, "distill": 128}, {"loss": losses.ArcfaceLoss(scale=64), "epoch": 40, "distill": 128}, ] tt.train(sch, 0)
- Just specify
-
Knowledge distillation result of training Mobilenet on CASIA
Teacher emb_shape Dropout Optimizer Distill Max lfw Max cfp_fp Max agedb_30 None 512 0 SGDW 0 0.9838 0.8730 0.8697 None 512 0.4 SGDW 0 0.9837 0.8491 0.8745 r100 512 0 SGDW 7 0.9900 0.9111 0.9068 r100 512 0.4 SGDW 7 0.9905 0.9170 0.9112 r100 512 0.4 SGDW 128 0.9955 0.9376 0.9465 r100 512 0.4 AdamW 128 0.9920 0.9346 0.9387 r100 512 0.4 AdamW 128 0.9920 0.9346 0.9387 r100 256 0 SGDW 128 0.9937 0.9337 0.9427 r100 256 0.4 SGDW 128 0.9942 0.9369 0.9448 -
Knowledge distillation using Mobilenet on MS1M dataset
Teacher emb_shape Dropout Optimizer Distill Max lfw Max cfp_fp Max agedb_30 r100 512 0.4 SGDW 128 0.997 0.964 0.972833
- IJB_evals.py evaluates model accuracy using insightface/evaluation/IJB/ datasets.
- In case placing
IJB
dataset/media/SD/IJB_release
, basic usage will be:# Test mxnet model, default scenario N0D1F1 CUDA_VISIBLE_DEVICES='1' python IJB_evals.py -m '/media/SD/IJB_release/pretrained_models/MS1MV2-ResNet100-Arcface/model,0' -d /media/SD/IJB_release -L # Test keras h5 model, default scenario N0D1F1 CUDA_VISIBLE_DEVICES='1' python IJB_evals.py -m 'checkpoints/basic_model.h5' -d /media/SD/IJB_release -L # `-B` to run all 8 tests N{0,1}D{0,1}F{0,1} CUDA_VISIBLE_DEVICES='1' python IJB_evals.py -m 'checkpoints/basic_model.h5' -d /media/SD/IJB_release -B -L # `-N` to run 1N test CUDA_VISIBLE_DEVICES='1' python IJB_evals.py -m 'checkpoints/basic_model.h5' -d /media/SD/IJB_release -N -L # `-E` to save embeddings data CUDA_VISIBLE_DEVICES='1' python IJB_evals.py -m 'checkpoints/basic_model.h5' -d /media/SD/IJB_release -E # Then can be restored for other tests, add `-E` to save again python IJB_evals.py -R IJB_result/MS1MV2-ResNet100-Arcface_IJBB.npz -d /media/SD/IJB_release -B # Plot result only, this needs the `label` data, which can be saved using `-L` parameter. # Or should provide the label txt file. python IJB_evals.py --plot_only /media/SD/IJB_release/IJBB/result/*100*.npy /media/SD/IJB_release/IJBB/meta/ijbb_template_pair_label.txt
- See
-h
for detail usage.python IJB_evals.py -h
-
Test using TFLite Model Benchmark Tool
-
Platform
- CPU:
Qualcomm Technologies, Inc SDM630
- System:
Android
- Inference:
TFLite
- CPU:
-
mobilenet_v2 comparing
orignal
/dynamic
/float16
/uint8
conversion ofTFLite
model. Using headerGDC + emb_shape=512 + pointwise_conv=False
.mobilenet_v2 Size (MB) threads=1 (ms) threads=4 (ms) orignal 11.576 52.224 18.102 orignal xnn 11.576 29.116 8.744 dynamic 3.36376 38.497 20.008 dynamic xnn 3.36376 37.433 19.234 float16 5.8267 53.986 19.191 float16 xnn 5.8267 29.862 8.661 uint8 3.59032 27.247 10.783 -
mobilenet_v2 comparing different headers using
float16 conversion + xnn + threads=4
emb_shape output_layer pointwise_conv PReLU Size (MB) Time (ms) 256 GDC False False 5.17011 8.214 512 GDC False False 5.82598 8.436 256 GDC True False 6.06384 9.129 512 GDC True False 6.32542 9.357 256 E True False 9.98053 10.669 256 E False False 14.9618 11.502 512 E True False 14.174 11.958 512 E False False 25.4481 15.063 512 GDC False True 5.85275 10.481 -
Backbones comparing using
float16 conversion + xnn + threads=4
, headerGDC + emb_shape=512 + pointwise_conv=False
Model Size (MB) Time (ms) mobilenet_v3_small 2.80058 4.211 mobilenet_v3_large 6.95015 10.025 ghostnet strides=2 8.06546 11.125 mobilenet 7.4905 11.836 se_mobilefacenet 1.88518 18.713 mobilefacenet 1.84267 20.443 EB0 9.40449 22.054 EB1 14.4268 31.881 ghostnet strides=1 8.16576 46.142 mobilenet_m1 7.02651 52.648
- Triplet Loss and Online Triplet Mining in TensorFlow
- TensorFlow Addons Losses: TripletSemiHardLoss
- TensorFlow Addons Layers: WeightNormalization
- Github deepinsight/insightface
- Github cavalleria/cavaface.pytorch
- Github titu1994/keras-squeeze-excite-network
- Github qubvel/EfficientNet
- Github QiaoranC/tf_ResNeSt_RegNet_model
- Partial FC: Training 10 Million Identities on a Single Machine
- Github IrvingMeng/MagFace
- BibTeX
@misc{leondgarse, author = {Leondgarse}, title = {Keras Insightface}, year = {2022}, publisher = {GitHub}, journal = {GitHub repository}, doi = {10.5281/zenodo.6506949}, howpublished = {\url{https://github.com/leondgarse/Keras_insightface}} }
- Latest DOI: