-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use the default configuration, the acc always equals to 0 #74
Comments
Refer to the model(log) author published, this algorithm trained with 512 batch size should get acc > 0 after about 2k batches.But your batch size is 120, which is much smaller than 512, so I guess the algorithm haven‘t seen enough samples yet. (2000x120 vs 2000x512) |
@mike07026 is Right. And also try to use smaller network and larger batch size to obtain a stable result. |
Hi, I understand this should differ from the original results but the model has been training for over an epoch already without any increase in accuracy, wouldn't you expect at least some improvement? |
@eladrich batch size of 32 is too small. I suggest to use at least 128. My experiments use 512(128*4). |
That‘s right! CUDA_VISIBLE_DEVICES='0,1' python -u train_softmax.py --network r100 --loss-type 4 --margin-m 0.5 --data-dir ../datasets/faces_ms1m_112x112 --prefix ../model-r100 --per-batch-size 32 ![acc log] (https://user-images.githubusercontent.com/33198334/37393912-740b3d52-27ad-11e8-8152-4b18533ce020.jpg) |
@zhangjiekui ,how long to train ? |
@nttstar Is it not possible to train using small batch size 32 or 64 (data=MS1M-ArcFace, m=0.5)? Or is it just going to take long? Do you have any idea why SphereFace loss and CosFace loss converge faster (does not stay at 0 for very long) even with small batch size? |
Dear author, I used the default configuration hope to reproduce the resutls, the acc alwyas show 0 after 2000 Batch. Is there anything wrong?
I only have 2 GPUs, so I changed the visible device to 0, 1
CUDA_VISIBLE_DEVICES='0,1' python -u train_softmax.py --network r100 --loss-type 4 --margin-m 0.5 --data-dir ../datasets/faces_ms1m_112x112 --prefix ../model-r100
Logfile
lyh@lyh-dell:~/workspace/insightface/src$ MXNET_ENABLE_GPU_P2P=0 CUDA_VISIBLE_DEVICES='0,1' python -u train_softmax.py --network r100 --loss-type 4 --margin-m 0.5 --data-dir /data/faces_ms1m_112x112 --prefix ../model-r100
/home/lyh/anaconda2/lib/python2.7/site-packages/urllib3/contrib/pyopenssl.py:46: DeprecationWarning: OpenSSL.rand is deprecated - you should use os.urandom instead
import OpenSSL.SSL
gpu num: 2
num_layers 100
image_size [112, 112]
num_classes 85164
Called with argument: Namespace(batch_size=120, beta=1000.0, beta_freeze=0, beta_min=5.0, c2c_mode=-10, c2c_threshold=0.0, center_alpha=0.5, center_scale=0.003, ckpt=1, coco_scale=8.676161173096705, ctx_num=2, cutoff=0, data_dir='/data/faces_ms1m_112x112', easy_margin=0, emb_size=512, end_epoch=100000, gamma=0.12, image_channel=3, image_h=112, image_w=112, images_per_identity=0, incay=0.0, loss_type=4, lr=0.1, lr_steps='', margin=4, margin_a=0.0, margin_m=0.5, margin_s=64.0, margin_verbose=0, max_steps=0, mom=0.9, network='r100', noise_sgd=0.0, num_classes=85164, num_layers=100, output_c2c=0, patch='0_0_96_112_0', per_batch_size=60, power=1.0, prefix='../model-r100', pretrained='', rand_mirror=1, rescale_threshold=0, retrain=False, scale=0.9993, target='lfw,cfp_fp,agedb_30', triplet_alpha=0.3, triplet_bag_size=3600, triplet_max_ap=0.0, use_deformable=0, use_val=False, verbose=2000, version_input=1, version_output='E', version_se=0, version_unit=3, wd=0.0005)
init resnet 100
0 1 E 3
INFO:root:loading recordio /data/faces_ms1m_112x112/train.rec...
header0 label [ 3804847. 3890011.]
id2range 85164
0 0 3804846
c2c_stat [0, 85164]
3804846
rand_mirror 1
(120,)
loading bin 1000
loading bin 2000
loading bin 3000
loading bin 4000
loading bin 5000
loading bin 6000
loading bin 7000
loading bin 8000
loading bin 9000
loading bin 10000
loading bin 11000
loading bin 12000
(12000L, 3L, 112L, 112L)
ver lfw
loading bin 1000
loading bin 2000
loading bin 3000
loading bin 4000
loading bin 5000
loading bin 6000
loading bin 7000
loading bin 8000
loading bin 9000
loading bin 10000
loading bin 11000
loading bin 12000
loading bin 13000
loading bin 14000
(14000L, 3L, 112L, 112L)
ver cfp_fp
loading bin 1000
loading bin 2000
loading bin 3000
loading bin 4000
loading bin 5000
loading bin 6000
loading bin 7000
loading bin 8000
loading bin 9000
loading bin 10000
loading bin 11000
loading bin 12000
(12000L, 3L, 112L, 112L)
ver agedb_30
lr_steps [426666, 597333, 682666]
[16:29:38] src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:107: Running performance tests to find the best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)
/home/lyh/anaconda2/lib/python2.7/site-packages/mxnet/module/base_module.py:466: UserWarning: Optimizer created manually outside Module but rescale_grad is not normalized to 1.0/batch_size/num_workers (0.5 vs. 0.00833333333333). Is this intended?
optimizer_params=optimizer_params)
call reset()
INFO:root:Epoch[0] Batch [20] Speed: 211.74 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [40] Speed: 204.20 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [60] Speed: 200.59 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [80] Speed: 200.87 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [100] Speed: 204.17 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [120] Speed: 203.31 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [140] Speed: 202.20 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [160] Speed: 197.89 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [180] Speed: 196.42 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [200] Speed: 197.84 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [220] Speed: 199.84 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [240] Speed: 199.16 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [260] Speed: 199.51 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [280] Speed: 199.33 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [300] Speed: 199.42 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [320] Speed: 199.17 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [340] Speed: 199.13 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [360] Speed: 199.19 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [380] Speed: 199.55 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [400] Speed: 198.83 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [420] Speed: 196.18 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [440] Speed: 198.51 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [460] Speed: 199.76 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [480] Speed: 197.17 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [500] Speed: 198.87 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [520] Speed: 194.67 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [540] Speed: 192.28 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [560] Speed: 190.91 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [580] Speed: 193.99 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [600] Speed: 193.84 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [620] Speed: 190.89 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [640] Speed: 190.69 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [660] Speed: 191.62 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [680] Speed: 193.53 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [700] Speed: 189.48 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [720] Speed: 187.77 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [740] Speed: 189.28 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [760] Speed: 189.19 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [780] Speed: 190.49 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [800] Speed: 192.47 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [820] Speed: 195.38 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [840] Speed: 197.69 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [860] Speed: 197.44 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [880] Speed: 196.46 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [900] Speed: 194.44 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [920] Speed: 193.12 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [940] Speed: 197.04 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [960] Speed: 199.12 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [980] Speed: 196.32 samples/sec acc=0.000000
lr-batch-epoch: 0.1 999 0
INFO:root:Epoch[0] Batch [1000] Speed: 197.65 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [1020] Speed: 198.27 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [1040] Speed: 199.45 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [1060] Speed: 198.46 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [1080] Speed: 198.32 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [1100] Speed: 197.28 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [1120] Speed: 195.61 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [1140] Speed: 199.11 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [1160] Speed: 199.16 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [1180] Speed: 196.99 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [1200] Speed: 195.02 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [1220] Speed: 197.64 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [1240] Speed: 191.67 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [1260] Speed: 187.53 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [1280] Speed: 190.14 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [1300] Speed: 189.53 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [1320] Speed: 190.74 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [1340] Speed: 189.12 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [1360] Speed: 189.72 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [1380] Speed: 188.79 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [1400] Speed: 188.42 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [1420] Speed: 189.92 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [1440] Speed: 188.11 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [1460] Speed: 190.44 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [1480] Speed: 189.30 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [1500] Speed: 190.49 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [1520] Speed: 190.19 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [1540] Speed: 189.77 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [1560] Speed: 190.97 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [1580] Speed: 188.68 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [1600] Speed: 189.51 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [1620] Speed: 190.87 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [1640] Speed: 191.33 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [1660] Speed: 194.82 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [1680] Speed: 190.51 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [1700] Speed: 191.16 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [1720] Speed: 196.51 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [1740] Speed: 192.80 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [1760] Speed: 197.37 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [1780] Speed: 199.04 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [1800] Speed: 197.57 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [1820] Speed: 199.10 samples/sec acc=0.000000
INFO:root:Epoch[0] Batch [1840] Speed: 199.07 samples/sec acc=0.000000
The text was updated successfully, but these errors were encountered: