-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory Errors, would appreciate more detailed requirement info in README #55
Comments
Refer to #32 if you do not have 24GB memory on each GPU, or use smaller |
Hi, I have followed the instruction to install the environment (python2.7, cu9.0) and when running the scripts, CUDA_VISIBLE_DEVICES=0 python -u train.py --network r50 --loss arcface --dataset emore --per-batch-size 8 I have encountered the following problem too(I changed the --per-batch-size to 8 and network to r50 and this problem still exits) /home/sirius/anaconda3/envs/insightFacePre/lib/python2.7/site-packages/mxnet/numpy_op_signature.py:61: UserWarning: Some mxnet.numpy operator signatures may not be displayed consistently with their counterparts in the official NumPy package due to too-low Python version 2.7.18 |Anaconda, Inc.| (default, Jun 4 2021, 14:47:46)
[GCC 7.3.0]. Python >= 3.5 is required to make the signatures display correctly.
.format(str(sys.version)))
gpu num: 1
prefix ./models/r50-arcface-emore/model
image_size [112, 112]
num_classes 85742
Called with argument: Namespace(batch_size=8, ckpt=3, ctx_num=1, dataset='emore', frequent=20, image_channel=3, kvstore='device', loss='arcface', lr=0.1, lr_steps='100000,160000,220000', models_root='./models', mom=0.9, network='r50', per_batch_size=8, pretrained='', pretrained_epoch=1, rescale_threshold=0, verbose=2000, wd=0.0005) {'loss_m1': 1.0, 'loss_m2': 0.5, 'loss_m3': 0.0, 'net_act': 'prelu', 'emb_size': 512, 'data_rand_mirror': True, 'num_layers': 50, 'loss_name': 'margin_softmax', 'val_targets': ['lfw', 'cfp_fp', 'agedb_30'], 'ce_loss': True, 'net_input': 1, 'image_shape': [112, 112, 3], 'net_blocks': [1, 4, 6, 2], 'fc7_lr_mult': 1.0, 'ckpt_embedding': True, 'net_unit': 3, 'net_output': 'E', 'count_flops': True, 'num_workers': 1, 'batch_size': 8, 'memonger': False, 'data_images_filter': 0, 'dataset': 'emore', 'num_classes': 85742, 'fc7_no_bias': False, 'loss': 'arcface', 'data_color': 0, 'loss_s': 64.0, 'dataset_path': '~/document/siriusShare/Clustering-Face/Data/faces_emore', 'data_cutoff': False, 'net_se': 0, 'net_multiplier': 1.0, 'fc7_wd_mult': 1.0, 'network': 'r50', 'per_batch_size': 8, 'net_name': 'fresnet', 'workspace': 256, 'max_steps': 0, 'bn_mom': 0.9}
0 1 E 3 prelu False
Network FLOPs: 12.6G
INFO:root:loading recordio ~/document/siriusShare/Clustering-Face/Data/faces_emore/train.rec...
Traceback (most recent call last):
File "train.py", line 377, in <module>
main()
File "train.py", line 374, in main
train_net(args)
File "train.py", line 242, in train_net
images_filter = config.data_images_filter,
File "/home/sirius/document/siriusShare/Clustering-Face/insightface-master/recognition/image_iter.py", line 38, in __init__
self.imgrec = recordio.MXIndexedRecordIO(path_imgidx, path_imgrec, 'r') # pylint: disable=redefined-variable-type
File "/home/sirius/anaconda3/envs/insightFacePre/lib/python2.7/site-packages/mxnet/recordio.py", line 245, in __init__
super(MXIndexedRecordIO, self).__init__(uri, flag)
File "/home/sirius/anaconda3/envs/insightFacePre/lib/python2.7/site-packages/mxnet/recordio.py", line 71, in __init__
self.open()
File "/home/sirius/anaconda3/envs/insightFacePre/lib/python2.7/site-packages/mxnet/recordio.py", line 248, in open
super(MXIndexedRecordIO, self).open()
File "/home/sirius/anaconda3/envs/insightFacePre/lib/python2.7/site-packages/mxnet/recordio.py", line 79, in open
check_call(_LIB.MXRecordIOReaderCreate(self.uri, ctypes.byref(self.handle)))
File "/home/sirius/anaconda3/envs/insightFacePre/lib/python2.7/site-packages/mxnet/base.py", line 255, in check_call
raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [14:54:34] src/io/local_filesys.cc:209: Check failed: allow_null: LocalFileSystem::Open "~/document/siriusShare/Clustering-Face/Data/faces_emore/train.rec": No such file or directory
Stack trace:
[bt] (0) /home/sirius/anaconda3/envs/insightFacePre/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x40ff258) [0x7fe08a76e258]
[bt] (1) /home/sirius/anaconda3/envs/insightFacePre/lib/python2.7/site-packages/mxnet/libmxnet.so(+0x40f6bca) [0x7fe08a765bca]
[bt] (2) /home/sirius/anaconda3/envs/insightFacePre/lib/python2.7/site-packages/mxnet/libmxnet.so(MXRecordIOReaderCreate+0x2d) [0x7fe089d82dad]
[bt] (3) /home/sirius/anaconda3/envs/insightFacePre/lib/python2.7/lib-dynload/../../libffi.so.7(+0x69dd) [0x7fe0e4ac09dd]
[bt] (4) /home/sirius/anaconda3/envs/insightFacePre/lib/python2.7/lib-dynload/../../libffi.so.7(+0x6067) [0x7fe0e4ac0067]
[bt] (5) /home/sirius/anaconda3/envs/insightFacePre/lib/python2.7/lib-dynload/_ctypes.so(_ctypes_callproc+0x4de) [0x7fe0e456a9de]
[bt] (6) /home/sirius/anaconda3/envs/insightFacePre/lib/python2.7/lib-dynload/_ctypes.so(+0x9b61) [0x7fe0e4560b61]
[bt] (7) /home/sirius/anaconda3/envs/insightFacePre/bin/../lib/libpython2.7.so.1.0(PyObject_Call+0x43) [0x7fe0e5f0db83]
[bt] (8) /home/sirius/anaconda3/envs/insightFacePre/bin/../lib/libpython2.7.so.1.0(PyEval_EvalFrameEx+0x3bb9) [0x7fe0e5fa4199]
[14:54:34] src/engine/engine.cc:55: MXNet start using engine: ThreadedEnginePerDevice Thanks for your time and any help would be appreciated! |
"~/document/siriusShare/Clustering-Face/Data/faces_emore/train.rec": No such file or directory |
@nttstar Hi, thanks for your quick reply, I have successfully run the torch version code on one GPU from the comment |
After going through the instructions for adding the dataset, and adding the dependencies, and making sure I'm within the src folder in the repository, I enter the following to train InsightFace on LResNet100E-IR (this has been modified as my machine only has one GPU):
However, I get the following output:
Can you include more detail in the README about the specific requirements in terms of devices, memory, and CUDA requirements?
The text was updated successfully, but these errors were encountered: