Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report Progress as a Fraction of Remaining Epochs #59

Closed
matthew-mcateer opened this issue Feb 22, 2018 · 5 comments
Closed

Report Progress as a Fraction of Remaining Epochs #59

matthew-mcateer opened this issue Feb 22, 2018 · 5 comments

Comments

@matthew-mcateer
Copy link

Is there any way to get the script to report the number of remaining epochs (or number of remaining batches within an epoch?). I'm training Mobilenet using the InsightFace method using the MSM1 dataset. I'm on the 2nd epoch, but I have no idea how many more epochs remain.

@nttstar
Copy link
Collaborator

nttstar commented Feb 22, 2018

Count the batches(iterations) instead of epochs.

@matthew-mcateer
Copy link
Author

Ah, so that means the number of epochs in the job just (100k(the default epoch end) * 128(the batch size)) / number of data samples(10 M for number of images in dataset) @nttstar ?

@nttstar
Copy link
Collaborator

nttstar commented Feb 23, 2018

If total size of iterations is 100K, then the number of epochs equals to 100K*128*GPU_NUM/sample_size

@nttstar nttstar closed this as completed Feb 24, 2018
@matthew-mcateer
Copy link
Author

But hang on, it appears that the number of batches per epoch is much less than 100K. What is the determinant of the number of batches per Epoch?

Also @nttstar How long did it originally take to train the models given s examples in the README? We're currently training Mobilenet using the ArcFace method using one NVIDIA Tesla GPU. We're about 3 days and 22 hours in, with 8 epochs and 9640 batches passed. the accuracy reported at each step only recently approached 0.010156. The accuracy also appears to be rising much more slowly than when the softmax method is used. Is this normal?

@nttstar
Copy link
Collaborator

nttstar commented Feb 25, 2018

number of batches per Epoch = total_sample_size/total_batch_size

Also I just did experiments with batch_size=512(128*4). I'm not very sure if it works very well with smaller batch size like your case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants