-
Notifications
You must be signed in to change notification settings - Fork 620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Test time no speed increase with half-precision #402
Comments
I'll be investigating this shortly, but a few pointers:
|
Hi @SeanNaren , |
What's the current state of this, how can I help? |
Hi , |
Using the master branch do you still run into this issue? I'd suggest re-training a model first to see if it works. Also just to be clear, you are checking this without a LM? |
Yes I had checked this without LM just the DS2 inference time. I retrained the small model on the same (toy-dataset) which i used before and checked the inference time on both the dataset during validation one having no mix precision and other with having precision. But i was not able to find any significance difference between the running speed of both the models. :( |
I would like to point out that half precision with APEX is only worth it if the GPU has a lot of tensor cores (RTX or Volta) apex #297 But it looks like the API of APEX used here is the old one. It is probably worth it to update it. |
I updated the apex code according to the new api's . But still wasn't able to achieve there was no change. :( |
@vaibhav0195 could you run this script without editing anything: https://github.com/SeanNaren/deepspeech.pytorch/blob/master/benchmark.py then re-run it with the |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
Hello thanks for the support for the mixed precision training. I was very excited to train my models on the half precision .
I have a nvidia gtx 2080ti running on cuda10.0 cudnn 7.4 , ubuntu 16.4 python 3.7 and pytorch 1 (installed using pip)
So my questions are:
1.) I can see the speed increase during the training time (using half-precision) but in the test time ,If I use the beam ctc decoder (with half-precision) i dont get any speed increase :(
(Its same as that of the model trained without the half-precision).
2.) while using the L.M and beam ctc decoder my WER is worst then in the greedy decoder :(.
can anyone please help me what are causing these problems ?or am i missing something ?
any help is appreciated.
Thanks
I think i Found why mix prec was not working as in "load_model" method we arent giving the mixed_precision option and hence by default its taking it as false hence at testing mixprec is not being loaded. Also we are not saving the mixed_precision variable at the serialize method which is why we cant get to know if we trained our model using mixed_precision or not and hence at test time we load all the model without the FP16 support.
UPDATE : even i changed the above issue to save the mixed_precision variable while saving the model. It does not changed anything but made it even slower :( . my avg inference time for non-mixprecision were 11ms per file but for mixprecision one it was 20ms :( (for beam ctc decoder with same param) However for the greedy decoder i get 4.2ms per file for the mix-precision model and 4.5 per file for the non mix-percision model.
Also if I just look at model inference time without decoder i get 1.5 - 1.7 ms for mix-precision and 1.8-1.9 for non mix precision.
The text was updated successfully, but these errors were encountered: