Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test time no speed increase with half-precision #402

Closed
vaibhav0195 opened this issue Mar 26, 2019 · 10 comments
Closed

Test time no speed increase with half-precision #402

vaibhav0195 opened this issue Mar 26, 2019 · 10 comments
Assignees
Labels

Comments

@vaibhav0195
Copy link

vaibhav0195 commented Mar 26, 2019

Hello thanks for the support for the mixed precision training. I was very excited to train my models on the half precision .

I have a nvidia gtx 2080ti running on cuda10.0 cudnn 7.4 , ubuntu 16.4 python 3.7 and pytorch 1 (installed using pip)

So my questions are:
1.) I can see the speed increase during the training time (using half-precision) but in the test time ,If I use the beam ctc decoder (with half-precision) i dont get any speed increase :(
(Its same as that of the model trained without the half-precision).
2.) while using the L.M and beam ctc decoder my WER is worst then in the greedy decoder :(.

can anyone please help me what are causing these problems ?or am i missing something ?

any help is appreciated.
Thanks


I think i Found why mix prec was not working as in "load_model" method we arent giving the mixed_precision option and hence by default its taking it as false hence at testing mixprec is not being loaded. Also we are not saving the mixed_precision variable at the serialize method which is why we cant get to know if we trained our model using mixed_precision or not and hence at test time we load all the model without the FP16 support.


UPDATE : even i changed the above issue to save the mixed_precision variable while saving the model. It does not changed anything but made it even slower :( . my avg inference time for non-mixprecision were 11ms per file but for mixprecision one it was 20ms :( (for beam ctc decoder with same param) However for the greedy decoder i get 4.2ms per file for the mix-precision model and 4.5 per file for the non mix-percision model.
Also if I just look at model inference time without decoder i get 1.5 - 1.7 ms for mix-precision and 1.8-1.9 for non mix precision.

@SeanNaren
Copy link
Owner

I'll be investigating this shortly, but a few pointers:

  1. We've noted that decoding time usually is considerably longer than the model inference time. So double checking this is the case may be a good idea!
  2. It is important that you grid search on a validation set the alpha/beta parameters, this can be done using this script!

@vaibhav0195
Copy link
Author

Hi @SeanNaren ,
thanks for your reply.
I doubled checked the model inference time as i also thought that decoding time should be greater then model inference time but there is no significant difference.
and thanks for suggesting second point to speed up my inference and tune my decoder properly, havent looked into it yet, but my main concern is about using the tensorcores of my gpu to speed up my model inference time :(

@SeanNaren
Copy link
Owner

What's the current state of this, how can I help?

@SeanNaren SeanNaren self-assigned this May 17, 2019
@vaibhav0195
Copy link
Author

Hi ,
I tried running couple of commands given nvidia to check weather my tensor-cores were being used . But haven't got any luck so far.
I am willing to help but i am kinda new to the field.

@SeanNaren
Copy link
Owner

Using the master branch do you still run into this issue? I'd suggest re-training a model first to see if it works. Also just to be clear, you are checking this without a LM?

@vaibhav0195
Copy link
Author

Yes I had checked this without LM just the DS2 inference time.

I retrained the small model on the same (toy-dataset) which i used before and checked the inference time on both the dataset during validation one having no mix precision and other with having precision. But i was not able to find any significance difference between the running speed of both the models. :(

@miguelvr
Copy link
Contributor

I would like to point out that half precision with APEX is only worth it if the GPU has a lot of tensor cores (RTX or Volta) apex #297

But it looks like the API of APEX used here is the old one.

It is probably worth it to update it.

@vaibhav0195
Copy link
Author

I updated the apex code according to the new api's . But still wasn't able to achieve there was no change. :(

@SeanNaren
Copy link
Owner

@vaibhav0195 could you run this script without editing anything: https://github.com/SeanNaren/deepspeech.pytorch/blob/master/benchmark.py then re-run it with the --mixed-precision flag and post the output?

@stale
Copy link

stale bot commented Feb 27, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Feb 27, 2020
@stale stale bot closed this as completed Mar 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants