Test time no speed increase with half-precision #402

vaibhav0195 · 2019-03-26T06:54:50Z

Hello thanks for the support for the mixed precision training. I was very excited to train my models on the half precision .

I have a nvidia gtx 2080ti running on cuda10.0 cudnn 7.4 , ubuntu 16.4 python 3.7 and pytorch 1 (installed using pip)

So my questions are:
1.) I can see the speed increase during the training time (using half-precision) but in the test time ,If I use the beam ctc decoder (with half-precision) i dont get any speed increase :(
(Its same as that of the model trained without the half-precision).
2.) while using the L.M and beam ctc decoder my WER is worst then in the greedy decoder :(.

can anyone please help me what are causing these problems ?or am i missing something ?

any help is appreciated.
Thanks

I think i Found why mix prec was not working as in "load_model" method we arent giving the mixed_precision option and hence by default its taking it as false hence at testing mixprec is not being loaded. Also we are not saving the mixed_precision variable at the serialize method which is why we cant get to know if we trained our model using mixed_precision or not and hence at test time we load all the model without the FP16 support.

UPDATE : even i changed the above issue to save the mixed_precision variable while saving the model. It does not changed anything but made it even slower :( . my avg inference time for non-mixprecision were 11ms per file but for mixprecision one it was 20ms :( (for beam ctc decoder with same param) However for the greedy decoder i get 4.2ms per file for the mix-precision model and 4.5 per file for the non mix-percision model.
Also if I just look at model inference time without decoder i get 1.5 - 1.7 ms for mix-precision and 1.8-1.9 for non mix precision.

SeanNaren · 2019-03-28T13:09:39Z

I'll be investigating this shortly, but a few pointers:

We've noted that decoding time usually is considerably longer than the model inference time. So double checking this is the case may be a good idea!
It is important that you grid search on a validation set the alpha/beta parameters, this can be done using this script!

vaibhav0195 · 2019-03-29T05:53:47Z

Hi @SeanNaren ,
thanks for your reply.
I doubled checked the model inference time as i also thought that decoding time should be greater then model inference time but there is no significant difference.
and thanks for suggesting second point to speed up my inference and tune my decoder properly, havent looked into it yet, but my main concern is about using the tensorcores of my gpu to speed up my model inference time :(

SeanNaren · 2019-05-17T14:01:09Z

What's the current state of this, how can I help?

vaibhav0195 · 2019-05-20T04:55:42Z

Hi ,
I tried running couple of commands given nvidia to check weather my tensor-cores were being used . But haven't got any luck so far.
I am willing to help but i am kinda new to the field.

SeanNaren · 2019-05-20T08:10:30Z

Using the master branch do you still run into this issue? I'd suggest re-training a model first to see if it works. Also just to be clear, you are checking this without a LM?

vaibhav0195 · 2019-05-20T12:44:14Z

Yes I had checked this without LM just the DS2 inference time.

I retrained the small model on the same (toy-dataset) which i used before and checked the inference time on both the dataset during validation one having no mix precision and other with having precision. But i was not able to find any significance difference between the running speed of both the models. :(

miguelvr · 2019-05-20T13:30:44Z

I would like to point out that half precision with APEX is only worth it if the GPU has a lot of tensor cores (RTX or Volta) apex #297

But it looks like the API of APEX used here is the old one.

It is probably worth it to update it.

vaibhav0195 · 2019-05-21T08:39:33Z

I updated the apex code according to the new api's . But still wasn't able to achieve there was no change. :(

SeanNaren · 2019-05-21T12:46:58Z

@vaibhav0195 could you run this script without editing anything: https://github.com/SeanNaren/deepspeech.pytorch/blob/master/benchmark.py then re-run it with the --mixed-precision flag and post the output?

stale · 2020-02-27T13:59:03Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

vaibhav0195 mentioned this issue Mar 28, 2019

FP16 support #55

Closed

SeanNaren self-assigned this May 17, 2019

vaibhav0195 closed this as completed May 21, 2019

vaibhav0195 reopened this May 21, 2019

stale bot added the stale label Feb 27, 2020

stale bot closed this as completed Mar 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test time no speed increase with half-precision #402

Test time no speed increase with half-precision #402

vaibhav0195 commented Mar 26, 2019 •

edited

Loading

SeanNaren commented Mar 28, 2019

vaibhav0195 commented Mar 29, 2019

SeanNaren commented May 17, 2019

vaibhav0195 commented May 20, 2019

SeanNaren commented May 20, 2019

vaibhav0195 commented May 20, 2019

miguelvr commented May 20, 2019

vaibhav0195 commented May 21, 2019

SeanNaren commented May 21, 2019

stale bot commented Feb 27, 2020

Test time no speed increase with half-precision #402

Test time no speed increase with half-precision #402

Comments

vaibhav0195 commented Mar 26, 2019 • edited Loading

SeanNaren commented Mar 28, 2019

vaibhav0195 commented Mar 29, 2019

SeanNaren commented May 17, 2019

vaibhav0195 commented May 20, 2019

SeanNaren commented May 20, 2019

vaibhav0195 commented May 20, 2019

miguelvr commented May 20, 2019

vaibhav0195 commented May 21, 2019

SeanNaren commented May 21, 2019

stale bot commented Feb 27, 2020

vaibhav0195 commented Mar 26, 2019 •

edited

Loading