-
Notifications
You must be signed in to change notification settings - Fork 313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
INF loss cause by ScaledLinear #534
Comments
Could you use the changes from #531, which have been merged into the master, to |
What optimizer did you use? Did you use |
Yes, |
Yes, I have learned about this PR on wechat group. But this may be a new problem cause by the decoder and joiner part of the
where:
|
More debug info:
|
Because it's sometimes possible to scale up one layer and scale down the next one without affecting the output, this can happen. |
I've added the clamping operation in the
It seems that the clamping operation does not affect training. Here are the values of three scalar weights after each epoch: Without clamping: Epoch: 1 Decoder conv weight_scale: 2.532698154449463 Joiner decoder_proj weight_scale: -1.3956085443496704 Simple lm_proj weight_scale: 0.014985241927206516 Epoch: 2 Decoder conv weight_scale: 2.4966623783111572 Joiner decoder_proj weight_scale: -1.3323917388916016 Simple lm_proj weight_scale: -0.04377373680472374 Epoch: 3 Decoder conv weight_scale: 2.408905506134033 Joiner decoder_proj weight_scale: -1.013999104499817 Simple lm_proj weight_scale: -0.248096764087677 Epoch: 4 Decoder conv weight_scale: 2.3934552669525146 Joiner decoder_proj weight_scale: -0.9088234305381775 Simple lm_proj weight_scale: -0.2796499729156494 Epoch: 5 Decoder conv weight_scale: 2.463165283203125 Joiner decoder_proj weight_scale: -1.2767215967178345 Simple lm_proj weight_scale: -0.09834159910678864 Epoch: 6 Decoder conv weight_scale: 2.5755698680877686 Joiner decoder_proj weight_scale: -1.4661520719528198 Simple lm_proj weight_scale: 0.08270139247179031 Epoch: 7 Decoder conv weight_scale: 2.453735113143921 Joiner decoder_proj weight_scale: -1.2604535818099976 Simple lm_proj weight_scale: -0.11372770369052887 Epoch: 8 Decoder conv weight_scale: 2.562387704849243 Joiner decoder_proj weight_scale: -1.4478679895401 Simple lm_proj weight_scale: 0.06404751539230347 Epoch: 9 Decoder conv weight_scale: 2.539534568786621 Joiner decoder_proj weight_scale: -1.40958833694458 Simple lm_proj weight_scale: 0.02723226696252823 Epoch: 10 Decoder conv weight_scale: 2.5854289531707764 Joiner decoder_proj weight_scale: -1.48295259475708 Simple lm_proj weight_scale: 0.09910309314727783 Epoch: 11 Decoder conv weight_scale: 2.4235291481018066 Joiner decoder_proj weight_scale: -1.1637898683547974 Simple lm_proj weight_scale: -0.19667033851146698 Epoch: 12 Decoder conv weight_scale: 2.5119268894195557 Joiner decoder_proj weight_scale: -1.362194299697876 Simple lm_proj weight_scale: -0.018590014427900314 Epoch: 13 Decoder conv weight_scale: 2.4722437858581543 Joiner decoder_proj weight_scale: -1.2908258438110352 Simple lm_proj weight_scale: -0.08399543911218643 Epoch: 14 Decoder conv weight_scale: 2.5198278427124023 Joiner decoder_proj weight_scale: -1.3738354444503784 Simple lm_proj weight_scale: -0.006699979770928621 Epoch: 15 Decoder conv weight_scale: 2.4350645542144775 Joiner decoder_proj weight_scale: -1.2236826419830322 Simple lm_proj weight_scale: -0.15010575950145721 Epoch: 16 Decoder conv weight_scale: 2.5459630489349365 Joiner decoder_proj weight_scale: -1.4185402393341064 Simple lm_proj weight_scale: 0.03665268048644066 Epoch: 17 Decoder conv weight_scale: 2.260655403137207 Joiner decoder_proj weight_scale: -0.8165771961212158 Simple lm_proj weight_scale: -0.5119346976280212 Epoch: 18 Decoder conv weight_scale: 2.5793089866638184 Joiner decoder_proj weight_scale: -1.4762474298477173 Simple lm_proj weight_scale: 0.0903153046965599 Epoch: 19 Decoder conv weight_scale: 2.5563783645629883 Joiner decoder_proj weight_scale: -1.4410549402236938 Simple lm_proj weight_scale: 0.055296555161476135 Epoch: 20 Decoder conv weight_scale: 2.5518383979797363 Joiner decoder_proj weight_scale: -1.4319567680358887 Simple lm_proj weight_scale: 0.04718981310725212 Epoch: 21 Decoder conv weight_scale: 2.443829298019409 Joiner decoder_proj weight_scale: -1.2422311305999756 Simple lm_proj weight_scale: -0.13266611099243164 Epoch: 22 Decoder conv weight_scale: 2.4006619453430176 Joiner decoder_proj weight_scale: -0.8165771961212158 Simple lm_proj weight_scale: -0.3719165623188019 Epoch: 23 Decoder conv weight_scale: 2.4226510524749756 Joiner decoder_proj weight_scale: -1.1069473028182983 Simple lm_proj weight_scale: -0.22385886311531067 Epoch: 24 Decoder conv weight_scale: 2.568734645843506 Joiner decoder_proj weight_scale: -1.4594826698303223 Simple lm_proj weight_scale: 0.0736895278096199 Epoch: 25 Decoder conv weight_scale: 2.4270377159118652 Joiner decoder_proj weight_scale: -1.1974050998687744 Simple lm_proj weight_scale: -0.17232902348041534 Epoch: 26 Decoder conv weight_scale: 2.489833116531372 Joiner decoder_proj weight_scale: -1.3193844556808472 Simple lm_proj weight_scale: -0.05526689812541008 Epoch: 27 Decoder conv weight_scale: 2.525768756866455 Joiner decoder_proj weight_scale: -1.3883850574493408 Simple lm_proj weight_scale: 0.004825897980481386 Epoch: 28 Decoder conv weight_scale: 2.4805877208709717 Joiner decoder_proj weight_scale: -1.3054426908493042 Simple lm_proj weight_scale: -0.07006088644266129 Epoch: 29 Decoder conv weight_scale: 2.366048574447632 Joiner decoder_proj weight_scale: -0.8165771961212158 Simple lm_proj weight_scale: -0.4065343141555786 Epoch: 30 Decoder conv weight_scale: 2.5044734477996826 Joiner decoder_proj weight_scale: -1.3446654081344604 Simple lm_proj weight_scale: -0.031459204852581024 With clamping: Epoch: 1 Decoder conv weight_scale: 2.0 Joiner decoder_proj weight_scale: -0.1262197494506836 Simple lm_proj weight_scale: 1.346535086631775 Epoch: 2 Decoder conv weight_scale: 2.0 Joiner decoder_proj weight_scale: -0.13853846490383148 Simple lm_proj weight_scale: 1.2230230569839478 Epoch: 3 Decoder conv weight_scale: 1.9999276399612427 Joiner decoder_proj weight_scale: -0.30957549810409546 Simple lm_proj weight_scale: 0.7861649990081787 Epoch: 4 Decoder conv weight_scale: 2.0 Joiner decoder_proj weight_scale: -0.47377195954322815 Simple lm_proj weight_scale: 0.5668039321899414 Epoch: 5 Decoder conv weight_scale: 2.0 Joiner decoder_proj weight_scale: -0.15966753661632538 Simple lm_proj weight_scale: 1.1079970598220825 Epoch: 6 Decoder conv weight_scale: 1.9998482465744019 Joiner decoder_proj weight_scale: -0.11061322689056396 Simple lm_proj weight_scale: 1.4907336235046387 Epoch: 7 Decoder conv weight_scale: 1.9994806051254272 Joiner decoder_proj weight_scale: -0.17074772715568542 Simple lm_proj weight_scale: 1.074182152748108 Epoch: 8 Decoder conv weight_scale: 1.9996674060821533 Joiner decoder_proj weight_scale: -0.11555270105600357 Simple lm_proj weight_scale: 1.4526734352111816 Epoch: 9 Decoder conv weight_scale: 2.0 Joiner decoder_proj weight_scale: -0.12586870789527893 Simple lm_proj weight_scale: 1.371146321296692 Epoch: 10 Decoder conv weight_scale: 2.0 Joiner decoder_proj weight_scale: -0.10725975781679153 Simple lm_proj weight_scale: 1.5275452136993408 Epoch: 11 Decoder conv weight_scale: 1.999825358390808 Joiner decoder_proj weight_scale: -0.23376572132110596 Simple lm_proj weight_scale: 0.9220944046974182 Epoch: 12 Decoder conv weight_scale: 2.0 Joiner decoder_proj weight_scale: -0.1301610767841339 Simple lm_proj weight_scale: 1.2763099670410156 Epoch: 13 Decoder conv weight_scale: 1.9997310638427734 Joiner decoder_proj weight_scale: -0.15640747547149658 Simple lm_proj weight_scale: 1.1399130821228027 Epoch: 14 Decoder conv weight_scale: 2.0 Joiner decoder_proj weight_scale: -0.13257452845573425 Simple lm_proj weight_scale: 1.3028491735458374 Epoch: 15 Decoder conv weight_scale: 2.0 Joiner decoder_proj weight_scale: -0.19641730189323425 Simple lm_proj weight_scale: 1.002395510673523 Epoch: 16 Decoder conv weight_scale: 1.9998756647109985 Joiner decoder_proj weight_scale: -0.11937755346298218 Simple lm_proj weight_scale: 1.3922357559204102 Epoch: 17 Decoder conv weight_scale: 2.0 Joiner decoder_proj weight_scale: -0.8165771961212158 Simple lm_proj weight_scale: -0.21228034794330597 Epoch: 18 Decoder conv weight_scale: 1.9999603033065796 Joiner decoder_proj weight_scale: -0.1086028665304184 Simple lm_proj weight_scale: 1.5089606046676636 Epoch: 19 Decoder conv weight_scale: 2.0 Joiner decoder_proj weight_scale: -0.12050770223140717 Simple lm_proj weight_scale: 1.4338239431381226 Epoch: 20 Decoder conv weight_scale: 1.9998784065246582 Joiner decoder_proj weight_scale: -0.12354948371648788 Simple lm_proj weight_scale: 1.4136296510696411 Epoch: 21 Decoder conv weight_scale: 2.0 Joiner decoder_proj weight_scale: -0.1823224574327469 Simple lm_proj weight_scale: 1.037805199623108 Epoch: 22 Decoder conv weight_scale: 1.9997987747192383 Joiner decoder_proj weight_scale: -0.8165771961212158 Simple lm_proj weight_scale: 0.3626188039779663 Epoch: 23 Decoder conv weight_scale: 1.9996812343597412 Joiner decoder_proj weight_scale: -0.2594310939311981 Simple lm_proj weight_scale: 0.8760554790496826 Epoch: 24 Decoder conv weight_scale: 2.0 Joiner decoder_proj weight_scale: -0.11035250872373581 Simple lm_proj weight_scale: 1.4716155529022217 Epoch: 25 Decoder conv weight_scale: 1.9995653629302979 Joiner decoder_proj weight_scale: -0.21520820260047913 Simple lm_proj weight_scale: 0.9618409276008606 Epoch: 26 Decoder conv weight_scale: 1.9997872114181519 Joiner decoder_proj weight_scale: -0.13693207502365112 Simple lm_proj weight_scale: 1.196391224861145 Epoch: 27 Decoder conv weight_scale: 2.0 Joiner decoder_proj weight_scale: -0.12907470762729645 Simple lm_proj weight_scale: 1.3241920471191406 Epoch: 28 Decoder conv weight_scale: 2.0 Joiner decoder_proj weight_scale: -0.1475147157907486 Simple lm_proj weight_scale: 1.1672462224960327 Epoch: 29 Decoder conv weight_scale: 2.0 Joiner decoder_proj weight_scale: -0.8165771961212158 Simple lm_proj weight_scale: 0.10545660555362701 Epoch: 30 Decoder conv weight_scale: 1.9997196197509766 Joiner decoder_proj weight_scale: -0.13277317583560944 Simple lm_proj weight_scale: 1.250410556793213 It seems that one scalar weight is in saturation: the clamping function is pulling it back every time. The WER is fine though, so this might not be a problem. Do I need to try another clamping range e.g (-10,3)? |
That threshold is fine. These models have degrees of freedom, one can get big and the other small.. |
After 30 epochs of model training, inf loss appeared in a certain batch of training on my own dataset.
My train script is modified from egs/librispecch/pruned_transducer_stateless2.
The problem seems to be on the weight parameters of the simple_lm_proj layer which is ScaledLinear type.
The reason for this exception is that the
real weight
of the layer has reached the maximum representation range of the float32 point (Convert to 64 bit floating point can solve it).Maybe need to reconsider the construction of ScaledLinear module.
Case for reproduct:
The text was updated successfully, but these errors were encountered: