Fix un-intended regression in cudnn-fp16 backend running on GTX GPUs #1246

ankan-ban · 2020-04-28T15:06:06Z

don't set CUBLAS_TENSOR_OP_MATH unconditionally.

* Time management refactoring (LeelaChessZero#1195) * Appended files. * Compiles. * Compiles again. * Make smart pruning use smoothed nps. * Seems to be fully implemented. * Mistype. * One more bug. * Found discrepancy with documentaiton. * Bugfixes. * Don't smooth nps during the first move. * Too large default for timeuse decay. * Bugfix. * Fix build. * Relax defaults a bit. Add fixed to logging. * Remove "smooth" to "smooth-experimental" for now. * MLH verbose stats - Issue 1200 (LeelaChessZero#1230) * Add M effect logic to output section * Fix missing prefixes and semicolons * Some fixes. * Slight format improvement? Co-authored-by: Tilps <Tilps@users.noreply.github.com> * Start TempDecay only after a given number of moves (LeelaChessZero#1212) * Added TempDecayStartMove for starting temp decay only after a given number of moves. This allows keeping initial game up for a few moves and still use decay. * Doesn't allow temperature to fall below endgame temp during temp decay. Still allows initial temp to be below endgame temp. * Doesn't allow temperature to fall below endgame temp during temp decay. Still allows initial temp to be below endgame temp. * Hide temp options * renamed TempDecayStartMove to TempDecayDelayMoves Co-authored-by: Alexis Olson <AlexisOlson@gmail.com> * Changelog for 0.25.0-rc2. (LeelaChessZero#1233) * Changelog for 0.25.0-rc2. * Add one more PR to the changelog. * Cuda winograd (LeelaChessZero#1228) * custom winograd convolution for cuda backends * custom winograd fixes - fix a bug to make it work for non-SE networks - enable by default only with fp32. * address review comments * remove random line in comment * remove unused constants - W,H are hardcoded to 8 - because there are assumptions in the code based on that. No point in defining constants. * cuda winograd fixes (LeelaChessZero#1238) * cuda winograd fixes - don't typecast directly to half datatype in CPU side code as older CUDA runtime doesn't support that. - don't use gemmEx version on GPUs older than Maxwell generation (not supported). - modify the check to enable custom_winograd setting. It should be faster in most cases - except presently on RTX GPUs when using fp16. * Allow most parts of fen to be optional. (LeelaChessZero#1234) Default to white to move, no castling, no en passant, 0 rule50ply, 1 total move. Also convert other string to std::string and removing using. * Fix UpdateNps to actually smooth the nps and correctly handle time_since_movestart_ms == 0 (LeelaChessZero#1243) * Update changelog for 0.25.0 final release. (LeelaChessZero#1244) * Always report at least 1 depth. (LeelaChessZero#1247) * Fix un-intended regression for GTX GPUs (LeelaChessZero#1246) * memory optimization for cudnn custom_winograd (LeelaChessZero#1250) * memory optimization for cudnn custom_winograd - don't save untransformed weights - print warning message when low memory is detected. * address review comments * fix warning message * fix total weight size calculation 2 layers per residual block! * keep pdb files only for release builds (LeelaChessZero#1256) * doc update (LeelaChessZero#1267) * Include verbose stats for the node. (LeelaChessZero#1268) Use printing lambdas for parts of the verbose output to share between the newly outputted node and its children. * add alphazero time manager (LeelaChessZero#1201) * Updated FLAGS.md with logfile flag (LeelaChessZero#1275) * Fixed a typo in CONTRIBUTING.md (LeelaChessZero#1274) * Update Readme about using git (LeelaChessZero#1265) * Make `wl_` double. (LeelaChessZero#1280) * Move move filter population to a constructor. (LeelaChessZero#1281) * Filter out illegal searchmoves to avoid crashing. (LeelaChessZero#1282) * Clear policy for terminal loss. (LeelaChessZero#1285) * Allow smart pruning to terminate search if win is known. (LeelaChessZero#1284) * Allow smart pruning to terminate search if win is known. * Minor tweak, better safe than sorry. * Fix bug where pv might not update for best move change. (LeelaChessZero#1286) * Fix bug where pv might not update. * Fix... Co-authored-by: Alexander Lyashuk <crem@google.com> Co-authored-by: Tilps <Tilps@users.noreply.github.com> Co-authored-by: Naphthalin <40385638+Naphthalin@users.noreply.github.com> Co-authored-by: Ankan Banerjee <ankan.ban@gmail.com> Co-authored-by: Ed Lee <edilee@mozilla.com> Co-authored-by: borg323 <39573933+borg323@users.noreply.github.com> Co-authored-by: Hace <mellekoning@gmail.com> Co-authored-by: Kip Hamiltons <48076495+KipHamiltons@users.noreply.github.com> Co-authored-by: nguyenpham <axchess@yahoo.com>

@Naphthalin

* Time management refactoring (#1195) * Appended files. * Compiles. * Compiles again. * Make smart pruning use smoothed nps. * Seems to be fully implemented. * Mistype. * One more bug. * Found discrepancy with documentaiton. * Bugfixes. * Don't smooth nps during the first move. * Too large default for timeuse decay. * Bugfix. * Fix build. * Relax defaults a bit. Add fixed to logging. * Remove "smooth" to "smooth-experimental" for now. * MLH verbose stats - Issue 1200 (#1230) * Add M effect logic to output section * Fix missing prefixes and semicolons * Some fixes. * Slight format improvement? Co-authored-by: Tilps <Tilps@users.noreply.github.com> * Start TempDecay only after a given number of moves (#1212) * Added TempDecayStartMove for starting temp decay only after a given number of moves. This allows keeping initial game up for a few moves and still use decay. * Doesn't allow temperature to fall below endgame temp during temp decay. Still allows initial temp to be below endgame temp. * Doesn't allow temperature to fall below endgame temp during temp decay. Still allows initial temp to be below endgame temp. * Hide temp options * renamed TempDecayStartMove to TempDecayDelayMoves Co-authored-by: Alexis Olson <AlexisOlson@gmail.com> * Changelog for 0.25.0-rc2. (#1233) * Changelog for 0.25.0-rc2. * Add one more PR to the changelog. * Cuda winograd (#1228) * custom winograd convolution for cuda backends * custom winograd fixes - fix a bug to make it work for non-SE networks - enable by default only with fp32. * address review comments * remove random line in comment * remove unused constants - W,H are hardcoded to 8 - because there are assumptions in the code based on that. No point in defining constants. * cuda winograd fixes (#1238) * cuda winograd fixes - don't typecast directly to half datatype in CPU side code as older CUDA runtime doesn't support that. - don't use gemmEx version on GPUs older than Maxwell generation (not supported). - modify the check to enable custom_winograd setting. It should be faster in most cases - except presently on RTX GPUs when using fp16. * Allow most parts of fen to be optional. (#1234) Default to white to move, no castling, no en passant, 0 rule50ply, 1 total move. Also convert other string to std::string and removing using. * Fix UpdateNps to actually smooth the nps and correctly handle time_since_movestart_ms == 0 (#1243) * Update changelog for 0.25.0 final release. (#1244) * Always report at least 1 depth. (#1247) * Fix un-intended regression for GTX GPUs (#1246) * memory optimization for cudnn custom_winograd (#1250) * memory optimization for cudnn custom_winograd - don't save untransformed weights - print warning message when low memory is detected. * address review comments * fix warning message * fix total weight size calculation 2 layers per residual block! * keep pdb files only for release builds (#1256) * doc update (#1267) * Include verbose stats for the node. (#1268) Use printing lambdas for parts of the verbose output to share between the newly outputted node and its children. * add alphazero time manager (#1201) * Updated FLAGS.md with logfile flag (#1275) * Fixed a typo in CONTRIBUTING.md (#1274) * Update Readme about using git (#1265) * Make `wl_` double. (#1280) * Move move filter population to a constructor. (#1281) * Filter out illegal searchmoves to avoid crashing. (#1282) * Clear policy for terminal loss. (#1285) * Allow smart pruning to terminate search if win is known. (#1284) * Allow smart pruning to terminate search if win is known. * Minor tweak, better safe than sorry. * Fix bug where pv might not update for best move change. (#1286) * Fix bug where pv might not update. * Fix... * Catch up to master (#6) * Time management refactoring (#1195) * Appended files. * Compiles. * Compiles again. * Make smart pruning use smoothed nps. * Seems to be fully implemented. * Mistype. * One more bug. * Found discrepancy with documentaiton. * Bugfixes. * Don't smooth nps during the first move. * Too large default for timeuse decay. * Bugfix. * Fix build. * Relax defaults a bit. Add fixed to logging. * Remove "smooth" to "smooth-experimental" for now. * MLH verbose stats - Issue 1200 (#1230) * Add M effect logic to output section * Fix missing prefixes and semicolons * Some fixes. * Slight format improvement? Co-authored-by: Tilps <Tilps@users.noreply.github.com> * Start TempDecay only after a given number of moves (#1212) * Added TempDecayStartMove for starting temp decay only after a given number of moves. This allows keeping initial game up for a few moves and still use decay. * Doesn't allow temperature to fall below endgame temp during temp decay. Still allows initial temp to be below endgame temp. * Doesn't allow temperature to fall below endgame temp during temp decay. Still allows initial temp to be below endgame temp. * Hide temp options * renamed TempDecayStartMove to TempDecayDelayMoves Co-authored-by: Alexis Olson <AlexisOlson@gmail.com> * Changelog for 0.25.0-rc2. (#1233) * Changelog for 0.25.0-rc2. * Add one more PR to the changelog. * Cuda winograd (#1228) * custom winograd convolution for cuda backends * custom winograd fixes - fix a bug to make it work for non-SE networks - enable by default only with fp32. * address review comments * remove random line in comment * remove unused constants - W,H are hardcoded to 8 - because there are assumptions in the code based on that. No point in defining constants. * cuda winograd fixes (#1238) * cuda winograd fixes - don't typecast directly to half datatype in CPU side code as older CUDA runtime doesn't support that. - don't use gemmEx version on GPUs older than Maxwell generation (not supported). - modify the check to enable custom_winograd setting. It should be faster in most cases - except presently on RTX GPUs when using fp16. * Allow most parts of fen to be optional. (#1234) Default to white to move, no castling, no en passant, 0 rule50ply, 1 total move. Also convert other string to std::string and removing using. * Fix UpdateNps to actually smooth the nps and correctly handle time_since_movestart_ms == 0 (#1243) * Update changelog for 0.25.0 final release. (#1244) * Always report at least 1 depth. (#1247) * Fix un-intended regression for GTX GPUs (#1246) * memory optimization for cudnn custom_winograd (#1250) * memory optimization for cudnn custom_winograd - don't save untransformed weights - print warning message when low memory is detected. * address review comments * fix warning message * fix total weight size calculation 2 layers per residual block! * keep pdb files only for release builds (#1256) * doc update (#1267) * Include verbose stats for the node. (#1268) Use printing lambdas for parts of the verbose output to share between the newly outputted node and its children. * add alphazero time manager (#1201) * Updated FLAGS.md with logfile flag (#1275) * Fixed a typo in CONTRIBUTING.md (#1274) * Update Readme about using git (#1265) * Make `wl_` double. (#1280) * Move move filter population to a constructor. (#1281) * Filter out illegal searchmoves to avoid crashing. (#1282) * Clear policy for terminal loss. (#1285) * Allow smart pruning to terminate search if win is known. (#1284) * Allow smart pruning to terminate search if win is known. * Minor tweak, better safe than sorry. * Fix bug where pv might not update for best move change. (#1286) * Fix bug where pv might not update. * Fix... Co-authored-by: Alexander Lyashuk <crem@google.com> Co-authored-by: Tilps <Tilps@users.noreply.github.com> Co-authored-by: Naphthalin <40385638+Naphthalin@users.noreply.github.com> Co-authored-by: Ankan Banerjee <ankan.ban@gmail.com> Co-authored-by: Ed Lee <edilee@mozilla.com> Co-authored-by: borg323 <39573933+borg323@users.noreply.github.com> Co-authored-by: Hace <mellekoning@gmail.com> Co-authored-by: Kip Hamiltons <48076495+KipHamiltons@users.noreply.github.com> Co-authored-by: nguyenpham <axchess@yahoo.com> * Change defaults and unhide MLH options * Update values per @Naphthalin's comments Co-authored-by: Alexander Lyashuk <crem@google.com> Co-authored-by: Tilps <Tilps@users.noreply.github.com> Co-authored-by: Naphthalin <40385638+Naphthalin@users.noreply.github.com> Co-authored-by: Ankan Banerjee <ankan.ban@gmail.com> Co-authored-by: Ed Lee <edilee@mozilla.com> Co-authored-by: borg323 <39573933+borg323@users.noreply.github.com> Co-authored-by: Hace <mellekoning@gmail.com> Co-authored-by: Kip Hamiltons <48076495+KipHamiltons@users.noreply.github.com> Co-authored-by: nguyenpham <axchess@yahoo.com>

Fix un-intended regression for GTX GPUs

4342ca9

ankan-ban changed the title ~~Fix un-intended regression for GTX GPUs~~ Fix un-intended regression in cudnn-fp16 backend running on GTX GPUs Apr 28, 2020

borg323 approved these changes Apr 28, 2020

View reviewed changes

ankan-ban merged commit 938615a into LeelaChessZero:master Apr 29, 2020

AlexisOlson pushed a commit to AlexisOlson/lc0 that referenced this pull request May 10, 2020

Fix un-intended regression for GTX GPUs (LeelaChessZero#1246)

2942289

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix un-intended regression in cudnn-fp16 backend running on GTX GPUs #1246

Fix un-intended regression in cudnn-fp16 backend running on GTX GPUs #1246

ankan-ban commented Apr 28, 2020

Fix un-intended regression in cudnn-fp16 backend running on GTX GPUs #1246

Fix un-intended regression in cudnn-fp16 backend running on GTX GPUs #1246

Conversation

ankan-ban commented Apr 28, 2020