-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ResNext3D layer selected incorrectly large WSS causing OOM failure #381
Comments
|
|
Perhaps we need to look at these changes: https://github.com/AMDComputeLibraries/MLOpen/pull/2340/files |
@zjing14 Please have a look. |
The multiple of group_count is moved into miopenConvolutionBwdWeightsAlgoGEMM, but not remove the multiple of group_count outside. Will create a fix. |
Awesome thanks guys. Of course this means we have to regenerate the entire find-db.
|
@zjing14 Does this only affect 3D convolutions, or 2D as well? |
Both, probably. |
Actually, only GEMM Backward Data needs to be regenerated, which is ~1/15 of the full regeneration. The process looks like this:
I hoping that Tuna has such a capability. @JehandadKhan |
Yes, we have the ability. |
#381 (comment) updated. |
Implemented. |
ROCm 3.8 Blocking issue.
JIRA tracking issue for ResNext3D:
http://ontrack-internal.amd.com/browse/SWDEV-246350
Issue is in default hybrid mode Caffe2 model is crashing with OOM. However, in normal find mode it passes.
Analysis:
Deleting the user Find-Db and running in default hybrid mode also fails with OOM.
MIOPEN_LOG_LEVEL=6 for default hybrid mode shows 1.3GB is requested by layer:
Find-Db is populated after normal run with clean ufdb with
the correct 5MB sizethe incorrect size (1.3GB).Driver command to reproduce the issue:
The text was updated successfully, but these errors were encountered: