Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3D pooling regression in TF unit tests with MIOpen from ROCm 3.7RC1 #365

Closed
deven-amd opened this issue Jul 30, 2020 · 5 comments
Closed

Comments

@deven-amd
Copy link
Contributor

Some of the TF unit tests are failing with the following error

MIOpen Error: /root/driver/MLOpen/src/ocl/pooling_ocl.cpp:142: 3D pooling doesn't support workspace index mask mode
2020-07-30 17:06:17.649098: E tensorflow/stream_executor/rocm/rocm_dnn.cc:4530] failed to enqueue forward pooling (before backward) on stream: miopenStatusUnknownError

The same tests work fine with older MIOpen versions (from ROCm 3.5).

output with miopen logging enabled seems to point to

...
MIOpen(HIP): miopenStatus_t miopenPoolingForward(miopenHandle_t, const miopenPoolingDescriptor_t, const void *, const miopenTensorDescriptor_t, const void *, const void *, const miopenTensorDescriptor_t, void *, bool, void *, size_t){
MIOpen(HIP): 	handle = stream: 0x3de39c0, device_id: 0
MIOpen(HIP): 	poolDesc = miopenPoolingMax, 1, 1, 1, 0, 0, 0, 1, 1, 1, 
MIOpen(HIP): 	alpha = 0x7fbb33ffd160
MIOpen(HIP): 	xDesc = 1, 1, 3, 5, 4
MIOpen(HIP): 	x = 0x7fbad5a02d00
MIOpen(HIP): 	beta = 0x7fbb33ffd164
MIOpen(HIP): 	yDesc = 1, 1, 3, 5, 4
MIOpen(HIP): 	y = 0x7fbad5a03200
MIOpen(HIP): 	do_backward = 1
MIOpen(HIP): 	workSpace = 0x7fbad5a03100
MIOpen(HIP): 	workSpaceSize = 240
MIOpen(HIP): }
MIOpen(HIP): Command [Pooling_logging_cmd] ./bin/MIOpenDriver pool -d 3 -M 0 -n 1 -c 1 -D 3 -H 5 -W 4 -Z 1 -y 1 -x 1 -o 0 -p 0 -q 0 -s 1 -v 1 -u 1 -m max -F 1 -t 1
MIOpen Error: /root/driver/MLOpen/src/ocl/pooling_ocl.cpp:142: 3D pooling doesn't support workspace index mask mode
2020-07-30 17:06:17.649098: E tensorflow/stream_executor/rocm/rocm_dnn.cc:4530] failed to enqueue forward pooling (before backward) on stream: miopenStatusUnknownError
MIOpen(HIP): miopenStatus_t miopenDestroyPoolingDescriptor(miopenPoolingDescriptor_t){
MIOpen(HIP): 	poolDesc = miopenPoolingMax, 1, 1, 1, 0, 0, 0, 1, 1, 1, 
MIOpen(HIP): }
MIOpen(HIP): miopenStatus_t miopenDestroyTensorDescriptor(miopenTensorDescriptor_t){
MIOpen(HIP): 	tensorDesc = 1, 1, 3, 5, 4
MIOpen(HIP): }
MIOpen(HIP): miopenStatus_t miopenDestroyTensorDescriptor(miopenTensorDescriptor_t){
MIOpen(HIP): 	tensorDesc = 1, 1, 3, 5, 4
MIOpen(HIP): }

...

but when I run that command standalone it works fine

root@ixt-rack-04:/root/tensorflow# /opt/rocm-3.7.0-3289/miopen/bin/MIOpenDriver pool -d 3 -M 0 -n 1 -c 1 -D 3 -H 5 -W 4 -Z 1 -y 1 -x 1 -o 0 -p 0 -q 0 -s 1 -v 1 -u 1 -m max -F 1 -t 1
MIOpenDriver pool -d 3 -M 0 -n 1 -c 1 -D 3 -H 5 -W 4 -Z 1 -y 1 -x 1 -o 0 -p 0 -q 0 -s 1 -v 1 -u 1 -m max -F 1 -t 1
GPU Kernel Time Forward Pooling Elapsed: 0.015852 ms
Forward Pooling Verifies on CPU and GPU
@deven-amd
Copy link
Contributor Author

this is probably unrelated but though I should mention it.

the following MIOpen error seems to pop up in some unit tests

grep: /tmp/miopen-MIOpenPoolingND.cl-c375-eb98-0d54-40dc/MIOpenPoolingND.cl.o.linked.bc.out: No such file or directory                                                                                                                                             

does not seem to lead to any errors, but would be nice if we could prevent it from getting displayed (assuming it is not an error symptom)

@ce1adon
Copy link
Contributor

ce1adon commented Jul 30, 2020

@deven-amd Please check if PR #366 fix the failures.

For the following issue, can you provide more info? In what circumstance does this msg pop out?

this is probably unrelated but though I should mention it.

the following MIOpen error seems to pop up in some unit tests

grep: /tmp/miopen-MIOpenPoolingND.cl-c375-eb98-0d54-40dc/MIOpenPoolingND.cl.o.linked.bc.out: No such file or directory                                                                                                                                             

does not seem to lead to any errors, but would be nice if we could prevent it from getting displayed (assuming it is not an error symptom)

@pfultz2
Copy link
Contributor

pfultz2 commented Jul 30, 2020

the following MIOpen error seems to pop up in some unit tests

This was fixed in clang-ocl here: ROCm/clang-ocl#27

@deven-amd
Copy link
Contributor Author

Applied the same change (as in PR #366) on the TF side, and all the unit test regressions are gone

@daniellowell
Copy link
Contributor

Implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants