Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NHWC] Enable batch norm by refactoring OCL kernel #1244

Merged
merged 13 commits into from
Dec 23, 2021
Merged

[NHWC] Enable batch norm by refactoring OCL kernel #1244

merged 13 commits into from
Dec 23, 2021

Conversation

ce1adon
Copy link
Contributor

@ce1adon ce1adon commented Oct 28, 2021

No description provided.

@codecov

This comment has been minimized.

@junliume junliume added this to the ROCm 5.0 milestone Oct 28, 2021
@junliume junliume changed the title refactor ocl kernels for batchnorm nhwc [NHWC] Enable batch norm by refactoring OCL kernel Oct 28, 2021
@atamazov
Copy link
Contributor

@DrizztDoUrden Please review problem description changes.

Copy link
Contributor

@atamazov atamazov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some cosmetic changes + recommendations

UPDATE: "Cosmetic" changes are actually mandatory.

src/solver/batchnorm/forward_spatial_single.cpp Outdated Show resolved Hide resolved
src/kernels/MIOpenBatchNormBwdSpatial.cl Outdated Show resolved Hide resolved
src/kernels/MIOpenBatchNormBwdSpatial.cl Outdated Show resolved Hide resolved
src/kernels/MIOpenBatchNormBwdSpatial.cl Outdated Show resolved Hide resolved
src/kernels/MIOpenBatchNormBwdSpatial.cl Outdated Show resolved Hide resolved
@atamazov
Copy link
Contributor

@ce1adon What is your plan about testing?

@ce1adon
Copy link
Contributor Author

ce1adon commented Oct 28, 2021

@ce1adon What is your plan about testing?

As mentioned in the meeting, we will collaborate with the clients about the test.

@atamazov
Copy link
Contributor

@ce1adon It seems like I missed that. Let's discuss next Tuesday.

cc @junliume @JehandadKhan

@JehandadKhan
Copy link
Collaborator

@ce1adon I think we need to add at least some tests to exercise this kernel.

@junliume
Copy link
Collaborator

@ce1adon I think we need to add at least some tests to exercise this kernel.

+1
check this comment out from SWDEV-293294-comment

src/kernels/MIOpenBatchNormBwdSpatial.cl Outdated Show resolved Hide resolved
src/kernels/MIOpenBatchNormFwdTrainSpatial.cl Outdated Show resolved Hide resolved
@junliume
Copy link
Collaborator

ping @ce1adon for updates on this PR. Thanks!

@ce1adon
Copy link
Contributor Author

ce1adon commented Nov 12, 2021

ping @ce1adon for updates on this PR. Thanks!

Debugging accuracy issues in ctest.

@junliume junliume modified the milestones: ROCm 5.0, ROCm 5.1 Nov 18, 2021
@junliume junliume dismissed DrizztDoUrden’s stale review December 13, 2021 23:16

Review changes adopted

@junliume junliume dismissed atamazov’s stale review December 13, 2021 23:16

Please re-review :)

atamazov
atamazov previously approved these changes Dec 15, 2021
Copy link
Contributor

@atamazov atamazov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No objections!

src/kernels/MIOpenBatchNormBwdSpatial.cl Outdated Show resolved Hide resolved
atamazov
atamazov previously approved these changes Dec 16, 2021
Copy link
Contributor

@atamazov atamazov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

#endif

#if (MIO_LAYOUT_NHWC != 0) || (MIO_LAYOUT_NHWC != 1)
#error MIO_LAYOUT_NHWC must be 0 or 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Recommendation] Oops, I am afraid we need double quotes around error message.

Suggested change
#error MIO_LAYOUT_NHWC must be 0 or 1
#error "MIO_LAYOUT_NHWC must be 0 or 1"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@atamazov I'm afraid I might have messed up something and trying to correct them:

[2021-12-16T21:18:03.373Z] 13/93 Test  #3: test_bn_3d_spatial_test ...............................***Failed  Error regular expression found in output. Regex=[FAILED]  1.34 sec
[2021-12-16T21:18:03.373Z] Choosing smaller input values for low dims
[2021-12-16T21:18:03.373Z] MIOpen(OpenCL): Error [BuildProgram] Build log: /tmp/comgr-547839/input/CompileSource:446:2: error: "MIO_LAYOUT_NHWC must be 0 or 1"
[2021-12-16T21:18:03.373Z] #error "MIO_LAYOUT_NHWC must be 0 or 1"
[2021-12-16T21:18:03.373Z]  ^
[2021-12-16T21:18:03.373Z] 1 error generated.
[2021-12-16T21:18:03.373Z] Error: Failed to compile opencl source (from CL or HIP source to LLVM IR).

Copy link
Collaborator

@junliume junliume Dec 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe #error macro is not accepted in COMGR or hipRTC?
Nope, I've seen other examples in other .cl files

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OMG we shall use &&, not || 🤕

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@atamazov ah, I'm so distracted and should have figured it out sooner :) I'm going to change it really soon.

@atamazov
Copy link
Contributor

@junliume

Error fix f85cedd

👍 👍 👍

Copy link
Contributor

@atamazov atamazov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@atamazov
Copy link
Contributor

@ce1adon If would be nice if you bless this PR as well ;)

@atamazov
Copy link
Contributor

@junliume The CI failure at run 19 was due to #1351

@junliume junliume merged commit adcef3b into develop Dec 23, 2021
@junliume junliume deleted the bn_nhwc branch April 14, 2022 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants