-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NCHW][SWDEV-312112] fix local buffer alignment bug on transpose+nhwc kernel's solver #1324
Conversation
This comment has been minimized.
This comment has been minimized.
I can confirm that the specific command as listed in [SWDEV-312112] is fixed by this PR. Meanwhile, we still get RuntimeError: test_nn.py failed! FAILED (failures=13, skipped=78, expected failures=4) |
This seems Ok as a hotfix. However we should hide details in some suitable abstraction instead hacky code here and there. In this specific case, I can propose some class that can represents metrics of a workspace that logically consists of two sub-buffers, each with its own size and alignment. Very basic example: WorkspaceBuf2Traits wt(size_0, size_1, alignment_1);
...
wt.GetSize(); // total workspace size
wt.GetOffset1(); // offset of subbuffer 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conditionally approved.
A complicated example of handling sub-buffers can be found in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Post merge issue has been raised: #1326
@shaojiewang I assumed that #1324 would resolve #1317 and tried it. Unfortunately this is not so. It may worth double-checking #1324 -- maybe some of the fixes are still missing (and HIP is tolerant to these, but OCL doesn't)? |
Yes, I'll check it ASAP. |
AFAICS, OCL backend requires a larger alignment(128bytes). I'll prepare another PR to solve both #1317 and #1326 |
Good! Just in case -- the OCL problem is not of blocker priority anymore, so we have time)) |
The find-db NCHW records for
must be refreshed. |
root cause fix for swdev312112:
[nchw]fix local buffer alignment bug on transpose+nhwc kernel's solver.
added by carlus:
Above is what we are implementing in fwd when input is NCHW but use NHWC kernel with transpose. Basically there should be 4 buffers in workspace, where the first 3 buffers are for transpose the original inp/wei/out buffer. the last one is for some kernel that need use atomic fp32 to accumulate the result, then cast back to fp16. We always allocate a single workspace, and keep an offset inside each of the buffer.
So the problem here is the offset for 4# may not be aligned to multiple of 4 byte, since the previous 3 buffers are fp16. If 4# buffer are not aligned to 4 byte, fp32 atomic will crash.
This PR tries to solve this problem by adding some extra byte to make sure 4# offset is aligned to multiple of 4.