Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU Segfault on 3D Int8 Convolution #2593

Open
JehandadKhan opened this issue Dec 11, 2023 · 3 comments
Open

GPU Segfault on 3D Int8 Convolution #2593

JehandadKhan opened this issue Dec 11, 2023 · 3 comments

Comments

@JehandadKhan
Copy link
Collaborator

When running the convolution described by the following driver command on MI200 platform

MIOpenDriver convint8 -n 1 -c 64 --in_d 128 -H 128 -W 128 -k 32 --fil_d 3 -y 3 -x 3 --pad_d 1 -p 1 -q 1 --conv_stride_d 1 -u 1 -v 1 --dilation_d 1 -l 1 -j 1 --spatial_dim 3 -m conv -g 1 -F 1 -t 1 -S 0

Running it results in the following:

Memory access fault by GPU node-6 (Agent handle: 0x1190f40) on address 0x7f3692c9e000. Reason: Unknown.
@JehandadKhan JehandadKhan self-assigned this Dec 11, 2023
@JehandadKhan
Copy link
Collaborator Author

@atamazov Can you please take a look?

@atamazov
Copy link
Contributor

atamazov commented Dec 11, 2023

@JehandadKhan The reason is integer overflow. Navi21:

MIOpen(HIP): Info [ConvolutionForwardImmediate] solver_id = GemmFwdRest, workspace = 7247757312
... 
MIOpen(HIP): Info2 [Log] Kernel MIOpenUtilKernels4.cl Compile Time, ms: 155.193
MIOpen(HIP): Info2 [run] kernel_name = transpose_packed_MN2NM, global_work_dim = { 3623878656, 1, 1 }, local_work_dim = { 256, 1, 1 }

The solver calls

float transpose_packed_MN2NM(const Handle& handle,
                             int m,
                             int n,
                             int in_offset,
                             int out_offset,
                             ConstData_t in,
                             Data_t out,
                             miopenDataType_t type);

and passes 3623878656 to out_offset which exceeds INT_MAX.

The fix could be either narrowing the solver's applicability or fixing the math in transpose_packed_MN2NM(); the latter is preferable of course.

How urgent is this issue (please assign a label)?


Please note that:

  • WS size > 4 GiB
  • grid size (3,623,878,656) and that is not far from the HIP limit (4,294,967,295).
  • I am not sure if rocBLAS is able to handle matrices > 4GiB, but this is another story.
  • ConvDirectNaiveConvFwd is working fine

@atamazov
Copy link
Contributor

@JehandadKhan Similar problem happens in CallGemm()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants