GPU Segfault on 3D Int8 Convolution #2593

JehandadKhan · 2023-12-11T19:39:48Z

When running the convolution described by the following driver command on MI200 platform

MIOpenDriver convint8 -n 1 -c 64 --in_d 128 -H 128 -W 128 -k 32 --fil_d 3 -y 3 -x 3 --pad_d 1 -p 1 -q 1 --conv_stride_d 1 -u 1 -v 1 --dilation_d 1 -l 1 -j 1 --spatial_dim 3 -m conv -g 1 -F 1 -t 1 -S 0

Running it results in the following:

Memory access fault by GPU node-6 (Agent handle: 0x1190f40) on address 0x7f3692c9e000. Reason: Unknown.

The text was updated successfully, but these errors were encountered:

JehandadKhan · 2023-12-11T19:40:09Z

@atamazov Can you please take a look?

atamazov · 2023-12-11T22:41:28Z

@JehandadKhan The reason is integer overflow. Navi21:

MIOpen(HIP): Info [ConvolutionForwardImmediate] solver_id = GemmFwdRest, workspace = 7247757312
... 
MIOpen(HIP): Info2 [Log] Kernel MIOpenUtilKernels4.cl Compile Time, ms: 155.193
MIOpen(HIP): Info2 [run] kernel_name = transpose_packed_MN2NM, global_work_dim = { 3623878656, 1, 1 }, local_work_dim = { 256, 1, 1 }

The solver calls

float transpose_packed_MN2NM(const Handle& handle,
                             int m,
                             int n,
                             int in_offset,
                             int out_offset,
                             ConstData_t in,
                             Data_t out,
                             miopenDataType_t type);

and passes 3623878656 to out_offset which exceeds INT_MAX.

The fix could be either narrowing the solver's applicability or fixing the math in transpose_packed_MN2NM(); the latter is preferable of course.

How urgent is this issue (please assign a label)?

Please note that:

WS size > 4 GiB
grid size (3,623,878,656) and that is not far from the HIP limit (4,294,967,295).
I am not sure if rocBLAS is able to handle matrices > 4GiB, but this is another story.
ConvDirectNaiveConvFwd is working fine

atamazov · 2023-12-15T22:59:35Z

@JehandadKhan Similar problem happens in CallGemm()

JehandadKhan self-assigned this Dec 11, 2023

JehandadKhan added complexity_middle urgency_high labels Dec 13, 2023

atamazov mentioned this issue Dec 15, 2023

Get rid of legacy 2GiB offset limits in CallGemm*() and transpose*() internal APIs and kernels. #2613

Merged

CAHEK7 mentioned this issue Dec 20, 2023

Argmax enhancement in case of inner dim reduce #2583

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU Segfault on 3D Int8 Convolution #2593

GPU Segfault on 3D Int8 Convolution #2593

JehandadKhan commented Dec 11, 2023

JehandadKhan commented Dec 11, 2023

atamazov commented Dec 11, 2023 •

edited

Loading

atamazov commented Dec 15, 2023

GPU Segfault on 3D Int8 Convolution #2593

GPU Segfault on 3D Int8 Convolution #2593

Comments

JehandadKhan commented Dec 11, 2023

JehandadKhan commented Dec 11, 2023

atamazov commented Dec 11, 2023 • edited Loading

atamazov commented Dec 15, 2023

atamazov commented Dec 11, 2023 •

edited

Loading