[Codegen][ROCm] Mismatched Dtype of Workgroup/Workitem #15777

junrushao · 2023-09-18T23:57:46Z

This PR fixes a ROCm codegen error that the dtype of @llvm.amdgcn.workgroup.id* and @llvm.amdgcn.workitem.id.* are always i32 when generating LLVM IR, even if it's marked as T.int64 in TIR.

An example that triggers this issue:

@T.prim_func
def encode_kernel(A: T.handle("float16", "global"), max_abs_value: T.handle("float16", "global"), v: T.int64):
    T.func_attr({"calling_conv": 2, "target": T.target({"host": {"keys": ["cpu"], "kind": "llvm", "tag": ""}, "keys": ["rocm", "gpu"], "kind": "rocm", "max_num_threads": 256, "max_shared_memory_per_block": 65536, "max_threads_per_block": 1024, "mcpu": "gfx1100", "mtriple": "amdgcn-amd-amdhsa-hcc", "tag": "", "thread_warp_size": 32}), "tir.is_global_func": T.bool(True), "tir.kernel_launch_params": ["blockIdx.x", "threadIdx.x"], "tir.noalias": T.bool(True)})
    A_1 = T.decl_buffer((v * T.int64(8192),), "float16", data=A)
    max_abs_value_1 = T.decl_buffer((T.min(v, (v * T.int64(256) + T.int64(65535)) // T.int64(65536) * T.int64(256)) * T.int64(256),), "float16", data=max_abs_value)
    blockIdx_x = T.launch_thread("blockIdx.x", T.int64(256))
    threadIdx_x = T.launch_thread("threadIdx.x", T.int64(256))
    for i_j_fused_0, k in T.grid(T.shift_right(v + T.int64(255), T.int64(8)), T.int64(32)):
        if i_j_fused_0 * T.int64(256) + blockIdx_x - v < T.int64(0):
            if k == T.int64(0):
                max_abs_value_1[i_j_fused_0 * T.int64(65536) + blockIdx_x * T.int64(256) + threadIdx_x] = T.float16(-65504)
            max_abs_value_1[i_j_fused_0 * T.int64(65536) + blockIdx_x * T.int64(256) + threadIdx_x] = T.max(max_abs_value_1[i_j_fused_0 * T.int64(65536) + blockIdx_x * T.int64(256) + threadIdx_x], T.call_pure_extern("float16", "__ocml_fabs_f16", A_1[i_j_fused_0 * T.int64(2097152) + blockIdx_x * T.int64(8192) + threadIdx_x * T.int64(32) + k]))

This PR fixes a ROCm codegen error that the dtype of `@llvm.amdgcn.workgroup.id*` and `@llvm.amdgcn.workitem.id.*` are always i32 when generating LLVM IR, even if it's marked as T.int64 in TIR. An example that triggers this issue: ```python @T.prim_func def encode_kernel(A: T.handle("float16", "global"), max_abs_value: T.handle("float16", "global"), v: T.int64): T.func_attr({"calling_conv": 2, "target": T.target({"host": {"keys": ["cpu"], "kind": "llvm", "tag": ""}, "keys": ["rocm", "gpu"], "kind": "rocm", "max_num_threads": 256, "max_shared_memory_per_block": 65536, "max_threads_per_block": 1024, "mcpu": "gfx1100", "mtriple": "amdgcn-amd-amdhsa-hcc", "tag": "", "thread_warp_size": 32}), "tir.is_global_func": T.bool(True), "tir.kernel_launch_params": ["blockIdx.x", "threadIdx.x"], "tir.noalias": T.bool(True)}) A_1 = T.decl_buffer((v * T.int64(8192),), "float16", data=A) max_abs_value_1 = T.decl_buffer((T.min(v, (v * T.int64(256) + T.int64(65535)) // T.int64(65536) * T.int64(256)) * T.int64(256),), "float16", data=max_abs_value) blockIdx_x = T.launch_thread("blockIdx.x", T.int64(256)) threadIdx_x = T.launch_thread("threadIdx.x", T.int64(256)) for i_j_fused_0, k in T.grid(T.shift_right(v + T.int64(255), T.int64(8)), T.int64(32)): if i_j_fused_0 * T.int64(256) + blockIdx_x - v < T.int64(0): if k == T.int64(0): max_abs_value_1[i_j_fused_0 * T.int64(65536) + blockIdx_x * T.int64(256) + threadIdx_x] = T.float16(-65504) max_abs_value_1[i_j_fused_0 * T.int64(65536) + blockIdx_x * T.int64(256) + threadIdx_x] = T.max(max_abs_value_1[i_j_fused_0 * T.int64(65536) + blockIdx_x * T.int64(256) + threadIdx_x], T.call_pure_extern("float16", "__ocml_fabs_f16", A_1[i_j_fused_0 * T.int64(2097152) + blockIdx_x * T.int64(8192) + threadIdx_x * T.int64(32) + k])) ```

junrushao marked this pull request as ready for review September 18, 2023 23:57

jinhongyii approved these changes Sep 19, 2023

View reviewed changes

junrushao merged commit 9613385 into apache:main Sep 20, 2023

ysh329 mentioned this pull request Oct 18, 2023

[Release] v0.14.0 Release Candidate Notes #15948

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Codegen][ROCm] Mismatched Dtype of Workgroup/Workitem #15777

[Codegen][ROCm] Mismatched Dtype of Workgroup/Workitem #15777

junrushao commented Sep 18, 2023

[Codegen][ROCm] Mismatched Dtype of Workgroup/Workitem #15777

[Codegen][ROCm] Mismatched Dtype of Workgroup/Workitem #15777

Conversation

junrushao commented Sep 18, 2023