Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue]: enabling amdgpu-unsafe-fp-atomics for gfx90a #47

Closed
pxl-th opened this issue Mar 6, 2024 · 5 comments
Closed

[Issue]: enabling amdgpu-unsafe-fp-atomics for gfx90a #47

pxl-th opened this issue Mar 6, 2024 · 5 comments

Comments

@pxl-th
Copy link

pxl-th commented Mar 6, 2024

Problem Description

Hi! I'm one of the developers of AMDGPU.jl library that provides support for AMD GPU programming in Julia.
To perform compilation of Julia GPU kernels we omit HIP and use LLVM directly, hence -munsafe-fp-atomics is not available.

To enable HW atomics, we instead add amdgpu-unsafe-fp-atomics=true function attribute to our LLVM IR during compilation.
This works fine with gfx1100 devices, replacing CAS loop with HW fadd.

However, for gfx90a devices this does nothing.
I was wondering if I'm missing something else that needs to be done?

Here's an example Julia kernel, which does atomic fadd on the first array item:

@kernel function ker!(x)
    @inbounds @atomic x[1] += 1f0
end

Here's its optimized LLVM IR with atomicrmw fadd float which is the same for gfx1100 and gfx90a (notice amdgpu-unsafe-fp-atomics attribute):

click
; ModuleID = 'start'
source_filename = "start"
target datalayout = "e-p:64:64-p1:64:64-p2:32:32-p3:32:32-p4:64:64-p5:32:32-p6:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024-v2048:2048-n32:64-S32-A5-G1-ni:10:11:12:13"
target triple = "amdgcn-amd-amdhsa"

; Function Attrs: nounwind readnone speculatable willreturn
declare i32 @llvm.amdgcn.workgroup.id.x() #0

; Function Attrs: nounwind readnone speculatable willreturn
declare i32 @llvm.amdgcn.workitem.id.x() #0

; Function Attrs: cold noreturn nounwind
declare void @llvm.amdgcn.endpgm() #1

;  @ none within `gpu_ker!`
define amdgpu_kernel void @_Z8gpu_ker_16CompilerMetadataI11DynamicSize12DynamicCheckv16CartesianIndicesILi1E5TupleI5OneToI5Int64EEE7NDRangeILi1ES0_S0_S2_ILi1ES3_IS4_IS5_EEES2_ILi1ES3_IS4_IS5_EEEEE14ROCDeviceArrayI7Float32Li1ELi1EE({ i64, i64, i64, i64, i64, i64, i32, i32, i64, i64, i64, i64 } %state, { [1 x [1 x [1 x i64]]], [2 x [1 x [1 x [1 x i64]]]] } %0, { [1 x i64], i8 addrspace(1)*, i64 } %1) local_unnamed_addr #2 !dbg !41 {
conversion:
  %.fca.0.0.0.0.extract = extractvalue { [1 x [1 x [1 x i64]]], [2 x [1 x [1 x [1 x i64]]]] } %0, 0, 0, 0, 0
  %.fca.1.1.0.0.0.extract = extractvalue { [1 x [1 x [1 x i64]]], [2 x [1 x [1 x [1 x i64]]]] } %0, 1, 1, 0, 0, 0
  %.fca.1.extract = extractvalue { [1 x i64], i8 addrspace(1)*, i64 } %1, 1
;  @ none within `gpu_ker!` @ /home/pxl-th/.julia/packages/KernelAbstractions/Zcyra/src/macros.jl:94
; ┌ @ /home/pxl-th/.julia/dev/AMDGPU/src/ROCKernels.jl:144 within `#__validindex`
; │┌ @ /home/pxl-th/.julia/dev/AMDGPU/src/device/gcn/indexing.jl:172 within `blockIdx`
; ││┌ @ /home/pxl-th/.julia/dev/AMDGPU/src/device/gcn/indexing.jl:95 within `blockIdx_x`
; │││┌ @ /home/pxl-th/.julia/dev/AMDGPU/src/device/gcn/indexing.jl:93 within `workgroupIdx_x`
; ││││┌ @ /home/pxl-th/.julia/dev/AMDGPU/src/device/gcn/indexing.jl:3 within `_index`
; │││││┌ @ /home/pxl-th/.julia/dev/AMDGPU/src/device/gcn/indexing.jl:3 within `macro expansion` @ /home/pxl-th/.julia/packages/LLVM/Q3CgR/src/interop/base.jl:38
        %2 = call i32 @llvm.amdgcn.workgroup.id.x(), !dbg !45, !range !66
; │└└└└└
; │┌ @ /home/pxl-th/.julia/dev/AMDGPU/src/device/gcn/indexing.jl:164 within `threadIdx`
; ││┌ @ /home/pxl-th/.julia/dev/AMDGPU/src/device/gcn/indexing.jl:89 within `threadIdx_x`
; │││┌ @ /home/pxl-th/.julia/dev/AMDGPU/src/device/gcn/indexing.jl:87 within `workitemIdx_x`
; ││││┌ @ /home/pxl-th/.julia/dev/AMDGPU/src/device/gcn/indexing.jl:3 within `_index`
; │││││┌ @ /home/pxl-th/.julia/dev/AMDGPU/src/device/gcn/indexing.jl:3 within `macro expansion` @ /home/pxl-th/.julia/packages/LLVM/Q3CgR/src/interop/base.jl:38
        %3 = call i32 @llvm.amdgcn.workitem.id.x(), !dbg !67, !range !76
; ││││└└
; ││││┌ @ int.jl:1068 within `+` @ int.jl:87
       %4 = add nuw nsw i32 %3, 1, !dbg !77
; │└└└└
; │┌ @ /home/pxl-th/.julia/packages/KernelAbstractions/Zcyra/src/nditeration.jl:84 within `expand`
; ││┌ @ abstractarray.jl:1291 within `getindex`
; │││┌ @ indices.jl:350 within `to_indices` @ indices.jl:354
; ││││┌ @ indices.jl:359 within `_to_indices1`
; │││││┌ @ indices.jl:277 within `to_index` @ indices.jl:292
; ││││││┌ @ number.jl:7 within `convert`
; │││││││┌ @ boot.jl:784 within `Int64`
; ││││││││┌ @ boot.jl:708 within `toInt64`
           %5 = zext i32 %4 to i64, !dbg !81
; ││└└└└└└└
; ││ @ /home/pxl-th/.julia/packages/KernelAbstractions/Zcyra/src/nditeration.jl:84 within `expand` @ /home/pxl-th/.julia/packages/KernelAbstractions/Zcyra/src/nditeration.jl:74
; ││┌ @ ntuple.jl:48 within `ntuple`
; │││┌ @ /home/pxl-th/.julia/packages/KernelAbstractions/Zcyra/src/nditeration.jl:78 within `#1`
; ││││┌ @ int.jl:86 within `-`
       %6 = zext i32 %2 to i64, !dbg !104
; ││││└
; ││││┌ @ int.jl:88 within `*`
       %7 = mul i64 %.fca.1.1.0.0.0.extract, %6, !dbg !112
; ││││└
; ││││┌ @ int.jl:87 within `+`
       %8 = add i64 %7, %5, !dbg !114
; │└└└└
; │ @ /home/pxl-th/.julia/dev/AMDGPU/src/ROCKernels.jl:145 within `#__validindex`
; │┌ @ multidimensional.jl:471 within `in`
; ││┌ @ tuple.jl:318 within `map`
; │││┌ @ range.jl:1439 within `in`
; ││││┌ @ int.jl:514 within `<=`
       %9 = icmp slt i64 %8, 1, !dbg !115
       %10 = icmp sgt i64 %8, %.fca.0.0.0.0.extract, !dbg !115
; └└└└└
  %11 = or i1 %9, %10, !dbg !62
  br i1 %11, label %L128, label %L104, !dbg !62

L104:                                             ; preds = %conversion
  %.fca.0.0.extract = extractvalue { [1 x i64], i8 addrspace(1)*, i64 } %1, 0, 0
;  @ none within `gpu_ker!` @ /home/pxl-th/.julia/packages/KernelAbstractions/Zcyra/src/macros.jl:95
; ┌ @ /home/pxl-th/.julia/dev/atomic.jl:6 within `macro expansion`
; │┌ @ /home/pxl-th/.julia/packages/Atomix/F9VIX/src/references.jl:95 within `getindex`
; ││┌ @ abstractarray.jl:702 within `checkbounds` @ abstractarray.jl:687
; │││┌ @ abstractarray.jl:763 within `checkindex`
; ││││┌ @ int.jl:513 within `<`
       %.not = icmp slt i64 %.fca.0.0.extract, 1, !dbg !127
; │││└└
; │││ @ abstractarray.jl:702 within `checkbounds`
     br i1 %.not, label %L115, label %L119, !dbg !133

L115:                                             ; preds = %L104
; │││┌ @ /home/pxl-th/.julia/dev/AMDGPU/src/device/quirks.jl:8 within `#throw_boundserror`
; ││││┌ @ /home/pxl-th/.julia/dev/AMDGPU/src/device/runtime.jl:113 within `signal_exception`
; │││││┌ @ /home/pxl-th/.julia/dev/AMDGPU/src/device/runtime.jl:11 within `exception_flag`
; ││││││┌ @ none within `kernel_state`
; │││││││┌ @ none within `macro expansion` @ /home/pxl-th/.julia/packages/LLVM/Q3CgR/src/interop/base.jl:38
          %state.i.fca.0.extract.i = extractvalue { i64, i64, i64, i64, i64, i64, i32, i32, i64, i64, i64, i64 } %state, 0, !dbg !141
; │││││└└└
; │││││┌ @ pointer.jl:146 within `unsafe_store!` @ pointer.jl:146
        %memcpy_refined_dst.i = inttoptr i64 %state.i.fca.0.extract.i to i32*, !dbg !156
        store i32 1, i32* %memcpy_refined_dst.i, align 1, !dbg !156
; │││││└
; │││││ @ /home/pxl-th/.julia/dev/AMDGPU/src/device/runtime.jl:115 within `signal_exception`
; │││││┌ @ /home/pxl-th/.julia/dev/AMDGPU/src/device/gcn/execution_control.jl:52 within `endpgm`
        call void @llvm.amdgcn.endpgm(), !dbg !160
; │││││└
; │││││ @ /home/pxl-th/.julia/dev/AMDGPU/src/device/runtime.jl:116 within `signal_exception`
       unreachable, !dbg !164

L119:                                             ; preds = %L104
; │└└└└
; │┌ @ /home/pxl-th/.julia/packages/Atomix/F9VIX/src/core.jl:33 within `modify!` @ /home/pxl-th/.julia/packages/UnsafeAtomicsLLVM/6HZfV/src/internal.jl:20
; ││┌ @ /home/pxl-th/.julia/packages/UnsafeAtomicsLLVM/6HZfV/src/atomics.jl:359 within `atomic_pointermodify`
; │││┌ @ /home/pxl-th/.julia/packages/UnsafeAtomicsLLVM/6HZfV/src/atomics.jl:255 within `llvm_atomic_op`
; ││││┌ @ /home/pxl-th/.julia/packages/UnsafeAtomicsLLVM/6HZfV/src/atomics.jl:255 within `macro expansion` @ /home/pxl-th/.julia/packages/LLVM/Q3CgR/src/interop/base.jl:38
       %12 = bitcast i8 addrspace(1)* %.fca.1.extract to float addrspace(1)*, !dbg !165
       %13 = atomicrmw fadd float addrspace(1)* %12, float 1.000000e+00 seq_cst, align 4, !dbg !165
; ││└└└
; ││ @ /home/pxl-th/.julia/packages/Atomix/F9VIX/src/core.jl:33 within `modify!`
    br label %L128, !dbg !176

L128:                                             ; preds = %L119, %conversion
; └└
;  @ none within `gpu_ker!` @ /home/pxl-th/.julia/packages/KernelAbstractions/Zcyra/src/macros.jl:97
  ret void, !dbg !179
}

attributes #0 = { nounwind readnone speculatable willreturn "amdgpu-unsafe-fp-atomics"="true"}
attributes #1 = { cold noreturn nounwind }
attributes #2 = { "amdgpu-unsafe-fp-atomics"="true"}

And here's the assembly output for gfx1100, notice global_atomic_add_f32 present:

click
	.text
	.amdgcn_target "amdgcn-amd-amdhsa--gfx1100"
	.globl	_Z8gpu_ker_16CompilerMetadataI11DynamicSize12DynamicCheckv16CartesianIndicesILi1E5TupleI5OneToI5Int64EEE7NDRangeILi1ES0_S0_S2_ILi1ES3_IS4_IS5_EEES2_ILi1ES3_IS4_IS5_EEEEE14ROCDeviceArrayI7Float32Li1ELi1EE ; -- Begin function _Z8gpu_ker_16CompilerMetadataI11DynamicSize12DynamicCheckv16CartesianIndicesILi1E5TupleI5OneToI5Int64EEE7NDRangeILi1ES0_S0_S2_ILi1ES3_IS4_IS5_EEES2_ILi1ES3_IS4_IS5_EEEEE14ROCDeviceArrayI7Float32Li1ELi1EE
	.p2align	8
	.type	_Z8gpu_ker_16CompilerMetadataI11DynamicSize12DynamicCheckv16CartesianIndicesILi1E5TupleI5OneToI5Int64EEE7NDRangeILi1ES0_S0_S2_ILi1ES3_IS4_IS5_EEES2_ILi1ES3_IS4_IS5_EEEEE14ROCDeviceArrayI7Float32Li1ELi1EE,@function
_Z8gpu_ker_16CompilerMetadataI11DynamicSize12DynamicCheckv16CartesianIndicesILi1E5TupleI5OneToI5Int64EEE7NDRangeILi1ES0_S0_S2_ILi1ES3_IS4_IS5_EEES2_ILi1ES3_IS4_IS5_EEEEE14ROCDeviceArrayI7Float32Li1ELi1EE: ; @_Z8gpu_ker_16CompilerMetadataI11DynamicSize12DynamicCheckv16CartesianIndicesILi1E5TupleI5OneToI5Int64EEE7NDRangeILi1ES0_S0_S2_ILi1ES3_IS4_IS5_EEES2_ILi1ES3_IS4_IS5_EEEEE14ROCDeviceArrayI7Float32Li1ELi1EE
.Lfunc_begin0:
	.file	1 "." "none"
	.loc	1 0 0                           ; none:0:0
	.cfi_sections .debug_frame
	.cfi_startproc
; %bb.0:                                ; %conversion
	s_clause 0x1
	s_load_b64 s[2:3], s[0:1], 0x68
	s_load_b64 s[4:5], s[0:1], 0x58
.Ltmp0:
	.file	2 "." "boot.jl"
	.loc	2 708 0 prologue_end            ; boot.jl:708:0
	v_dual_mov_b32 v1, 0 :: v_dual_add_nc_u32 v0, 1, v0
.Ltmp1:
	.file	3 "." "int.jl"
	.loc	3 87 0                          ; int.jl:87:0
	s_waitcnt lgkmcnt(0)
	s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
	v_mad_u64_u32 v[2:3], null, s2, s15, v[0:1]
	v_mov_b32_e32 v0, v3
	s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
	v_mad_u64_u32 v[3:4], null, s3, s15, v[0:1]
.Ltmp2:
	.loc	3 514 0                         ; int.jl:514:0
	v_cmp_lt_i64_e32 vcc_lo, 0, v[2:3]
	v_cmp_ge_i64_e64 s2, s[4:5], v[2:3]
	s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(SALU_CYCLE_1)
	s_and_b32 s2, vcc_lo, s2
	s_and_saveexec_b32 s3, s2
	s_cbranch_execz .LBB0_4
.Ltmp3:
; %bb.1:                                ; %L104
	.loc	3 0 0 is_stmt 0                 ; int.jl:0:0
	s_load_b64 s[2:3], s[0:1], 0x70
.Ltmp4:
	.loc	3 513 0 is_stmt 1               ; int.jl:513:0
	s_waitcnt lgkmcnt(0)
	v_cmp_gt_i64_e64 s2, s[2:3], 0
	s_delay_alu instid0(VALU_DEP_1)
	s_and_b32 vcc_lo, exec_lo, s2
	s_mov_b32 s2, -1
	s_cbranch_vccz .LBB0_3
.Ltmp5:
; %bb.2:                                ; %L119
	.loc	3 0 0 is_stmt 0                 ; int.jl:0:0
	s_load_b64 s[2:3], s[0:1], 0x78
	v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 1.0
.Ltmp6:
	.file	4 "." "/home/pxl-th/.julia/packages/LLVM/Q3CgR/src/interop/base.jl"
	.loc	4 38 0 is_stmt 1                ; /home/pxl-th/.julia/packages/LLVM/Q3CgR/src/interop/base.jl:38:0
	s_waitcnt vmcnt(0) lgkmcnt(0)
	s_waitcnt_vscnt null, 0x0
	global_atomic_add_f32 v0, v1, s[2:3]
	s_waitcnt_vscnt null, 0x0
	buffer_gl0_inv
	buffer_gl1_inv
	s_mov_b32 s2, 0
.Ltmp7:
.LBB0_3:                                ; %Flow
	.loc	4 0 0 is_stmt 0                 ; /home/pxl-th/.julia/packages/LLVM/Q3CgR/src/interop/base.jl:0:0
	s_delay_alu instid0(SALU_CYCLE_1)
	s_and_not1_b32 vcc_lo, exec_lo, s2
	s_cbranch_vccz .LBB0_5
.LBB0_4:                                ; %UnifiedReturnBlock
	s_endpgm
.LBB0_5:                                ; %L115
	s_load_b64 s[0:1], s[0:1], 0x0
	v_dual_mov_b32 v2, 0 :: v_dual_mov_b32 v3, 1
	s_waitcnt lgkmcnt(0)
	v_dual_mov_b32 v0, s0 :: v_dual_mov_b32 v1, s1
.Ltmp8:
	.file	5 "." "pointer.jl"
	.loc	5 146 0 is_stmt 1               ; pointer.jl:146:0
	s_clause 0x3
	flat_store_b8 v[0:1], v2 offset:3
	flat_store_b8 v[0:1], v2 offset:2
	flat_store_b8 v[0:1], v2 offset:1
	flat_store_b8 v[0:1], v3
.Ltmp9:
	.file	6 "." "/home/pxl-th/.julia/dev/AMDGPU/src/device/gcn/execution_control.jl"
	.loc	6 52 0                          ; /home/pxl-th/.julia/dev/AMDGPU/src/device/gcn/execution_control.jl:52:0
	s_endpgm
	; divergent unreachable
	s_sendmsg sendmsg(MSG_DEALLOC_VGPRS)
	s_endpgm

And here's assembly for gfx90a, notice regular global_atomic_cmpswap:

click
       .text
        .amdgcn_target "amdgcn-amd-amdhsa--gfx90a:sramecc+"
        .globl  _Z8gpu_ker_16CompilerMetadataI11DynamicSize12DynamicCheckv16CartesianIndicesILi1E5TupleI5OneToI5Int64EEE7NDRangeILi1ES0_S0_S2_ILi1ES3_IS4_IS5_EEES2_ILi1ES3_IS4_IS5_EEEEE14ROCDeviceArrayI7Float32Li1ELi1EE ; -- Begin function _Z8gpu_ker_16CompilerMetadataI11DynamicSize12DynamicCheckv16CartesianIndicesILi1E5TupleI5OneToI5Int64EEE7NDRangeILi1ES0_S0_S2_ILi1ES3_IS4_IS5_EEES2_ILi1ES3_IS4_IS5_EEEEE14ROCDeviceArrayI7Float32Li1ELi1EE
        .p2align        8
        .type   _Z8gpu_ker_16CompilerMetadataI11DynamicSize12DynamicCheckv16CartesianIndicesILi1E5TupleI5OneToI5Int64EEE7NDRangeILi1ES0_S0_S2_ILi1ES3_IS4_IS5_EEES2_ILi1ES3_IS4_IS5_EEEEE14ROCDeviceArrayI7Float32Li1ELi1EE,@function
_Z8gpu_ker_16CompilerMetadataI11DynamicSize12DynamicCheckv16CartesianIndicesILi1E5TupleI5OneToI5Int64EEE7NDRangeILi1ES0_S0_S2_ILi1ES3_IS4_IS5_EEES2_ILi1ES3_IS4_IS5_EEEEE14ROCDeviceArrayI7Float32Li1ELi1EE: ; @_Z8gpu_ker_16CompilerMetadataI11DynamicSize12DynamicCheckv16CartesianIndicesILi1E5TupleI5OneToI5Int64EEE7NDRangeILi1ES0_S0_S2_ILi1ES3_IS4_IS5_EEES2_ILi1ES3_IS4_IS5_EEEEE14ROCDeviceArrayI7Float32Li1ELi1EE
.Lfunc_begin0:
        .file   1 "." "none"
        .loc    1 0 0                           ; none:0:0
        .cfi_sections .debug_frame
        .cfi_startproc
; %bb.0:                                ; %conversion
        s_load_dwordx2 s[0:1], s[4:5], 0x68
        s_load_dwordx2 s[2:3], s[4:5], 0x58
.Ltmp0:
        .file   2 "." "int.jl"
        .loc    2 87 0 prologue_end             ; int.jl:87:0
        v_add_u32_e32 v0, 1, v0
.Ltmp1:
        .file   3 "." "boot.jl"
        .loc    3 708 0                         ; boot.jl:708:0
        v_mov_b32_e32 v1, 0
.Ltmp2:
        .loc    2 87 0                          ; int.jl:87:0
        v_mov_b32_e32 v2, s6
        s_waitcnt lgkmcnt(0)
        s_mul_i32 s7, s1, s6
        v_mad_u64_u32 v[0:1], s[0:1], s0, v2, v[0:1]
        v_add_u32_e32 v1, s7, v1
.Ltmp3:
        .loc    2 514 0                         ; int.jl:514:0
        v_cmp_lt_i64_e32 vcc, 0, v[0:1]
        v_cmp_ge_i64_e64 s[0:1], s[2:3], v[0:1]
        s_and_b64 s[0:1], vcc, s[0:1]
        s_and_saveexec_b64 s[2:3], s[0:1]
        s_cbranch_execz .LBB0_6
.Ltmp4:
; %bb.1:                                ; %L104
        .loc    2 0 0 is_stmt 0                 ; int.jl:0:0
        s_load_dwordx2 s[2:3], s[4:5], 0x70
        s_mov_b64 s[0:1], 0
.Ltmp5:
        .loc    2 513 0 is_stmt 1               ; int.jl:513:0
        s_waitcnt lgkmcnt(0)
        v_cmp_gt_i64_e64 s[6:7], s[2:3], 0
        s_mov_b64 s[2:3], -1
        s_and_b64 vcc, exec, s[6:7]
        s_cbranch_vccz .LBB0_5
.Ltmp6:
; %bb.2:                                ; %L119
        .loc    2 0 0 is_stmt 0                 ; int.jl:0:0
        s_load_dwordx2 s[2:3], s[4:5], 0x78
        v_mov_b32_e32 v2, 0
.Ltmp7:
        .file   4 "." "/users/antonsmi/.julia/packages/LLVM/Q3CgR/src/interop/base.jl"
        .loc    4 38 0 is_stmt 1                ; /users/antonsmi/.julia/packages/LLVM/Q3CgR/src/interop/base.jl:38:0
        s_waitcnt lgkmcnt(0)
        s_load_dword s6, s[2:3], 0x0
        s_waitcnt lgkmcnt(0)
        v_mov_b32_e32 v1, s6
.LBB0_3:                                ; %atomicrmw.start
                                        ; =>This Inner Loop Header: Depth=1
        v_add_f32_e32 v0, 1.0, v1
        buffer_wbl2
        s_waitcnt vmcnt(0) lgkmcnt(0)
        global_atomic_cmpswap v0, v2, v[0:1], s[2:3] glc
        s_waitcnt vmcnt(0)
        buffer_invl2
        buffer_wbinvl1_vol
        v_cmp_eq_u32_e32 vcc, v0, v1
        s_or_b64 s[0:1], vcc, s[0:1]
        v_mov_b32_e32 v1, v0
        s_andn2_b64 exec, exec, s[0:1]
        s_cbranch_execnz .LBB0_3
.Ltmp8:
; %bb.4:                                ; %Flow
        .loc    4 0 0 is_stmt 0                 ; /users/antonsmi/.julia/packages/LLVM/Q3CgR/src/interop/base.jl:0:0
        s_or_b64 exec, exec, s[0:1]
        s_mov_b64 s[2:3], 0
.LBB0_5:                                ; %Flow4
        s_and_b64 vcc, exec, s[2:3]
        s_cbranch_vccnz .LBB0_7
.LBB0_6:                                ; %UnifiedReturnBlock
        s_endpgm
.LBB0_7:                                ; %L115
        s_load_dwordx2 s[0:1], s[4:5], 0x0
        v_mov_b32_e32 v2, 0
        v_mov_b32_e32 v3, 1
        s_waitcnt lgkmcnt(0)
        v_pk_mov_b32 v[0:1], s[0:1], s[0:1] op_sel:[0,1]
.Ltmp9:
        .file   5 "." "pointer.jl"
        .loc    5 146 0 is_stmt 1               ; pointer.jl:146:0
        flat_store_byte v[0:1], v2 offset:3
        flat_store_byte v[0:1], v2 offset:2
        flat_store_byte v[0:1], v2 offset:1
        flat_store_byte v[0:1], v3
.Ltmp10:
        .file   6 "." "/pfs/lustrep2/scratch/project_465000557/antonsmi/julia_depot/dev/AMDGPU/src/device/gcn/execution_control.jl"
        .loc    6 52 0                          ; /pfs/lustrep2/scratch/project_465000557/antonsmi/julia_depot/dev/AMDGPU/src/device/gcn/execution_control.jl:52:0
        s_endpgm
        ; divergent unreachable
        s_endpgm

Any help or advice is appreciated.
Thanks!

Operating System

Ubuntu 22.04.3 LTS (Jammy Jellyfish)

CPU

AMD Ryzen 7 5800X 8-Core Processor

GPU

AMD Instinct MI250X, AMD Radeon RX 7900 XTX

ROCm Version

ROCm 6.0.0, ROCm 5.6.0

@lamb-j
Copy link
Collaborator

lamb-j commented Apr 18, 2024

@kzhuravl

@kzhuravl
Copy link
Collaborator

kzhuravl commented Apr 18, 2024

Hi @arsenm or Shilei (@shiltian), can you help with this?

@arsenm
Copy link

arsenm commented Apr 25, 2024

We're in the middle of overhauling the unsafe atomic handling (i.e. see llvm#85052)

I'm not sure the handling here was ever updated properly for gfx11. I'm planning on fixing all of these cases soon.

For this particular case I think you need to use something stricter than system scope (e.g. add syncscope("agent") to the atomic)

searlmc1 pushed a commit that referenced this issue Jul 4, 2024
…arallel fusion llvm#94391 (llvm#97607)"

This reverts commit edbc0e3.

Reason for rollback. ASAN complains about this PR:

==4320==ERROR: AddressSanitizer: heap-use-after-free on address 0x502000006cd8 at pc 0x55e2978d63cf bp 0x7ffe6431c2b0 sp 0x7ffe6431c2a8
READ of size 8 at 0x502000006cd8 thread T0
    #0 0x55e2978d63ce in map<llvm::MutableArrayRef<mlir::BlockArgument> &, llvm::MutableArrayRef<mlir::BlockArgument>, nullptr> mlir/include/mlir/IR/IRMapping.h:40:11
    #1 0x55e2978d63ce in mlir::createFused(mlir::LoopLikeOpInterface, mlir::LoopLikeOpInterface, mlir::RewriterBase&, std::__u::function<llvm::SmallVector<mlir::Value, 6u> (mlir::OpBuilder&, mlir::Location, llvm::ArrayRef<mlir::BlockArgument>)>, llvm::function_ref<void (mlir::RewriterBase&, mlir::LoopLikeOpInterface, mlir::LoopLikeOpInterface&, mlir::IRMapping)>) mlir/lib/Interfaces/LoopLikeInterface.cpp:156:11
    #2 0x55e2952a614b in mlir::fuseIndependentSiblingForLoops(mlir::scf::ForOp, mlir::scf::ForOp, mlir::RewriterBase&) mlir/lib/Dialect/SCF/Utils/Utils.cpp:1398:43
    #3 0x55e291480c6f in mlir::transform::LoopFuseSiblingOp::apply(mlir::transform::TransformRewriter&, mlir::transform::TransformResults&, mlir::transform::TransformState&) mlir/lib/Dialect/SCF/TransformOps/SCFTransformOps.cpp:482:17
    #4 0x55e29149ed5e in mlir::transform::detail::TransformOpInterfaceInterfaceTraits::Model<mlir::transform::LoopFuseSiblingOp>::apply(mlir::transform::detail::TransformOpInterfaceInterfaceTraits::Concept const*, mlir::Operation*, mlir::transform::TransformRewriter&, mlir::transform::TransformResults&, mlir::transform::TransformState&) blaze-out/k8-opt-asan/bin/mlir/include/mlir/Dialect/Transform/Interfaces/TransformInterfaces.h.inc:477:56
    #5 0x55e297494a60 in apply blaze-out/k8-opt-asan/bin/mlir/include/mlir/Dialect/Transform/Interfaces/TransformInterfaces.cpp.inc:61:14
    #6 0x55e297494a60 in mlir::transform::TransformState::applyTransform(mlir::transform::TransformOpInterface) mlir/lib/Dialect/Transform/Interfaces/TransformInterfaces.cpp:953:48
    #7 0x55e294646a8d in applySequenceBlock(mlir::Block&, mlir::transform::FailurePropagationMode, mlir::transform::TransformState&, mlir::transform::TransformResults&) mlir/lib/Dialect/Transform/IR/TransformOps.cpp:1788:15
    #8 0x55e29464f927 in mlir::transform::NamedSequenceOp::apply(mlir::transform::TransformRewriter&, mlir::transform::TransformResults&, mlir::transform::TransformState&) mlir/lib/Dialect/Transform/IR/TransformOps.cpp:2155:10
    #9 0x55e2945d28ee in mlir::transform::detail::TransformOpInterfaceInterfaceTraits::Model<mlir::transform::NamedSequenceOp>::apply(mlir::transform::detail::TransformOpInterfaceInterfaceTraits::Concept const*, mlir::Operation*, mlir::transform::TransformRewriter&, mlir::transform::TransformResults&, mlir::transform::TransformState&) blaze-out/k8-opt-asan/bin/mlir/include/mlir/Dialect/Transform/Interfaces/TransformInterfaces.h.inc:477:56
    #10 0x55e297494a60 in apply blaze-out/k8-opt-asan/bin/mlir/include/mlir/Dialect/Transform/Interfaces/TransformInterfaces.cpp.inc:61:14
    #11 0x55e297494a60 in mlir::transform::TransformState::applyTransform(mlir::transform::TransformOpInterface) mlir/lib/Dialect/Transform/Interfaces/TransformInterfaces.cpp:953:48
    #12 0x55e2974a5fe2 in mlir::transform::applyTransforms(mlir::Operation*, mlir::transform::TransformOpInterface, mlir::RaggedArray<llvm::PointerUnion<mlir::Operation*, mlir::Attribute, mlir::Value>> const&, mlir::transform::TransformOptions const&, bool) mlir/lib/Dialect/Transform/Interfaces/TransformInterfaces.cpp:2016:16
    #13 0x55e2945888d7 in mlir::transform::applyTransformNamedSequence(mlir::RaggedArray<llvm::PointerUnion<mlir::Operation*, mlir::Attribute, mlir::Value>>, mlir::transform::TransformOpInterface, mlir::ModuleOp, mlir::transform::TransformOptions const&) mlir/lib/Dialect/Transform/Transforms/TransformInterpreterUtils.cpp:234:10
    #14 0x55e294582446 in (anonymous namespace)::InterpreterPass::runOnOperation() mlir/lib/Dialect/Transform/Transforms/InterpreterPass.cpp:147:16
    #15 0x55e2978e93c6 in operator() mlir/lib/Pass/Pass.cpp:527:17
    #16 0x55e2978e93c6 in void llvm::function_ref<void ()>::callback_fn<mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_1>(long) llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12
    #17 0x55e2978e207a in operator() llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12
    #18 0x55e2978e207a in executeAction<mlir::PassExecutionAction, mlir::Pass &> mlir/include/mlir/IR/MLIRContext.h:275:7
    #19 0x55e2978e207a in mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) mlir/lib/Pass/Pass.cpp:521:21
    #20 0x55e2978e5fbf in runPipeline mlir/lib/Pass/Pass.cpp:593:16
    #21 0x55e2978e5fbf in mlir::PassManager::runPasses(mlir::Operation*, mlir::AnalysisManager) mlir/lib/Pass/Pass.cpp:904:10
    #22 0x55e2978e5b65 in mlir::PassManager::run(mlir::Operation*) mlir/lib/Pass/Pass.cpp:884:60
    #23 0x55e291ebb460 in performActions(llvm::raw_ostream&, std::__u::shared_ptr<llvm::SourceMgr> const&, mlir::MLIRContext*, mlir::MlirOptMainConfig const&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:408:17
    #24 0x55e291ebabd9 in processBuffer mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:481:9
    #25 0x55e291ebabd9 in operator() mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:548:12
    #26 0x55e291ebabd9 in llvm::LogicalResult llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>::callback_fn<mlir::MlirOptMain(llvm::raw_ostream&, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&)::$_0>(long, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&) llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12
    #27 0x55e297b1cffe in operator() llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12
    #28 0x55e297b1cffe in mlir::splitAndProcessBuffer(std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>, llvm::raw_ostream&, llvm::StringRef, llvm::StringRef)::$_0::operator()(llvm::StringRef) const mlir/lib/Support/ToolUtilities.cpp:86:16
    #29 0x55e297b1c9c5 in interleave<const llvm::StringRef *, (lambda at mlir/lib/Support/ToolUtilities.cpp:79:23), (lambda at llvm/include/llvm/ADT/STLExtras.h:2147:49), void> llvm/include/llvm/ADT/STLExtras.h:2125:3
    #30 0x55e297b1c9c5 in interleave<llvm::SmallVector<llvm::StringRef, 8U>, (lambda at mlir/lib/Support/ToolUtilities.cpp:79:23), llvm::raw_ostream, llvm::StringRef> llvm/include/llvm/ADT/STLExtras.h:2147:3
    #31 0x55e297b1c9c5 in mlir::splitAndProcessBuffer(std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>, llvm::raw_ostream&, llvm::StringRef, llvm::StringRef) mlir/lib/Support/ToolUtilities.cpp:89:3
    #32 0x55e291eb0cf0 in mlir::MlirOptMain(llvm::raw_ostream&, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:551:10
    #33 0x55e291eb115c in mlir::MlirOptMain(int, char**, llvm::StringRef, llvm::StringRef, mlir::DialectRegistry&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:589:14
    #34 0x55e291eb15f8 in mlir::MlirOptMain(int, char**, llvm::StringRef, mlir::DialectRegistry&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:605:10
    #35 0x55e29130d1be in main mlir/tools/mlir-opt/mlir-opt.cpp:311:33
    #36 0x7fbcf3fff3d3 in __libc_start_main (/usr/grte/v5/lib64/libc.so.6+0x613d3) (BuildId: 9a996398ce14a94560b0c642eb4f6e94)
    #37 0x55e2912365a9 in _start /usr/grte/v5/debug-src/src/csu/../sysdeps/x86_64/start.S:120

0x502000006cd8 is located 8 bytes inside of 16-byte region [0x502000006cd0,0x502000006ce0)
freed by thread T0 here:
    #0 0x55e29130b7e2 in operator delete(void*, unsigned long) compiler-rt/lib/asan/asan_new_delete.cpp:155:3
    #1 0x55e2979eb657 in __libcpp_operator_delete<void *, unsigned long>
    #2 0x55e2979eb657 in __do_deallocate_handle_size<>
    #3 0x55e2979eb657 in __libcpp_deallocate
    #4 0x55e2979eb657 in deallocate
    #5 0x55e2979eb657 in deallocate
    #6 0x55e2979eb657 in operator()
    #7 0x55e2979eb657 in ~vector
    #8 0x55e2979eb657 in mlir::Block::~Block() mlir/lib/IR/Block.cpp:24:1
    #9 0x55e2979ebc17 in deleteNode llvm/include/llvm/ADT/ilist.h:42:39
    #10 0x55e2979ebc17 in erase llvm/include/llvm/ADT/ilist.h:205:5
    #11 0x55e2979ebc17 in erase llvm/include/llvm/ADT/ilist.h:209:39
    #12 0x55e2979ebc17 in mlir::Block::erase() mlir/lib/IR/Block.cpp:67:28
    #13 0x55e297aef978 in mlir::RewriterBase::eraseBlock(mlir::Block*) mlir/lib/IR/PatternMatch.cpp:245:10
    #14 0x55e297af0563 in mlir::RewriterBase::inlineBlockBefore(mlir::Block*, mlir::Block*, llvm::ilist_iterator<llvm::ilist_detail::node_options<mlir::Operation, false, false, void, false, void>, false, false>, mlir::ValueRange) mlir/lib/IR/PatternMatch.cpp:331:3
    #15 0x55e297af06d8 in mlir::RewriterBase::mergeBlocks(mlir::Block*, mlir::Block*, mlir::ValueRange) mlir/lib/IR/PatternMatch.cpp:341:3
    #16 0x55e297036608 in mlir::scf::ForOp::replaceWithAdditionalYields(mlir::RewriterBase&, mlir::ValueRange, bool, std::__u::function<llvm::SmallVector<mlir::Value, 6u> (mlir::OpBuilder&, mlir::Location, llvm::ArrayRef<mlir::BlockArgument>)> const&) mlir/lib/Dialect/SCF/IR/SCF.cpp:575:12
    #17 0x55e2970673ca in mlir::detail::LoopLikeOpInterfaceInterfaceTraits::Model<mlir::scf::ForOp>::replaceWithAdditionalYields(mlir::detail::LoopLikeOpInterfaceInterfaceTraits::Concept const*, mlir::Operation*, mlir::RewriterBase&, mlir::ValueRange, bool, std::__u::function<llvm::SmallVector<mlir::Value, 6u> (mlir::OpBuilder&, mlir::Location, llvm::ArrayRef<mlir::BlockArgument>)> const&) blaze-out/k8-opt-asan/bin/mlir/include/mlir/Interfaces/LoopLikeInterface.h.inc:658:56
    #18 0x55e2978d5feb in replaceWithAdditionalYields blaze-out/k8-opt-asan/bin/mlir/include/mlir/Interfaces/LoopLikeInterface.cpp.inc:105:14
    #19 0x55e2978d5feb in mlir::createFused(mlir::LoopLikeOpInterface, mlir::LoopLikeOpInterface, mlir::RewriterBase&, std::__u::function<llvm::SmallVector<mlir::Value, 6u> (mlir::OpBuilder&, mlir::Location, llvm::ArrayRef<mlir::BlockArgument>)>, llvm::function_ref<void (mlir::RewriterBase&, mlir::LoopLikeOpInterface, mlir::LoopLikeOpInterface&, mlir::IRMapping)>) mlir/lib/Interfaces/LoopLikeInterface.cpp:135:14
    #20 0x55e2952a614b in mlir::fuseIndependentSiblingForLoops(mlir::scf::ForOp, mlir::scf::ForOp, mlir::RewriterBase&) mlir/lib/Dialect/SCF/Utils/Utils.cpp:1398:43
    #21 0x55e291480c6f in mlir::transform::LoopFuseSiblingOp::apply(mlir::transform::TransformRewriter&, mlir::transform::TransformResults&, mlir::transform::TransformState&) mlir/lib/Dialect/SCF/TransformOps/SCFTransformOps.cpp:482:17
    #22 0x55e29149ed5e in mlir::transform::detail::TransformOpInterfaceInterfaceTraits::Model<mlir::transform::LoopFuseSiblingOp>::apply(mlir::transform::detail::TransformOpInterfaceInterfaceTraits::Concept const*, mlir::Operation*, mlir::transform::TransformRewriter&, mlir::transform::TransformResults&, mlir::transform::TransformState&) blaze-out/k8-opt-asan/bin/mlir/include/mlir/Dialect/Transform/Interfaces/TransformInterfaces.h.inc:477:56
    #23 0x55e297494a60 in apply blaze-out/k8-opt-asan/bin/mlir/include/mlir/Dialect/Transform/Interfaces/TransformInterfaces.cpp.inc:61:14
    #24 0x55e297494a60 in mlir::transform::TransformState::applyTransform(mlir::transform::TransformOpInterface) mlir/lib/Dialect/Transform/Interfaces/TransformInterfaces.cpp:953:48
    #25 0x55e294646a8d in applySequenceBlock(mlir::Block&, mlir::transform::FailurePropagationMode, mlir::transform::TransformState&, mlir::transform::TransformResults&) mlir/lib/Dialect/Transform/IR/TransformOps.cpp:1788:15
    #26 0x55e29464f927 in mlir::transform::NamedSequenceOp::apply(mlir::transform::TransformRewriter&, mlir::transform::TransformResults&, mlir::transform::TransformState&) mlir/lib/Dialect/Transform/IR/TransformOps.cpp:2155:10
    #27 0x55e2945d28ee in mlir::transform::detail::TransformOpInterfaceInterfaceTraits::Model<mlir::transform::NamedSequenceOp>::apply(mlir::transform::detail::TransformOpInterfaceInterfaceTraits::Concept const*, mlir::Operation*, mlir::transform::TransformRewriter&, mlir::transform::TransformResults&, mlir::transform::TransformState&) blaze-out/k8-opt-asan/bin/mlir/include/mlir/Dialect/Transform/Interfaces/TransformInterfaces.h.inc:477:56
    #28 0x55e297494a60 in apply blaze-out/k8-opt-asan/bin/mlir/include/mlir/Dialect/Transform/Interfaces/TransformInterfaces.cpp.inc:61:14
    #29 0x55e297494a60 in mlir::transform::TransformState::applyTransform(mlir::transform::TransformOpInterface) mlir/lib/Dialect/Transform/Interfaces/TransformInterfaces.cpp:953:48
    #30 0x55e2974a5fe2 in mlir::transform::applyTransforms(mlir::Operation*, mlir::transform::TransformOpInterface, mlir::RaggedArray<llvm::PointerUnion<mlir::Operation*, mlir::Attribute, mlir::Value>> const&, mlir::transform::TransformOptions const&, bool) mlir/lib/Dialect/Transform/Interfaces/TransformInterfaces.cpp:2016:16
    #31 0x55e2945888d7 in mlir::transform::applyTransformNamedSequence(mlir::RaggedArray<llvm::PointerUnion<mlir::Operation*, mlir::Attribute, mlir::Value>>, mlir::transform::TransformOpInterface, mlir::ModuleOp, mlir::transform::TransformOptions const&) mlir/lib/Dialect/Transform/Transforms/TransformInterpreterUtils.cpp:234:10
    #32 0x55e294582446 in (anonymous namespace)::InterpreterPass::runOnOperation() mlir/lib/Dialect/Transform/Transforms/InterpreterPass.cpp:147:16
    #33 0x55e2978e93c6 in operator() mlir/lib/Pass/Pass.cpp:527:17
    #34 0x55e2978e93c6 in void llvm::function_ref<void ()>::callback_fn<mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int)::$_1>(long) llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12
    #35 0x55e2978e207a in operator() llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12
    #36 0x55e2978e207a in executeAction<mlir::PassExecutionAction, mlir::Pass &> mlir/include/mlir/IR/MLIRContext.h:275:7
    #37 0x55e2978e207a in mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) mlir/lib/Pass/Pass.cpp:521:21
    #38 0x55e2978e5fbf in runPipeline mlir/lib/Pass/Pass.cpp:593:16
    #39 0x55e2978e5fbf in mlir::PassManager::runPasses(mlir::Operation*, mlir::AnalysisManager) mlir/lib/Pass/Pass.cpp:904:10
    #40 0x55e2978e5b65 in mlir::PassManager::run(mlir::Operation*) mlir/lib/Pass/Pass.cpp:884:60
    #41 0x55e291ebb460 in performActions(llvm::raw_ostream&, std::__u::shared_ptr<llvm::SourceMgr> const&, mlir::MLIRContext*, mlir::MlirOptMainConfig const&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:408:17
    #42 0x55e291ebabd9 in processBuffer mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:481:9
    #43 0x55e291ebabd9 in operator() mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:548:12
    #44 0x55e291ebabd9 in llvm::LogicalResult llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>::callback_fn<mlir::MlirOptMain(llvm::raw_ostream&, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&)::$_0>(long, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&) llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12
    #45 0x55e297b1cffe in operator() llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12
    #46 0x55e297b1cffe in mlir::splitAndProcessBuffer(std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>, llvm::raw_ostream&, llvm::StringRef, llvm::StringRef)::$_0::operator()(llvm::StringRef) const mlir/lib/Support/ToolUtilities.cpp:86:16
    #47 0x55e297b1c9c5 in interleave<const llvm::StringRef *, (lambda at mlir/lib/Support/ToolUtilities.cpp:79:23), (lambda at llvm/include/llvm/ADT/STLExtras.h:2147:49), void> llvm/include/llvm/ADT/STLExtras.h:2125:3
    #48 0x55e297b1c9c5 in interleave<llvm::SmallVector<llvm::StringRef, 8U>, (lambda at mlir/lib/Support/ToolUtilities.cpp:79:23), llvm::raw_ostream, llvm::StringRef> llvm/include/llvm/ADT/STLExtras.h:2147:3
    #49 0x55e297b1c9c5 in mlir::splitAndProcessBuffer(std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>, llvm::raw_ostream&, llvm::StringRef, llvm::StringRef) mlir/lib/Support/ToolUtilities.cpp:89:3
    #50 0x55e291eb0cf0 in mlir::MlirOptMain(llvm::raw_ostream&, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:551:10
    #51 0x55e291eb115c in mlir::MlirOptMain(int, char**, llvm::StringRef, llvm::StringRef, mlir::DialectRegistry&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:589:14

previously allocated by thread T0 here:
    #0 0x55e29130ab5d in operator new(unsigned long) compiler-rt/lib/asan/asan_new_delete.cpp:86:3
    #1 0x55e2979ed5d4 in __libcpp_operator_new<unsigned long>
    #2 0x55e2979ed5d4 in __libcpp_allocate
    #3 0x55e2979ed5d4 in allocate
    #4 0x55e2979ed5d4 in __allocate_at_least<std::__u::allocator<mlir::BlockArgument> >
    #5 0x55e2979ed5d4 in __split_buffer
    #6 0x55e2979ed5d4 in mlir::BlockArgument* std::__u::vector<mlir::BlockArgument, std::__u::allocator<mlir::BlockArgument>>::__push_back_slow_path<mlir::BlockArgument const&>(mlir::BlockArgument const&)
    #7 0x55e2979ec0f2 in push_back
    #8 0x55e2979ec0f2 in mlir::Block::addArgument(mlir::Type, mlir::Location) mlir/lib/IR/Block.cpp:154:13
    #9 0x55e29796e457 in parseRegionBody mlir/lib/AsmParser/Parser.cpp:2172:34
    #10 0x55e29796e457 in (anonymous namespace)::OperationParser::parseRegion(mlir::Region&, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) mlir/lib/AsmParser/Parser.cpp:2121:7
    #11 0x55e29796b25e in (anonymous namespace)::CustomOpAsmParser::parseRegion(mlir::Region&, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) mlir/lib/AsmParser/Parser.cpp:1785:16
    #12 0x55e297035742 in mlir::scf::ForOp::parse(mlir::OpAsmParser&, mlir::OperationState&) mlir/lib/Dialect/SCF/IR/SCF.cpp:521:14
    #13 0x55e291322c18 in llvm::ParseResult llvm::detail::UniqueFunctionBase<llvm::ParseResult, mlir::OpAsmParser&, mlir::OperationState&>::CallImpl<llvm::ParseResult (*)(mlir::OpAsmParser&, mlir::OperationState&)>(void*, mlir::OpAsmParser&, mlir::OperationState&) llvm/include/llvm/ADT/FunctionExtras.h:220:12
    #14 0x55e29795bea3 in operator() llvm/include/llvm/ADT/FunctionExtras.h:384:12
    #15 0x55e29795bea3 in callback_fn<llvm::unique_function<llvm::ParseResult (mlir::OpAsmParser &, mlir::OperationState &)> > llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12
    #16 0x55e29795bea3 in operator() llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12
    #17 0x55e29795bea3 in parseOperation mlir/lib/AsmParser/Parser.cpp:1521:9
    #18 0x55e29795bea3 in parseCustomOperation mlir/lib/AsmParser/Parser.cpp:2017:19
    #19 0x55e29795bea3 in (anonymous namespace)::OperationParser::parseOperation() mlir/lib/AsmParser/Parser.cpp:1174:10
    #20 0x55e297971d20 in parseBlockBody mlir/lib/AsmParser/Parser.cpp:2296:9
    #21 0x55e297971d20 in (anonymous namespace)::OperationParser::parseBlock(mlir::Block*&) mlir/lib/AsmParser/Parser.cpp:2226:12
    #22 0x55e29796e4f5 in parseRegionBody mlir/lib/AsmParser/Parser.cpp:2184:7
    #23 0x55e29796e4f5 in (anonymous namespace)::OperationParser::parseRegion(mlir::Region&, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) mlir/lib/AsmParser/Parser.cpp:2121:7
    #24 0x55e29796b25e in (anonymous namespace)::CustomOpAsmParser::parseRegion(mlir::Region&, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) mlir/lib/AsmParser/Parser.cpp:1785:16
    #25 0x55e29796b2cf in (anonymous namespace)::CustomOpAsmParser::parseOptionalRegion(mlir::Region&, llvm::ArrayRef<mlir::OpAsmParser::Argument>, bool) mlir/lib/AsmParser/Parser.cpp:1796:12
    #26 0x55e2978d89ff in mlir::function_interface_impl::parseFunctionOp(mlir::OpAsmParser&, mlir::OperationState&, bool, mlir::StringAttr, llvm::function_ref<mlir::Type (mlir::Builder&, llvm::ArrayRef<mlir::Type>, llvm::ArrayRef<mlir::Type>, mlir::function_interface_impl::VariadicFlag, std::__u::basic_string<char, std::__u::char_traits<char>, std::__u::allocator<char>>&)>, mlir::StringAttr, mlir::StringAttr) mlir/lib/Interfaces/FunctionImplementation.cpp:232:14
    #27 0x55e2969ba41d in mlir::func::FuncOp::parse(mlir::OpAsmParser&, mlir::OperationState&) mlir/lib/Dialect/Func/IR/FuncOps.cpp:203:10
    #28 0x55e291322c18 in llvm::ParseResult llvm::detail::UniqueFunctionBase<llvm::ParseResult, mlir::OpAsmParser&, mlir::OperationState&>::CallImpl<llvm::ParseResult (*)(mlir::OpAsmParser&, mlir::OperationState&)>(void*, mlir::OpAsmParser&, mlir::OperationState&) llvm/include/llvm/ADT/FunctionExtras.h:220:12
    #29 0x55e29795bea3 in operator() llvm/include/llvm/ADT/FunctionExtras.h:384:12
    #30 0x55e29795bea3 in callback_fn<llvm::unique_function<llvm::ParseResult (mlir::OpAsmParser &, mlir::OperationState &)> > llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12
    #31 0x55e29795bea3 in operator() llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12
    #32 0x55e29795bea3 in parseOperation mlir/lib/AsmParser/Parser.cpp:1521:9
    #33 0x55e29795bea3 in parseCustomOperation mlir/lib/AsmParser/Parser.cpp:2017:19
    #34 0x55e29795bea3 in (anonymous namespace)::OperationParser::parseOperation() mlir/lib/AsmParser/Parser.cpp:1174:10
    #35 0x55e297959b78 in parse mlir/lib/AsmParser/Parser.cpp:2725:20
    #36 0x55e297959b78 in mlir::parseAsmSourceFile(llvm::SourceMgr const&, mlir::Block*, mlir::ParserConfig const&, mlir::AsmParserState*, mlir::AsmParserCodeCompleteContext*) mlir/lib/AsmParser/Parser.cpp:2785:41
    #37 0x55e29790d5c2 in mlir::parseSourceFile(std::__u::shared_ptr<llvm::SourceMgr> const&, mlir::Block*, mlir::ParserConfig const&, mlir::LocationAttr*) mlir/lib/Parser/Parser.cpp:46:10
    #38 0x55e291ebbfe2 in parseSourceFile<mlir::ModuleOp, const std::__u::shared_ptr<llvm::SourceMgr> &> mlir/include/mlir/Parser/Parser.h:159:14
    #39 0x55e291ebbfe2 in parseSourceFile<mlir::ModuleOp> mlir/include/mlir/Parser/Parser.h:189:10
    #40 0x55e291ebbfe2 in mlir::parseSourceFileForTool(std::__u::shared_ptr<llvm::SourceMgr> const&, mlir::ParserConfig const&, bool) mlir/include/mlir/Tools/ParseUtilities.h:31:12
    #41 0x55e291ebb263 in performActions(llvm::raw_ostream&, std::__u::shared_ptr<llvm::SourceMgr> const&, mlir::MLIRContext*, mlir::MlirOptMainConfig const&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:383:33
    #42 0x55e291ebabd9 in processBuffer mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:481:9
    #43 0x55e291ebabd9 in operator() mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:548:12
    #44 0x55e291ebabd9 in llvm::LogicalResult llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>::callback_fn<mlir::MlirOptMain(llvm::raw_ostream&, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&)::$_0>(long, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&) llvm/include/llvm/ADT/STLFunctionalExtras.h:45:12
    #45 0x55e297b1cffe in operator() llvm/include/llvm/ADT/STLFunctionalExtras.h:68:12
    #46 0x55e297b1cffe in mlir::splitAndProcessBuffer(std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>, llvm::raw_ostream&, llvm::StringRef, llvm::StringRef)::$_0::operator()(llvm::StringRef) const mlir/lib/Support/ToolUtilities.cpp:86:16
    #47 0x55e297b1c9c5 in interleave<const llvm::StringRef *, (lambda at mlir/lib/Support/ToolUtilities.cpp:79:23), (lambda at llvm/include/llvm/ADT/STLExtras.h:2147:49), void> llvm/include/llvm/ADT/STLExtras.h:2125:3
    #48 0x55e297b1c9c5 in interleave<llvm::SmallVector<llvm::StringRef, 8U>, (lambda at mlir/lib/Support/ToolUtilities.cpp:79:23), llvm::raw_ostream, llvm::StringRef> llvm/include/llvm/ADT/STLExtras.h:2147:3
    #49 0x55e297b1c9c5 in mlir::splitAndProcessBuffer(std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::function_ref<llvm::LogicalResult (std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, llvm::raw_ostream&)>, llvm::raw_ostream&, llvm::StringRef, llvm::StringRef) mlir/lib/Support/ToolUtilities.cpp:89:3
    #50 0x55e291eb0cf0 in mlir::MlirOptMain(llvm::raw_ostream&, std::__u::unique_ptr<llvm::MemoryBuffer, std::__u::default_delete<llvm::MemoryBuffer>>, mlir::DialectRegistry&, mlir::MlirOptMainConfig const&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:551:10
    #51 0x55e291eb115c in mlir::MlirOptMain(int, char**, llvm::StringRef, llvm::StringRef, mlir::DialectRegistry&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:589:14
    #52 0x55e291eb15f8 in mlir::MlirOptMain(int, char**, llvm::StringRef, mlir::DialectRegistry&) mlir/lib/Tools/mlir-opt/MlirOptMain.cpp:605:10
    #53 0x55e29130d1be in main mlir/tools/mlir-opt/mlir-opt.cpp:311:33
    #54 0x7fbcf3fff3d3 in __libc_start_main (/usr/grte/v5/lib64/libc.so.6+0x613d3) (BuildId: 9a996398ce14a94560b0c642eb4f6e94)
    #55 0x55e2912365a9 in _start /usr/grte/v5/debug-src/src/csu/../sysdeps/x86_64/start.S:120

SUMMARY: AddressSanitizer: heap-use-after-free mlir/include/mlir/IR/IRMapping.h:40:11 in map<llvm::MutableArrayRef<mlir::BlockArgument> &, llvm::MutableArrayRef<mlir::BlockArgument>, nullptr>
Shadow bytes around the buggy address:
  0x502000006a00: fa fa 00 fa fa fa 00 00 fa fa 00 fa fa fa 00 fa
  0x502000006a80: fa fa 00 fa fa fa 00 00 fa fa 00 00 fa fa 00 00
  0x502000006b00: fa fa 00 00 fa fa 00 00 fa fa 00 fa fa fa 00 fa
  0x502000006b80: fa fa 00 fa fa fa 00 fa fa fa 00 00 fa fa 00 00
  0x502000006c00: fa fa 00 00 fa fa 00 00 fa fa 00 00 fa fa fd fa
=>0x502000006c80: fa fa fd fa fa fa fd fd fa fa fd[fd]fa fa fd fd
  0x502000006d00: fa fa 00 fa fa fa 00 fa fa fa 00 fa fa fa 00 fa
  0x502000006d80: fa fa 00 fa fa fa 00 fa fa fa 00 fa fa fa 00 fa
  0x502000006e00: fa fa 00 fa fa fa 00 fa fa fa 00 00 fa fa 00 fa
  0x502000006e80: fa fa 00 fa fa fa 00 00 fa fa 00 fa fa fa 00 fa
  0x502000006f00: fa fa 00 fa fa fa 00 fa fa fa 00 fa fa fa 00 fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==4320==ABORTING
@pxl-th
Copy link
Author

pxl-th commented Jul 23, 2024

I'll close it as specifying syncscope fixes the issue. Thanks!

@pxl-th pxl-th closed this as completed Jul 23, 2024
@arsenm
Copy link

arsenm commented Aug 16, 2024

I'll close it as specifying syncscope fixes the issue. Thanks!

Note the attribute was just removed in edded8d. You should now specify some combination of !amdgpu.no.fine.grained.memory , !amdgpu.no.remote.memory, and !amdgpu.ignore.denormal.mode depending on what's appropriate for the situation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants