Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OpenMP] Fix stack corruption due to argument mismatch #96386

Merged
merged 1 commit into from
Jul 5, 2024

Conversation

sushgokh
Copy link
Contributor

While lowering (#pragma omp target update from), clang's generated .omp_task_entry. is setting up 9 arguments while calling __tgt_target_data_update_nowait_mapper.

At the same time, in __tgt_target_data_update_nowait_mapper, call to targetData() is converted to a sibcall assuming it has the argument count listed in the signature.

AARCH64 asm sequence for this is as follows (removed unrelated insns):

`
.omp_task_entry..108:
sub sp, sp, #32
stp x29, x30, sp, #16 // 16-byte Folded Spill
add x29, sp, #16
str x8, sp, #8. // stack canary
str xzr, [sp]
bl __tgt_target_data_update_nowait_mapper

__tgt_target_data_update_nowait_mapper:
sub sp, sp, #32
stp x29, x30, sp, #16 // 16-byte Folded Spill
add x29, sp, #16
str x8, sp, #8 // stack canary
// Sibcall argument setup
adrp x8, :got:_Z16targetDataUpdateP7ident_tR8DeviceTyiPPvS4_PlS5_S4_S4_R11AsyncInfoTyb
ldr x8, [x8, :got_lo12:_Z16targetDataUpdateP7ident_tR8DeviceTyiPPvS4_PlS5_S4_S4_R11AsyncInfoTyb]
stp x9, x8, x29, #16
adrp x8, .L.str.8
add x8, x8, :lo12:.L.str.8
str x8, x29, #32. <==. This is the insn that erases $fp

ldp x29, x30, sp, #16 // 16-byte Folded Reload
add sp, sp, #32
// Sibcall
b ZL10targetDataI22TaskAsyncInfoWrapperTyEvP7ident_tliPPvS4_PlS5_S4_S4_PFiS2_R8DeviceTyiS4_S4_S5_S5_S4_S4_R11AsyncInfoTybEPKcSD
`

On AArch64, call to __tgt_target_data_update_nowait_mapper in .omp_task_entry. sets up only single space on stack and this results in ovewriting $fp and subsequent stack corruption. This issue can be credited to discrepancy of __tgt_target_data_update_nowait_mapper signature in openmp/libomptarget/include/omptarget.h taking 13 arguments while clang/lib/CodeGen/CGOpenMPRuntime.cpp and
llvm/include/llvm/Frontend/OpenMP/OMPKinds.def taking only 9 arguments.

This patch modifies __tgt_target_data_update_nowait_mapper signature to match .omp_task_entry usage(and other 2 files mentioned above).

@llvmbot
Copy link
Collaborator

llvmbot commented Jun 22, 2024

@llvm/pr-subscribers-llvm-transforms
@llvm/pr-subscribers-flang-openmp
@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-offload

Author: Sushant Gokhale (sushgokh)

Changes

While lowering (#pragma omp target update from), clang's generated .omp_task_entry. is setting up 9 arguments while calling __tgt_target_data_update_nowait_mapper.

At the same time, in __tgt_target_data_update_nowait_mapper, call to targetData<TaskAsyncInfoWrapperTy>() is converted to a sibcall assuming it has the argument count listed in the signature.

AARCH64 asm sequence for this is as follows (removed unrelated insns):

`
.omp_task_entry..108:
sub sp, sp, #32
stp x29, x30, sp, #16 // 16-byte Folded Spill
add x29, sp, #16
str x8, sp, #8. // stack canary
str xzr, [sp]
bl __tgt_target_data_update_nowait_mapper

__tgt_target_data_update_nowait_mapper:
sub sp, sp, #32
stp x29, x30, sp, #16 // 16-byte Folded Spill
add x29, sp, #16
str x8, sp, #8 // stack canary
// Sibcall argument setup
adrp x8, :got:_Z16targetDataUpdateP7ident_tR8DeviceTyiPPvS4_PlS5_S4_S4_R11AsyncInfoTyb
ldr x8, [x8, :got_lo12:_Z16targetDataUpdateP7ident_tR8DeviceTyiPPvS4_PlS5_S4_S4_R11AsyncInfoTyb]
stp x9, x8, x29, #16
adrp x8, .L.str.8
add x8, x8, :lo12:.L.str.8
str x8, x29, #32. <==. This is the insn that erases $fp

ldp x29, x30, sp, #16 // 16-byte Folded Reload
add sp, sp, #32
// Sibcall
b ZL10targetDataI22TaskAsyncInfoWrapperTyEvP7ident_tliPPvS4_PlS5_S4_S4_PFiS2_R8DeviceTyiS4_S4_S5_S5_S4_S4_R11AsyncInfoTybEPKcSD
`

On AArch64, call to __tgt_target_data_update_nowait_mapper in .omp_task_entry. sets up only single space on stack and this results in ovewriting $fp and subsequent stack corruption. This issue can be credited to discrepancy of __tgt_target_data_update_nowait_mapper signature in openmp/libomptarget/include/omptarget.h taking 13 arguments while clang/lib/CodeGen/CGOpenMPRuntime.cpp and
llvm/include/llvm/Frontend/OpenMP/OMPKinds.def taking only 9 arguments.

This patch modifies __tgt_target_data_update_nowait_mapper signature to match .omp_task_entry usage(and other 2 files mentioned above).


Full diff: https://github.com/llvm/llvm-project/pull/96386.diff

2 Files Affected:

  • (modified) offload/include/omptarget.h (+1-2)
  • (modified) offload/src/interface.cpp (+1-2)
diff --git a/offload/include/omptarget.h b/offload/include/omptarget.h
index 323dee41630f2..968589e866334 100644
--- a/offload/include/omptarget.h
+++ b/offload/include/omptarget.h
@@ -388,8 +388,7 @@ void __tgt_target_data_update_mapper(ident_t *Loc, int64_t DeviceId,
 void __tgt_target_data_update_nowait_mapper(
     ident_t *Loc, int64_t DeviceId, int32_t ArgNum, void **ArgsBase,
     void **Args, int64_t *ArgSizes, int64_t *ArgTypes, map_var_info_t *ArgNames,
-    void **ArgMappers, int32_t DepNum, void *DepList, int32_t NoAliasDepNum,
-    void *NoAliasDepList);
+    void **ArgMappers);
 
 // Performs the same actions as data_begin in case ArgNum is non-zero
 // and initiates run of offloaded region on target platform; if ArgNum
diff --git a/offload/src/interface.cpp b/offload/src/interface.cpp
index 763b051cc6d77..27562388c2f11 100644
--- a/offload/src/interface.cpp
+++ b/offload/src/interface.cpp
@@ -207,8 +207,7 @@ EXTERN void __tgt_target_data_update_mapper(ident_t *Loc, int64_t DeviceId,
 EXTERN void __tgt_target_data_update_nowait_mapper(
     ident_t *Loc, int64_t DeviceId, int32_t ArgNum, void **ArgsBase,
     void **Args, int64_t *ArgSizes, int64_t *ArgTypes, map_var_info_t *ArgNames,
-    void **ArgMappers, int32_t DepNum, void *DepList, int32_t NoAliasDepNum,
-    void *NoAliasDepList) {
+    void **ArgMappers) {
   OMPT_IF_BUILT(ReturnAddressSetterRAII RA(__builtin_return_address(0)));
   targetData<TaskAsyncInfoWrapperTy>(
       Loc, DeviceId, ArgNum, ArgsBase, Args, ArgSizes, ArgTypes, ArgNames,

@@ -207,8 +207,7 @@ EXTERN void __tgt_target_data_update_mapper(ident_t *Loc, int64_t DeviceId,
EXTERN void __tgt_target_data_update_nowait_mapper(
ident_t *Loc, int64_t DeviceId, int32_t ArgNum, void **ArgsBase,
void **Args, int64_t *ArgSizes, int64_t *ArgTypes, map_var_info_t *ArgNames,
void **ArgMappers, int32_t DepNum, void *DepList, int32_t NoAliasDepNum,
void *NoAliasDepList) {
void **ArgMappers) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are all unused? Seems weird that we're not using them at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original design was from libomp task interface, but we never use it.

However, I think we might want to keep them, as I definitely had a patch long time ago that uses them. A better fix would be to modify the front end to emit null value correspondingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I think we might want to keep them, as I definitely had a patch long time ago that uses them. A better fix would be to modify the front end to emit null value correspondingly.

All the nowait calls, listed below, suffer from the same discrepancy

__tgt_target_data_begin_nowait_mapper
__tgt_target_data_end_nowait_mapper
__tgt_target_data_update_nowait_mapper

while their counterparts, derived here, only have 9 arguments

So, I am thinking of modifying all the nowait calls while addressing this specific issue.

@shiltian If you still think that modifying the front-end would be better here, could you tell me what changes are exactly required in the OffloadingArgs here ? I hope this is what you are suggesting.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. If it is nowait, push extra null values.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:codegen flang:openmp clang:openmp OpenMP related changes to Clang labels Jun 27, 2024
Copy link
Contributor

@shiltian shiltian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides, don't you need to update clang tests?

Copy link
Contributor

@shiltian shiltian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

While lowering (#pragma omp target update from), clang's generated
.omp_task_entry. is setting up 9 arguments while calling
__tgt_target_data_update_nowait_mapper.

At the same time, in __tgt_target_data_update_nowait_mapper, call to
targetData<TaskAsyncInfoWrapperTy>() is converted to a sibcall assuming
it has the argument count listed in the signature.

AARCH64 asm sequence for this is as follows (removed unrelated insns):

.omp_task_entry..108:
  sub   sp, sp, llvm#32
  stp   x29, x30, sp, llvm#16       // 16-byte Folded Spill
  add   x29, sp, llvm#16
  str   x8, sp, llvm#8. // stack canary
  str   xzr, [sp]
  bl   __tgt_target_data_update_nowait_mapper

__tgt_target_data_update_nowait_mapper:
  sub   sp, sp, llvm#32
  stp   x29, x30, sp, llvm#16       // 16-byte Folded Spill
  add   x29, sp, llvm#16
  str   x8, sp, llvm#8 // stack canary
  // Sibcall argument setup
  adrp  x8, :got:_Z16targetDataUpdateP7ident_tR8DeviceTyiPPvS4_PlS5_S4_S4_R11AsyncInfoTyb
  ldr   x8, [x8, :got_lo12:_Z16targetDataUpdateP7ident_tR8DeviceTyiPPvS4_PlS5_S4_S4_R11AsyncInfoTyb]
  stp   x9, x8, x29, llvm#16
  adrp  x8, .L.str.8
  add   x8, x8, :lo12:.L.str.8
  str   x8, x29, llvm#32. <==. This is the insn that erases $fp

  ldp   x29, x30, sp, llvm#16       // 16-byte Folded Reload
  add   sp, sp, llvm#32
  // Sibcall
  b    ZL10targetDataI22TaskAsyncInfoWrapperTyEvP7ident_tliPPvS4_PlS5_S4_S4_PFiS2_R8DeviceTyiS4_S4_S5_S5_S4_S4_R11AsyncInfoTybEPKcSD

On AArch64, call to __tgt_target_data_update_nowait_mapper in .omp_task_entry.
sets up only single space on stack and this results in ovewriting $fp
and subsequent stack corruption. This issue can be credited to discrepancy of
__tgt_target_data_update_nowait_mapper signature in
openmp/libomptarget/include/omptarget.h taking 13 arguments while
clang/lib/CodeGen/CGOpenMPRuntime.cpp and
llvm/include/llvm/Frontend/OpenMP/OMPKinds.def taking only 9 arguments.

This patch modifies __tgt_target_data_update_nowait_mapper signature
to match .omp_task_entry usage(and other 2 files mentioned above).

Co-authored-by: Kugan Vivekanandarajah <kvivekananda@nvidia.com>
@sushgokh sushgokh merged commit c7ee204 into llvm:main Jul 5, 2024
7 checks passed
@sushgokh sushgokh deleted the GRCO-601 branch July 5, 2024 05:11
kbluck pushed a commit to kbluck/llvm-project that referenced this pull request Jul 6, 2024
While lowering (#pragma omp target update from), clang's generated
.omp_task_entry. is setting up 9 arguments while calling
__tgt_target_data_update_nowait_mapper.

At the same time, in __tgt_target_data_update_nowait_mapper, call to
targetData<TaskAsyncInfoWrapperTy>() is converted to a sibcall assuming
it has the argument count listed in the signature.

AARCH64 asm sequence for this is as follows (removed unrelated insns):

`
.omp_task_entry..108:
  sub   sp, sp, llvm#32
  stp   x29, x30, sp, llvm#16       // 16-byte Folded Spill
  add   x29, sp, llvm#16
  str   x8, sp, llvm#8. // stack canary
  str   xzr, [sp]
  bl   __tgt_target_data_update_nowait_mapper

__tgt_target_data_update_nowait_mapper:
  sub   sp, sp, llvm#32
  stp   x29, x30, sp, llvm#16       // 16-byte Folded Spill
  add   x29, sp, llvm#16
  str   x8, sp, llvm#8 // stack canary
  // Sibcall argument setup
adrp x8,
:got:_Z16targetDataUpdateP7ident_tR8DeviceTyiPPvS4_PlS5_S4_S4_R11AsyncInfoTyb
ldr x8, [x8,
:got_lo12:_Z16targetDataUpdateP7ident_tR8DeviceTyiPPvS4_PlS5_S4_S4_R11AsyncInfoTyb]
  stp   x9, x8, x29, llvm#16
  adrp  x8, .L.str.8
  add   x8, x8, :lo12:.L.str.8
  str   x8, x29, llvm#32. <==. This is the insn that erases $fp

  ldp   x29, x30, sp, llvm#16       // 16-byte Folded Reload
  add   sp, sp, llvm#32
  // Sibcall
b
ZL10targetDataI22TaskAsyncInfoWrapperTyEvP7ident_tliPPvS4_PlS5_S4_S4_PFiS2_R8DeviceTyiS4_S4_S5_S5_S4_S4_R11AsyncInfoTybEPKcSD
`

On AArch64, call to __tgt_target_data_update_nowait_mapper in
.omp_task_entry. sets up only single space on stack and this results in
ovewriting $fp and subsequent stack corruption. This issue can be
credited to discrepancy of __tgt_target_data_update_nowait_mapper
signature in openmp/libomptarget/include/omptarget.h taking 13 arguments
while clang/lib/CodeGen/CGOpenMPRuntime.cpp and
llvm/include/llvm/Frontend/OpenMP/OMPKinds.def taking only 9 arguments.

This patch modifies __tgt_target_data_update_nowait_mapper signature to
match .omp_task_entry usage(and other 2 files mentioned above).

Co-authored-by: Kugan Vivekanandarajah <kvivekananda@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen clang:openmp OpenMP related changes to Clang clang Clang issues not falling into any other category flang:openmp llvm:transforms offload
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants