Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Flang][MLIR][OpenMP] Align map clause generation and fix issue with non-shared allocations for assumed shape/size descriptor types #110

Merged
merged 1 commit into from
Jul 8, 2024

Conversation

agozillon
Copy link

@agozillon agozillon commented Jul 5, 2024

This PR aims to unify the map argument generation behavior across both the implicit capture (captured in a target region) and the explicit capture (process map), currently the varPtr field of the MapInfo for the same variable will be different depending on how it's captured. This PR tries to align that across the generations of MapInfoOp in the OpenMP lowering.

Currently, I have opted to utilise the rawInput (input memref to a HLFIR DeclareInfoOp) as opposed to the addr field which includes more information. The side affect of this is that we have to deal with BoxTypes less often, which will result in simpler maps in these cases. The negative side affect of this is that we don't have access to the bounds information through the resulting value, however, I believe the bounds information we require in our case is still appropriately stored in the map bounds, and this seems to be the case from testing so far.

The other fix is for cases where we end up with a BoxType argument into a function (certain assumed shape and sizes cases do this) that has no fir.ref wrapping it. As we need the Box to be a reference type to actually utilise the operation to access the base address stored inside and create the correct mappings we currently generate an intermediate allocation in these cases, and then store into it, and utilise this as the map argument, as opposed to the original.

However, as we were not sharing the same intermediate allocation across all of the maps for a variable, this resulted in errors in certain cases when detatching/attatching the data e.g. via enter and exit. This PR adjusts this for cases

Currently we only maintain tracking of all intermediate allocations for the current function scope, as opposed to module. Primarily as the only case I am aware of that this is required is in cases where we pass certain types of arguments to functions (so I opted to minimize the overhead of the pass for now). It could likely be extended to module scope if required if we find other cases where it's applicable and causing issues.

@agozillon agozillon changed the title [Flang][MLIR][OpenMP] Align map clause generation and fix issue with … [Flang][MLIR][OpenMP] Align map clause generation and fix issue with non-shared allocations for assumed shape/size descriptor types Jul 5, 2024
@agozillon
Copy link
Author

I plan on making an upstream PR for this as well soon and I'll ask Jean Perrier to have a look over it, primarily for the usage of the rawInput, to see if there's any side affects to look out for in the cases where it provides a result from a hlfir declare op!

However, until then I would be interested in applying this to ATD to see if it improves or decreases our nightly builds pass rates. I would imagine improve by a small amount, but I'd prefer to test it on our nightlies before landing it upstream where it will be more of a pain to fix when it's in!

Copy link
Collaborator

@ronlieb ronlieb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from a build/test perspective.

@agozillon
Copy link
Author

@ronlieb Thank you very much! I'll land it on Monday then provided no one has any further comments then, so as not to have the nightly results bother or confuse anyone over the weekend if it does do anything adverse I wasn't expecting...

Copy link

@ergawy ergawy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few questions and small comments. Thanks for heads-up about do-concurrent @agozillon, I will take care of the changes.

Also, I would have preferred this PR to be split to 3 PRs: one for the raw address changes, one for the member mapping of derived types, and one for local alloca caching. Let me know if you disagree with this (no need to split things now since you have Ron's approval, just for the future 🙏).

Comment on lines 2041 to 2043
if (llvm::find(mapSyms, &sym) == mapSyms.end()) {
if (const auto *details =
sym.template detailsIf<semantics::HostAssocDetails>())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use early returns for less nesting.

Suggested change
if (llvm::find(mapSyms, &sym) == mapSyms.end()) {
if (const auto *details =
sym.template detailsIf<semantics::HostAssocDetails>())
if (llvm::find(mapSyms, &sym) != mapSyms.end())
return;
if (const auto *details =
sym.template detailsIf<semantics::HostAssocDetails>())

!HOST-LABEL: func.func @_QMassumed_array_routinesPassumed_size_array(
!HOST-SAME: %[[ARG0:.*]]: !fir.ref<!fir.array<?xi32>> {fir.bindc_name = "arr_read_write"}) {
!HOST: %[[INTERMEDIATE_ALLOCA:.*]] = fir.alloca !fir.box<!fir.array<?xi32>>
!HOST: %[[ARG0_SHAPE:.*]] = fir.shape %{{.*}} : (index) -> !fir.shape<1>
!HOST: %[[ARG0_DECL:.*]]:2 = hlfir.declare %[[ARG0]](%[[ARG0_SHAPE]]) dummy_scope %{{[0-9]+}} {fortran_attrs = #fir.var_attrs<intent_inout>, uniq_name = "_QMassumed_array_routinesFassumed_size_arrayEarr_read_write"} : (!fir.ref<!fir.array<?xi32>>, !fir.shape<1>, !fir.dscope) -> (!fir.box<!fir.array<?xi32>>, !fir.ref<!fir.array<?xi32>>)
!HOST: %[[ALLOCA:.*]] = fir.alloca i32 {bindc_name = "i", uniq_name = "_QMassumed_array_routinesFassumed_size_arrayEi"}
!HOST: %[[DIMS0:.*]]:3 = fir.box_dims %[[ARG0_DECL]]#0, %c0{{.*}} : (!fir.box<!fir.array<?xi32>>, index) -> (index, index, index)
!HOST: %[[C4_1:.*]] = arith.subi %c4, %c1{{.*}} : index
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be extra safe.

Suggested change
!HOST: %[[C4_1:.*]] = arith.subi %c4, %c1{{.*}} : index
!HOST: %[[C4_1:.*]] = arith.subi %c4{{.*}}, %c1{{.*}} : index

// Structure component symbols don't have bindings, and can only be
// explicitly mapped individually. If a member is captured implicitly
// we map the entirety of the derived type when we find its symbol.
if (sym.owner().IsDerivedType())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change tested? Is it the change in lines 79 & 80 in array-bounds.f90?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change already exists, the previous code was doing the same thing, it's just more explicit in the above case. However, it's likely possible to omit this check + early return as I would imagine it's already being caught by the addition on line 2038. A similar early return is done later, in the body generation when we're handling symbol re-bindings to target arguments. But in either case without these in place, it would be a compiler ICE as opposed to an IR change in cases where derived type members are implicitly captured, as there symbol is the parent and they themselves do not have symbols currently (from my understanding at least).

Copy link

@skatrak skatrak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too familiar with the code being changed, but changes make sense to me. I just have minimal comment-related nits. LGTM, but please wait for approval by @ergawy before merging. Thank you!

baseOp = converter.getSymbolAddress(details->symbol());
converter.copySymbolBinding(details->symbol(), sym);
}
// If we come across a symbol without a symbol address, we
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ultra-nit: Comment line breaks here too early.

/// Tracks any intermediate function/subroutine local allocations we
/// generate for the descriptors of box type dummy arguments, so that
/// we can retrieve it for subsequent reuses within the functions
/// scope
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// scope
/// scope.

builder.restoreInsertionPoint(insPt);
builder.create<fir::StoreOp>(loc, descriptor, alloca);
descriptor = alloca;
// if we have already created a local allocation for this BoxType,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// if we have already created a local allocation for this BoxType,
// If we have already created a local allocation for this BoxType,

…non-shared allocations for assumed shape/size descriptor types

This PR aims unify the map argument generation across both the implicit capture (captured in a target region) and the explicit capture (process map), currently the varPtr field of the MapInfo for the same variable will be different depending on how it's captured. This PR tries to align that across the generations of MapInfoOp in the OpenMP lowering.

Currently, I have opted to utilise the rawInput (input memref to a HLFIR DeclareInfoOp) as opposed to the addr field which includes more information. The side affect of this is that we have to deal with BoxTypes less often, which will result in simpler maps in these cases. The negative side affect of this is that we don't have access to the bounds information through the resulting value, however, I believe the bounds information we require in our case is still appropriately stored in the map bounds, and this seems to be the case from testing so far.

The other fix is for cases where we end up with a BoxType argument into a function (certain assumed shape and sizes cases do this) that has no fir.ref wrapping it. As we need the Box to be a reference type to actually utilise the operation to access the base address stored inside and create the correct mappings we currently generate an intermediate allocation in these cases, and then store into it, and utilise this as the map argument, as opposed to the original.

However, as we were not sharing the same intermediate allocation across all of the maps for a variable, this resulted in errors in certain cases when detatching/attatching the data e.g. via enter and exit.  This PR adjusts this for cases

Currently we only maintain tracking of all intermediate allocations for the current function scope, as opposed to module. Primarily as the only case I am aware of that this is required is in cases where we pass certain types of arguments to functions (so I opted to minimize the overhead of the pass for now). It could likely be extended to module scope if required if we find other cases where it's applicable and causing issues.
@agozillon
Copy link
Author

Updated the PR to hopefully cover all comments!

@agozillon
Copy link
Author

Thank you everyone, going to run a final set of tests just to check I didn't break anything with the NIT adjustments then I'll merge the PR.

@agozillon agozillon merged commit 7ac059c into ROCm:amd-trunk-dev Jul 8, 2024
2 of 4 checks passed
randyh62 pushed a commit to randyh62/llvm-project that referenced this pull request Aug 2, 2024
…ine/2023.07.17

Promote develop branch (as of ecda567) to mainline
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants