Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #[inline(always)] on closures with target feature 1.1 #111836

Merged
merged 2 commits into from
Jul 23, 2023

Conversation

calebzulawski
Copy link
Member

@calebzulawski calebzulawski commented May 22, 2023

Fixes #108655. I think this is the most obvious solution that isn't overly complicated. The comment includes more justification, but I think this is likely better than demoting the #[inline(always)] to #[inline], since existing code is unaffected.

@rustbot
Copy link
Collaborator

rustbot commented May 22, 2023

r? @davidtwco

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels May 22, 2023
@calebzulawski
Copy link
Member Author

r? @workingjubilee

@rustbot rustbot assigned workingjubilee and unassigned davidtwco May 22, 2023
@wesleywiser
Copy link
Member

@calebzulawski we can re-roll reviewer if you'd like 🙂

@workingjubilee
Copy link
Member

I've gotten rid of my other hundreds of notifications, I can work on this finally.

Copy link
Member

@workingjubilee workingjubilee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I basically suggested it but now that I have a good look at the code and am a few months of thinking about target features wiser, I'm concerned about the behavior of this approach for this particular, admittedly somewhat contrived example:

#![feature(target_feature_11)]

use core::arch::x86_64::*;

#[target_feature(enable = "avx")]
pub unsafe fn escape(a: f64, b: f64, c: f64, d: f64) -> impl Fn() -> __m256d {
    #[inline(always)]
    move || _mm256_set_pd(a, b, c, d)
}

#[target_feature(enable = "avx")]
pub unsafe fn way_out() -> fn(__m256d) -> i32 {
    #[inline(always)]
    move |a| _mm256_movemask_pd(a)
}

pub fn unsafe_haven(a: f64, b: f64, c: f64, d: f64) -> i32 {
    // Problem: Even though this code declared
    // that it met escape()'s and way_out()'s unsafe preconditions,
    // THIS function doesn't have the target features!
    let escapee = unsafe { escape(a, b, c, d) };
    let escaping_avx_type = escapee();
    let opening = unsafe { way_out() };
    opening(escaping_avx_type)
}

Comment on lines +13 to +14
#[inline(always)]
move || {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this forced me to check if you can annotate closures with target_feature(enable). (You cannot, fortunately.)

// would result in this closure being compiled without the inherited target features, but this
// is probably a poor usage of `#[inline(always)]` and easily avoided by not using the attribute.
if tcx.features().target_feature_11
&& tcx.is_closure(did.to_def_id())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...apparently is_closure will return true if this is a generator, also. I frankly have no idea how that should work, but dropping the features should remain safe in that case, at least...

Comment on lines +505 to +507
// its parent function, which effectively inherits the features anyway. Boxing this closure
// would result in this closure being compiled without the inherited target features, but this
// is probably a poor usage of `#[inline(always)]` and easily avoided by not using the attribute.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Boxing seems like a waste, yes, but now that I am thinking about it, this seems like it could result in confusing behavior in the "escaping closure" case, when that would result, instead of the IIFE? Does that even make sense?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inlining with closures is unfortunately always confusing. Box<dyn FnOnce()>, for example, implements FnOnce itself:

impl<Args: Tuple, F: FnOnce<Args> + ?Sized, A: Allocator> FnOnce<Args> for Box<F, A> {
type Output = <F as FnOnce<Args>>::Output;
extern "rust-call" fn call_once(self, args: Args) -> Self::Output {
<F as FnOnce<Args>>::call_once(*self, args)
}
}

This call_once doesn't have any inline attribute at all! Therefore, the boxed closure's call_once inlines into this call_once, and then it's up in the air after that.

@calebzulawski
Copy link
Member Author

unsafe_haven in this example is unsound, since you can call it without ever checking for avx, but I'll assume you have e.g. a is_x86_feature_detected!("avx") in there making it sound for the sake of argument.

I don't think there is actually anything wrong here. The #[inline(always)] apply to the closures (which are of course functions) and don't propagate to any of the contents, even if the function is trivial. So even with both closures inlined, the avx intrinsics are still not #[inline(always)], and without target features enabled in unsafe_haven, you simply run into #53069. You've checked that it's safe to call the intrinsics, they just can't inline into the larger function since it's missing the target features.

@workingjubilee
Copy link
Member

Yes, I'm handwaving feature detection for this example. Technically it's not unsound until someone actually calls it. :^)

The "ultimate" question seems to be if this is truly preferable over demoting the inline hint from inline(always) to inline? In this case we're erasing target features, which sounds like it is more likely to compromise soundness, miscompile, or have unexpected effects for the programmer, than simply downgrading the inlining, which allows the codegen backend to examine the situation and choose to reject inlining on a case-by-case basis, but preserves the feature annotations that the backend would need to use in order to make decisions about whether the inlining would be correct.

Currently, the target_feature semantics are not hints, they're directives, and that's part of why enable is unsafe. However, the inline semantics are hints, and can be disregarded. However-however, the closure case is something we are, honestly, ad hoc making up as we go along.

Relevant issues and commits:

@calebzulawski
Copy link
Member Author

I'm not sure if inline(always) is really a hint or not. As far as I can tell, LLVM's alwaysinline errors if the function fails to inline (I guess this only applies to direct function calls).

Regardless, I think this behavior is probably best for now because I'm sure there is existing code with inline(always) closures inside target_feature functions and I'd rather avoid breaking that existing code. On top of that, I think preferring the inline over the target features is at least justifiable, if not the best option (otherwise, it's impossible to use inline(always), and I'm sure someone will come across that at some point and be frustrated)

@workingjubilee
Copy link
Member

I'm not sure if inline(always) is really a hint or not. As far as I can tell, LLVM's alwaysinline errors if the function fails to inline (I guess this only applies to direct function calls).

It is because we say it is, as I understand it. LLVM is allowed to choose to not error on that case and simply silently ignore it, and as I understand it has in the past, and as you observed, it only applies to direct calls.

I guess the matter of indirection is most of what's really becoming pertinent, now that I think about it:

If we do this, then "featureful inlining" stops before the closure, but if we don't, it continues into the closure, but the closure itself may not get inlined. So if there is some reason that the closure's exterior gets "outlined" anyways, like the Box<dyn FnOnce(A) -> R> case, we can't rely on the features being present on the caller in order to justify erasing the closure. This seems particularly punishing in the case of ABI mismatches where if the closure preserved the features then inline-into-closure would be acceptable. As we see from llvm/llvm-project@7c3cf4c the actual ABI determination requires accounting for both caller and callee, and thus so does the inlining strategy, thus I fear we can't reason along the lines of "well, the caller of this closure will have all the features anyways so inlining will save us".

I might be wrong, obviously. However, one thing I am confident about is that we should not have to guess: This needs, at minimum, codegen tests in order to validate the LLVMIR is what we expect for both the direct call and indirect call cases, and we're going to need enough nesting that we can see all the consequences. This will help clarify what LLVM actually does, illuminate which approaches might actually lead to performance regressions, and catch whether LLVM decides to change its mind.

@workingjubilee
Copy link
Member

I recommend making a tests/codegen/target-feature directory.

There is no way this is going to be the last of these.

@calebzulawski
Copy link
Member Author

To take a step back for a moment, extending #[target_feature] to the closure is implicit, but adding #[inline(always)] is explicit. When the user places the inline attribute on the closure inside a target feature function, it really should be emitted because the user asked for it. The user didn't specifically request the target features be added to the closure, so I think the inlining should win out.

That said, like the added comment indicates, using inline(always) results in inheriting the target features as well, in all but the pathological case. I think it's not worth worrying ourselves about the closure escaping when it's almost certainly a mistake/bug/unnecessary using #[inline] at all on a boxed closure. We could even get really fancy and emit a lint.

I'm basically saying it's not worth overthinking it. I'm confident this change won't do anything unsound, it might not have completely optimal codegen in unusual edge cases, but I think it's easy to work around. At worst, this behavior could be adjusted in a follow up PR, since it's just codegen and not language semantics :)

@workingjubilee
Copy link
Member

workingjubilee commented Jul 17, 2023

I agree (re: "At worst, this behavior could be adjusted in a follow up PR, since it's just codegen"), I just still want to see codegen tests so that if LLVM changes their inlining rules again for target features we can catch it. :^)

@workingjubilee
Copy link
Member

@rustbot author

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 17, 2023
@calebzulawski
Copy link
Member Author

All tests good 🙂

@workingjubilee
Copy link
Member

Let's give this a whirl. @bors r+ rollup=never

@bors
Copy link
Contributor

bors commented Jul 23, 2023

📌 Commit cdb9de7 has been approved by workingjubilee

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jul 23, 2023
@bors
Copy link
Contributor

bors commented Jul 23, 2023

⌛ Testing commit cdb9de7 with merge 1c44af9...

@bors
Copy link
Contributor

bors commented Jul 23, 2023

☀️ Test successful - checks-actions
Approved by: workingjubilee
Pushing 1c44af9 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Jul 23, 2023
@bors bors merged commit 1c44af9 into rust-lang:master Jul 23, 2023
@rustbot rustbot added this to the 1.73.0 milestone Jul 23, 2023
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (1c44af9): comparison URL.

Overall result: no relevant changes - no action needed

@rustbot label: -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

This benchmark run did not return any relevant results for this metric.

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-3.4% [-3.4%, -3.4%] 1
All ❌✅ (primary) - - 0

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 651.172s -> 651.296s (0.02%)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

target_feature_11 rejects code that was previously accepted
7 participants