Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[llvm][rtsan] Add transform pass for sanitize_realtime_unsafe #109543

Merged
merged 10 commits into from
Oct 3, 2024

Conversation

davidtrevelyan
Copy link
Contributor

Pass update for sanitize_realtime_unsafe

Follows #106754

Motivation

Calls to system library functions such as malloc are easy for RealtimeSanitizer to intercept. If such a call is made in a [[clang::nonblocking]] function (a real-time context), RealtimeSanitizer will error. Real-time programmers also write their own blocking (real-time unsafe) functions that may or may not call intercepted functions. We wish to introduce a mechanism whereby RealtimeSanitizer can error on calls to these, too, if called within a real-time context.

At the same time as introducing [[clang::nonblocking]], the [[clang::blocking]] attribute was also introduced. With the function effects warnings (as errors) activated, blocking functions cannot be called from non-blocking functions, and this is enforced at compile time. The purpose of this series of PRs is to introduce similar functionality into RealtimeSanitizer, so that it can make the equivalent check at run time.

Implementation

We recently merged the sanitize_realtime_unsafe LLVM function attribute into main. Our next steps are to:

  1. when sanitize_realtime_unsafe is detected, update RealtimeSanitizer's LLVM pass to insert a call to __rtsan_notify_blocking_call, and
  2. switch on the feature by updating Clang's CodeGen to actually add the sanitize_realtime_unsafe attribute to the IR for functions attributed with [[clang::blocking]].

Once the feature is switched on, RealtimeSanitizer will error if any calls to functions attributed with [[clang::blocking]] are made from [[clang::nonblocking]] functions.

Integration Roadmap

The above functionality is currently split into three patches.

  • Introduction of the sanitize_realtime_unsafe LLVM attribute (previous PR), and
  • An update to RealtimeSanitizer's LLVM pass to use it (this PR)
  • An update to Clang's CodeGen to add the sanitize_realtime_unsafe attribute to functions attributed with [[clang::blocking]] (next PR)

@llvmbot
Copy link
Member

llvmbot commented Sep 21, 2024

@llvm/pr-subscribers-llvm-transforms

@llvm/pr-subscribers-compiler-rt-sanitizer

Author: None (davidtrevelyan)

Changes

Pass update for sanitize_realtime_unsafe

Follows #106754

Motivation

Calls to system library functions such as malloc are easy for RealtimeSanitizer to intercept. If such a call is made in a [[clang::nonblocking]] function (a real-time context), RealtimeSanitizer will error. Real-time programmers also write their own blocking (real-time unsafe) functions that may or may not call intercepted functions. We wish to introduce a mechanism whereby RealtimeSanitizer can error on calls to these, too, if called within a real-time context.

At the same time as introducing [[clang::nonblocking]], the [[clang::blocking]] attribute was also introduced. With the function effects warnings (as errors) activated, blocking functions cannot be called from non-blocking functions, and this is enforced at compile time. The purpose of this series of PRs is to introduce similar functionality into RealtimeSanitizer, so that it can make the equivalent check at run time.

Implementation

We recently merged the sanitize_realtime_unsafe LLVM function attribute into main. Our next steps are to:

  1. when sanitize_realtime_unsafe is detected, update RealtimeSanitizer's LLVM pass to insert a call to __rtsan_notify_blocking_call, and
  2. switch on the feature by updating Clang's CodeGen to actually add the sanitize_realtime_unsafe attribute to the IR for functions attributed with [[clang::blocking]].

Once the feature is switched on, RealtimeSanitizer will error if any calls to functions attributed with [[clang::blocking]] are made from [[clang::nonblocking]] functions.

Integration Roadmap

The above functionality is currently split into three patches.

  • Introduction of the sanitize_realtime_unsafe LLVM attribute (previous PR), and
  • An update to RealtimeSanitizer's LLVM pass to use it (this PR)
  • An update to Clang's CodeGen to add the sanitize_realtime_unsafe attribute to functions attributed with [[clang::blocking]] (next PR)

Full diff: https://github.com/llvm/llvm-project/pull/109543.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Instrumentation/RealtimeSanitizer.cpp (+26-3)
  • (added) llvm/test/Instrumentation/RealtimeSanitizer/rtsan_unsafe.ll (+19)
diff --git a/llvm/lib/Transforms/Instrumentation/RealtimeSanitizer.cpp b/llvm/lib/Transforms/Instrumentation/RealtimeSanitizer.cpp
index 7854cf4f2c625f..f27cf83cd3cf3d 100644
--- a/llvm/lib/Transforms/Instrumentation/RealtimeSanitizer.cpp
+++ b/llvm/lib/Transforms/Instrumentation/RealtimeSanitizer.cpp
@@ -17,6 +17,7 @@
 #include "llvm/IR/IRBuilder.h"
 #include "llvm/IR/Module.h"
 
+#include "llvm/Demangle/Demangle.h"
 #include "llvm/Transforms/Instrumentation/RealtimeSanitizer.h"
 
 using namespace llvm;
@@ -45,6 +46,26 @@ static void insertCallAtAllFunctionExitPoints(Function &Fn,
         insertCallBeforeInstruction(Fn, I, InsertFnName);
 }
 
+static PreservedAnalyses rtsanPreservedCFGAnalyses() {
+  PreservedAnalyses PA;
+  PA.preserveSet<CFGAnalyses>();
+  return PA;
+}
+
+static void insertNotifyBlockingCallAtFunctionEntryPoint(Function &F) {
+  IRBuilder<> Builder(&F.front().front());
+  Value *NameArg = Builder.CreateGlobalString(demangle(F.getName()));
+
+  FunctionType *FuncType =
+      FunctionType::get(Type::getVoidTy(F.getContext()),
+                        {PointerType::getUnqual(F.getContext())}, false);
+
+  FunctionCallee Func = F.getParent()->getOrInsertFunction(
+      "__rtsan_notify_blocking_call", FuncType);
+
+  Builder.CreateCall(Func, {NameArg});
+}
+
 RealtimeSanitizerPass::RealtimeSanitizerPass(
     const RealtimeSanitizerOptions &Options) {}
 
@@ -53,10 +74,12 @@ PreservedAnalyses RealtimeSanitizerPass::run(Function &F,
   if (F.hasFnAttribute(Attribute::SanitizeRealtime)) {
     insertCallAtFunctionEntryPoint(F, "__rtsan_realtime_enter");
     insertCallAtAllFunctionExitPoints(F, "__rtsan_realtime_exit");
+    return rtsanPreservedCFGAnalyses();
+  }
 
-    PreservedAnalyses PA;
-    PA.preserveSet<CFGAnalyses>();
-    return PA;
+  if (F.hasFnAttribute(Attribute::SanitizeRealtimeUnsafe)) {
+    insertNotifyBlockingCallAtFunctionEntryPoint(F);
+    return rtsanPreservedCFGAnalyses();
   }
 
   return PreservedAnalyses::all();
diff --git a/llvm/test/Instrumentation/RealtimeSanitizer/rtsan_unsafe.ll b/llvm/test/Instrumentation/RealtimeSanitizer/rtsan_unsafe.ll
new file mode 100644
index 00000000000000..fe9f13778e35af
--- /dev/null
+++ b/llvm/test/Instrumentation/RealtimeSanitizer/rtsan_unsafe.ll
@@ -0,0 +1,19 @@
+; RUN: opt < %s -passes=rtsan -S | FileCheck %s
+
+define void @_Z17blocking_functionv() #0 {
+  ret void
+}
+
+define noundef i32 @main() #2 {
+  call void @_Z17blocking_functionv() #4
+  ret i32 0
+}
+
+attributes #0 = { mustprogress noinline sanitize_realtime_unsafe optnone ssp uwtable(sync) }
+
+; RealtimeSanitizer pass should create the demangled function name as a global string, and
+; pass it as input to an inserted __rtsan_expect_not_realtime call at the function entrypoint
+; CHECK: [[GLOBAL_STR:@[a-zA-Z0-9\.]+]]
+; CHECK-SAME: c"blocking_function()\00"
+; CHECK-LABEL: @_Z17blocking_functionv()
+; CHECK-NEXT: call{{.*}}@__rtsan_notify_blocking_call(ptr{{.*}}[[GLOBAL_STR]])

@davidtrevelyan
Copy link
Contributor Author

@cjappl for review

@dtcxzyw dtcxzyw requested a review from cjappl September 22, 2024 07:03
@cjappl cjappl changed the title Rtsan/blocking 2 llvm pass [llvm][rtsan] Add transform pass for sanitize_realtime_unsafe Sep 23, 2024
Copy link
Contributor

@cjappl cjappl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Let's wait to see if there is any other feedback, otherwise I will merge next week

using namespace llvm;

static std::vector<Type *> getArgTypes(ArrayRef<Value *> FunctionArgs) {
std::vector<Type *> Types;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer SmallVector instead of std::vector, even if you don't want to use inline storage.

Copy link
Contributor Author

@davidtrevelyan davidtrevelyan Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Updated in 49ed53a

@@ -0,0 +1,19 @@
; RUN: opt < %s -passes=rtsan -S | FileCheck %s
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use update_test_checks.py, unless there's some reason this is not possible here.

Copy link
Contributor Author

@davidtrevelyan davidtrevelyan Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, great tool. Thanks for showing me this script - I've updated this test in 0775459

@davidtrevelyan davidtrevelyan force-pushed the rtsan/blocking-2-llvm-pass branch from f19d63d to 0775459 Compare September 25, 2024 11:58

static PreservedAnalyses runSanitizeRealtimeUnsafe(Function &Fn) {
IRBuilder<> Builder(&Fn.front().front());
Value *Name = Builder.CreateGlobalString(demangle(Fn.getName()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly out of interest, why do we de-mangle here and not in the reporting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a really great question. In all honesty I didn't originally think about the possibility of de-mangling during the report stage. After thinking about it further, I have an idea on which one will probably be less work overall, but I'd need advice on whether the distribution of work is appropriate.

Here's how I see the work involved:

  • for demangling in the pass:
    • no run-time work: demangles once at compile time, but
    • could increase compile-time work (marginally?)
  • for demangling in the reporting:
    • adds to run-time work: demangles every time an error is registered (which could be many times if halt_on_error is off)
    • won't add to compile-time work

My very naive intuition is that demangling in the pass is preferable, but very happy to move it to the reporting if I have my priorities a bit upside down. Thanks in advance for any suggestions here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are missing the downside of binary size, because demangled name takes up more space, for demangling in the pass.

I am not sure if demangling will cause any noticable overhead over unwinding and symbolizing the stack frame.

Taking a step back, why do we pass this name and not use the PC and symbolize that?

Copy link
Contributor Author

@davidtrevelyan davidtrevelyan Sep 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this question and prompting us to look into this in more detail. I had indeed not considered the issue of binary size, so went away and thought about it a bit harder with @cjappl. There's lots to think about here, especially in terms of how we expect the sanitizer to be used.

We anticipate that the sanitizer users will only be marking a handful of functions they are worried about with the [[clang::blocking]] attribute, which is what ultimately results in sanitize_realtime_unsafe being added in IR. We think this because any functions that do the normal bad real-time stuff like allocation will be picked up by the libc interceptors anyway. Adding the [[clang::blocking]] attribute is more for user functions that could block entirely within user space, like a spin mutex lock or similar busy-wait scenario. Conservatively estimating that a user has 10 concerning [[blocking]] functions, each with demangled symbol length 25 characters - I think that would be (at least ideally) of the order 250 bytes of extra binary size to store. The rtsan runtime library is (when stripped) currently about 750 kB already, so we think this is about a 0.03% size increment - and only for sanitized builds.

We tried the PC approach in a branch and confirmed it works, but we were sad to lose the function name for printing diagnostics in release builds, and also to have to do the extra run-time work to unwind the stack. Being able to run the sanitizer with release builds is one of our design goals - we'd like to achieve as-near-normal performance as possible, so that users can meaningfully QA their real-time systems with RealtimeSanitizer enabled.

Given:

  • the extra work at run time necessary to unwind the stack,
  • the simpler run-time implementation when accepting a function name string,
  • the loss of being able to print the blocking function name in release builds, and
  • the almost-negligible expected binary increment,

... our moderate preference is currently to keep the approach how we have it outlined in this PR. However, this is of course caveated with the appreciation that we're very new here and inexperienced with the codebase, so I'm very keen to take as much advice as possible on this. Many thanks for your help so far - keen to hear your thoughts!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We tried the PC approach in a branch and confirmed it works, but we were sad to lose the function name for printing diagnostics in release builds

Confirming that this also meant that the whole stack trace symbolization was gone, but that was okay.

do the extra run-time work to unwind the stack.

I don't understand. To symbolize a PC, we don't need to unwind the stack. From the runtime CLs I saw go buy, it seems like you unwind the stack on error regardless though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirming that this also meant that the whole stack trace symbolization was gone, but that was okay.

Yes - thanks - that was indeed my intended meaning.

I don't understand. To symbolize a PC, we don't need to unwind the stack. From the runtime CLs I saw go buy, it seems like you unwind the stack on error regardless though

That's correct, my mistake. We can discount the runtime cost of unwinding the stack and just concentrate on the functionality/binary size trade-off. After some more thought, we also realised that the symbolised PC approach would make suppression of these errors difficult to implement, because we would no longer have a function name to match against a suppression list. For this reason, we're leaning more strongly towards the existing implmentation and taking the small hit on binary size. I noticed that you approved this PR last night, so I guess you're happy enough with it as it is - please let me know if you have any further concerns. Thanks again for the review 👍

llvm/lib/Transforms/Instrumentation/RealtimeSanitizer.cpp Outdated Show resolved Hide resolved
@davidtrevelyan davidtrevelyan force-pushed the rtsan/blocking-2-llvm-pass branch from 0775459 to b4db726 Compare September 25, 2024 21:36
@davidtrevelyan
Copy link
Contributor Author

Polite ping @vitalybuka @MaskRay :)

@cjappl cjappl merged commit 4547d60 into llvm:main Oct 3, 2024
6 of 8 checks passed
@cjappl cjappl deleted the rtsan/blocking-2-llvm-pass branch October 3, 2024 13:32
xgupta pushed a commit to xgupta/llvm-project that referenced this pull request Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants