Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[analyzer] Improve diagnostics from ArrayBoundCheckerV2 #70056

Merged
merged 5 commits into from
Nov 7, 2023

Conversation

NagyDonat
Copy link
Contributor

Previously alpha.security.ArrayBoundV2 produced very spartan bug reports; this commit ensures that the relevant (and known) details are reported to the user.

The logic for detecting bugs is not changed, after this commit the checker will report the same set of issues, but with better messages.

To test the details of the message generation this commit adds a new test file 'out-of-bounds-diagnostics.c'. Three of the testcases are added with FIXME notes because they reveal shortcomings of the existing modeling and bounds checking code. I will try to fix them in separate follow-up commits.

Previously alpha.security.ArrayBoundV2 produced very spartan bug
reports; this commit ensures that the relevant (and known) details are
reported to the user.

The logic for detecting bugs is not changed, after this commit the
checker will report the same set of issues, but with better messages.

To test the details of the message generation this commit adds a new
test file 'out-of-bounds-diagnostics.c'. Three of the testcases are
added with FIXME notes because they reveal shortcomings of the existing
modeling and bounds checking code. I will try to fix them in separate
follow-up commits.
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:static analyzer clang:dataflow Clang Dataflow Analysis framework - https://clang.llvm.org/docs/DataFlowAnalysisIntro.html labels Oct 24, 2023
@llvmbot
Copy link
Collaborator

llvmbot commented Oct 24, 2023

@llvm/pr-subscribers-clang-static-analyzer-1

@llvm/pr-subscribers-clang

Author: None (DonatNagyE)

Changes

Previously alpha.security.ArrayBoundV2 produced very spartan bug reports; this commit ensures that the relevant (and known) details are reported to the user.

The logic for detecting bugs is not changed, after this commit the checker will report the same set of issues, but with better messages.

To test the details of the message generation this commit adds a new test file 'out-of-bounds-diagnostics.c'. Three of the testcases are added with FIXME notes because they reveal shortcomings of the existing modeling and bounds checking code. I will try to fix them in separate follow-up commits.


Patch is 30.06 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/70056.diff

7 Files Affected:

  • (modified) clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp (+144-37)
  • (added) clang/test/Analysis/out-of-bounds-diagnostics.c (+149)
  • (modified) clang/test/Analysis/out-of-bounds-new.cpp (+13-13)
  • (modified) clang/test/Analysis/out-of-bounds.c (+14-14)
  • (modified) clang/test/Analysis/taint-diagnostic-visitor.c (+2-2)
  • (modified) clang/test/Analysis/taint-generic.c (+10-10)
  • (modified) clang/test/Analysis/taint-generic.cpp (+7-7)
diff --git a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
index e80a06e38587520..7d97f36e4cffe44 100644
--- a/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/ArrayBoundCheckerV2.cpp
@@ -22,23 +22,25 @@
 #include "clang/StaticAnalyzer/Core/PathSensitive/DynamicExtent.h"
 #include "clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h"
 #include "llvm/ADT/SmallString.h"
+#include "llvm/Support/FormatVariadic.h"
 #include "llvm/Support/raw_ostream.h"
 #include <optional>
 
 using namespace clang;
 using namespace ento;
 using namespace taint;
+using llvm::formatv;
 
 namespace {
+enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint };
+
 class ArrayBoundCheckerV2 :
     public Checker<check::Location> {
   BugType BT{this, "Out-of-bound access"};
   BugType TaintBT{this, "Out-of-bound access", categories::TaintedData};
 
-  enum OOB_Kind { OOB_Precedes, OOB_Exceeds, OOB_Taint };
-
   void reportOOB(CheckerContext &C, ProgramStateRef ErrorState, OOB_Kind Kind,
-                 SVal TaintedSVal = UnknownVal()) const;
+                 NonLoc Offset, std::string RegName, std::string Msg) const;
 
   static bool isFromCtypeMacro(const Stmt *S, ASTContext &AC);
 
@@ -53,7 +55,7 @@ class ArrayBoundCheckerV2 :
 /// Arr and the distance of Location from the beginning of Arr (expressed in a
 /// NonLoc that specifies the number of CharUnits). Returns nullopt when these
 /// cannot be determined.
-std::optional<std::pair<const SubRegion *, NonLoc>>
+static std::optional<std::pair<const SubRegion *, NonLoc>>
 computeOffset(ProgramStateRef State, SValBuilder &SVB, SVal Location) {
   QualType T = SVB.getArrayIndexType();
   auto EvalBinOp = [&SVB, State, T](BinaryOperatorKind Op, NonLoc L, NonLoc R) {
@@ -174,9 +176,119 @@ compareValueToThreshold(ProgramStateRef State, NonLoc Value, NonLoc Threshold,
   return {nullptr, nullptr};
 }
 
+static std::string getRegionName(const SubRegion *Region) {
+  std::string RegName = Region->getDescriptiveName();
+  if (!RegName.empty())
+    return RegName;
+
+  // Field regions only have descriptive names when their parent has a
+  // descriptive name; so we provide a fallback representation for them:
+  if (const auto *FR = Region->getAs<FieldRegion>()) {
+    StringRef Name = FR->getDecl()->getName();
+    if (!Name.empty())
+      return formatv("the field '{0}'", Name);
+    return "the unnamed field";
+  }
+
+  if (isa<AllocaRegion>(Region))
+    return "the memory returned by 'alloca'";
+
+  if (isa<SymbolicRegion>(Region) &&
+      isa<HeapSpaceRegion>(Region->getMemorySpace()))
+    return "the heap area";
+
+  if (isa<StringRegion>(Region))
+    return "the string literal";
+
+  return "the region";
+}
+
+static std::optional<int64_t> getConcreteValue(NonLoc SV) {
+  if (auto ConcreteVal = SV.getAs<nonloc::ConcreteInt>()) {
+    const llvm::APSInt &IntVal = ConcreteVal->getValue();
+    return IntVal.tryExtValue();
+  }
+  return std::nullopt;
+}
+
+static const char *ShortMsgTemplates[] = {
+    "Out of bound access to memory preceding {0}",
+    "Out of bound access to memory after the end of {0}",
+    "Potential out of bound access to {0} with tainted offset"};
+
+static std::string getShortMsg(OOB_Kind Kind, std::string RegName) {
+  return formatv(ShortMsgTemplates[Kind], RegName);
+}
+
+static std::string getPrecedesMsg(std::string RegName, NonLoc Offset) {
+  SmallString<128> Buf;
+  llvm::raw_svector_ostream Out(Buf);
+  Out << "Access of " << RegName << " at negative byte offset";
+  if (auto ConcreteIdx = Offset.getAs<nonloc::ConcreteInt>())
+    Out << ' ' << ConcreteIdx->getValue();
+  return std::string(Buf);
+}
+static std::string getExceedsMsg(ASTContext &ACtx, std::string RegName,
+                                 NonLoc Offset, NonLoc Extent, SVal Location) {
+  const auto *EReg = Location.getAsRegion()->getAs<ElementRegion>();
+  assert(EReg && "this checker only handles element access");
+  QualType ElemType = EReg->getElementType();
+
+  std::optional<int64_t> OffsetN = getConcreteValue(Offset);
+  std::optional<int64_t> ExtentN = getConcreteValue(Extent);
+
+  bool UseByteOffsets = true;
+  if (int64_t ElemSize = ACtx.getTypeSizeInChars(ElemType).getQuantity()) {
+    const bool OffsetHasRemainder = OffsetN && *OffsetN % ElemSize;
+    const bool ExtentHasRemainder = ExtentN && *ExtentN % ElemSize;
+    if (!OffsetHasRemainder && !ExtentHasRemainder) {
+      UseByteOffsets = false;
+      if (OffsetN)
+        *OffsetN /= ElemSize;
+      if (ExtentN)
+        *ExtentN /= ElemSize;
+    }
+  }
+
+  SmallString<256> Buf;
+  llvm::raw_svector_ostream Out(Buf);
+  Out << "Access of ";
+  if (!ExtentN && !UseByteOffsets)
+    Out << "'" << ElemType.getAsString() << "' element in ";
+  Out << RegName << " at ";
+  if (OffsetN) {
+    Out << (UseByteOffsets ? "byte offset " : "index ") << *OffsetN;
+  } else {
+    Out << "an overflowing " << (UseByteOffsets ? "byte offset" : "index");
+  }
+  if (ExtentN) {
+    Out << ", while it holds only ";
+    if (*ExtentN != 1)
+      Out << *ExtentN;
+    else
+      Out << "a single";
+    if (UseByteOffsets)
+      Out << " byte";
+    else
+      Out << " '" << ElemType.getAsString() << "' element";
+
+    if (*ExtentN > 1)
+      Out << "s";
+  }
+
+  return std::string(Buf);
+}
+static std::string getTaintMsg(std::string RegName) {
+  SmallString<128> Buf;
+  llvm::raw_svector_ostream Out(Buf);
+  Out << "Access of " << RegName
+      << " with a tainted offset that may be too large";
+  return std::string(Buf);
+}
+
 void ArrayBoundCheckerV2::checkLocation(SVal location, bool isLoad,
                                         const Stmt* LoadS,
-                                        CheckerContext &checkerContext) const {
+                                        CheckerContext &C) const {
 
   // NOTE: Instead of using ProgramState::assumeInBound(), we are prototyping
   // some new logic here that reasons directly about memory region extents.
@@ -193,11 +305,11 @@ void ArrayBoundCheckerV2::checkLocation(SVal location, bool isLoad,
   // and incomplete analysis of these leads to false positives. As even
   // accurate reports would be confusing for the users, just disable reports
   // from these macros:
-  if (isFromCtypeMacro(LoadS, checkerContext.getASTContext()))
+  if (isFromCtypeMacro(LoadS, C.getASTContext()))
     return;
 
-  ProgramStateRef state = checkerContext.getState();
-  SValBuilder &svalBuilder = checkerContext.getSValBuilder();
+  ProgramStateRef state = C.getState();
+  SValBuilder &svalBuilder = C.getSValBuilder();
 
   const std::optional<std::pair<const SubRegion *, NonLoc>> &RawOffset =
       computeOffset(state, svalBuilder, location);
@@ -223,7 +335,10 @@ void ArrayBoundCheckerV2::checkLocation(SVal location, bool isLoad,
 
     if (state_precedesLowerBound && !state_withinLowerBound) {
       // We know that the index definitely precedes the lower bound.
-      reportOOB(checkerContext, state_precedesLowerBound, OOB_Precedes);
+      std::string RegName = getRegionName(Reg);
+      std::string Msg = getPrecedesMsg(RegName, ByteOffset);
+      reportOOB(C, state_precedesLowerBound, OOB_Precedes, ByteOffset, RegName,
+                Msg);
       return;
     }
 
@@ -240,13 +355,19 @@ void ArrayBoundCheckerV2::checkLocation(SVal location, bool isLoad,
     if (state_exceedsUpperBound) {
       if (!state_withinUpperBound) {
         // We know that the index definitely exceeds the upper bound.
-        reportOOB(checkerContext, state_exceedsUpperBound, OOB_Exceeds);
+        std::string RegName = getRegionName(Reg);
+        std::string Msg = getExceedsMsg(C.getASTContext(), RegName, ByteOffset,
+                                        *KnownSize, location);
+        reportOOB(C, state_exceedsUpperBound, OOB_Exceeds, ByteOffset, RegName,
+                  Msg);
         return;
       }
       if (isTainted(state, ByteOffset)) {
         // Both cases are possible, but the index is tainted, so report.
-        reportOOB(checkerContext, state_exceedsUpperBound, OOB_Taint,
-                  ByteOffset);
+        std::string RegName = getRegionName(Reg);
+        std::string Msg = getTaintMsg(RegName);
+        reportOOB(C, state_exceedsUpperBound, OOB_Taint, ByteOffset, RegName,
+                  Msg);
         return;
       }
     }
@@ -255,42 +376,28 @@ void ArrayBoundCheckerV2::checkLocation(SVal location, bool isLoad,
       state = state_withinUpperBound;
   }
 
-  checkerContext.addTransition(state);
+  C.addTransition(state);
 }
 
 void ArrayBoundCheckerV2::reportOOB(CheckerContext &C,
                                     ProgramStateRef ErrorState, OOB_Kind Kind,
-                                    SVal TaintedSVal) const {
+                                    NonLoc Offset, std::string RegName,
+                                    std::string Msg) const {
 
   ExplodedNode *ErrorNode = C.generateErrorNode(ErrorState);
   if (!ErrorNode)
     return;
 
-  // FIXME: These diagnostics are preliminary, and they'll be replaced soon by
-  // a follow-up commit.
+  std::string ShortMsg = getShortMsg(Kind, RegName);
 
-  SmallString<128> Buf;
-  llvm::raw_svector_ostream Out(Buf);
-  Out << "Out of bound memory access ";
-
-  switch (Kind) {
-  case OOB_Precedes:
-    Out << "(accessed memory precedes memory block)";
-    break;
-  case OOB_Exceeds:
-    Out << "(access exceeds upper limit of memory block)";
-    break;
-  case OOB_Taint:
-    Out << "(index is tainted)";
-    break;
-  }
   auto BR = std::make_unique<PathSensitiveBugReport>(
-      Kind == OOB_Taint ? TaintBT : BT, Out.str(), ErrorNode);
-  // Track back the propagation of taintedness, or do nothing if TaintedSVal is
-  // the default UnknownVal().
-  for (SymbolRef Sym : getTaintedSymbols(ErrorState, TaintedSVal)) {
-    BR->markInteresting(Sym);
-  }
+      Kind == OOB_Taint ? TaintBT : BT, ShortMsg, Msg, ErrorNode);
+
+  // Track back the propagation of taintedness.
+  if (Kind == OOB_Taint)
+    for (SymbolRef Sym : getTaintedSymbols(ErrorState, Offset))
+      BR->markInteresting(Sym);
+
   C.emitReport(std::move(BR));
 }
 
diff --git a/clang/test/Analysis/out-of-bounds-diagnostics.c b/clang/test/Analysis/out-of-bounds-diagnostics.c
new file mode 100644
index 000000000000000..4f5d0d0bc91c54a
--- /dev/null
+++ b/clang/test/Analysis/out-of-bounds-diagnostics.c
@@ -0,0 +1,149 @@
+// RUN: %clang_analyze_cc1 -Wno-array-bounds -analyzer-output=text        \
+// RUN:     -analyzer-checker=core,alpha.security.ArrayBoundV2,unix.Malloc,alpha.security.taint -verify %s
+
+int array[10];
+
+void arrayUnderflow(void) {
+  array[-3] = 5;
+  // expected-warning@-1 {{Out of bound access to memory preceding 'array'}}
+  // expected-note@-2 {{Access of 'array' at negative byte offset -12}}
+}
+
+int scanf(const char *restrict fmt, ...);
+
+void taintedIndex(void) {
+  int index;
+  scanf("%d", &index);
+  // expected-note@-1 {{Taint originated here}}
+  // expected-note@-2 {{Taint propagated to the 2nd argument}}
+  array[index] = 5;
+  // expected-warning@-1 {{Potential out of bound access to 'array' with tainted offset}}
+  // expected-note@-2 {{Access of 'array' with a tainted offset that may be too large}}
+}
+
+void arrayOverflow(void) {
+  array[12] = 5;
+  // expected-warning@-1 {{Out of bound access to memory after the end of 'array'}}
+  // expected-note@-2 {{Access of 'array' at index 12, while it holds only 10 'int' elements}}
+}
+
+int scalar;
+int scalarOverflow(void) {
+  return (&scalar)[1];
+  // expected-warning@-1 {{Out of bound access to memory after the end of 'scalar'}}
+  // expected-note@-2 {{Access of 'scalar' at index 1, while it holds only a single 'int' element}}
+}
+
+int oneElementArray[1];
+int oneElementArrayOverflow(void) {
+  return oneElementArray[1];
+  // expected-warning@-1 {{Out of bound access to memory after the end of 'oneElementArray'}}
+  // expected-note@-2 {{Access of 'oneElementArray' at index 1, while it holds only a single 'int' element}}
+}
+
+short convertedArray(void) {
+  return ((short*)array)[47];
+  // expected-warning@-1 {{Out of bound access to memory after the end of 'array'}}
+  // expected-note@-2 {{Access of 'array' at index 47, while it holds only 20 'short' elements}}
+}
+
+struct vec {
+  int len;
+  double elems[64];
+} v;
+
+double arrayInStruct(void) {
+  return v.elems[64];
+  // expected-warning@-1 {{Out of bound access to memory after the end of 'v.elems'}}
+  // expected-note@-2 {{Access of 'v.elems' at index 64, while it holds only 64 'double' elements}}
+}
+
+double arrayInStructPtr(struct vec *pv) {
+  return pv->elems[64];
+  // expected-warning@-1 {{Out of bound access to memory after the end of the field 'elems'}}
+  // expected-note@-2 {{Access of the field 'elems' at index 64, while it holds only 64 'double' elements}}
+}
+
+struct two_bytes {
+  char lo, hi;
+};
+
+struct two_bytes convertedArray2(void) {
+  // We report this with byte offsets because the offset is not divisible by the element size.
+  struct two_bytes a = {0, 0};
+  char *p = (char*)&a;
+  return *((struct two_bytes*)(p + 7));
+  // expected-warning@-1 {{Out of bound access to memory after the end of 'a'}}
+  // expected-note@-2 {{Access of 'a' at byte offset 7, while it holds only 2 bytes}}
+}
+
+int intFromString(void) {
+  // We report this with byte offsets because the extent is not divisible by the element size.
+  return ((const int*)"this is a string of 33 characters")[20];
+  // expected-warning@-1 {{Out of bound access to memory after the end of the string literal}}
+  // expected-note@-2 {{Access of the string literal at byte offset 80, while it holds only 34 bytes}}
+}
+
+int intFromStringDivisible(void) {
+  // However, this is reported with indices/elements, because the extent happens to be a multiple of 4.
+  return ((const int*)"abc")[20];
+  // expected-warning@-1 {{Out of bound access to memory after the end of the string literal}}
+  // expected-note@-2 {{Access of the string literal at index 20, while it holds only a single 'int' element}}
+}
+
+typedef __typeof(sizeof(int)) size_t;
+void *malloc(size_t size);
+
+int *mallocRegion(void) {
+  int *mem = (int*)malloc(2*sizeof(int));
+  mem[3] = -2;
+  // expected-warning@-1 {{Out of bound access to memory after the end of the heap area}}
+  // expected-note@-2 {{Access of the heap area at index 3, while it holds only 2 'int' elements}}
+  return mem;
+}
+
+void *alloca(size_t size);
+
+int allocaRegion(void) {
+  int *mem = (int*)alloca(2*sizeof(int));
+  mem[3] = -2;
+  // expected-warning@-1 {{Out of bound access to memory after the end of the heap area}}
+  // expected-note@-2 {{Access of the heap area at index 3, while it holds only 2 'int' elements}}
+  // FIXME: this should be
+  //   {{Out of bound access to memory after the end of the memory returned by 'alloca'}}
+  //   {{Access of the memory returned by 'alloca' at index 3, while it holds only 2 'int' elements}}
+  // but apparently something models 'alloca' as if it was allocating on the heap
+  return *mem;
+}
+
+int *unknownExtent(int arg) {
+  if (arg >= 2)
+    return 0;
+  int *mem = (int*)malloc(arg);
+  mem[8] = -2;
+  // FIXME: this should produce
+  //   {{Out of bound access to memory after the end of the heap area}}
+  //   {{Access of 'int' element in the heap area at index 8}}
+  return mem;
+}
+
+void unknownIndex(int arg) {
+  // expected-note@+2 {{Assuming 'arg' is >= 12}}
+  // expected-note@+1 {{Taking true branch}}
+  if (arg >= 12)
+    array[arg] = -2;
+  // expected-warning@-1 {{Out of bound access to memory after the end of 'array'}}
+  // expected-note@-2 {{Access of 'array' at an overflowing index, while it holds only 10 'int' elements}}
+}
+
+int *nothingIsCertain(int x, int y) {
+  if (x >= 2)
+    return 0;
+  int *mem = (int*)malloc(x);
+  if (y >= 8)
+    mem[y] = -2;
+  // FIXME: this should produce
+  //   {{Out of bound access to memory after the end of the heap area}}
+  //   {{Access of 'int' element in the heap area at an overflowing index}}
+  return mem;
+}
diff --git a/clang/test/Analysis/out-of-bounds-new.cpp b/clang/test/Analysis/out-of-bounds-new.cpp
index b7ceea72a270db4..af4ec47d8358aab 100644
--- a/clang/test/Analysis/out-of-bounds-new.cpp
+++ b/clang/test/Analysis/out-of-bounds-new.cpp
@@ -5,7 +5,7 @@
 // - constant integer size for buffer
 void test1(int x) {
   int *buf = new int[100];
-  buf[100] = 1; // expected-warning{{Out of bound memory access}}
+  buf[100] = 1; // expected-warning{{Out of bound access to memory}}
 }
 
 void test1_ok(int x) {
@@ -20,7 +20,7 @@ void test1_ok(int x) {
 void test1_ptr(int x) {
   int *buf = new int[100];
   int *p = buf;
-  p[101] = 1; // expected-warning{{Out of bound memory access}}
+  p[101] = 1; // expected-warning{{Out of bound access to memory}}
 }
 
 void test1_ptr_ok(int x) {
@@ -37,7 +37,7 @@ void test1_ptr_arith(int x) {
   int *buf = new int[100];
   int *p = buf;
   p = p + 100;
-  p[0] = 1; // expected-warning{{Out of bound memory access}}
+  p[0] = 1; // expected-warning{{Out of bound access to memory}}
 }
 
 void test1_ptr_arith_ok(int x) {
@@ -51,7 +51,7 @@ void test1_ptr_arith_bad(int x) {
   int *buf = new int[100];
   int *p = buf;
   p = p + 99;
-  p[1] = 1; // expected-warning{{Out of bound memory access}}
+  p[1] = 1; // expected-warning{{Out of bound access to memory}}
 }
 
 void test1_ptr_arith_ok2(int x) {
@@ -66,7 +66,7 @@ void test1_ptr_arith_ok2(int x) {
 // - constant integer size for buffer
 void test2(int x) {
   int *buf = new int[100];
-  buf[-1] = 1; // expected-warning{{Out of bound memory access}}
+  buf[-1] = 1; // expected-warning{{Out of bound access to memory}}
 }
 
 // Tests doing an out-of-bounds access before the start of an array using:
@@ -76,7 +76,7 @@ void test2(int x) {
 void test2_ptr(int x) {
   int *buf = new int[100];
   int *p = buf;
-  p[-1] = 1; // expected-warning{{Out of bound memory access}}
+  p[-1] = 1; // expected-warning{{Out of bound access to memory}}
 }
 
 // Tests doing an out-of-bounds access before the start of an array using:
@@ -87,35 +87,35 @@ void test2_ptr_arith(int x) {
   int *buf = new int[100];
   int *p = buf;
   --p;
-  p[0] = 1; // expected-warning {{Out of bound memory access (accessed memory precedes memory block)}}
+  p[0] = 1; // expected-warning {{Out of bound access to memory preceding}}
 }
 
 // Tests under-indexing
 // of a multi-dimensional array
 void test2_multi(int x) {
   auto buf = new int[100][100];
-  buf[0][-1] = 1; // expected-warning{{Out of bound memory access}}
+  buf[0][-1] = 1; // expected-warning{{Out of bound access to memory}}
 }
 
 // Tests under-indexing
 // of a multi-dimensional array
 void test2_multi_b(int x) {
   auto buf = new int[100][100];
-  buf[-1][0] = 1; // expected-warning{{Out of bound memory access}}
+  buf[-1][0] = 1; // expected-warning{{Out of bound access to memory}}
 }
 
 // Tests over-indexing
 // of a multi-dimensional array
 void test2_multi_c(int x) {
   auto buf = new int[100][100];
-  buf[100][0] = 1; // expected-warning{{Out of bound memory access}}
+  buf[100][0] = 1; // expected-warning{{Out of bound access to memory}}
 }
 
 // Tests over-indexing
 // of a multi-dimensional array
 void test2_multi_2(int x) {
   auto buf = new int[100][100];
-  buf[99][100] = 1; // expected-warning{{Out of bound memory access}}
+  buf[99][100] = 1; // expected-warning{{Out of bound access to memory}}
 }
 
 // Tests normal access of
@@ -131,7 +131,7 @@ void test_diff_types(int x) {
   int *buf = new int[10]; //10*sizeof(int) Bytes allocated
   char *cptr = (char *)buf;
   cptr[sizeof(int) * 9] = 1;  // no-warning
-  cptr[sizeof(int) * 10] = 1; // expected-warning{{Out of bound memory access}}
+  cptr[sizeof(int) * 10] = 1; // expected-warning{{Out of bound access to memory}}
 }
 
 // Tests over-indexing
@@ -139,7 +139,7 @@ void test_diff_types(int x) {
 void test_non_array(int x) {
   int *ip = new int;
   ip[0] = 1; // no-warning
-  ip[1] = 2; // expected-warning{{Out of bound memory access}}
+  ip[1] = 2; // expected-warning{{Out of bound access to memory}}
 }
 
 //Tests over-indexing
diff --git a/clang/test/Analysis/out-of-bounds.c b/clang/test/Analysis/out-of-bounds.c
index cf8e60f66ac063d..ed457e869600630 100644
--- a/clang/test/Analysis/out-of-bounds.c
+++ b/clang/test/Analysis/out-of-bounds.c
@@ -7,7 +7...
[truncated]

@github-actions
Copy link

github-actions bot commented Oct 24, 2023

✅ With the latest revision this PR passed the C/C++ code formatter.

@NagyDonat
Copy link
Contributor Author

I renamed some variables because (1) they weren't CamelCased because they were "inherited" from the old versions of this checker and (2) their names were slightly too long and they introduced extra linebreaks in the newly added code.

@NagyDonat
Copy link
Contributor Author

Note that this PR introduces a PathSensitiveBugReport where the Description and ShortDescription fields (inherited from BugReport) are different (because I'm using the four-argument constructor). These fields are intended for different purposes, but practically every checker stores the same string in both of them.

To clarify this situation, I opened a discussion on discourse.

Copy link
Contributor

@steakhal steakhal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I only have minor remarks.
Consider renaming the PR Improve reports -> Improve messages, or diagnostics, to highlight that the "messages" aspect is improved, not e.g. improving the FP/TP rate or something like that.

@NagyDonat NagyDonat changed the title [analyzer] Improve reports from ArrayBoundCheckerV2 [analyzer] Improve diagnostics from ArrayBoundCheckerV2 Oct 31, 2023
@NagyDonat
Copy link
Contributor Author

The title change suggestion was a very good point, thanks!

Copy link
Contributor

@steakhal steakhal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only had a couple std::moves missing, an FYI comment, and one question about the diagnostics in the tests.

Even in the current state, I think it's a good baby step in improving the status quo.
Thank you for pushing this forward!


One remark about the review workflow, I'd prefer if the conversation starter would resolve the conversations. Let me explain why:

Given the amount of reviewers for CSA, I'd suggest explicitly supporting reviewer experience.
Speaking of that, personally I find the following workflow to suite myself the best:

  • All comments needs a reaction, either by explict commenting on it or by putting there a thums-up emoji or something. This helps the author follow which comments were addressed downstream and is pending for submission. An explicit comment is expected for challenging a comment. I prefer emojies, becase they don't send an email to everyone subscribe - unlike for individual comments; and usually I also batch individual commits into a "self-review" as patch author when I reply for the same reason.
  • All comments needs to be reacted before requestion the next round of review from the person who added those comments.
  • The comment should be only marked resolved by the person who raised the concern. This ensures that the comment was adequately solved, thus the issue is no longer relevant.
  • To merge a PR, the PR should have no open conversations. Post-approval comments are also fine and leaves room for further discussions, but the goal is no longer to directly challenge the legitimacy of the patch.

Note that I'm not raising these because I feel we have to adjust, but rather because I find the reviews experience on GitHub really poor in general.
Comments disappear, comments don't show up for the patch author, only if they look at the right pane like "Conversations" and "Files changed" - yes, not all comments are present on each -.- And I've been bitten by it as a patch author many many times, and expectations around "reacting on comments" helped to mitigate this pain to some degree.

Note that this is only my personal preference, and if I'm not mistaken, there is no legitimized consensus around the project. At least, last time I checked the following relevant thread around GitHub review workflows:
https://discourse.llvm.org/t/rfc-github-pr-resolve-conversation-button/73178/

std::string RegName = getRegionName(Reg);
std::string Msg = getExceedsMsg(C.getASTContext(), RegName, ByteOffset,
*KnownSize, Location);
reportOOB(C, ExceedsUpperBound, OOB_Exceeds, ByteOffset, RegName, Msg);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
reportOOB(C, ExceedsUpperBound, OOB_Exceeds, ByteOffset, RegName, Msg);
reportOOB(C, ExceedsUpperBound, OOB_Exceeds, ByteOffset, std::move(RegName), std::move(Msg));

ByteOffset);
std::string RegName = getRegionName(Reg);
std::string Msg = getTaintMsg(RegName);
reportOOB(C, ExceedsUpperBound, OOB_Taint, ByteOffset, RegName, Msg);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
reportOOB(C, ExceedsUpperBound, OOB_Taint, ByteOffset, RegName, Msg);
reportOOB(C, ExceedsUpperBound, OOB_Taint, ByteOffset, std::move(RegName), std::move(Msg));

@NagyDonat
Copy link
Contributor Author

Thanks for the suggestions; I'll probably upload the next revision tomorrow.

One remark about the review workflow, I'd prefer if the conversation starter would resolve the conversations. Let me explain why:

Given the amount of reviewers for CSA, I'd suggest explicitly supporting reviewer experience. Speaking of that, personally I find the following workflow to suite myself the best:

  • All comments needs a reaction, either by explict commenting on it or by putting there a thums-up emoji or something. This helps the author follow which comments were addressed downstream and is pending for submission. An explicit comment is expected for challenging a comment. I prefer emojies, becase they don't send an email to everyone subscribe - unlike for individual comments; and usually I also batch individual commits into a "self-review" as patch author when I reply for the same reason.

  • All comments needs to be reacted before requestion the next round of review from the person who added those comments.

  • The comment should be only marked resolved by the person who raised the concern. This ensures that the comment was adequately solved, thus the issue is no longer relevant.

  • To merge a PR, the PR should have no open conversations. Post-approval comments are also fine and leaves room for further discussions, but the goal is no longer to directly challenge the legitimacy of the patch.

Sounds reasonable, I'll try to follow this process, especially when interacting with you. In the previous round I resolved the conversations because I applied them (mostly verbatim) in the new revision that I uploaded; but I can instead apply a thumbsup emoji if you'd prefer that.

I thought for a moment that there are some corner cases where the more
specific "may be too large" message is inaccurate, but I realized that
there are no such cases, so we can use the specific message (which is
slightly more helpful for the user).

This reverts commit 979a387.
@NagyDonat
Copy link
Contributor Author

I decided to keep the reportOOB() calls without additional changes. I feel that adding std::move() has limited benefits (performance impact is negligible and the next line is a return so the reader can already see that the variable won't be used again) and it would make those already long lines even more convoluted. I also considered refactoring the data flow, but I didn't find a more elegant arrangement than the current one.

@NagyDonat NagyDonat merged commit 16ef496 into llvm:main Nov 7, 2023
2 of 3 checks passed
@NagyDonat NagyDonat deleted the arrayboundv2_msg_improvements branch November 7, 2023 12:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:dataflow Clang Dataflow Analysis framework - https://clang.llvm.org/docs/DataFlowAnalysisIntro.html clang:static analyzer clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants