-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[analyzer] MallocChecker: Recognize std::atomics in smart pointer suppression. #90918
Conversation
…pression. Fixes llvm#90498. Same as 5337efc for atomic builtins, but for `std::atomic` this time. This is useful because even though the actual builtin atomic is still there, it may be buried beyond the inlining depth limit. Also add one popular custom smart pointer class name to the name-based heuristics, which isn't necessary to fix the bug but arguably a good idea regardless.
@llvm/pr-subscribers-clang @llvm/pr-subscribers-clang-static-analyzer-1 Author: Artem Dergachev (haoNoQ) ChangesFixes #90498. Same as 5337efc for atomic builtins, but for Also add one popular custom smart pointer class name to the name-based heuristics, which isn't necessary to fix the bug but arguably a good idea regardless. Full diff: https://github.com/llvm/llvm-project/pull/90918.diff 3 Files Affected:
diff --git a/clang/lib/StaticAnalyzer/Checkers/MallocChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/MallocChecker.cpp
index 11651fd491f743..3d89d1e6343490 100644
--- a/clang/lib/StaticAnalyzer/Checkers/MallocChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/MallocChecker.cpp
@@ -3447,7 +3447,7 @@ static bool isReferenceCountingPointerDestructor(const CXXDestructorDecl *DD) {
if (N.contains_insensitive("ptr") || N.contains_insensitive("pointer")) {
if (N.contains_insensitive("ref") || N.contains_insensitive("cnt") ||
N.contains_insensitive("intrusive") ||
- N.contains_insensitive("shared")) {
+ N.contains_insensitive("shared") || N.ends_with_insensitive("rc")) {
return true;
}
}
@@ -3479,13 +3479,24 @@ PathDiagnosticPieceRef MallocBugVisitor::VisitNode(const ExplodedNode *N,
// original reference count is positive, we should not report use-after-frees
// on objects deleted in such destructors. This can probably be improved
// through better shared pointer modeling.
- if (ReleaseDestructorLC) {
+ if (ReleaseDestructorLC && (ReleaseDestructorLC == CurrentLC ||
+ ReleaseDestructorLC->isParentOf(CurrentLC))) {
if (const auto *AE = dyn_cast<AtomicExpr>(S)) {
+ // Check for manual use of atomic builtins.
AtomicExpr::AtomicOp Op = AE->getOp();
if (Op == AtomicExpr::AO__c11_atomic_fetch_add ||
Op == AtomicExpr::AO__c11_atomic_fetch_sub) {
- if (ReleaseDestructorLC == CurrentLC ||
- ReleaseDestructorLC->isParentOf(CurrentLC)) {
+ BR.markInvalid(getTag(), S);
+ }
+ } else if (const auto *CE = dyn_cast<CallExpr>(S)) {
+ // Check for `std::atomic` and such. This covers both regular method calls
+ // and operator calls.
+ if (const auto *MD =
+ dyn_cast_or_null<CXXMethodDecl>(CE->getDirectCallee())) {
+ const CXXRecordDecl *RD = MD->getParent();
+ // A bit wobbly with ".contains()" because it may be like
+ // "__atomic_base" or something.
+ if (StringRef(RD->getNameAsString()).contains("atomic")) {
BR.markInvalid(getTag(), S);
}
}
diff --git a/clang/test/Analysis/Inputs/system-header-simulator-cxx.h b/clang/test/Analysis/Inputs/system-header-simulator-cxx.h
index 1c2be322f83c20..29326ec1f9280d 100644
--- a/clang/test/Analysis/Inputs/system-header-simulator-cxx.h
+++ b/clang/test/Analysis/Inputs/system-header-simulator-cxx.h
@@ -1260,6 +1260,13 @@ template<
iterator end() const { return iterator(val + 1); }
};
+template <typename T>
+class atomic {
+public:
+ T operator++();
+ T operator--();
+};
+
namespace execution {
class sequenced_policy {};
}
diff --git a/clang/test/Analysis/NewDelete-atomics.cpp b/clang/test/Analysis/NewDelete-atomics.cpp
index 54fce17ea7bd22..1425acab7489ba 100644
--- a/clang/test/Analysis/NewDelete-atomics.cpp
+++ b/clang/test/Analysis/NewDelete-atomics.cpp
@@ -20,7 +20,7 @@ typedef enum memory_order {
memory_order_seq_cst = __ATOMIC_SEQ_CST
} memory_order;
-class Obj {
+class RawObj {
int RefCnt;
public:
@@ -37,11 +37,27 @@ class Obj {
void foo();
};
+class StdAtomicObj {
+ std::atomic<int> RefCnt;
+
+public:
+ int incRef() {
+ return ++RefCnt;
+ }
+
+ int decRef() {
+ return --RefCnt;
+ }
+
+ void foo();
+};
+
+template <typename T>
class IntrusivePtr {
- Obj *Ptr;
+ T *Ptr;
public:
- IntrusivePtr(Obj *Ptr) : Ptr(Ptr) {
+ IntrusivePtr(T *Ptr) : Ptr(Ptr) {
Ptr->incRef();
}
@@ -55,22 +71,106 @@ class IntrusivePtr {
delete Ptr;
}
- Obj *getPtr() const { return Ptr; } // no-warning
+ T *getPtr() const { return Ptr; } // no-warning
+};
+
+// Also IntrusivePtr but let's dodge name-based heuristics.
+template <typename T>
+class DifferentlyNamed {
+ T *Ptr;
+
+public:
+ DifferentlyNamed(T *Ptr) : Ptr(Ptr) {
+ Ptr->incRef();
+ }
+
+ DifferentlyNamed(const DifferentlyNamed &Other) : Ptr(Other.Ptr) {
+ Ptr->incRef();
+ }
+
+ ~DifferentlyNamed() {
+ // We should not take the path on which the object is deleted.
+ if (Ptr->decRef() == 1)
+ delete Ptr;
+ }
+
+ T *getPtr() const { return Ptr; } // no-warning
};
void testDestroyLocalRefPtr() {
- IntrusivePtr p1(new Obj());
+ IntrusivePtr<RawObj> p1(new RawObj());
+ {
+ IntrusivePtr<RawObj> p2(p1);
+ }
+
+ // p1 still maintains ownership. The object is not deleted.
+ p1.getPtr()->foo(); // no-warning
+}
+
+void testDestroySymbolicRefPtr(const IntrusivePtr<RawObj> &p1) {
+ {
+ IntrusivePtr<RawObj> p2(p1);
+ }
+
+ // p1 still maintains ownership. The object is not deleted.
+ p1.getPtr()->foo(); // no-warning
+}
+
+void testDestroyLocalRefPtrWithAtomics() {
+ IntrusivePtr<StdAtomicObj> p1(new StdAtomicObj());
+ {
+ IntrusivePtr<StdAtomicObj> p2(p1);
+ }
+
+ // p1 still maintains ownership. The object is not deleted.
+ p1.getPtr()->foo(); // no-warning
+}
+
+
+void testDestroyLocalRefPtrWithAtomics(const IntrusivePtr<StdAtomicObj> &p1) {
{
- IntrusivePtr p2(p1);
+ IntrusivePtr<StdAtomicObj> p2(p1);
}
// p1 still maintains ownership. The object is not deleted.
p1.getPtr()->foo(); // no-warning
}
-void testDestroySymbolicRefPtr(const IntrusivePtr &p1) {
+void testDestroyLocalRefPtrDifferentlyNamed() {
+ DifferentlyNamed<RawObj> p1(new RawObj());
+ {
+ DifferentlyNamed<RawObj> p2(p1);
+ }
+
+ // p1 still maintains ownership. The object is not deleted.
+ p1.getPtr()->foo(); // no-warning
+}
+
+void testDestroySymbolicRefPtrDifferentlyNamed(
+ const DifferentlyNamed<RawObj> &p1) {
+ {
+ DifferentlyNamed<RawObj> p2(p1);
+ }
+
+ // p1 still maintains ownership. The object is not deleted.
+ p1.getPtr()->foo(); // no-warning
+}
+
+void testDestroyLocalRefPtrWithAtomicsDifferentlyNamed() {
+ DifferentlyNamed<StdAtomicObj> p1(new StdAtomicObj());
+ {
+ DifferentlyNamed<StdAtomicObj> p2(p1);
+ }
+
+ // p1 still maintains ownership. The object is not deleted.
+ p1.getPtr()->foo(); // no-warning
+}
+
+
+void testDestroyLocalRefPtrWithAtomicsDifferentlyNamed(
+ const DifferentlyNamed<StdAtomicObj> &p1) {
{
- IntrusivePtr p2(p1);
+ DifferentlyNamed<StdAtomicObj> p2(p1);
}
// p1 still maintains ownership. The object is not deleted.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hah!
I've just debugged an identical case last week, wrt. llvm intrusive pointers, which also used std::atomics inside for the refcount and lead to a Leak FP.
When I dag into the case, I realized that the root cause is that we can't track the value of the std::atomic
from zero, inc, dec and then fail to prove that it must be equal to zero and take the "delete" path.
This is not strictly related to leak reports, probably that's the most affected checker - but one can craft any other case with other checkers too. This is why I wanted a solution not special-casing the leak checker, but something more generic.
I considered modeling the atomic member functions, until the first place they escape, after which we can no longer ever reason about them. This made me to look into suppressions.
My first idea was to check what "interesting symbols" we have after a BR traversal and check if any is a conjured one conjured for a callexpr statement calling a std::atomic-related function - and suppress the report in that case.
I realized that this wouldn't be enough, as control dependencies such as conditions, wouldn't be considered for leaks, thus not suppress the motivating FP example.
My next idea was to also check the TrackedConditions
of the bug report, and look for expressions that refer to a subexpression for which we associate a conjured symbol, etc. and apply the same logic as before.
It didn't work because it would suppress all paths crossing that condition, no matter if the branch was taken or not.
Now, the final idea was to only consider conditions, which dominate the basic block of the bug report itself. This wouldn't solve the leak FP per-say, but would work for any other bug report.
I believe, that such a heuristic with dominators could work for the rest of the checkers - where the bug report is attached to some given statement - and not delayed until some other future statement like in the leak checker.
All in all, I think the suppression you propose here makes sense.
I've just left here what my thought process was when I dig into a similar case. It might be useful, who knows.
Medium-to-long-term I think an attribute-based approach might make sense there:
Or both. I've definitely seen a few codebases, that are security-critical to a large-ish chunk of humanity, that have like 20 competing ad-hoc reference-counting schemes. Mostly in plain C where MallocChecker would otherwise be very useful, if it wasn't ruined by false positives of a similar nature from all these reference counting schemes. These schemes probably don't make sense to hardcode in the compiler because there's no inheritance so you'd need to hardcode every struct that follows one of those schemes, not just every scheme in and of itself.
Yeah I think it's not sufficient to model the atomics. It's a good idea to model them anyway, but even when atomics aren't used (eg. the custom smart pointer never needed to be thread-safe), the fundamental problem is that we still don't know the initial value of the reference count. In particular we can't refute the possibility that the original value was like Modeling atomics would make it better when the smart pointer was first created during analysis, so we actually know that the initial value is 0. Then by carefully tracking it we might arrive to the correct conclusion. But when the initial value is symbolic we're fundamentally powerless without the domain-specific knowledge that the value could not have been It's possible that this domain-specific knowledge could be transferred to us with the help of well-placed assertions across the smart pointer class. But an attribute could achieve the same in a more direct manner.
Yes, I think there has to be something good in this area, even though I haven't got any good specific solutions in my head. @Szelethus did a lot of initial experimentation in this area, which resulted in improved condition tracking, but we haven't used it for false positive suppression yet. Once we research this deeper and make it more principled, maybe we should really start doing that. It may also be a good idea to not look at the bug report in isolation, but consider the entire space of execution paths on which it was found. If the space isn't "vast" enough to convince us that we aren't stepping onto an #61669, suppress the warning. |
I was actually thinking about something similar to that sort of conundrum, and I did have some ideas that don't involve whole program analysis:
|
@sharkautarch Yeah I think the situation where we observe the allocation site during analysis may be significantly different in presence of smart pointers. It may be a good idea to include such check in the heuristic, to reclaim some of the false negatives currently silenced by it. If we've seen the initial reference count and we've modeled it perfectly so far, these false positives can't happen. But this implies that we model our atomics perfectly. And given that we don't even know what a "reference count" is (maybe it's just some unrelated integer?), it's quite hard to get that right. Also we don't really do "whole program analysis" and we probably can't do that without major changes to our technique; it simply doesn't scale to that scale. Our existing cross-TU mode is more about occasionally breaking translation unit boundaries when we really want something specific from another TU (like the effects of a function call we just encountered), but we still focus on small portions of the program at a time. We never really derive whole-program conclusions from these small independent invocations of our analysis. That'd require very different technology. It's not necessarily difficult per se – we could develop such technology and occasionally make our main analysis interoperate with it. But we haven't done any of that yet and our cross-TU mode isn't that kind of cross-TU mode. |
Fixes #90498.
Same as 5337efc for atomic builtins, but for
std::atomic
this time. This is useful because even though the actual builtin atomic is still there, it may be buried beyond the inlining depth limit.Also add one popular custom smart pointer class name to the name-based heuristics, which isn't necessary to fix the bug but arguably a good idea regardless.