Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[clang][static analyzer] false positive cplusplus.NewDelete ("use after free") w/ reference counting #90498

Closed
sharkautarch opened this issue Apr 29, 2024 · 8 comments · Fixed by #90918
Labels
clang:static analyzer false-positive Warning fires when it should not

Comments

@sharkautarch
Copy link

sharkautarch commented Apr 29, 2024

version of clang: 17.0.6 (archlinux pkg)

(see also: this closed issue thread w/ more details about the false positive: ValveSoftware/gamescope#1275)
After Joshua-Ashton did a refactor of a reference counting class, I started seeing a new use-after-free warning from scan-build:

[419/450] Compiling C++ object src/gamescope.p/commit.cpp.o
../../../src/commit.cpp:116:12: warning: Use of memory after it is freed [cplusplus.NewDelete]
  116 |     close( m_nCommitFence );
      |            ^~~~~~~~~~~~~~
1 warning generated.

as detailed under the last comment I made in the issue thread here: ValveSoftware/gamescope#1275 (comment)
I found that, in this situation, the static analyzer is:

  1. making a conclusion on use-after-free with incomplete information (only looking at local translation unit it seems)
  2. just assuming that if it can't definitively prove, with its incomplete information, that if (<condition>) { delete this } won't be taken, then it'll evaluate both the branch-taken path and the branch-not-taken path.

what should happen instead:
Whenever the static analyzer sees free or delete inside something like an if, wherein it currently doesn't know if the branching condition will always evaluate to true or false, it should check beyond the local translation unit to try to figure out if said condition would always turn out to be true or always turn out to be false.

@github-actions github-actions bot added the clang Clang issues not falling into any other category label Apr 29, 2024
@EugeneZelenko EugeneZelenko added clang:static analyzer false-positive Warning fires when it should not and removed clang Clang issues not falling into any other category labels Apr 29, 2024
@llvmbot
Copy link
Member

llvmbot commented Apr 29, 2024

@llvm/issue-subscribers-clang-static-analyzer

Author: None (sharkautarch)

version of clang: `17.0.6` (archlinux pkg)

(see also: this closed issue thread w/ more details about the false positive: ValveSoftware/gamescope#1275)
After Joshua-Ashton did a refactor of a reference counting class, I started seeing a new use-after-free warning from scan-build:

[419/450] Compiling C++ object src/gamescope.p/commit.cpp.o
../../../src/commit.cpp:116:12: warning: Use of memory after it is freed [cplusplus.NewDelete]
  116 |     close( m_nCommitFence );
      |            ^~~~~~~~~~~~~~
1 warning generated.

as detailed under the last comment I made in the issue thread here: ValveSoftware/gamescope#1275 (comment)
I found that, in this situation, the static analyzer is:

  1. making a conclusion on use-after-free with incomplete information (only looking at local translation unit it seems)
  2. just assuming that if it can't definitively prove, with its incomplete information, that if (&lt;condition&gt;) { delete this } won't be taken, then it'll evaluate both the branch-taken path and the branch-not-taken path.

what should happen instead:
Whenever the static analyzer sees free or delete inside something like an if, wherein it currently doesn't know if the branching condition will always evaluate to true or false, it should check beyond the local translation unit to try to figure out if said condition would always turn out to be true or always turn out to be false.

@steakhal
Copy link
Contributor

steakhal commented Apr 30, 2024

Hi, it's great to see projects using the Static Analyzer, and even reporting FPs. Thank you.
Could you upload the preprocessed code for the issue?
You can just pass the -E flag to the compilation command you have for that file and upload the result.
After that, I could (probably) reproduce the issue myself and look into this.

as detailed under the last comment I made in the issue thread here: ValveSoftware/gamescope#1275 (comment) I found that, in this situation, the static analyzer is:

  1. making a conclusion on use-after-free with incomplete information (only looking at local translation unit it seems)

  2. just assuming that if it can't definitively prove, with its incomplete information, that if (<condition>) { delete this } won't be taken, then it'll evaluate both the branch-taken path and the branch-not-taken path.

I haven't read the linked discussion, but your statements are true: we only reason about a single TU at a time; and assuming no dead code, we consider that both branches are possible to take. It's difficult for me to assess this particular case, and there could be caveats of course. So let's see a detailed investigation.

what should happen instead: Whenever the static analyzer sees free or delete inside something like an if, wherein it currently doesn't know if the branching condition will always evaluate to true or false, it should check beyond the local translation unit to try to figure out if said condition would always turn out to be true or always turn out to be false.

Some have already tried cross-translation unit (CTU) analysis, but my experience is that it's hard, and the tradeoffs are not great. The way we do path-sensitive analysis in the CSA, it's not really suitable for scalable CTU analysis.
So, I'm afraid, you need to live with this. However, usually preconditions and invariants can be expressed via assert calls, and that would suppress the execution paths for CSA as well (if the code is analyzed in a setting where the assert macro is expanded to a branch and a call to a noreturn function on the branch where the condition is not satisfied. This allows CSA to prove the condition, and eliminate the branch later where the FP would be found.
Let me know if it would help, but send me the preprocessed code to actually have a look.

@sharkautarch
Copy link
Author

@steakhal
I ended up making a whole repo with preprocessed output for two different branches: vanilla branch, and a sanity-check branch I had made:
https://github.com/sharkautarch/gamescope-preprocessed-for-static-analyzer-FP
Fair warning that I kinda just copied whole directories, but on the bright side, the build directories have preprocessed .i files for every .o counterpart

@haoNoQ
Copy link
Collaborator

haoNoQ commented May 2, 2024

Folks, I haven't actually looked at it yet, but at a glance one thing worries me: it looks to me as if you aren't actually reading static analyzer warnings as intended. The "warning: Use of memory after it is freed" text you see on the command line isn't the actual output, this is more of a notification that a warning exists. You should open the HTML files produced by scan-build in the -o directory to see a step-by-step explanation of why the static analyzer thinks this happens, under which conditions, which specific branches were taken.

(Please ignore me if you're already doing this. It's just that a surprising amount of users completely miss this. We should make a better message about this.)

These step-by-step reports are the intended way to consume static analyzer warnings; IDE integration also usually looks similar to this. Static analyzer warnings are emitted against an entire execution path, not a single line of code; the final statement may not be true on a different execution path. The step-by-step report may ultimately still be a false positive, but in many cases it'd actually explain to you what the analyzer was thinking and where it made a mistake, and maybe even help you find an elegant way to suppress the warning (while we're getting it fixed on our end).

You definitely shouldn't debug this with debug prints inside clang. The analyzer simulates branch-y code out of order so you just get prints from different universes mixed up. For "hardcore" debugging we usually print graphviz graphs of execution paths explored by the analyzer (like ViewExplodedGraph and such).

@sharkautarch
Copy link
Author

sharkautarch commented May 2, 2024

Folks, I haven't actually looked at it yet, but at a glance one thing worries me: it looks to me as if you aren't actually reading static analyzer warnings as intended. The "warning: Use of memory after it is freed" text you see on the command line isn't the actual output, this is more of a notification that a warning exists. You should open the HTML files produced by scan-build in the -o directory to see a step-by-step explanation of why the static analyzer thinks this happens, under which conditions, which specific branches were taken.

(Please ignore me if you're already doing this. It's just that a surprising amount of users completely miss this. We should make a better message about this.)

These step-by-step reports are the intended way to consume static analyzer warnings; IDE integration also usually looks similar to this. Static analyzer warnings are emitted against an entire execution path, not a single line of code; the final statement may not be true on a different execution path. The step-by-step report may ultimately still be a false positive, but in many cases it'd actually explain to you what the analyzer was thinking and where it made a mistake, and maybe even help you find an elegant way to suppress the warning (while we're getting it fixed on our end).

You definitely shouldn't debug this with debug prints inside clang. The analyzer simulates branch-y code out of order so you just get prints from different universes mixed up. For "hardcore" debugging we usually print graphviz graphs of execution paths explored by the analyzer (like ViewExplodedGraph and such).

Good to know that you’re actually supposed to look at the browser view to get more information.
I’ll admit that I didn’t look at the browser view for this warning but:

  • Unless I missed it, I never saw any documentation indicating that browser view shows more information than what’s printed to the console
  • I’ve actually looked at the browser view for other warnings in the past (on multiple occasions), and I’ve never felt I’d gain any helpful insight from it. so I didn’t see any reason to look at the browser view for this warning…

Also about the graphvis exploded graph stuff, I actually tried to dump a debug graph and looked at it with graphvis, but there were so many nodes that I couldn’t make heads or tails about it. Though maybe I picked the wrong graph option thingy?

haoNoQ added a commit to haoNoQ/llvm-project that referenced this issue May 2, 2024
…pression.

Fixes llvm#90498.

Same as 5337efc for atomic builtins, but for `std::atomic` this time.
This is useful because even though the actual builtin atomic is still there,
it may be buried beyond the inlining depth limit.

Also add one popular custom smart pointer class name to the name-based
heuristics, which isn't necessary to fix the bug but arguably a good idea
regardless.
@haoNoQ
Copy link
Collaborator

haoNoQ commented May 2, 2024

Yes I completely agree that we have a messaging problem here. We should probably turn on full text output by default. It's not readable whatsoever but at least it gives you complete information no matter where you look. (I like how gcc -fanalyzer does a better job at full text output but I think the overall conclusion is still the same, it's very hard to read these paths without properly overlaying them on top of your code.) Or remove text output entirely from scan-build, but that'd probably cause even more confusion. And in any case, yeah, we should be more clear in man scan-build and scan-build --help, they should say that the html output is the actual output. I'll probably try to patch this up in a moment.

I also think you're right that most of our warnings are quite decipherable without the full path. Especially if it's freshly written code. (Also our "dead stores" warnings aren't even path-sensitive and they're our "loudest" warnings usually.) But I'm coming from a more bayesian perspective here: a large portion of LLVM bug reports about false positives were historically explained away into a true positive when the full path is considered.

It's probably also true that those non-trivial-path reports are the most "important" ones. We find weird edgecase paths that haven't been considered by the programmer, that haven't been covered by runtime tests. Simpler bugs can be found with much simpler technology, compile-time or run-time.


In any case, yeah I think you're right, this report looks pretty bad. Here's (part of) the HTML view:

Screenshot 2024-05-02 at 12 51 57 PM

And the most nested stack frame for "memory was released" is:

Screenshot 2024-05-02 at 12 52 24 PM

The highlighted step "(12) Assuming 'uRefPrivate' is 0" indicates that the static analyzer doesn't know that the reference count drops to zero, but it has to consider this possibility because an immediate branch depends on it. So technically the use-after-free would happen under the assumption that on step (12) the last remaining reference to the object is removed and the reference count drops to zero.

Notice how step (12) isn't part of RemoveWaitable(); it's part of the destructor of ~Rc which we've entered on step (4) and exited on step (19). Which is correct because we definitely can't deallocate memory before we get to this destructor, but this doesn't really tell us what happened inside RemoveWaitable() and if we think ownership has actually been consumed. (This is because we hide the bubbles that correspond to parts of the execution path that don't "look" relevant. You can see the "real" full path with -analyzer-config prune-paths=false but this is really not recommended; this may have hundreds or even thousands of steps, especially in C++.)

But it sounds like it's fairly safe to assume that ownership wasn't transfered. Even if it was from some philosophical perspective, there's clearly the m_RemovedWaitables vector that continues to hold one reference. (Or this could be the !pWaitable->HasLiveReferences() branch but in this case we've probably been in trouble for a while already.)

So what really happened is: on step (12) the static analyzer started exploring the possibility that the reference count was 0 from the start.

And this is a really annoying problem that quickly goes back to the fundamentals of the analysis technique that we're using, roughly explained in the bug #61669 for which we don't really have a very good solution; we're probably stuck with it for a while. Basically, whenever we see a branch, we cannot continue the analysis at all until we "fork" the analysis to explore both sides of the branch independently. And then the two parallel universes never merge back.

Which is fine in most cases. We just see the branch and assume that it's there "for a reason". But often bites us in a few specific areas. C++ reference-counting smart pointers are one of such areas. Such false positives are common in their presense.

We've got some very crude heuristics to deal with this smart pointer related false positives. One of them works by noticing that the delete operation happened in a destructor and was surrounded by atomic operations. That's probably why shared_ptr works correctly but your custom smart pointer doesn't: we didn't react to the use of std::atomic in the destructor, we only knew about manual builtin atomic operations. (The actual builtin atomic was still there in the header but it's buried so deep in the call stack that we've run out of "budget" inlining it.)

So, let me improve that heuristic real quick. This appears to take care of your false positive. #90918

@haoNoQ
Copy link
Collaborator

haoNoQ commented May 2, 2024

Also about the graphvis exploded graph stuff, I actually tried to dump a debug graph and looked at it with graphvis, but there were so many nodes that I couldn’t make heads or tails about it. Though maybe I picked the wrong graph option thingy?

Yeah the budget limit is -analyzer-config max-nodes=225000 analysis steps, each corresponds to a node. These are indeed hard to read. In case of false positives you should definitely use -trim-egraph, and there's this whole exploded-graph-rewriter.py thing for more clever tricks, but other than that we just get good at Ctrl+F searching and binary-searching. This is unfortunately essential because it's very hard to "reduce" code to manageable size (be it manually or with tools like creduce) without throwing away the root cause of the unwanted behavior. I have a conference talk about this if you're interested (https://www.youtube.com/watch?v=g0Mqx1niUi0). But, generally, this really isn't what you should be ever forced to do as a user.

@sharkautarch
Copy link
Author

@haoNoQ Thanks for the advice & help! :)

haoNoQ added a commit that referenced this issue May 9, 2024
…pression. (#90918)

Fixes #90498.

Same as 5337efc for atomic builtins, but for `std::atomic` this
time. This is useful because even though the actual builtin atomic is
still there, it may be buried beyond the inlining depth limit.

Also add one popular custom smart pointer class name to the name-based
heuristics, which isn't necessary to fix the bug but arguably a good
idea regardless.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:static analyzer false-positive Warning fires when it should not
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants