-
Notifications
You must be signed in to change notification settings - Fork 12k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[lldb][libc++] Hide all libc++ implementation details from stacktraces #108870
Conversation
@llvm/pr-subscribers-lldb @llvm/pr-subscribers-libcxx Author: Adrian Vogelsgesang (vogelsgesang) ChangesThis commit changes the libc++ frame recognizer to hide implementation details of libc++ more aggressively. The applied heuristic is rather straightforward: We consider every function name starting with This works pretty neatly for However, it is noteworthy, that this does not work 100% in all cases: E.g., for Full diff: https://github.com/llvm/llvm-project/pull/108870.diff 5 Files Affected:
diff --git a/libcxx/docs/UserDocumentation.rst b/libcxx/docs/UserDocumentation.rst
index 3651e52ed77a75..f48d65ee0e753b 100644
--- a/libcxx/docs/UserDocumentation.rst
+++ b/libcxx/docs/UserDocumentation.rst
@@ -343,6 +343,32 @@ Third-party Integrations
Libc++ provides integration with a few third-party tools.
+Debugging libc++ internals in LLDB
+----------------------------------
+
+LLDB hides the implementation details of libc++ by default.
+
+E.g., when setting a breakpoint in a comparator passed to ``std::sort``, the
+backtrace will read as
+
+.. code-block::
+
+ (lldb) thread backtrace
+ * thread #1, name = 'a.out', stop reason = breakpoint 3.1
+ * frame #0: 0x000055555555520e a.out`my_comparator(a=1, b=8) at test-std-sort.cpp:6:3
+ frame #7: 0x0000555555555615 a.out`void std::__1::sort[abi:ne200000]<std::__1::__wrap_iter<int*>, bool (*)(int, int)>(__first=(item = 8), __last=(item = 0), __comp=(a.out`my_less(int, int) at test-std-sort.cpp:5)) at sort.h:1003:3
+ frame #8: 0x000055555555531a a.out`main at test-std-sort.cpp:24:3
+
+Note how the caller of ``my_comparator`` is shown as ``std::sort``. Looking at
+the frame numbers, we can see that frames #1 until #6 were hidden. Those frames
+represent internal implementation details such as ``__sort4`` and similar
+utility functions.
+
+To also show those implementation details, use ``thread backtrace -u``.
+Alternatively, to disable those compact backtraces for good, use
+``frame recognizer list`` and ``frame recognizer delete`` to delete the libc++
+frame recognizer.
+
GDB Pretty printers for libc++
------------------------------
diff --git a/lldb/source/Plugins/LanguageRuntime/CPlusPlus/CPPLanguageRuntime.cpp b/lldb/source/Plugins/LanguageRuntime/CPlusPlus/CPPLanguageRuntime.cpp
index faa05e8f834ea1..d0e84bdeb94f01 100644
--- a/lldb/source/Plugins/LanguageRuntime/CPlusPlus/CPPLanguageRuntime.cpp
+++ b/lldb/source/Plugins/LanguageRuntime/CPlusPlus/CPPLanguageRuntime.cpp
@@ -45,7 +45,7 @@ char CPPLanguageRuntime::ID = 0;
/// A frame recognizer that is installed to hide libc++ implementation
/// details from the backtrace.
class LibCXXFrameRecognizer : public StackFrameRecognizer {
- std::array<RegularExpression, 4> m_hidden_regex;
+ std::array<RegularExpression, 2> m_hidden_regex;
RecognizedStackFrameSP m_hidden_frame;
struct LibCXXHiddenFrame : public RecognizedStackFrame {
@@ -55,28 +55,17 @@ class LibCXXFrameRecognizer : public StackFrameRecognizer {
public:
LibCXXFrameRecognizer()
: m_hidden_regex{
- // internal implementation details of std::function
+ // internal implementation details in the `std::` namespace
// std::__1::__function::__alloc_func<void (*)(), std::__1::allocator<void (*)()>, void ()>::operator()[abi:ne200000]
// std::__1::__function::__func<void (*)(), std::__1::allocator<void (*)()>, void ()>::operator()
// std::__1::__function::__value_func<void ()>::operator()[abi:ne200000]() const
- RegularExpression{""
- R"(^std::__[^:]*::)" // Namespace.
- R"(__function::.*::operator\(\))"},
- // internal implementation details of std::function in ABI v2
// std::__2::__function::__policy_invoker<void (int, int)>::__call_impl[abi:ne200000]<std::__2::__function::__default_alloc_func<int (*)(int, int), int (int, int)>>
- RegularExpression{""
- R"(^std::__[^:]*::)" // Namespace.
- R"(__function::.*::__call_impl)"},
- // internal implementation details of std::invoke
- // std::__1::__invoke[abi:ne200000]<void (*&)()>
- RegularExpression{
- R"(^std::__[^:]*::)" // Namespace.
- R"(__invoke)"},
- // internal implementation details of std::invoke
- // std::__1::__invoke_void_return_wrapper<void, true>::__call[abi:ne200000]<void (*&)()>
- RegularExpression{
- R"(^std::__[^:]*::)" // Namespace.
- R"(__invoke_void_return_wrapper<.*>::__call)"}
+ // std::__1::__invoke[abi:ne200000]<void (*&)()>
+ // std::__1::__invoke_void_return_wrapper<void, true>::__call[abi:ne200000]<void (*&)()>
+ RegularExpression{R"(^std::__[^:]*::__)"},
+ // internal implementation details in the `std::ranges` namespace
+ // std::__1::ranges::__sort::__sort_fn_impl[abi:ne200000]<std::__1::__wrap_iter<int*>, std::__1::__wrap_iter<int*>, bool (*)(int, int), std::__1::identity>
+ RegularExpression{R"(^std::__[^:]*::ranges::__)"},
},
m_hidden_frame(new LibCXXHiddenFrame()) {}
diff --git a/lldb/test/API/lang/cpp/libcxx-internals-recognizer/Makefile b/lldb/test/API/lang/cpp/libcxx-internals-recognizer/Makefile
new file mode 100644
index 00000000000000..bb571299664934
--- /dev/null
+++ b/lldb/test/API/lang/cpp/libcxx-internals-recognizer/Makefile
@@ -0,0 +1,5 @@
+CXX_SOURCES := main.cpp
+USE_LIBCPP := 1
+CXXFLAGS_EXTRAS := -std=c++20
+
+include Makefile.rules
diff --git a/lldb/test/API/lang/cpp/libcxx-internals-recognizer/TestLibcxxInternalsRecognizer.py b/lldb/test/API/lang/cpp/libcxx-internals-recognizer/TestLibcxxInternalsRecognizer.py
new file mode 100644
index 00000000000000..a5b4e4fe995c38
--- /dev/null
+++ b/lldb/test/API/lang/cpp/libcxx-internals-recognizer/TestLibcxxInternalsRecognizer.py
@@ -0,0 +1,56 @@
+import lldb
+from lldbsuite.test.decorators import *
+from lldbsuite.test.lldbtest import *
+from lldbsuite.test import lldbutil
+
+
+class LibCxxInternalsRecognizerTestCase(TestBase):
+ NO_DEBUG_INFO_TESTCASE = True
+
+ @add_test_categories(["libc++"])
+ def test_frame_recognizer(self):
+ """Test that implementation details of libc++ are hidden"""
+ self.build()
+ (target, process, thread, bkpt) = lldbutil.run_to_source_breakpoint(
+ self, "break here", lldb.SBFileSpec("main.cpp")
+ )
+
+ expected_parents = {
+ "sort_less(int, int)": ["::sort", "test_algorithms"],
+ # `std::ranges::sort` is implemented as an object of types `__sort`,
+ # which unfortunately means that there is no `std::ranges::sort`
+ # stack frame, and `main` is the direct parent of `my_less_ranges`.
+ "ranges_sort_less(int, int)": ["test_algorithms"],
+ # `ranges::views::transform` internally uses `std::invoke`, and that
+ # call also shows up in the stack trace
+ "view_transform(int)": ["::invoke", "ranges::transform_view", "test_algorithms"],
+ # Various types of `invoke` calls
+ "consume_number(int)": ["::invoke", "test_invoke"],
+ "invoke_add(int, int)": ["::invoke", "test_invoke"],
+ "Callable::member_function(int) const": ["::invoke", "test_invoke"],
+ "Callable::operator()(int) const": ["::invoke", "test_invoke"],
+ # Containers
+ "MyKey::operator<(MyKey const&) const": ["less", "::emplace", "test_containers"],
+ }
+ stop_set = set()
+ while process.GetState() != lldb.eStateExited:
+ fn = thread.GetFrameAtIndex(0).GetFunctionName()
+ stop_set.add(fn)
+ self.assertIn(fn, expected_parents.keys())
+ frame_id = 1
+ for expected_parent in expected_parents[fn]:
+ # Skip all hidden frames
+ while (
+ frame_id < thread.GetNumFrames()
+ and thread.GetFrameAtIndex(frame_id).IsHidden()
+ ):
+ frame_id = frame_id + 1
+ # Expect the correct parent frame
+ self.assertIn(
+ expected_parent, thread.GetFrameAtIndex(frame_id).GetFunctionName()
+ )
+ frame_id = frame_id + 1
+ process.Continue()
+
+ # Make sure that we actually verified all intended scenarios
+ self.assertEqual(len(stop_set), len(expected_parents))
diff --git a/lldb/test/API/lang/cpp/libcxx-internals-recognizer/main.cpp b/lldb/test/API/lang/cpp/libcxx-internals-recognizer/main.cpp
new file mode 100644
index 00000000000000..870301b0970439
--- /dev/null
+++ b/lldb/test/API/lang/cpp/libcxx-internals-recognizer/main.cpp
@@ -0,0 +1,86 @@
+#include <algorithm>
+#include <functional>
+#include <map>
+#include <ranges>
+#include <vector>
+
+bool sort_less(int a, int b) {
+ __builtin_printf("break here");
+ return a < b;
+}
+
+bool ranges_sort_less(int a, int b) {
+ __builtin_printf("break here");
+ return a < b;
+}
+
+int view_transform(int a) {
+ __builtin_printf("break here");
+ return a * a;
+}
+
+void test_algorithms() {
+ std::vector<int> vec{8, 1, 3, 2};
+
+ // The internal frames for `std::sort` should be hidden
+ std::sort(vec.begin(), vec.end(), sort_less);
+
+ // The internal frames for `ranges::sort` should be hidden
+ std::ranges::sort(vec.begin(), vec.end(), ranges_sort_less);
+
+ // Same for views
+ for (auto x : vec | std::ranges::views::transform(view_transform)) {
+ // no-op
+ }
+}
+
+void consume_number(int i) { __builtin_printf("break here"); }
+
+int invoke_add(int i, int j) {
+ __builtin_printf("break here");
+ return i + j;
+}
+
+struct Callable {
+ Callable(int num) : num_(num) {}
+ void operator()(int i) const { __builtin_printf("break here"); }
+ void member_function(int i) const { __builtin_printf("break here"); }
+ int num_;
+};
+
+void test_invoke() {
+ // Invoke a void-returning function
+ std::invoke(consume_number, -9);
+
+ // Invoke a non-void-returning function
+ std::invoke(invoke_add, 1, 10);
+
+ // Invoke a member function
+ const Callable foo(314159);
+ std::invoke(&Callable::member_function, foo, 1);
+
+ // Invoke a function object
+ std::invoke(Callable(12), 18);
+}
+
+struct MyKey {
+ int x;
+ bool operator==(const MyKey &) const = default;
+ bool operator<(const MyKey &other) const {
+ __builtin_printf("break here");
+ return x < other.x;
+ }
+};
+
+void test_containers() {
+ std::map<MyKey, int> map;
+ map.emplace(MyKey{1}, 2);
+ map.emplace(MyKey{2}, 3);
+}
+
+int main() {
+ test_algorithms();
+ test_invoke();
+ test_containers();
+ return 0;
+}
|
✅ With the latest revision this PR passed the Python code formatter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like a good idea, but I am also a bit scared of hiding too much stuff and making it difficult to figure out what's going wrong from a stack trace. It's not rational, I have no evidence to back this up and perhaps this is only fear of unknown.
Factually, the heuristic of using __
makes sense to me since all implementation-detail functions start with __
, and nothing in the public libc++ API starts with __
(off the top of my head).
Overall, I think I like this patch but like I said I'm a bit scared of it being too aggressive.
Yes, this is also my main concern here. E.g., for Do you have an idea how we could de-risk this change? E.g., would do you have a couple more scenarios in mind for which I should add test cases? Are there similar patterns to the Do we have some libc++ developers which are using lldb top-of-tree for their day-to-day work on libc++ and who would make us aware in case this commit turns out to be too aggressive? |
When you're stopped in a frame and you have libc++ frames above you in the backtrace, you're either stopped in a callback (like I wonder if a heuristic for the callback case could be: mark all but the first libc++ entry point as an implementation detail. Would this be even more aggressive? Also, the implementation of this might be a bit awkward. Not sure. Wdyt? |
f0ffdc6
to
10dfb91
Compare
What I could imagine is to drive this via attributes instead of a heuristic. We could potentially mark implementation details of libc++ as such to control what the debugging experience is like at a much finer grain if we used attributes, but obviously this would also increase complexity in the code and it would be yet another thing that we have to slap on almost every declaration.
I'm not aware of anyone. I just use |
Do we need to worry about whether this recognizer hides frames that might be useful to someone actually working on the libc++ they are debugging? I would imagine in that case, you'd probably want to turn off this recognizer anyway, since you want to see the details. Seems to me like this feature should be tuned to "users of libc++" not "developers of libc++". We really should add a "disable" to the frame recognizers. Once you delete them you can't really get the built-in ones back. People who are working ON libc++ still spend a bunch of their time debugging "uses" of it, and likely would benefit from the implementation hiding, then every so often want to see the details. We should make that easier. |
Do you mean something like #109219? (Test cases still missing; rest should be ready for review)
Agree, I think that should be the goal here
🤔 Might be a good idea... I will have to experiment a bit with this... |
Maybe we could also find some other convention. E.g. hiding the
|
This commit changes the libc++ frame recognizer to hide implementation details of libc++ more aggressively. The applied heuristic is rather straightforward: We consider every function name starting with `__` as an implementation detail. This works pretty neatly for `std::invoke`, `std::function`, `std::sort`, `std::map::emplace` and many others. Also, this should align quite nicely with libc++'s general coding convention of using the `__` for their implementation details, thereby keeping the future maintenance effort low. However, it is noteworthy, that this does not work 100% in all cases: E.g., for `std::ranges::sort`, itself is not really a function call, but an object with an overloaded `operator()`, which means that there is no actual call `std::ranges::sort` in the call stack.
10dfb91
to
d31e3c5
Compare
✅ With the latest revision this PR passed the C/C++ code formatter. |
Merged #109219
I combined your approach with the |
|
||
expected_parents = { | ||
"sort_less(int, int)": ["::sort", "test_algorithms"], | ||
# `std::ranges::sort` is implemented as an object of types `__sort`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There're a large number of such customization point objects (and niebloids, which will be respecified as CPOs soon, see P3136R0) since C++20. Should we invent some convention to recognize them uniformly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure that such a convention would be necessary. With the current heuristic ("never hide a frame which was immediately called from user code"), the end result to the user is fine, despite the debug info containing the __sort::operator()
function name
lldb/source/Plugins/LanguageRuntime/CPlusPlus/CPPLanguageRuntime.cpp
Outdated
Show resolved
Hide resolved
lldb/source/Plugins/LanguageRuntime/CPlusPlus/CPPLanguageRuntime.cpp
Outdated
Show resolved
Hide resolved
lldb/source/Plugins/LanguageRuntime/CPlusPlus/CPPLanguageRuntime.cpp
Outdated
Show resolved
Hide resolved
@Michael137 thanks for the review! I updated the pull request accordingly |
82c4f0a
to
a02a623
Compare
On Sep 23, 2024, at 11:46 AM, Adrian Vogelsgesang ***@***.***> wrote:
@vogelsgesang commented on this pull request.
In libcxx/docs/UserDocumentation.rst <#108870 (comment)>:
> +
+ (lldb) thread backtrace
+ * thread #1, name = 'a.out', stop reason = breakpoint 3.1
+ * frame #0: 0x000055555555520e a.out`my_comparator(a=1, b=8) at test-std-sort.cpp:6:3
+ frame #7: 0x0000555555555615 a.out`void std::__1::sort[abi:ne200000]<std::__1::__wrap_iter<int*>, bool (*)(int, int)>(__first=(item = 8), __last=(item = 0), __comp=(a.out`my_less(int, int) at test-std-sort.cpp:5)) at sort.h:1003:3
+ frame #8: 0x000055555555531a a.out`main at test-std-sort.cpp:24:3
+
+Note how the caller of ``my_comparator`` is shown as ``std::sort``. Looking at
+the frame numbers, we can see that frames #1 until #6 were hidden. Those frames
+represent internal implementation details such as ``__sort4`` and similar
+utility functions.
+
+To also show those implementation details, use ``thread backtrace -u``.
+Alternatively, to disable those compact backtraces, use ``frame recognizer list``
+and ``frame recognizer disable`` on the "libc++ frame recognizer".
+
I assume that this mechanism can be disabled via settings set target.process.thread.step-avoid-regexp ""? Or would that match every frame and break stepping completely?
The `thread.step-avoid-regexp` should be independent of this hiding of frames. For instance, if I have a bunch of teeny accessors in my library, I might just want not to step into them, not because I want to hide them in general, but they just make stepping annoying. That's also an appropriate use of this regexp. Even in the usage for C++ avoidance, I don't think the two are overlapping. After all, std::vector<T>::size isn't an "implementation detail" of std in the same way all these shims are, but I still don't want to step into it... Anyway, this one belongs to the user so it seems the wrong place to hook up this hiding.
It does seem reasonable for both step-in and step-out to avoid "hidden" frames by not stopping in them. IIRC, Adrian's initial patch made step-out step past all the hidden frames. I don't remember seeing anything in that patch that makes "step in to hidden frame" turn into step-out but that also seems a reasonable behavior. In both cases, you still run the risk of an over-eager hider turning "step-out" into "continue", however.
IMO, the step-avoid-regex & step-avoid-library settings should be independent of this hiding feature, and should belong to the user to mess with independent of these features. It also seems pretty reasonable that if you step out to a frame we were going to hide, or into a frame that we were going to hide, we step you back out. And I also think it's reasonable that turning off this stepping behavior is done by turning off the hiding. I can't see the need for a separate setting for this.
Jim
… —
Reply to this email directly, view it on GitHub <#108870 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADUPVW25DOTCBKLWGEQYCSTZYBOX5AVCNFSM6AAAAABOJ4R5PSVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDGMRTGA2TOMRZGE>.
You are receiving this because you are on a team that was mentioned.
|
@jimingham I agree with everything you wrote. I think you misunderstood the context based on the way GitHub provided you some misleading context in the email notification |
Yeah, I should stop trying to read the notifications in the GitHub emails and just use them as a link to the website. Sigh... Anyway, the real question was "what does an empty string do in these regex's". The answer is the one you probably suspected, an empty string means "no regular expression to apply", not ".*". |
lldb/source/Plugins/LanguageRuntime/CPlusPlus/CPPLanguageRuntime.cpp
Outdated
Show resolved
Hide resolved
(I will be waiting for one or two more weeks before merging this, in case there are any additional comments) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regexes look sane (and sufficiently restricted) to me, thanks!
llvm#108870) This commit changes the libc++ frame recognizer to hide implementation details of libc++ more aggressively. The applied heuristic is rather straightforward: We consider every function name starting with `__` as an implementation detail. This works pretty neatly for `std::invoke`, `std::function`, `std::sort`, `std::map::emplace` and many others. Also, this should align quite nicely with libc++'s general coding convention of using the `__` for their implementation details, thereby keeping the future maintenance effort low. However, this heuristic by itself does not work in 100% of the cases: E.g., `std::ranges::sort` is not a function, but an object with an overloaded `operator()`, which means that there is no actual call `std::ranges::sort` in the call stack. Instead, there is a `std::ranges::__sort::operator()` call. To make sure that we don't hide this stack frame, we never hide the frame which represents the entry point from user code into libc++ code
llvm#108870) This commit changes the libc++ frame recognizer to hide implementation details of libc++ more aggressively. The applied heuristic is rather straightforward: We consider every function name starting with `__` as an implementation detail. This works pretty neatly for `std::invoke`, `std::function`, `std::sort`, `std::map::emplace` and many others. Also, this should align quite nicely with libc++'s general coding convention of using the `__` for their implementation details, thereby keeping the future maintenance effort low. However, this heuristic by itself does not work in 100% of the cases: E.g., `std::ranges::sort` is not a function, but an object with an overloaded `operator()`, which means that there is no actual call `std::ranges::sort` in the call stack. Instead, there is a `std::ranges::__sort::operator()` call. To make sure that we don't hide this stack frame, we never hide the frame which represents the entry point from user code into libc++ code
llvm#108870) This commit changes the libc++ frame recognizer to hide implementation details of libc++ more aggressively. The applied heuristic is rather straightforward: We consider every function name starting with `__` as an implementation detail. This works pretty neatly for `std::invoke`, `std::function`, `std::sort`, `std::map::emplace` and many others. Also, this should align quite nicely with libc++'s general coding convention of using the `__` for their implementation details, thereby keeping the future maintenance effort low. However, this heuristic by itself does not work in 100% of the cases: E.g., `std::ranges::sort` is not a function, but an object with an overloaded `operator()`, which means that there is no actual call `std::ranges::sort` in the call stack. Instead, there is a `std::ranges::__sort::operator()` call. To make sure that we don't hide this stack frame, we never hide the frame which represents the entry point from user code into libc++ code
Hi @vogelsgesang , the test you added on this PR breaks the LLDB matrix bot, in particular it breaks the clang 15 & 17 tests. Could you add a clang 15 error:
clang 17 error:
|
see #114122 |
This commit changes the libc++ frame recognizer to hide implementation details of libc++ more aggressively. The applied heuristic is rather straightforward: We consider every function name starting with
__
as an implementation detail.This works pretty neatly for
std::invoke
,std::function
,std::sort
,std::map::emplace
and many others. Also, this should align quite nicely with libc++'s general coding convention of using the__
for their implementation details, thereby keeping the future maintenance effort low.However, this heuristic by itself does not work in 100% of the cases: E.g.,
std::ranges::sort
is not a function, but an object with an overloadedoperator()
, which means that there is no actual callstd::ranges::sort
in the call stack. Instead, there is astd::ranges::__sort::operator()
call. To make sure that we don't hide this stack frame, we never hide the frame which represents the entry point from user code into libc++ code