Skip to content
Dima Kogan edited this page May 8, 2024 · 22 revisions

I've been a C++ programmer for just shy of 20 years, the last six of them working on chromium. I have a well-established workflow and a pretty big bag of tricks, and I have always relied heavily on runtime debuggers. I can say without exaggeration that switching to rr has been my biggest single productivity boost in probably 10 years. Bugs that might have taken me days to root-cause using gdb, I can now root-cause in an hour using rr. It has been especially helpful in debugging race conditions that don't reproduce reliably, or won't reproduce under gdb. It's magic.

-- @szager-chromium (Stefan Zager)

https://twitter.com/chandlerc1024/status/879962014860193792

I'm an rr developer, but my real job involves a lot of Gecko debugging. I find rr a great improvement over bare gdb and use it for almost all my Linux debugging tasks. It's an amazing tool!

-- Robert O'Callahan

I'm the author of OpenResty. I recently successfully used rr to quickly track down and fix a very obscure JIT stack overflow bug inside LuaJIT, with the help of LuaJIT's author. The patch is already merged to mainline: https://goo.gl/D5i47I The issue could only be randomly reproduced with a very large Lua script (1.8MB) in stress testing. rr record quickly recorded down a single run that hit this issue in stress testing on x86_64. The data breakpoints and reverse execution features in rr replay make debugging this nasty bug even enjoyable. Our advanced gdb tools in Python can also work flawlessly in rr replay. I really wish I had rr when I was tracking ~10 very deep LuaJIT bugs with LuaJIT's author a few years ago. At that time I could only analyze core dumps. Alas. rr is such an amazing tool!

-- Yichun Zhang (@agentzh)

rr has taken the application I work on from borderline-impossible to use inside a debugger to comically easy. I've got all my coworkers hooked on it. If you're using gdb, you should probably be using rr.

-- @kellerb

I was ecstatic when gdb gained reverse debugging abilities, but quickly had to realize it didn't work for me in practice because I was trying to debug something too big for it (Firefox). I've recently used rr to do some debugging on Firefox, and it didn't fail to deliver. While things like reverse-continue were relatively slow, having to wait for those was totally worth it if you look at the pain you'd have had to go through if you hadn't been able to reverse-continue in the first place. My only complaint about rr is that it didn't exist earlier!

-- @glandium

The idea of record-and-replay is not new; where rr is different is that it’s very low overhead and capable of handling complex programs like QEMU and Mozilla. It’s a usable production quality debug tool, not just a research project. I can’t recommend rr highly enough — I think it deserves to become a standard part of the Linux C/C++ developer’s toolkit, as valgrind has done before it.

-- @pm215

Even though we started using rr very recently, it has already cut down what would have been weeks of painful debugging for a couple of really nasty bugs. Simply put, rr with reverse debugging is mindblowing. We are excited to use it more and more in the future!

-- @kavindaw-optumsoft

rr has quickly become the number one tool I reach for when debugging complicated C++ code. rr only runs on Linux and I don't even use Linux as my day-to-day operating system! But rr provides such a great debugging experience, and gives me such a huge productivity boost, that I will reboot into Fedora just to use rr for all but the most trivial bugs. Take note of that, Linux advocates.

-- @fitzgen, Back to the Futu-rr-e: Deterministic Debugging with rr

I've been using rr for some time now, but in the last 3 days I hit a situation where it really saved my bacon. I had a mysterious problem in a data structure delivered to my gcc plugin, and I'm really not all that familiar with the inner workings of gcc. First of all, rr was enormously helpful in getting a debugger onto the right subprocess without messing around with dummy shell scripts to intercept and hack things in. Then rr made it possible for me to go back and forth through the cryptic internal gcc processing, tracing the origins of data embedded in structures one step at a time and keeping my head straight about chronology via heavy use of the when-ticks command. Not only did I track down my bug, but I learned a huge amount about the gcc internals I was looking at. I can't imagine how I could have tracked my bug down without rr.

-- @sfink (Steve Fink)

rr is just the most awesome debugging tool I've ever used. It's been super-useful to diagnose all kind of strange, nondeterministic, or racy difficult to reproduce bugs in both Servo (where race conditions are unfortunately common and usually really hard to track down) and Gecko. It's simply fantastic.

-- @emilio (Emilio Cobos Álvarez)

rr is a fantastically useful debugging tool. It has made root cause analysis of cargo-fuzz-found panics quicker when fuzzing Rust. However, it is at its best when debugging a large codebase you can't possible know thoroughly, such as Gecko. For example, it helps in a situation where a problem doesn't occur on the first attempt, because there's an empty cache far away elsewhere in the codebase and the problem being debugged requires the cache to already have the relevant entry. As another example, it helped me greatly when debugging invariant violations arising from nested event loops by allowing a narrowing back-and-forth execution. Each time you continue or reverse-continue over the problem, you can move breakpoints closer to the problem until your breakpoints are close enough that you can see the problem between them. Having objects reside in the same memory addresses throughout the debugging session also helps greatly and makes watchpoints more useful than they would otherwise be.

-- @hsivonen (Henri Sivonen)

rr made it far easier to debug a highly publicized issue in PostgreSQL, described here: http://jepsen.io/analyses/postgresql-12.3. In general I have found that rr makes it considerably easier to debug complex race conditions. It typically isn't necessary for me to go to any trouble to keep the overhead manageable, so I don't find myself straining to work within the limitations of rr as a tool.

-- @petergeoghegan (Peter Geoghegan)

The inclusion of rr in our testing workflow has enabled us to unveil difficult bugs in our database software (MySQL). This is particularly useful when working with interdependent software like our backup tool, where the database's state affects the backup software's behavior. By integrating rr, we can deterministically capture and replay the state of both tools. This has resulted in significantly reduced troubleshooting times for complex issues. An example of a success case can be found in this article: https://www.percona.com/blog/replay-the-execution-of-mysql-with-rr-record-and-replay/ .

-- @altmannmarcelo (Marcelo Altmann)

For a pretty long time, using gdb, i could not find the heap-buffer-overflow problem under the address sanitizer. Using rr allowed me to roll back in time to the moment the memory was allocated and quickly isolate the problem, since all addresses remained unchanged. Thank you very much for this wonderful tool.

-- @antamel (Anton Melnikov)

rr not only helped me solve challenging bugs in an unfamiliar codebase, but it also helped me conquer my fear of core dumps.

-- @airportyh (Toby Ho)

I have over 30 years of programming experience in C and C++. I have been developing the InnoDB storage engine since 2003, first for MySQL, and more recently for MariaDB. In 2012, our quality engineer at Oracle gave a talk on how hard it is to find and fix concurrency bugs. In a vast cloud of possible interleaved executions of concurrent threads or processes, a test happens to navigate a path that leads to something bad. Back then, many rarely hit bugs remained mysteries, because one could not guess from one bad end state (a core dump and a bunch of files) what happened earlier. Only if someone managed to come up with a reasonably reproducible test case, it would be possible to try adding some instrumentation to figure out what might have happened. With rr, we can simply run a random workload and get a full execution trace from the database initialization to the observed bad effect. The traces can span several process restarts, and the bad effect does not even have to be a crash. With the combination of rr and watch or awatch, it is possible to find a root cause of a bug in a matter of seconds or minutes, instead of the usual days or weeks (or never, in the case of some crash recovery bugs). rr also works perfectly with other tools, such as AddressSanitizer or MemorySanitizer. rr is not a silver bullet, but close. You need to be aware of its limitations: ‘fake hangs’ due to extremely unfair scheduling of threads, excessive conditional branches can make rr record several orders of magnitude slower, reverse-continue may miss breakpoints, and some race conditions that involve std::atomic for inter-thread synchronization may be completely invisible to rr while causing an immediate crash outside rr.

-- @dr-m (Marko Mäkelä)

rr is a revolution in the way debugging is done. It shortens investigations that would previously take hours into mere minutes. I use it extensively not just for debugging, but also for familiarizing myself with a new codebase. Being able to walk around the live code, and seeing when/why/how various functions are called is far more insighful than just reading the code. Thank you so much for this incredible tool

--- Dima Kogan


If you find rr useful, please add your testimonial here!

Clone this wiki locally