remove all panicking code from gdbstub!!

this is something I've been meaning to do for a _long_ time now, and after many failed attempts and false-starts, I've finally done it! if you inspect the asm output of `example_no_std`, you'll find that there is absolutely _zero_ panic handling machinery in the final binary! This has also resulted in a _substantial_ drop in binary size, as those bounds checks and panicking machinery were taking up a lot of space! removing panicking code ended up requiring 3 different approaches: 1. Rewriting array-indexing operations to use simpler indexing, which the compiler is able to optimize better. e.g: see the code used to index into the `target.xml` buffer in `base.rs` 2. Adding a sprinkle of unsafe code to certain array-indexing operations that the compiler is unsable to prove are safe. This was only done in two places: `decode_hex_buf` and `PacketBuf`. 3. Manually re-implementing the standard library's `slice::split_mut` and `slice::splitn_mut` methods to elide a bounds check. - Tracked upstream via rust-lang/rust#86313 Introducing unsafe code isn't something I take lightly, and while I've done my best to audit and validate the unsafe code I've written, I did end up including an optional `paranoid_unsafe` feature which gives end-users the option to opt-out of `gdbstub`'s no-panic guarantee in exchange for some additional peace of mind.
daniel5151 · Jun 19, 2021 · ecbbaf7 · ecbbaf7
1 parent 62e7646
commit ecbbaf7
Show file tree

Hide file tree

Showing 14 changed files with 307 additions and 53 deletions.
diff --git a/Cargo.toml b/Cargo.toml
@@ -31,6 +31,7 @@ goblin = "0.2"
 default = ["std"]
 alloc = ["managed/alloc"]
 std = ["alloc"]
+paranoid_unsafe = []
 
 # INTERNAL: enables the `__dead_code_marker!` macro.
 # used as part of the `scripts/test_dead_code_elim.sh`

diff --git a/README.md b/README.md
@@ -22,17 +22,19 @@ Why use `gdbstub`?
         -   Organizes GDB's countless optional protocol extensions into a coherent, understandable, and type-safe hierarchy of traits.
         -   Automatically handles client/server protocol feature negotiation, without needing to micro-manage the specific [`qSupported` packet](https://sourceware.org/gdb/onlinedocs/gdb/General-Query-Packets.html#qSupported) response.
     -   `gdbstub` makes _extensive_ use of Rust's powerful type system + generics to enforce protocol invariants at compile time, minimizing the number of tricky protocol details end users have to worry about.
-    -   Using a novel technique called [**Inlineable Dyn Extension Traits**](#zero-overhead-protocol-extensions) (IDETs), `gdbstub` enables fine-grained control over active protocol extensions _without_ relying on clunky `cargo` features.
+    -   Using a novel technique called [**Inlineable Dyn Extension Traits**](#zero-overhead-protocol-extensions) (IDETs), `gdbstub` enables fine-grained control over active protocol extensions _without_ relying on clunky `cargo` features or the use of `unsafe` code!
 -   **Easy to Integrate**
     -   `gdbstub`'s API is designed to be as unobtrusive as possible, and shouldn't require any large refactoring effort to integrate into an existing project. It doesn't require taking direct ownership of any key data structures, and aims to be a "drop in" solution when you need to add debugging to a project.
 -   **`#![no_std]` Ready & Size Optimized**
     -   `gdbstub` is a **`no_std` first** library, whereby all protocol features are required to be `no_std` compatible.
     -   `gdbstub` does not require _any_ dynamic memory allocation, and can be configured to use fixed-size, pre-allocated buffers. This enables `gdbstub` to be used on even the most resource constrained, no-[`alloc`](https://doc.rust-lang.org/alloc/) platforms.
-    -   `gdbstub` is transport-layer agnostic, and uses a basic [`Connection`](https://docs.rs/gdbstub/latest/gdbstub/trait.Connection.html) interface to communicate with the GDB server. As long as target has some method of performing in-order, serial, byte-wise I/O (e.g: putchar/getchar over UART), it's possible to run `gdbstub` on it.
-    -   "You don't pay for what you don't use": All code related to parsing/handling protocol extensions is guaranteed to be dead-code-eliminated from an optimized binary if left unimplmeneted! See the [Zero-overhead Protocol Extensions](#zero-overhead-protocol-extensions) section below for more details.
-    -   `gdbstub` tries to keep the binary and RAM footprint of its minimal configuration to a bare minimum, enabling it to be used on even the most resource-constrained microcontrollers.
-        -   When compiled in release mode, using all the tricks outlined in [`min-sized-rust`](https://github.com/johnthagen/min-sized-rust), a baseline `gdbstub` implementation weighs in at **_roughly 10kb of `.text` and negligible `.rodata`!_** \*
-        - \* Exact numbers vary by target platform, compiler version, and `gdbstub` revision. Data was collected using the included `example_no_std` project compiled on x86_64.
+    -   `gdbstub` is entirely **panic free** (when compiled in release mode, without the `paranoid_unsafe` cargo feature).
+        -   Validated by inspecting the asm output of the in-tree `example_no_std`.
+    -   `gdbstub` is transport-layer agnostic, and uses a basic [`Connection`](https://docs.rs/gdbstub/latest/gdbstub/trait.Connection.html) interface to communicate with the GDB server. As long as target has some method of performing in-order, serial, byte-wise I/O (e.g: putchar/getchar over UART), it's possible to run `gdbstub` on it!
+    -   "You don't pay for what you don't use": All code related to parsing/handling protocol extensions is guaranteed to be dead-code-eliminated from an optimized binary if left unimplemented! See the [Zero-overhead Protocol Extensions](#zero-overhead-protocol-extensions) section below for more details.
+    -   `gdbstub`'s minimal configuration has an incredibly low binary size + RAM overhead, enabling it to be used on even the most resource-constrained microcontrollers.
+        -   When compiled in release mode, using all the tricks outlined in [`min-sized-rust`](https://github.com/johnthagen/min-sized-rust), a baseline `gdbstub` implementation can weigh in at **_less than 10kb of `.text` + `.rodata`!_** \*
+        - \*Exact numbers vary by target platform, compiler version, and `gdbstub` revision. Data was collected using the included `example_no_std` project compiled on x86_64.
 
 ### Can I Use `gdbstub` in Production?
 
@@ -102,6 +104,11 @@ When using `gdbstub` in `#![no_std]` contexts, make sure to set `default-feature
     -   Implement `Connection` for [`TcpStream`](https://doc.rust-lang.org/std/net/struct.TcpStream.html) and [`UnixStream`](https://doc.rust-lang.org/std/os/unix/net/struct.UnixStream.html).
     -   Implement [`std::error::Error`](https://doc.rust-lang.org/std/error/trait.Error.html) for `gdbstub::Error`.
     -   Add a `TargetError::Io` variant to simplify `std::io::Error` handling from Target methods.
+-   `paranoid_unsafe`
+    -   Enabling the `paranoid_unsafe` feature will swap out a handful of unsafe `get_unchecked_mut` operations with their safe equivalents, at the expense of introducing panicking code into `gdbstub`.
+        -   `rustc` + LLVM do a pretty incredible job at eliding bounds checks... most of the time. Unfortunately, there are a few places in the code where the compiler is not smart enough to "prove" that a bounds check isn't needed, and a bit of unsafe code is required to remove those bounds checks.
+    -   This feature is **disabled** by default, as the unsafe code has been aggressively audited and tested for correctness. That said, if you're particularly paranoid about the use of unsafe code, enabling this feature may offer some piece of mind.
+    -   Please refer to the [`unsafe` in `gdbstub`](#unsafe-in-gdbstub) section below for more details.
 
 ## Examples
 
@@ -150,8 +157,13 @@ If you happen to stumble across this crate and end up using it to debug some bar
 
 -   When no cargo features are enabled:
     -   A few trivially safe calls to `NonZeroUsize::new_unchecked()` when defining internal constants.
+
+-   When the `paranoid_unsafe` feature is enabled, the following `unsafe` code is _removed_:
+    -   `src/protocol/packet.rs`: Swaps a couple slice-index methods in `PacketBuf` to use `get_unchecked_mut`. The public API of struct ensures that the bounds used to index into the array remain in-bounds.
+    -   `src/protocol/common/hex`: Use an alternate implementation of `decode_hex_buf` which uses unsafe slice indexing.
+
 -   When the `std` feature is enabled:
-    -   An implementation of `UnixStream::peek` which uses `libc::recv`. This manual implementation will be removed once [rust-lang/rust#76923](https://github.com/rust-lang/rust/issues/76923) is stabilized.
+    -   `src/connection/impls/unixstream.rs`: An implementation of `UnixStream::peek` which uses `libc::recv`. This manual implementation will be removed once [rust-lang/rust#76923](https://github.com/rust-lang/rust/issues/76923) is stabilized.
 
 ## Future Plans + Roadmap to `1.0.0`
 

diff --git a/example_no_std/Cargo.toml b/example_no_std/Cargo.toml
@@ -19,3 +19,4 @@ panic = "abort"
 opt-level = 's'  # Optimize for size.
 lto = true
 codegen-units = 1
+debug = true
diff --git a/src/gdbstub_impl/ext/base.rs b/src/gdbstub_impl/ext/base.rs
@@ -130,22 +130,20 @@ impl<T: Target, C: Connection> GdbStubImpl<T, C> {
 
                 match xml {
                     Some(xml) => {
-                        let xml = xml.trim();
-                        if cmd.offset >= xml.len() {
-                            // no more data
-                            res.write_str("l")?;
-                        } else if cmd.offset + cmd.len >= xml.len() {
-                            // last little bit of data
-                            res.write_str("l")?;
-                            res.write_binary(&xml.as_bytes()[cmd.offset..])?
-                        } else {
+                        let xml = xml.trim().as_bytes();
+                        if cmd.offset < xml.len() {
                             // still more data
                             res.write_str("m")?;
-                            res.write_binary(&xml.as_bytes()[cmd.offset..(cmd.offset + cmd.len)])?
+                            res.write_binary(
+                                &xml[cmd.offset..][..cmd.len.min(xml.len() - cmd.offset)],
+                            )?
+                        } else {
+                            // no more data
+                            res.write_str("l")?;
                         }
                     }
                     // If the target hasn't provided their own XML, then the initial response to
-                    // "qSupported" wouldn't have included  "qXfer:features:read", and gdb wouldn't
+                    // "qSupported" wouldn't have included "qXfer:features:read", and gdb wouldn't
                     // send this packet unless it was explicitly marked as supported.
                     None => return Err(Error::PacketUnexpected),
                 }

diff --git a/src/protocol/commands.rs b/src/protocol/commands.rs
@@ -13,6 +13,7 @@ pub(self) mod prelude {
     };
     pub use crate::protocol::common::Bstr;
     pub use crate::protocol::packet::PacketBuf;
+    pub use crate::util::no_panic_iter::SliceExt;
     pub use core::convert::{TryFrom, TryInto};
 }
 

diff --git a/src/protocol/commands/_m.rs b/src/protocol/commands/_m.rs
@@ -29,20 +29,23 @@ impl<'a> ParseCommand<'a> for m<'a> {
         // +------+------------------+------------------------------------------------+
 
         let (buf, body_range) = buf.into_raw_buf();
-        let body = &mut buf[body_range.start..];
+        let body = buf.get_mut(body_range.start..body_range.end)?;
 
-        // should return 3 slices: the addr (hex-encoded), len (hex-encoded), and the
-        // "rest" of the buffer
-        let mut body = body.split_mut(|b| *b == b',' || *b == b'#');
+        let mut body = body.split_mut_no_panic(|b| *b == b',');
 
         let addr = decode_hex_buf(body.next()?).ok()?;
         let addr_len = addr.len();
         let len = decode_hex(body.next()?).ok()?;
 
         drop(body);
 
+        // ensures that `split_at_mut` doesn't panic
+        if buf.len() < body_range.start + addr_len {
+            return None;
+        }
+
         let (addr, buf) = buf.split_at_mut(body_range.start + addr_len);
-        let addr = &addr[b"$m".len()..];
+        let addr = addr.get(b"$m".len()..)?;
 
         Some(m { addr, len, buf })
     }

diff --git a/src/protocol/commands/_m_upcase.rs b/src/protocol/commands/_m_upcase.rs
@@ -11,15 +11,11 @@ impl<'a> ParseCommand<'a> for M<'a> {
     fn from_packet(buf: PacketBuf<'a>) -> Option<Self> {
         let body = buf.into_body();
 
-        let mut body = body.split_mut(|&b| b == b',' || b == b':');
+        let mut body = body.split_mut_no_panic(|&b| b == b',' || b == b':');
         let addr = decode_hex_buf(body.next()?).ok()?;
         let len = decode_hex(body.next()?).ok()?;
-        let val = body.next()?;
+        let val = decode_hex_buf(body.next()?).ok()?;
 
-        Some(M {
-            addr,
-            len,
-            val: decode_hex_buf(val).ok()?,
-        })
+        Some(M { addr, len, val })
     }
 }
diff --git a/src/protocol/commands/_vRun.rs b/src/protocol/commands/_vRun.rs
@@ -10,7 +10,7 @@ impl<'a> ParseCommand<'a> for vRun<'a> {
     fn from_packet(buf: PacketBuf<'a>) -> Option<Self> {
         let body = buf.into_body();
 
-        let mut body = body.splitn_mut(3, |b| *b == b';');
+        let mut body = body.splitn_mut_no_panic(3, |b| *b == b';');
 
         let _first_semi = body.next()?;
         let filename = match decode_hex_buf(body.next()?).ok()? {

diff --git a/src/protocol/commands/breakpoint.rs b/src/protocol/commands/breakpoint.rs
@@ -1,7 +1,5 @@
-pub use crate::common::*;
-pub use crate::protocol::common::hex::{decode_hex, decode_hex_buf};
-pub use crate::protocol::packet::PacketBuf;
-pub use core::convert::{TryFrom, TryInto};
+use crate::protocol::common::hex::{decode_hex, decode_hex_buf};
+use crate::util::no_panic_iter::SliceExt;
 
 // Breakpoint packets are split up like this:
 //
@@ -26,7 +24,7 @@ pub struct BasicBreakpoint<'a> {
 
 impl<'a> BasicBreakpoint<'a> {
     pub fn from_slice(body: &'a mut [u8]) -> Option<BasicBreakpoint<'a>> {
-        let mut body = body.splitn_mut(4, |b| matches!(*b, b',' | b';'));
+        let mut body = body.splitn_mut_no_panic(4, |b| matches!(*b, b',' | b';'));
         let type_ = decode_hex(body.next()?).ok()?;
         let addr = decode_hex_buf(body.next()?).ok()?;
         let kind = decode_hex_buf(body.next()?).ok()?;
@@ -44,7 +42,7 @@ pub struct BytecodeBreakpoint<'a> {
 
 impl<'a> BytecodeBreakpoint<'a> {
     pub fn from_slice(body: &'a mut [u8]) -> Option<BytecodeBreakpoint<'a>> {
-        let mut body = body.splitn_mut(2, |b| *b == b';');
+        let mut body = body.splitn_mut_no_panic(2, |b| *b == b';');
 
         let base = BasicBreakpoint::from_slice(body.next()?)?;
 

diff --git a/src/protocol/common/hex.rs b/src/protocol/common/hex.rs
@@ -8,7 +8,7 @@ pub enum DecodeHexError {
     InvalidOutput,
 }
 
-/// Decode a GDB dex string into the specified integer.
+/// Decode a GDB hex string into the specified integer.
 ///
 /// GDB hex strings may include "xx", which represent "missing" data. This
 /// method simply treats "xx" as 0x00.
@@ -35,7 +35,7 @@ where
     Ok(result)
 }
 
-/// Wrapper around a raw hex string. Enabled "late" calls to `decode` from
+/// Wrapper around a raw hex string. Enables "late" calls to `decode` from
 /// outside the `crate::protocol` module.
 #[derive(Debug, Clone, Copy)]
 pub struct HexString<'a>(pub &'a [u8]);
@@ -64,7 +64,7 @@ fn ascii2byte(c: u8) -> Option<u8> {
     }
 }
 
-/// Check if the byte `c` is a valid GDB hex digit `[0-9][a-f][A-F][xX]`
+/// Check if the byte `c` is a valid GDB hex digit `[0-9a-fA-FxX]`
 #[allow(clippy::match_like_matches_macro)]
 pub fn is_hex(c: u8) -> bool {
     match c {
@@ -81,7 +81,74 @@ pub fn is_hex(c: u8) -> bool {
 /// GDB hex strings may include "xx", which represent "missing" data. This
 /// method simply treats "xx" as 0x00.
 // TODO: maybe don't blindly translate "xx" as 0x00?
-// TODO: rewrite this method to elide bound checks
+#[cfg(not(feature = "paranoid_unsafe"))]
+pub fn decode_hex_buf(base_buf: &mut [u8]) -> Result<&mut [u8], DecodeHexBufError> {
+    use DecodeHexBufError::*;
+
+    if base_buf.is_empty() {
+        return Ok(&mut []);
+    }
+
+    let odd_adust = base_buf.len() % 2;
+    if odd_adust != 0 {
+        base_buf[0] = ascii2byte(base_buf[0]).ok_or(NotAscii)?;
+    }
+
+    let buf = &mut base_buf[odd_adust..];
+
+    let decoded_len = buf.len() / 2;
+    for i in 0..decoded_len {
+        // SAFETY: rustc isn't smart enough to automatically elide these bound checks.
+        //
+        // If buf.len() == 0 or 1: trivially safe, since the for block is never taken
+        // If buf.len() >= 2: the range of values for `i` is 0..(buf.len() / 2 - 1)
+        let (hi, lo, b) = unsafe {
+            (
+                //    (buf.len() / 2 - 1) * 2
+                // == (buf.len() - 2)
+                // since buf.len() is >2, this is in-bounds
+                *buf.get_unchecked(i * 2),
+                //    (buf.len() / 2 - 1) * 2 + 1
+                // == (buf.len() - 1)
+                // since buf.len() is >2, this is in-bounds
+                *buf.get_unchecked(i * 2 + 1),
+                // since buf.len() is >2, (buf.len() / 2 - 1) is always in-bounds
+                buf.get_unchecked_mut(i),
+            )
+        };
+
+        let hi = ascii2byte(hi).ok_or(NotAscii)?;
+        let lo = ascii2byte(lo).ok_or(NotAscii)?;
+        *b = hi << 4 | lo;
+    }
+
+    // SAFETY: rustc isn't smart enough to automatically elide this bound check.
+    //
+    // Consider the different values (decoded_len + odd_adust) can take:
+    //
+    //  buf.len() | (decoded_len + odd_adust)
+    // -----------|---------------------------
+    //      0     | (0 + 0) == 0
+    //      1     | (0 + 1) == 1
+    //      2     | (1 + 0) == 1
+    //      3     | (1 + 1) == 2
+    //      4     | (2 + 0) == 2
+    //      5     | (2 + 1) == 3
+    //
+    // Note that the computed index is always in-bounds.
+    //
+    // If I were still in undergrad, I could probably have whipped up a proper
+    // mathematical proof by induction or whatnot, but hopefully this "proof by
+    // example" ought to suffice.
+    unsafe { Ok(base_buf.get_unchecked_mut(..decoded_len + odd_adust)) }
+}
+
+/// Decode a GDB hex string into a byte slice _in place_.
+///
+/// GDB hex strings may include "xx", which represent "missing" data. This
+/// method simply treats "xx" as 0x00.
+// TODO: maybe don't blindly translate "xx" as 0x00?
+#[cfg(feature = "paranoid_unsafe")]
 pub fn decode_hex_buf(base_buf: &mut [u8]) -> Result<&mut [u8], DecodeHexBufError> {
     use DecodeHexBufError::*;