Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CStr::bytes iterator #104353

Merged
merged 2 commits into from
Mar 14, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
91 changes: 91 additions & 0 deletions library/core/src/ffi/c_str.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,11 @@ use crate::error::Error;
use crate::ffi::c_char;
use crate::fmt;
use crate::intrinsics;
use crate::iter::FusedIterator;
use crate::marker::PhantomData;
use crate::ops;
use crate::ptr::addr_of;
use crate::ptr::NonNull;
use crate::slice;
use crate::slice::memchr;
use crate::str;
Expand Down Expand Up @@ -504,6 +507,13 @@ impl CStr {
self.inner.as_ptr()
}

/// We could eventually expose this publicly, if we wanted.
#[inline]
#[must_use]
const fn as_non_null_ptr(&self) -> NonNull<c_char> {
NonNull::from(&self.inner).as_non_null_ptr()
}

Comment on lines +510 to +516
Copy link
Contributor Author

@clarfonthey clarfonthey Mar 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decided to put this because it seems like something we'd probably add in the future, but I don't really feel like doing an ACP for this right now. It felt reasonable enough to factor into its own method, at least.

I thought about why as_ptr uses addr_of! instead of a safe version (since, well, we require a reference to &self to make this work, and thus any potential UB arguments don't make sense) and concluded that we don't need to here, but if I'm wrong about that, please let me know!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you're referring to -- CStr::as_ptr() just calls the slice as_ptr(), which is just a couple pointer casts. I only found addr_of! used in Rc/Arc::as_ptr, and those are digging into a further heap allocation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry, I totally mixed up the implementations for as_ptr and to_bytes_with_nul. The latter uses addr_of to convert the slice into *const [u8] instead of the existing method, but that's going to be unsafe anyway, so, it's fair.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, I don't see any reason it's needed there either, just a little bit shorter than it was with double pointer casts before addr_of!. No matter.

/// Returns the length of `self`. Like C's `strlen`, this does not include the nul terminator.
///
/// > **Note**: This method is currently implemented as a constant-time
Expand Down Expand Up @@ -617,6 +627,26 @@ impl CStr {
unsafe { &*(addr_of!(self.inner) as *const [u8]) }
}

/// Iterates over the bytes in this C string.
///
/// The returned iterator will **not** contain the trailing nul terminator
/// that this C string has.
///
/// # Examples
///
/// ```
/// #![feature(cstr_bytes)]
/// use std::ffi::CStr;
///
/// let cstr = CStr::from_bytes_with_nul(b"foo\0").expect("CStr::from_bytes_with_nul failed");
/// assert!(cstr.bytes().eq(*b"foo"));
/// ```
#[inline]
#[unstable(feature = "cstr_bytes", issue = "112115")]
pub fn bytes(&self) -> Bytes<'_> {
Bytes::new(self)
}

/// Yields a <code>&[str]</code> slice if the `CStr` contains valid UTF-8.
///
/// If the contents of the `CStr` are valid UTF-8 data, this
Expand Down Expand Up @@ -735,3 +765,64 @@ const unsafe fn const_strlen(ptr: *const c_char) -> usize {
intrinsics::const_eval_select((ptr,), strlen_ct, strlen_rt)
}
}

/// An iterator over the bytes of a [`CStr`], without the nul terminator.
///
/// This struct is created by the [`bytes`] method on [`CStr`].
/// See its documentation for more.
///
/// [`bytes`]: CStr::bytes
#[must_use = "iterators are lazy and do nothing unless consumed"]
#[unstable(feature = "cstr_bytes", issue = "112115")]
#[derive(Clone, Debug)]
pub struct Bytes<'a> {
// since we know the string is nul-terminated, we only need one pointer
ptr: NonNull<u8>,
phantom: PhantomData<&'a u8>,
}
impl<'a> Bytes<'a> {
#[inline]
fn new(s: &'a CStr) -> Self {
Self { ptr: s.as_non_null_ptr().cast(), phantom: PhantomData }
}

#[inline]
fn is_empty(&self) -> bool {
// SAFETY: We uphold that the pointer is always valid to dereference
// by starting with a valid C string and then never incrementing beyond
// the nul terminator.
unsafe { self.ptr.read() == 0 }
}
}

#[unstable(feature = "cstr_bytes", issue = "112115")]
impl Iterator for Bytes<'_> {
type Item = u8;

#[inline]
fn next(&mut self) -> Option<u8> {
// SAFETY: We only choose a pointer from a valid C string, which must
// be non-null and contain at least one value. Since we always stop at
// the nul terminator, which is guaranteed to exist, we can assume that
// the pointer is non-null and valid. This lets us safely dereference
// it and assume that adding 1 will create a new, non-null, valid
// pointer.
unsafe {
let ret = self.ptr.read();
if ret == 0 {
None
} else {
self.ptr = self.ptr.offset(1);
Some(ret)
}
}
}

#[inline]
fn size_hint(&self) -> (usize, Option<usize>) {
if self.is_empty() { (0, Some(0)) } else { (1, None) }
}
}

#[unstable(feature = "cstr_bytes", issue = "112115")]
impl FusedIterator for Bytes<'_> {}
Loading