-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite docs for std::ptr
#49767
Rewrite docs for std::ptr
#49767
Changes from 7 commits
8b8091d
ee259e4
b564c4a
6eceb94
422b616
d7ce9a2
d7209d5
e350ba4
827251e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -959,59 +959,131 @@ extern "rust-intrinsic" { | |
/// value is not necessarily valid to be used to actually access memory. | ||
pub fn arith_offset<T>(dst: *const T, offset: isize) -> *const T; | ||
|
||
/// Copies `count * size_of<T>` bytes from `src` to `dst`. The source | ||
/// and destination may *not* overlap. | ||
/// Copies `count * size_of::<T>()` bytes from `src` to `dst`. The source | ||
/// and destination must *not* overlap. | ||
/// | ||
/// `copy_nonoverlapping` is semantically equivalent to C's `memcpy`. | ||
/// For regions of memory which might overlap, use [`copy`] instead. | ||
/// | ||
/// `copy_nonoverlapping` is semantically equivalent to C's [`memcpy`]. | ||
/// | ||
/// [`copy`]: ./fn.copy.html | ||
/// [`memcpy`]: https://www.gnu.org/software/libc/manual/html_node/Copying-Strings-and-Arrays.html#index-memcpy | ||
/// | ||
/// # Safety | ||
/// | ||
/// Beyond requiring that the program must be allowed to access both regions | ||
/// of memory, it is Undefined Behavior for source and destination to | ||
/// overlap. Care must also be taken with the ownership of `src` and | ||
/// `dst`. This method semantically moves the values of `src` into `dst`. | ||
/// However it does not drop the contents of `dst`, or prevent the contents | ||
/// of `src` from being dropped or used. | ||
/// `copy_nonoverlapping` is unsafe because it dereferences a raw pointer. | ||
/// The caller must ensure that `src` points to a valid sequence of type | ||
/// `T`. | ||
/// | ||
/// # Undefined Behavior | ||
/// | ||
/// Behavior is undefined if any of the following conditions are violated: | ||
/// | ||
/// * The region of memory which begins at `src` and has a length of | ||
/// `count * size_of::<T>()` bytes must be *both* valid and initialized. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is unclear to me what "valid" means (a single live allocation..?). Also I don't understand why src must be initialized. Where did you get this from? This would make it impossible to implement realloc in a pure rust allocator (which must copy uninitialized memory, since it doesn't know what part of the buffer is initialized). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "valid" was an attempt to communicate "allocated memory whose access doesn't violate LLVM's pointer aliasing rules" without overwhelming the reader. We could take this stuff out and define valid in the top-level docs (see next comment). My only insight was that, according to the reference, undefined behavior includes "Reads of undef (uninitialized) memory". I guess that as long as one doesn't assume that the value of |
||
/// | ||
/// * The region of memory which begins at `dst` and has a length of | ||
/// `count * size_of::<T>()` bytes must be valid (but may or may not be | ||
/// initialized). | ||
/// | ||
/// * `src` must be properly aligned. | ||
/// | ||
/// * `dst` must be properly aligned. | ||
/// | ||
/// * The two regions of memory must *not* overlap. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should be at the top, since all the other conditions are just "things that are true of any pointer accesses" (which suggests to me that they can potentially be omitted or centralized in the top-level module documentation and just linked?) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think it would be a good idea to document the requirements for stores and loads at the top-level. My concern was that people coming to the docs from a search engine may not see the top-level documentation. There was also talk in #36450 of specifying alignment requirements for every function in I'd like some clarification on what the precise requirements for loads and stores are regarding initializing memory, and what the precise definition of "valid" memory is. |
||
/// | ||
/// Additionally, if `T` is not [`Copy`](../marker/trait.Copy.html), only | ||
/// the region at `src` *or* the region at `dst` can be used or dropped | ||
/// after calling `copy_nonoverlapping`. `copy_nonoverlapping` creates | ||
/// bitwise copies of `T`, regardless of whether `T: Copy`, which can result | ||
/// in undefined behavior if both copies are used. | ||
/// | ||
/// # Examples | ||
/// | ||
/// A safe swap function: | ||
/// Manually implement [`Vec::append`]: | ||
/// | ||
/// ``` | ||
/// use std::mem; | ||
/// use std::ptr; | ||
/// | ||
/// # #[allow(dead_code)] | ||
/// fn swap<T>(x: &mut T, y: &mut T) { | ||
/// /// Moves all the elements of `src` into `dst`, leaving `src` empty. | ||
/// fn append<T>(dst: &mut Vec<T>, src: &mut Vec<T>) { | ||
/// let src_len = src.len(); | ||
/// let dst_len = dst.len(); | ||
/// | ||
/// // Ensure that `dst` has enough capacity to hold all of `src`. | ||
/// dst.reserve(src_len); | ||
/// | ||
/// unsafe { | ||
/// // Give ourselves some scratch space to work with | ||
/// let mut t: T = mem::uninitialized(); | ||
/// // The call to offset is always safe because `Vec` will never | ||
/// // allocate more than `isize::MAX` bytes. | ||
/// let dst = dst.as_mut_ptr().offset(dst_len as isize); | ||
/// let src = src.as_ptr(); | ||
/// | ||
/// // The two regions cannot overlap becuase mutable references do | ||
/// // not alias, and two different vectors cannot own the same | ||
/// // memory. | ||
/// ptr::copy_nonoverlapping(src, dst, src_len); | ||
/// } | ||
/// | ||
/// // Perform the swap, `&mut` pointers never alias | ||
/// ptr::copy_nonoverlapping(x, &mut t, 1); | ||
/// ptr::copy_nonoverlapping(y, x, 1); | ||
/// ptr::copy_nonoverlapping(&t, y, 1); | ||
/// unsafe { | ||
/// // Truncate `src` without dropping its contents. | ||
/// src.set_len(0); | ||
/// | ||
/// // y and t now point to the same thing, but we need to completely forget `t` | ||
/// // because it's no longer relevant. | ||
/// mem::forget(t); | ||
/// // Notify `dst` that it now holds the contents of `src`. | ||
/// dst.set_len(dst_len + src_len); | ||
/// } | ||
/// } | ||
/// | ||
/// let mut a = vec!['r']; | ||
/// let mut b = vec!['u', 's', 't']; | ||
/// | ||
/// append(&mut a, &mut b); | ||
/// | ||
/// assert_eq!(a, &['r', 'u', 's', 't']); | ||
/// assert!(b.is_empty()); | ||
/// ``` | ||
/// | ||
/// [`Vec::append`]: ../../std/vec/struct.Vec.html#method.append | ||
#[stable(feature = "rust1", since = "1.0.0")] | ||
pub fn copy_nonoverlapping<T>(src: *const T, dst: *mut T, count: usize); | ||
|
||
/// Copies `count * size_of<T>` bytes from `src` to `dst`. The source | ||
/// Copies `count * size_of::<T>()` bytes from `src` to `dst`. The source | ||
/// and destination may overlap. | ||
/// | ||
/// `copy` is semantically equivalent to C's `memmove`. | ||
/// If the source and destination will *never* overlap, | ||
/// [`copy_nonoverlapping`] can be used instead. | ||
/// | ||
/// `copy` is semantically equivalent to C's [`memmove`]. | ||
/// | ||
/// [`copy_nonoverlapping`]: ./fn.copy_nonoverlapping.html | ||
/// [`memmove`]: https://www.gnu.org/software/libc/manual/html_node/Copying-Strings-and-Arrays.html#index-memmove | ||
/// | ||
/// # Safety | ||
/// | ||
/// Care must be taken with the ownership of `src` and `dst`. | ||
/// This method semantically moves the values of `src` into `dst`. | ||
/// However it does not drop the contents of `dst`, or prevent the contents of `src` | ||
/// from being dropped or used. | ||
/// `copy` is unsafe because it dereferences a raw pointer. The caller must | ||
/// ensure that `src` points to a valid sequence of type `T`. | ||
/// | ||
/// # Undefined Behavior | ||
/// | ||
/// Behavior is undefined if any of the following conditions are violated: | ||
/// | ||
/// * The region of memory which begins at `src` and has a length of | ||
/// `count * size_of::<T>()` bytes must be *both* valid and initialized. | ||
/// | ||
/// * The region of memory which begins at `dst` and has a length of | ||
/// `count * size_of::<T>()` bytes must be valid (but may or may not be | ||
/// initialized). | ||
/// | ||
/// * `src` must be properly aligned. | ||
/// | ||
/// * `dst` must be properly aligned. | ||
/// | ||
/// Additionally, if `T` is not [`Copy`], only the region at `src` *or* the | ||
/// region at `dst` can be used or dropped after calling `copy`. `copy` | ||
/// creates bitwise copies of `T`, regardless of whether `T: Copy`, which | ||
/// can result in undefined behavior if both copies are used. | ||
/// | ||
/// [`Copy`]: ../marker/trait.Copy.html | ||
/// | ||
/// # Examples | ||
/// | ||
|
@@ -1028,15 +1100,39 @@ extern "rust-intrinsic" { | |
/// dst | ||
/// } | ||
/// ``` | ||
/// | ||
#[stable(feature = "rust1", since = "1.0.0")] | ||
pub fn copy<T>(src: *const T, dst: *mut T, count: usize); | ||
|
||
/// Invokes memset on the specified pointer, setting `count * size_of::<T>()` | ||
/// bytes of memory starting at `dst` to `val`. | ||
/// Sets `count * size_of::<T>()` bytes of memory starting at `dst` to | ||
/// `val`. | ||
/// | ||
/// `write_bytes` is semantically equivalent to C's [`memset`]. | ||
/// | ||
/// [`memset`]: https://www.gnu.org/software/libc/manual/html_node/Copying-Strings-and-Arrays.html#index-memset | ||
/// | ||
/// # Safety | ||
/// | ||
/// `write_bytes` is unsafe because it dereferences a raw pointer. The | ||
/// caller must ensure that the poiinter points to a valid value of type `T`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. typo: "poiinter" |
||
/// | ||
/// # Undefined Behavior | ||
/// | ||
/// Behavior is undefined if any of the following conditions are violated: | ||
/// | ||
/// * The region of memory which begins at `dst` and has a length of | ||
/// `count` bytes must be valid. | ||
/// | ||
/// * `dst` must be properly aligned. | ||
/// | ||
/// Additionally, the caller must ensure that writing `count` bytes to the | ||
/// given region of memory results in a valid value of `T`. Creating an | ||
/// invalid value of `T` can result in undefined behavior. An example is | ||
/// provided below. | ||
/// | ||
/// # Examples | ||
/// | ||
/// Basic usage: | ||
/// | ||
/// ``` | ||
/// use std::ptr; | ||
/// | ||
|
@@ -1047,6 +1143,23 @@ extern "rust-intrinsic" { | |
/// } | ||
/// assert_eq!(vec, [b'a', b'a', 0, 0]); | ||
/// ``` | ||
/// | ||
/// Creating an invalid value: | ||
/// | ||
/// ```no_run | ||
/// use std::{mem, ptr}; | ||
/// | ||
/// let mut v = Box::new(0i32); | ||
/// | ||
/// unsafe { | ||
/// // Leaks the previously held value by overwriting the `Box<T>` with | ||
/// // a null pointer. | ||
/// ptr::write_bytes(&mut v, 0, mem::size_of::<Box<i32>>()); | ||
/// } | ||
/// | ||
/// // At this point, using or dropping `v` results in undefined behavior. | ||
/// // v = Box::new(0i32); // ERROR | ||
/// ``` | ||
#[stable(feature = "rust1", since = "1.0.0")] | ||
pub fn write_bytes<T>(dst: *mut T, val: u8, count: usize); | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the role of having both
Safety
andUndefined Behavior
sections. It seems to me that the convention is to have a single section namedSafety
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(granted, I often wish the
# Safety
sections tended to more closely resemble your# Undefined Behavior
sections, but that is just my opinion)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@steveklabnik mentioned that new docs should make use of
Undefined Behavior
(I think upper case is correct). All these functions are trivially unsafe because they dereference raw pointers, and theSafety
section as I have it doesn't add any new information. Perhaps it could be moved to the module docs and not repeated everywhere?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I would favor a change like you just mentioned, but will also wait to see what Steve says. 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally, I felt like these two cases were pretty distinct, but now, re-thinking about it, I'm not so sure, honestly. In some senses, UB is a subset of safety, and it feels like it's worth calling out separately, but maybe not. dunno.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused; I thought UB was the consequence of not following Safety. Is there a kind of unsafety that doesn't lead to UB?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some background on my thought process:
While writing this, I took the point of view that
Safety
referred to the language feature (i.e. the subset of rust you must opt into with theunsafe
keyword). Some functions (e.g.Vec::set_len
) are not obviously unsafe from looking at their signature. TheSafety
section is probably a good place to discuss this.In discussing why a function is
unsafe
, one must communicate what invariants must be maintained to avoid UB. Most of the standard library uses theSafety
heading to communicate this. I am now of the opinion that there's never really a reason to have both of these top-level headings (as I do now) since the information is so closely related.However, for functions whose unsafety is obvious because they take a raw pointer and have names like
read
,write
, andcopy
, it seemed clearer to name the section discussing its invariantsUndefined Behavior
.RawVec::from_raw_parts
also uses this convention.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@scottmcm "safety" is "memory safety", there are kinds of UB that are not memory safety related.