-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add CStr::with_ptr and deprecate CStr::as_ptr #1642
Conversation
8172b05
to
07783d1
Compare
Is every use of Are there any uses of |
No. If the entire expression is part of a function argument, then the current rustc makes sure that the drop call happens after the function returns. This is necessary for taking the reference to an owned value. It gets problematic with pointers, because they are just values. I'll do some optimization experiments when I'm back on a PC |
No. If the call occurs in an expression, the
The Drawbacks section shows how to emulate fn get_ptr(s: &CString) -> *const c_char {
let mut ptr = ptr::null();
s.with_ptr(|p| { ptr = p; });
ptr
} |
From my watchtower, using the pointer returned by |
@nagisa I would say changing drop order in expressions is a breaking change. |
Not if llvm already does that after optimizations... Which we should definitely figure out |
My example prints the same in release configuration. |
If you have two Another alternative is "to mark as_ptr with unsafe", of course with introducing a new name and deprecating the old one. |
I guess we could add a function that works on sequences as well. Is working with slices or Vecs of CStrings a common use case?
The community has been very reluctant to use |
|
Well... the true way to do this is to add a (future miri-based) const fn that does the Maybe we could create a method that freezes the
That's a test, not a guarantee that UB doesn't occur. Especially since a use after free might not even get noticed in such a small example (the memory might not be used again due to other allocations having requirements placing them elsewhere). Have a look at this example: https://is.gd/AEiXxM Activating release mode and creating llvm IR shows the following fun IR (first block of the main function):
I forgot what combo caused it, but I've had
at some point or another. |
Has anybody thought of improving the docs for WARNING: make sure that the let p = let c_to_print = CString::new("Hello, world!").unwrap().as_ptr(); dereferencing Use this instead: let hello_world = CString::new("Hello, world!").unwrap()
let p = let c_to_print = hello_world.as_ptr(); I would have done this myself, but I am not sure about my English :( |
I'd be very interested to see how you get an undef there. null seems to be correct as the Docs are a part of the solution, but since all the examples show correct use of |
Ah, apparently this occurs, if the function doesn't use the argument: https://is.gd/dKLVoH |
One also wonders whether |
both |
That's a very good point. So those functions are less prone to misuse. |
From experience in our user help channels, this feels like by far the most common use after free bug that users stumble into. Sometimes they find crashes and come ask because of it, sometimes the issue is found latently. It would be swell if a lint could catch this well, and very much worth it to introduce lints that help users to write sound Rust code in practice. |
This gets really nasty if you have to work with multiple |
@starkat99 I've added a simple macro to my example for that use case. |
IIUC, there was a time in Rust's history (long before I started using it) where What about instead using a newtype over struct CCharRef<'a> {
ptr: *const c_char,
_phantom: PhantomData<&'a [c_char]>
}
impl<'a> Deref for CCharRef<'a> {
type Target = *const c_char;
fn deref(&self) -> &*const c_char {
&self.ptr
}
}
impl CStr {
// ...
#[allow(deprecated)]
fn as_ref(&self) -> CCharRef {
CCharRef {
ptr: self.as_ptr(),
_phantom: PhantomData
}
}
} Then, the following would be an error: let s = CString::new(...).unwrap().as_ref();
// ^~~~~~~~~~~~~~~~~~~~~~~~~~
// error: borrowed value does not live long enough (Note that this would still allow |
Document `CStr::as_ptr` dangers. r? @steveklabnik Hi! I've tried to document `CString::new("hello").unwrap().as_ptr()` footgun. Related [RFC] and the original [discussion]. [RFC]: rust-lang/rfcs#1642 [discussion]: https://users.rust-lang.org/t/you-should-stop-telling-people-that-safe-rust-is-always-safe/6094
There are docs now (thanks @matklad!), but I still think this is a good idea. Any other opinions? It's been quiet for a while. FCP time? |
I think the addition of The |
If accepted I would like to see |
-1 to removing I would very much like to see a definitive answer to this unresolved question:
Coming from C++, I would expect this to be valid, as temporaries there aren't destroyed until the end of the full expression (after |
The libs team discussed this briefly at the triage meeting yesterday. We felt that there definitely is a real problem here in terms of misusage of This may be a case where the motivation may want to be fleshed out a little more before the detailed design is tackled. For example what exact cases are we targeted at solving and are we willing to compromise cases unrelated to this? (things like that) |
Here is another example of people writing unsound code, even though there is a giant warning in the documentation. |
Here is a more tricky example of this issue (for which An insane idea: What if |
Yeah, slices of strings have been brought up already as a clear case where Your insane idea doesn't work unless CString also always leaks its On Thu, Sep 1, 2016 at 10:56 AM, Aleksey Kladov notifications@github.com
|
The point is not to make this memory safe, but to make this fail at runtime with an easy to google reason. We obviously can't add runtime range checks, but overwriting the buffer with some garbadge will almost certainly lead to the the loud failure, which is much better than silent UB. And perhaps we can "leak" CStrings in debug mode? We can allocate them from some kind of free list, such that deallocated CStrings are not immediately overwritting by random garbage, and instead are overwritten by our deterministic garbage :) |
Ah, looks like I can't simply #[cfg(debug_assertions)]
impl Drop for CString {
fn drop(&mut self) {
let pattern = b"X_X DEAD MEMORY ";
let bytes = &mut self.inner[..self.inner.len() - 1];
for (d, s) in bytes.iter_mut().zip(pattern.iter().cycle()) {
*d = *s;
}
}
} because the standard library is always build with Adding a |
which might be convenient but would make it easier to "leak" the pointer (as easy as | ||
`let ptr = s.with_ptr(|p| p);`). | ||
|
||
- Does `f(CString::new(...).unwrap().as_ptr())` actually invoke undefined behavior, if `f` doesn't store the pointer? The author's reading of the Rust reference implies that the `CString` temporary is kept alive for the entire expression, so it's fine. However, some commenters in the RFC thread have opined that the behavior of this code is unspecified at best. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Temporaries stay alive at least up to the nearest arena (see https://github.com/nikomatsakis/rust-memory-model/issues/17), and function calls do not create these.
Zero first byte of CString on drop Hi! This is one more attempt to ameliorate `CString::new("...").unwrap().as_ptr()` problem (related RFC: rust-lang/rfcs#1642). One of the biggest problems with this code is that it may actually work in practice, so the idea of this PR is to proactively break such invalid code. Looks like writing a `null` byte at the start of the CString should do the trick, and I think is an affordable cost: zeroing a single byte in `Drop` should be cheap enough compared to actual memory deallocation which would follow. I would actually prefer to do something like ```Rust impl Drop for CString { fn drop(&mut self) { let pattern = b"CTHULHU FHTAGN "; let bytes = self.inner[..self.inner.len() - 1]; for (d, s) in bytes.iter_mut().zip(pattern.iter().cycle()) { *d = *s; } } } ``` because Cthulhu error should be much easier to google, but unfortunately this would be too expensive in release builds, and we can't implement things `cfg(debug_assertions)` conditionally in stdlib. Not sure if the whole idea or my implementation (I've used ~~`transmute`~~ `mem::unitialized` to workaround move out of Drop thing) makes sense :)
I'm going to close this as there seems to be significant resistance to adding friction to the |
Wouldn't implementing use std::ffi::{CString, NulError};
use std::os::raw::c_char;
fn ffi_function(_hello: *const c_char) {}
// minimal wrapper for CString so we can add traits
struct MyCString(CString);
impl MyCString {
fn new<T: Into<Vec<u8>>>(t: T) -> Result<MyCString, NulError> {
CString::new(t).map(|c| MyCString(c))
}
}
impl AsRef<c_char> for MyCString {
fn as_ref(&self) -> &c_char {
unsafe{ &*self.0.as_ptr() }
}
}
fn main() {
{ // as_ptr() undefined behavior case
let p = CString::new("some_string").unwrap().as_ptr();
ffi_function(p);
}
{ // as_ptr() proper use case
let s = CString::new("some_string").unwrap();
ffi_function(s.as_ptr());
}
{ // as_ref() undefined behavior case, but borrow checker catches it
let r = MyCString::new("some_string").unwrap().as_ref();
ffi_function(r);
}
{ // as_ptr() proper use case, reference coerces to pointer
let s = MyCString::new("some_string").unwrap();
ffi_function(s.as_ref());
}
} |
Ideally, |
I don't quite understand why this issue is closed. Doesn't this invalidate Rust's guarantee that safe-rust cannot have undefined behavior, unless there is an error in an |
...upon reflection, it seems that the unsafety is in the passing of a pointer by-value to a function. I realize this would be a breaking change, but perhaps the only way to truly uphold that guarantee would be to require any function that takes a raw pointer as a parameter to be marked |
It's only unsafe to dereference raw pointers, not pass them around. |
@sfackler Right, but there can be a safe public function That said, when I wrote that comment I was under the impression (based this old comment) that there existed an |
Yes, that's correct. You can do things with raw pointers that don't involve dereferencing them, though. This function consumes a raw pointer and is totally safe to call with any value: fn is_pointer_even(x: *mut u8) -> bool {
x as usize % 2 == 0
}
fn main() {
is_pointer_even(1 as *mut u8);
} |
Ah. You're right, of course, but that doesn't seem....particularly useful. In any case, there aren't any non- |
That's correct. |
Has there been any development on this issue? The current situation works but it doesn't feel like an elegant solution, as it essentially still allows for dangling pointers. |
@miried the biggest change is that now rustc emits warnings for incorrect usages of Without re-familiarizing myself with discussion, it'd be cool if |
Spawned from rust-lang/rust#34111 (cc @oli-obk).
Rendered