Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add CStr::with_ptr and deprecate CStr::as_ptr #1642

Closed
wants to merge 4 commits into from

Conversation

durka
Copy link
Contributor

@durka durka commented Jun 6, 2016

@BurntSushi BurntSushi added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Jun 6, 2016
@BurntSushi
Copy link
Member

Is every use of CString::new(...).unwrap().as_ptr() always incorrect?

Are there any uses of as_ptr that are impossible or difficult to express using with_ptr? What if I want to return a *const c_char from a function?

@oli-obk
Copy link
Contributor

oli-obk commented Jun 6, 2016

Is every use of CString::new(...).unwrap().as_ptr() always incorrect?

No. If the entire expression is part of a function argument, then the current rustc makes sure that the drop call happens after the function returns. This is necessary for taking the reference to an owned value. It gets problematic with pointers, because they are just values. I'll do some optimization experiments when I'm back on a PC

@durka
Copy link
Contributor Author

durka commented Jun 6, 2016

Is every use of CString::new(...).unwrap().as_ptr() always incorrect?

No. If the call occurs in an expression, the CString is kept alive until the end of an expression. For example, if it is immediately passed to a function which does not store the pointer, then it is sound. Here's an example showing the drop timing.

Are there any uses of as_ptr that are impossible or difficult to express using with_ptr? What if I want to return a *const c_char from a function?

The Drawbacks section shows how to emulate as_ptr using with_ptr. In particular:

fn get_ptr(s: &CString) -> *const c_char {
    let mut ptr = ptr::null();
    s.with_ptr(|p| { ptr = p; });
    ptr
}

@nagisa
Copy link
Member

nagisa commented Jun 6, 2016

Is every use of CString::new(...).unwrap().as_ptr() always incorrect?

From my watchtower, using the pointer returned by as_ptr here is always UB, regardless of how compiler behaves currently.

@durka
Copy link
Contributor Author

durka commented Jun 6, 2016

@nagisa I would say changing drop order in expressions is a breaking change.

@oli-obk
Copy link
Contributor

oli-obk commented Jun 6, 2016

Not if llvm already does that after optimizations... Which we should definitely figure out

@durka
Copy link
Contributor Author

durka commented Jun 6, 2016

My example prints the same in release configuration.

@matklad
Copy link
Member

matklad commented Jun 6, 2016

Are there any uses of as_ptr that are impossible or difficult to express using with_ptr?

If you have two CStrings, then you'll need two levels of indent, which may be ugly. And if you have Vec<CString>, then you'll need some kind of a recursive function, which might be significantly trickier than a simple loop.

Another alternative is "to mark as_ptr with unsafe", of course with introducing a new name and deprecating the old one.

@durka
Copy link
Contributor Author

durka commented Jun 6, 2016

I guess we could add a function that works on sequences as well. Is working with slices or Vecs of CStrings a common use case?

Another alternative is "to mark as_ptr with unsafe", of course with introducing a new name and deprecating the old one.

The community has been very reluctant to use unsafe to mark areas of potential-but-not-guaranteed misuse. Since you can't cause UB using just as_ptr (without unsafe code somewhere else), I think this is a non-starter. But I'll add it to the alternatives section.

@durka
Copy link
Contributor Author

durka commented Jun 7, 2016

Another alternative possibility (maybe, haven't thought this all the way through) is to deprecate CStr::as_ptr and add CStr::into_ptr which consumes self. Then you can't call it through a deref from CString. This seems like it adds some safety because as long as you've got an explicit CStr then there is non-temporary backing storage somewhere. (not possible because CStr is unsized)

@oli-obk
Copy link
Contributor

oli-obk commented Jun 7, 2016

Well... the true way to do this is to add a (future miri-based) const fn that does the str -> CStr conversion at compile-time and yields a &'static CStr. So we only need to think about the case where the contents of the CString depend on some runtime calculations.

Maybe we could create a method that freezes the CString object, and thus forcing it to be in scope (doesn't help against returning the pointer, but neither does this rfc)

My example prints the same in release configuration.

That's a test, not a guarantee that UB doesn't occur. Especially since a use after free might not even get noticed in such a small example (the memory might not be used again due to other allocations having requirements placing them elsewhere).

Have a look at this example: https://is.gd/AEiXxM

Activating release mode and creating llvm IR shows the following fun IR (first block of the main function):

invoke void @f({}* null)

I forgot what combo caused it, but I've had

invoke void @f({}* undef)

at some point or another.

@matklad
Copy link
Member

matklad commented Jun 7, 2016

Has anybody thought of improving the docs for as_ptr? It might be a good idea to add something like this.


WARNING: make sure that the CString is alive while you are using the returned pointer. In particular, for

let p = let c_to_print = CString::new("Hello, world!").unwrap().as_ptr();

dereferencing p inside an unsafe block or passing it to a C function will almost surely trigger undefined behavior.

Use this instead:

let hello_world = CString::new("Hello, world!").unwrap()
let p = let c_to_print = hello_world.as_ptr();

I would have done this myself, but I am not sure about my English :(

@durka
Copy link
Contributor Author

durka commented Jun 7, 2016

@oli-obk

I'd be very interested to see how you get an undef there. null seems to be correct as the CString::new in that playpen calls ptr::null. In any case I consider it a bug, and only somewhat relevant to the RFC, if LLVM is changing drop orders between debug and release mode. Certainly if undef ever appears that seems like it can't be correct.

@matklad

Docs are a part of the solution, but since all the examples show correct use of as_ptr (keep the CString alive, don't call as_ptr on a temporary) but people do it anyway, I don't think it's the full answer.

@oli-obk
Copy link
Contributor

oli-obk commented Jun 8, 2016

I'd be very interested to see how you get an undef there

Ah, apparently this occurs, if the function doesn't use the argument: https://is.gd/dKLVoH

@durka
Copy link
Contributor Author

durka commented Jun 10, 2016

One also wonders whether str::as_ptr and <[T]>::as_ptr should get the same treatment. That might be more churn than we want, and I bet those methods are less commonly used.

@oli-obk
Copy link
Contributor

oli-obk commented Jun 10, 2016

both str::as_ptr and <[T]>::as_ptr require you to know the length of the object, but there's no way to obtain both the length and the pointer in the same expression without storing the object in a binding.

@durka
Copy link
Contributor Author

durka commented Jun 10, 2016

That's a very good point. So those functions are less prone to misuse.

@bluss
Copy link
Member

bluss commented Jun 10, 2016

From experience in our user help channels, this feels like by far the most common use after free bug that users stumble into. Sometimes they find crashes and come ask because of it, sometimes the issue is found latently. It would be swell if a lint could catch this well, and very much worth it to introduce lints that help users to write sound Rust code in practice.

@starkat99
Copy link

This gets really nasty if you have to work with multiple CStrings at once, either nested or in a vector. I think a strong lint and extra documentation warnings would be good enough to address the issue.

@durka
Copy link
Contributor Author

durka commented Jun 15, 2016

@starkat99 I've added a simple macro to my example for that use case.

@kylewlacy
Copy link

IIUC, there was a time in Rust's history (long before I started using it) where with-style methods were common :)

What about instead using a newtype over *const c_char? Something like:

struct CCharRef<'a> {
    ptr: *const c_char,
    _phantom: PhantomData<&'a [c_char]>
}

impl<'a> Deref for CCharRef<'a> {
    type Target = *const c_char;

    fn deref(&self) -> &*const c_char {
        &self.ptr
    }
}

impl CStr {
    // ...

    #[allow(deprecated)]
    fn as_ref(&self) -> CCharRef {
        CCharRef {
            ptr: self.as_ptr(),
            _phantom: PhantomData
        }
    }
}

Then, the following would be an error:

let s = CString::new(...).unwrap().as_ref();
//      ^~~~~~~~~~~~~~~~~~~~~~~~~~
//      error: borrowed value does not live long enough

(Note that this would still allow let s = *CString::new(...).unwrap().as_ref(), and the CString will still be dropped immediately)

GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this pull request Jun 21, 2016
Document `CStr::as_ptr` dangers.

r? @steveklabnik

Hi! I've tried to document `CString::new("hello").unwrap().as_ptr()` footgun. Related [RFC] and the original [discussion].

[RFC]: rust-lang/rfcs#1642
[discussion]: https://users.rust-lang.org/t/you-should-stop-telling-people-that-safe-rust-is-always-safe/6094
@durka
Copy link
Contributor Author

durka commented Jul 7, 2016

There are docs now (thanks @matklad!), but I still think this is a good idea. Any other opinions? It's been quiet for a while. FCP time?

@alexcrichton
Copy link
Member

I think the addition of with_ptr is fine to have, but I don't think that as_ptr should be deprecated. Long ago we used to only have with_ptr and then we added as_ptr in a general trend towards "let's cut down on that rightward drift".

The as_ptr itself is not incorrect 100% of the time and it is much more ergonomic much of the time as well, so I'm not personally convinced that deprecation is warranted for an API like that. There's a downside for leaving it stable, it can go wrong, but there's a downside for deprecating it, the replacement is less ergonomic.

@iliekturtles
Copy link

If accepted I would like to see with_ptr return a value. My main use case is FFI where checking the return value of the foreign function is necessary.

@rkjnsn
Copy link
Contributor

rkjnsn commented Jul 14, 2016

-1 to removing as_ptr, as I find it more ergonomic in many cases. I personally don't think I'd choose to use with_ptr, but have no objection to adding it if other's would find it useful.

I would very much like to see a definitive answer to this unresolved question:

Does f(CString::new(...).unwrap().as_ptr()) actually invoke undefined behavior, if f doesn't store the pointer? The author's reading of the Rust reference implies that the CString temporary is kept alive for the entire expression, so it's fine. However, some commenters in the RFC thread have opined that the behavior of this code is unspecified at best.

Coming from C++, I would expect this to be valid, as temporaries there aren't destroyed until the end of the full expression (after f returns). If this is not the case in Rust, it should be very clearly documented as such.

@alexcrichton
Copy link
Member

The libs team discussed this briefly at the triage meeting yesterday. We felt that there definitely is a real problem here in terms of misusage of as_ptr, but we didn't quite feel that this RFC is the best solution to the problem.

This may be a case where the motivation may want to be fleshed out a little more before the detailed design is tackled. For example what exact cases are we targeted at solving and are we willing to compromise cases unrelated to this? (things like that)

@aturon aturon self-assigned this Jul 25, 2016
@durka
Copy link
Contributor Author

durka commented Aug 26, 2016

Here is another example of people writing unsound code, even though there is a giant warning in the documentation.

@matklad
Copy link
Member

matklad commented Sep 1, 2016

Here is a more tricky example of this issue (for which with_ptr would be uneconomic): https://users.rust-lang.org/t/solved-how-to-export-vec-string-to-c-with-ffi/7121/3?u=matklad

An insane idea: What if dropin CString would overwrite the buffer with b"DEAD MEMORY X_X"?

@durka
Copy link
Contributor Author

durka commented Sep 1, 2016

Yeah, slices of strings have been brought up already as a clear case where
this API would be less ergonomic than the current one. Maybe we could pair
it with another API that works on slices or iterators or something.

Your insane idea doesn't work unless CString also always leaks its
allocation.

On Thu, Sep 1, 2016 at 10:56 AM, Aleksey Kladov notifications@github.com
wrote:

Here is a more tricky example of this issue (for which with_ptr would be
uneconomic): https://users.rust-lang.org/t/solved-how-to-export-vec-
string-to-c-with-ffi/7121/3?u=matklad

An insane idea: What if dropin CString would overwrite the buffer with b"DEAD
MEMORY X_X"?


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#1642 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AAC3n6Jk9LTFazWhCSlBh40ylpvPEAqLks5qlueigaJpZM4IvNe4
.

@matklad
Copy link
Member

matklad commented Sep 1, 2016

Your insane idea doesn't work unless CString also always leaks its allocation.

The point is not to make this memory safe, but to make this fail at runtime with an easy to google reason. We obviously can't add runtime range checks, but overwriting the buffer with some garbadge will almost certainly lead to the the loud failure, which is much better than silent UB.

And perhaps we can "leak" CStrings in debug mode? We can allocate them from some kind of free list, such that deallocated CStrings are not immediately overwritting by random garbage, and instead are overwritten by our deterministic garbage :)

@matklad
Copy link
Member

matklad commented Sep 4, 2016

Ah, looks like I can't simply

#[cfg(debug_assertions)]
impl Drop for CString {
    fn drop(&mut self) {
        let pattern = b"X_X DEAD MEMORY ";
        let bytes = &mut self.inner[..self.inner.len() - 1];
        for (d, s) in bytes.iter_mut().zip(pattern.iter().cycle()) {
            *d = *s;
        }
    }
}

because the standard library is always build with --release :(

Adding a Drop impl to the release build (even if just returns if debug flag is not set (hm, and is there runtime debug flag at all?)) is a non-starter, isn't it?

which might be convenient but would make it easier to "leak" the pointer (as easy as
`let ptr = s.with_ptr(|p| p);`).

- Does `f(CString::new(...).unwrap().as_ptr())` actually invoke undefined behavior, if `f` doesn't store the pointer? The author's reading of the Rust reference implies that the `CString` temporary is kept alive for the entire expression, so it's fine. However, some commenters in the RFC thread have opined that the behavior of this code is unspecified at best.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Temporaries stay alive at least up to the nearest arena (see https://github.com/nikomatsakis/rust-memory-model/issues/17), and function calls do not create these.

bors added a commit to rust-lang/rust that referenced this pull request Sep 13, 2016
Zero first byte of CString on drop

Hi! This is one more attempt to ameliorate `CString::new("...").unwrap().as_ptr()` problem (related RFC: rust-lang/rfcs#1642).

One of the biggest problems with this code is that it may actually work in practice, so the idea of this PR is to proactively break such invalid code.

Looks like writing a `null` byte at the start of the CString should do the trick, and I think is an affordable cost: zeroing a single byte in `Drop` should be cheap enough compared to actual memory deallocation which would follow.

I would actually prefer to do something like

```Rust
impl Drop for CString {
    fn drop(&mut self) {
        let pattern = b"CTHULHU FHTAGN ";
        let bytes = self.inner[..self.inner.len() - 1];
        for (d, s) in bytes.iter_mut().zip(pattern.iter().cycle()) {
            *d = *s;
        }
    }
}
```

because Cthulhu error should be much easier to google, but unfortunately this would be too expensive in release builds, and we can't implement things `cfg(debug_assertions)` conditionally in stdlib.

Not sure if the whole idea or my implementation (I've used ~~`transmute`~~ `mem::unitialized` to workaround move out of Drop thing) makes sense :)
@durka
Copy link
Contributor Author

durka commented Oct 10, 2016

I'm going to close this as there seems to be significant resistance to adding friction to the CStr::as_ptr interface, and @matklad's PR adds some sort of runtime defense against the bug that as_ptr invites.

@durka durka closed this Oct 10, 2016
@goertzenator
Copy link

Wouldn't implementing AsRef<c_char> for CString and using it instead of as_ptr() solve this problem? &c_char coerces to *const c_char but still gets borrow checked.

use std::ffi::{CString, NulError};
use std::os::raw::c_char;

fn ffi_function(_hello: *const c_char) {}


// minimal wrapper for CString so we can add traits
struct MyCString(CString);
impl MyCString {
    fn new<T: Into<Vec<u8>>>(t: T) -> Result<MyCString, NulError> {
        CString::new(t).map(|c| MyCString(c))
    }
}

impl AsRef<c_char> for MyCString {
    fn as_ref(&self) -> &c_char {
        unsafe{ &*self.0.as_ptr() }
    }
}

fn main() {
    { // as_ptr() undefined behavior case
        let p =   CString::new("some_string").unwrap().as_ptr();
        ffi_function(p);
    }
    { // as_ptr() proper use case
        let s =   CString::new("some_string").unwrap();
        ffi_function(s.as_ptr());
    }
    { // as_ref() undefined behavior case, but borrow checker catches it
        let r = MyCString::new("some_string").unwrap().as_ref();
        ffi_function(r);
    }
    { // as_ptr() proper use case, reference coerces to pointer
        let s = MyCString::new("some_string").unwrap();
        ffi_function(s.as_ref());
    }
}

@eddyb
Copy link
Member

eddyb commented Feb 1, 2017

Ideally, CString and &CStr would be ABI compatible with *const c_char, so we can use them directly.

@oli-obk oli-obk mentioned this pull request Jun 14, 2018
@BatmanAoD
Copy link
Member

I don't quite understand why this issue is closed. Doesn't this invalidate Rust's guarantee that safe-rust cannot have undefined behavior, unless there is an error in an unsafe block somewhere?

@BatmanAoD
Copy link
Member

...upon reflection, it seems that the unsafety is in the passing of a pointer by-value to a function. I realize this would be a breaking change, but perhaps the only way to truly uphold that guarantee would be to require any function that takes a raw pointer as a parameter to be marked unsafe?

@sfackler
Copy link
Member

It's only unsafe to dereference raw pointers, not pass them around.

@BatmanAoD
Copy link
Member

@sfackler Right, but there can be a safe public function f that takes a raw pointer and (unsafely) dereferences it. Since such a function cannot guarantee that the dereference will be safe, the function itself ought to be unsafe, I would think.

That said, when I wrote that comment I was under the impression (based this old comment) that there existed an open function in the standard library that took a raw pointer, but it seems that's not the case.

@sfackler
Copy link
Member

Since such a function cannot guarantee that the dereference will be safe, the function itself ought to be unsafe, I would think.

Yes, that's correct. You can do things with raw pointers that don't involve dereferencing them, though. This function consumes a raw pointer and is totally safe to call with any value:

fn is_pointer_even(x: *mut u8) -> bool {
    x as usize % 2 == 0
}

fn main() {
    is_pointer_even(1 as *mut u8);
}

@BatmanAoD
Copy link
Member

Ah. You're right, of course, but that doesn't seem....particularly useful.

In any case, there aren't any non-unsafe functions in the standard library that consume and dereference a raw pointer, correct?

@sfackler
Copy link
Member

That's correct.

@miried
Copy link

miried commented Oct 19, 2021

Has there been any development on this issue?

The current situation works but it doesn't feel like an elegant solution, as it essentially still allows for dangling pointers.

@matklad
Copy link
Member

matklad commented Oct 19, 2021

@miried the biggest change is that now rustc emits warnings for incorrect usages of CString:

playground

Without re-familiarizing myself with discussion, it'd be cool if &CStr was guaranteed to have the same repr as *const c_char, such that CStr was an ffi-safe type.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.