Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for CStr::from_bytes_with_nul #31190

Closed
alexcrichton opened this issue Jan 25, 2016 · 9 comments
Closed

Tracking issue for CStr::from_bytes_with_nul #31190

alexcrichton opened this issue Jan 25, 2016 · 9 comments
Labels
B-unstable Blocker: Implemented in the nightly compiler and unstable. final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@alexcrichton
Copy link
Member

Being added in #30614, this'll track the unstable status.

  • Are these the right names for the functions?
  • Do we think they'll be generally useful?
  • etc.
@alexcrichton alexcrichton added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. B-unstable Blocker: Implemented in the nightly compiler and unstable. labels Jan 25, 2016
@kamalmarhubi
Copy link
Contributor

I'm writing a bunch of systems code, and would like this function, along with a couple of related trait impls I forget right now.

One thing that jumps to mind: should the return be Option, or a Result with something akin NulError for CString? There's mention of an error at the start of #30614, and the analog in rust-lang/rfcs#1264 also has a Result.

@CasualX
Copy link

CasualX commented Apr 24, 2016

I've just bumped into this function and I'd like to add a consideration:

In C/C++ it happens that you get a fixed size buffer eg char strbuf[128]; which may be filled up to the brim with characters or ended earlier with a nul terminator. You'd manipulate these buffers with the strn* functions.

These days it is not encouraged to do this, but there's plenty of legacy stuff that uses it. In such a case interior nul bytes or not even having an ending nul is not a bug, but expected behaviour.

In this case it would be desired to have a function that can convert these into &CStr instead of returning an error.

Is there even a use case for when you have a byte slice (char pointer + length) and want it to contain exactly a nul terminated C string? I can only think of unbounded, nul terminated strings (covered by CStr::from_ptr) and fixed length char buffers containing a C string of max length.

Fixed length char buffers with optional nul terminator can't ever be converted to &CStr exactly because of its lacking nul terminator, but it's still desirable to convert them to &str...

@arcnmx
Copy link
Contributor

arcnmx commented Apr 24, 2016

@CasualX what you describe is not really a C string if it is not null terminted, and should not be used as one. Rust's CStr type is essentially "a collection of (non-zero) bytes terminated by a zero byte" and not keeping to that rule results in memory unsafety. A [u8] or [c_char] seems more appropriate for your case, with perhaps a helper function that will construct the smaller slice if it hits an early zero (or maybe a Result<&CStr, &[u8]> or even a Cow<CStr> depending on what you want to do with it?)

Is there even a use case for when you have a byte slice (char pointer + length) and want it to contain exactly a nul terminated C string?

The function generally exists for data coming from a Rust application to be used in an FFI context, its main advantages being:

  1. Accepts a native rust slice, as opposed to CStr::from_ptr(data.as_ptr() as *const libc::c_char)
  2. The unsafe variant carries the additional benefit of avoiding the length check currently associated with from_ptr.

Take the implementation of Deref for CString for example; it uses transmute because it knows the underlying representation of &CStr. Stable code can't rely on that (and in fact the representation may possibly change to a single *mut c_char in the future). It comes up when creating strings for FFI purposes. Situations like creating types similar to CString, or maybe you're avoiding heap allocations and just can't use it, or whatever else.

It's also useful for other situations too though, like your fixed-size character buffer example. To convert it to a &CStr, you need to do the null byte check yourself, so you already know the length of the string at that point; which gives you a nice slice to then create a &CStr from.

@SimonSapin
Copy link
Contributor

@CasualX What you describe sounds like a different abstraction from CStr (which is "nul-terminated with no interior nul"). I’d recommend defining a separate struct to build your own abstraction. CStr like the rest of the standard library is written in Rust, you can see it here:

https://github.com/rust-lang/rust/blob/master/src/libstd/ffi/c_str.rs

@CasualX
Copy link

CasualX commented Apr 25, 2016

@SimonSapin

Ah I interpreted Rust's CStr as 'be compatible with how C treats strings', but its scope is restricted to just 'nul terminated char arrays'.

I came here because in reading some file formats there's quite a lot of 'fixed length char buffers containing a maybe nul terminated C string' for which I was looking how to easily deal with. A simple scan for nul byte should do the trick, thanks.

I think it would be better to mimic the std::str::from_utf8 API (ie, a free function):

fn from_char_buf(buf: &[c_char]) -> &[u8] {
    let len = buf.iter()
        .enumerate()
        .find(|&(_, &byte)| byte == 0)
        .map_or_else(|| buf.len(), |(len, _)| len);
    &buf[..len]
}

But I digress as this is off-topic, to atone for bringing up someting unrelated have some on-topic thoughts:

@arcnmx

Ok I see, this API exposes the CString -> &CStr as a constructor of &CStr where you don't start out with a CString. This implies that the use case of this API specifically is 'you have something that can't be turned into a CString but you wish to create a &CStr from regardless.

You suggest two use cases:

  • Creating your own CString-like abstraction, in that case it should be covered by CString::new where it consumes your custom container.
  • Creating &CStr without heap allocations, essentially a custom container type that uses a fixed-size u8 array for backing which seems reasonable.

(My use case wouldn't be covered anyway, assuming the upfront length calculation really does go away at some point)

The docs specifically say that the upfront length calculation is an implementation detail that is desired to be changed. Using this as an argument is essentially cementing in the API that the length check is done ahead of time.

As for the API the checked version it should most certainly return a Result<&CStr, NulError> to mimic how CString::new behaves.

The names mirror their cousins in CString and matches the convention used for this sort of thing throughout the stdlib.

@arcnmx
Copy link
Contributor

arcnmx commented Apr 25, 2016

I think it would be better to mimic the std::str::from_utf8 API (ie, a free function):

Yup, that probably works best. A &[u8] newtype wrapper that Derefs to a slice on an early nil byte would also be appropriate. The only place &CStr comes in here is if you wanted to then pass that string on to something like open(2), where you'd need a Cow<CStr> or something similar to ensure that it indeed is nil-terminated.

File formats will do this fixed-size character buffer thing, but it's more a storage thing and a way to avoid a variable-length encoding rather than a char* meant to be used as a c string anywhere.

Creating your own CString-like abstraction, in that case it should be covered by CString::new where it consumes your custom container.

Hm? It's not... The point is if you need a similar abstraction to CString but can't actually use it, you don't have access to the same facilities that the standard library does to perform a straightforward borrow of &CStr.

The docs specifically say that the upfront length calculation is an implementation detail that is desired to be changed. Using this as an argument is essentially cementing in the API that the length check is done ahead of time.

Yes, that's why I called it a bonus. It's not an argument in favour of stabilizing, but it is a property of the function that probably won't disappear for maybe a year out from now. These kinds of implementation details are what unstable is for, and if its other merits aren't enough to stabilize it, one could argue not to deprecate or remove it until the length check is no more.

I also forgot to mention the most important argument for the function: it retains lifetime information by borrowing the underlying slice, whereas from_ptr will just return the most permissive lifetime needed to satisfy the borrow checker, potentially resulting in memory unsafety and a use after free, etc. Accidentally allowing .as_ptr() to escape the lifetime of its original value is a pretty common pitfall in FFI code that this constructor function helps to avoid.

As for the API the checked version it should most certainly return a Result<&CStr, NulError> to mimic how CString::new behaves.

Agreed. We can't use the name NulError though as it's already taken (and assumes CString, since it has an into_vec method). We'll need another name for it!

@CasualX
Copy link

CasualX commented Apr 25, 2016

If a name is what you want, I suggest NulErr ;)

@alexcrichton
Copy link
Member Author

🔔 This issue is now entering its cycle-long final comment period for stabilization 🔔

The libs team decided, however, that we should probably return a Result here instead of an Option

@alexcrichton alexcrichton added final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. and removed I-nominated labels Apr 29, 2016
@alexcrichton
Copy link
Member Author

The libs team discussed this issue during triage yesterday and the decision was to stabilize

alexcrichton added a commit to alexcrichton/rust that referenced this issue May 24, 2016
This commit applies the FCP decisions made by the libs team for the 1.10 cycle,
including both new stabilizations and deprecations. Specifically, the list of
APIs is:

Stabilized:

* `os::windows::fs::OpenOptionsExt::access_mode`
* `os::windows::fs::OpenOptionsExt::share_mode`
* `os::windows::fs::OpenOptionsExt::custom_flags`
* `os::windows::fs::OpenOptionsExt::attributes`
* `os::windows::fs::OpenOptionsExt::security_qos_flags`
* `os::unix::fs::OpenOptionsExt::custom_flags`
* `sync::Weak::new`
* `Default for sync::Weak`
* `panic::set_hook`
* `panic::take_hook`
* `panic::PanicInfo`
* `panic::PanicInfo::payload`
* `panic::PanicInfo::location`
* `panic::Location`
* `panic::Location::file`
* `panic::Location::line`
* `ffi::CStr::from_bytes_with_nul`
* `ffi::CStr::from_bytes_with_nul_unchecked`
* `ffi::FromBytesWithNulError`
* `fs::Metadata::modified`
* `fs::Metadata::accessed`
* `fs::Metadata::created`
* `sync::atomic::Atomic{Usize,Isize,Bool,Ptr}::compare_exchange`
* `sync::atomic::Atomic{Usize,Isize,Bool,Ptr}::compare_exchange_weak`
* `collections::{btree,hash}_map::{Occupied,Vacant,}Entry::key`
* `os::unix::net::{UnixStream, UnixListener, UnixDatagram, SocketAddr}`
* `SocketAddr::is_unnamed`
* `SocketAddr::as_pathname`
* `UnixStream::connect`
* `UnixStream::pair`
* `UnixStream::try_clone`
* `UnixStream::local_addr`
* `UnixStream::peer_addr`
* `UnixStream::set_read_timeout`
* `UnixStream::set_write_timeout`
* `UnixStream::read_timeout`
* `UnixStream::write_Timeout`
* `UnixStream::set_nonblocking`
* `UnixStream::take_error`
* `UnixStream::shutdown`
* Read/Write/RawFd impls for `UnixStream`
* `UnixListener::bind`
* `UnixListener::accept`
* `UnixListener::try_clone`
* `UnixListener::local_addr`
* `UnixListener::set_nonblocking`
* `UnixListener::take_error`
* `UnixListener::incoming`
* RawFd impls for `UnixListener`
* `UnixDatagram::bind`
* `UnixDatagram::unbound`
* `UnixDatagram::pair`
* `UnixDatagram::connect`
* `UnixDatagram::try_clone`
* `UnixDatagram::local_addr`
* `UnixDatagram::peer_addr`
* `UnixDatagram::recv_from`
* `UnixDatagram::recv`
* `UnixDatagram::send_to`
* `UnixDatagram::send`
* `UnixDatagram::set_read_timeout`
* `UnixDatagram::set_write_timeout`
* `UnixDatagram::read_timeout`
* `UnixDatagram::write_timeout`
* `UnixDatagram::set_nonblocking`
* `UnixDatagram::take_error`
* `UnixDatagram::shutdown`
* RawFd impls for `UnixDatagram`
* `{BTree,Hash}Map::values_mut`
* `<[_]>::binary_search_by_key`

Deprecated:

* `StaticCondvar` - this, and all other static synchronization primitives
                    below, are usable today through the lazy-static crate on
                    stable Rust today. Additionally, we'd like the non-static
                    versions to be directly usable in a static context one day,
                    so they're unlikely to be the final forms of the APIs in any
                    case.
* `CONDVAR_INIT`
* `StaticMutex`
* `MUTEX_INIT`
* `StaticRwLock`
* `RWLOCK_INIT`
* `iter::Peekable::is_empty`

Closes rust-lang#27717
Closes rust-lang#27720
cc rust-lang#27784 (but encode methods still exist)
Closes rust-lang#30014
Closes rust-lang#30425
Closes rust-lang#30449
Closes rust-lang#31190
Closes rust-lang#31399
Closes rust-lang#31767
Closes rust-lang#32111
Closes rust-lang#32281
Closes rust-lang#32312
Closes rust-lang#32551
Closes rust-lang#33018
bors added a commit that referenced this issue May 26, 2016
std: Stabilize APIs for the 1.10 release

This commit applies the FCP decisions made by the libs team for the 1.10 cycle,
including both new stabilizations and deprecations. Specifically, the list of
APIs is:

Stabilized:

* `os::windows::fs::OpenOptionsExt::access_mode`
* `os::windows::fs::OpenOptionsExt::share_mode`
* `os::windows::fs::OpenOptionsExt::custom_flags`
* `os::windows::fs::OpenOptionsExt::attributes`
* `os::windows::fs::OpenOptionsExt::security_qos_flags`
* `os::unix::fs::OpenOptionsExt::custom_flags`
* `sync::Weak::new`
* `Default for sync::Weak`
* `panic::set_hook`
* `panic::take_hook`
* `panic::PanicInfo`
* `panic::PanicInfo::payload`
* `panic::PanicInfo::location`
* `panic::Location`
* `panic::Location::file`
* `panic::Location::line`
* `ffi::CStr::from_bytes_with_nul`
* `ffi::CStr::from_bytes_with_nul_unchecked`
* `ffi::FromBytesWithNulError`
* `fs::Metadata::modified`
* `fs::Metadata::accessed`
* `fs::Metadata::created`
* `sync::atomic::Atomic{Usize,Isize,Bool,Ptr}::compare_exchange`
* `sync::atomic::Atomic{Usize,Isize,Bool,Ptr}::compare_exchange_weak`
* `collections::{btree,hash}_map::{Occupied,Vacant,}Entry::key`
* `os::unix::net::{UnixStream, UnixListener, UnixDatagram, SocketAddr}`
* `SocketAddr::is_unnamed`
* `SocketAddr::as_pathname`
* `UnixStream::connect`
* `UnixStream::pair`
* `UnixStream::try_clone`
* `UnixStream::local_addr`
* `UnixStream::peer_addr`
* `UnixStream::set_read_timeout`
* `UnixStream::set_write_timeout`
* `UnixStream::read_timeout`
* `UnixStream::write_Timeout`
* `UnixStream::set_nonblocking`
* `UnixStream::take_error`
* `UnixStream::shutdown`
* Read/Write/RawFd impls for `UnixStream`
* `UnixListener::bind`
* `UnixListener::accept`
* `UnixListener::try_clone`
* `UnixListener::local_addr`
* `UnixListener::set_nonblocking`
* `UnixListener::take_error`
* `UnixListener::incoming`
* RawFd impls for `UnixListener`
* `UnixDatagram::bind`
* `UnixDatagram::unbound`
* `UnixDatagram::pair`
* `UnixDatagram::connect`
* `UnixDatagram::try_clone`
* `UnixDatagram::local_addr`
* `UnixDatagram::peer_addr`
* `UnixDatagram::recv_from`
* `UnixDatagram::recv`
* `UnixDatagram::send_to`
* `UnixDatagram::send`
* `UnixDatagram::set_read_timeout`
* `UnixDatagram::set_write_timeout`
* `UnixDatagram::read_timeout`
* `UnixDatagram::write_timeout`
* `UnixDatagram::set_nonblocking`
* `UnixDatagram::take_error`
* `UnixDatagram::shutdown`
* RawFd impls for `UnixDatagram`
* `{BTree,Hash}Map::values_mut`
* `<[_]>::binary_search_by_key`

Deprecated:

* `StaticCondvar` - this, and all other static synchronization primitives
                    below, are usable today through the lazy-static crate on
                    stable Rust today. Additionally, we'd like the non-static
                    versions to be directly usable in a static context one day,
                    so they're unlikely to be the final forms of the APIs in any
                    case.
* `CONDVAR_INIT`
* `StaticMutex`
* `MUTEX_INIT`
* `StaticRwLock`
* `RWLOCK_INIT`
* `iter::Peekable::is_empty`

Closes #27717
Closes #27720
Closes #30014
Closes #30425
Closes #30449
Closes #31190
Closes #31399
Closes #31767
Closes #32111
Closes #32281
Closes #32312
Closes #32551
Closes #33018
alexcrichton added a commit to alexcrichton/rust that referenced this issue May 26, 2016
This commit applies the FCP decisions made by the libs team for the 1.10 cycle,
including both new stabilizations and deprecations. Specifically, the list of
APIs is:

Stabilized:

* `os::windows::fs::OpenOptionsExt::access_mode`
* `os::windows::fs::OpenOptionsExt::share_mode`
* `os::windows::fs::OpenOptionsExt::custom_flags`
* `os::windows::fs::OpenOptionsExt::attributes`
* `os::windows::fs::OpenOptionsExt::security_qos_flags`
* `os::unix::fs::OpenOptionsExt::custom_flags`
* `sync::Weak::new`
* `Default for sync::Weak`
* `panic::set_hook`
* `panic::take_hook`
* `panic::PanicInfo`
* `panic::PanicInfo::payload`
* `panic::PanicInfo::location`
* `panic::Location`
* `panic::Location::file`
* `panic::Location::line`
* `ffi::CStr::from_bytes_with_nul`
* `ffi::CStr::from_bytes_with_nul_unchecked`
* `ffi::FromBytesWithNulError`
* `fs::Metadata::modified`
* `fs::Metadata::accessed`
* `fs::Metadata::created`
* `sync::atomic::Atomic{Usize,Isize,Bool,Ptr}::compare_exchange`
* `sync::atomic::Atomic{Usize,Isize,Bool,Ptr}::compare_exchange_weak`
* `collections::{btree,hash}_map::{Occupied,Vacant,}Entry::key`
* `os::unix::net::{UnixStream, UnixListener, UnixDatagram, SocketAddr}`
* `SocketAddr::is_unnamed`
* `SocketAddr::as_pathname`
* `UnixStream::connect`
* `UnixStream::pair`
* `UnixStream::try_clone`
* `UnixStream::local_addr`
* `UnixStream::peer_addr`
* `UnixStream::set_read_timeout`
* `UnixStream::set_write_timeout`
* `UnixStream::read_timeout`
* `UnixStream::write_Timeout`
* `UnixStream::set_nonblocking`
* `UnixStream::take_error`
* `UnixStream::shutdown`
* Read/Write/RawFd impls for `UnixStream`
* `UnixListener::bind`
* `UnixListener::accept`
* `UnixListener::try_clone`
* `UnixListener::local_addr`
* `UnixListener::set_nonblocking`
* `UnixListener::take_error`
* `UnixListener::incoming`
* RawFd impls for `UnixListener`
* `UnixDatagram::bind`
* `UnixDatagram::unbound`
* `UnixDatagram::pair`
* `UnixDatagram::connect`
* `UnixDatagram::try_clone`
* `UnixDatagram::local_addr`
* `UnixDatagram::peer_addr`
* `UnixDatagram::recv_from`
* `UnixDatagram::recv`
* `UnixDatagram::send_to`
* `UnixDatagram::send`
* `UnixDatagram::set_read_timeout`
* `UnixDatagram::set_write_timeout`
* `UnixDatagram::read_timeout`
* `UnixDatagram::write_timeout`
* `UnixDatagram::set_nonblocking`
* `UnixDatagram::take_error`
* `UnixDatagram::shutdown`
* RawFd impls for `UnixDatagram`
* `{BTree,Hash}Map::values_mut`
* `<[_]>::binary_search_by_key`

Deprecated:

* `StaticCondvar` - this, and all other static synchronization primitives
                    below, are usable today through the lazy-static crate on
                    stable Rust today. Additionally, we'd like the non-static
                    versions to be directly usable in a static context one day,
                    so they're unlikely to be the final forms of the APIs in any
                    case.
* `CONDVAR_INIT`
* `StaticMutex`
* `MUTEX_INIT`
* `StaticRwLock`
* `RWLOCK_INIT`
* `iter::Peekable::is_empty`

Closes rust-lang#27717
Closes rust-lang#27720
cc rust-lang#27784 (but encode methods still exist)
Closes rust-lang#30014
Closes rust-lang#30425
Closes rust-lang#30449
Closes rust-lang#31190
Closes rust-lang#31399
Closes rust-lang#31767
Closes rust-lang#32111
Closes rust-lang#32281
Closes rust-lang#32312
Closes rust-lang#32551
Closes rust-lang#33018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
B-unstable Blocker: Implemented in the nightly compiler and unstable. final-comment-period In the final comment period and will be merged soon unless new substantive objections are raised. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants