Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ffi/CString: Note that with system allocator, C free() is safe #56295

Closed

Conversation

cgwalters
Copy link
Contributor

This text originated with e20a6db
which was in 2015 when Rust was strongly oriented towards jemalloc.
We now support the system allocator, and it's the default.

Returning strings from Rust to C is much more ergonomic if one can
invoke free() directly, so let's make that clear.

@rust-highfive
Copy link
Collaborator

r? @KodrAus

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Nov 27, 2018
@cgwalters
Copy link
Contributor Author

@federicomenaquintero ⬆️

@sfackler
Copy link
Member

I don't think this is true on windows-msvc at least, right?

@cgwalters
Copy link
Contributor Author

I don't think this is true on windows-msvc at least, right?

Hum...something related to the libcrt or so? Sounds possible...any idea for a Windows expert we could ping for useful phrasing here?

If the system allocator is in use, and the platform is not Windows with MSVC, it is safe to have C invoke free() on the returned pointer.

? A bit awkward...

@sfackler
Copy link
Member

The Windows version of the system allocator doesn't use malloc/free, it uses HeapAlloc/HeapFree.

Beyond that, it seems somewhat dangerous to guarantee this forever even on Unixy systems. We may want to use other APIs on certain platforms in the future (e.g. FreeBSD has some of the jemalloc nonstandard APIs).

@hanna-kruppe
Copy link
Contributor

It seems reasonable to me to at least guarantee that this library code calls into the standard global allocator functions, whatever those may be. However, no library could actually use that knowledge, since library crates don't get to choose the allocator they run under. So only code that is directly part of an application could possibly benefit (and for the rest, the current wording in this PR seems dangerously misleading).

It seems more helpful to eventually support creating CStrings with different allocators, then anyone can allocate a C string with the system allocator and free it in C code. Well, assuming it's guaranteed what functions the System allocator calls -- but I think it should be, since improved compatibility with C code and C tools was one reason for switching to it by default.

@cgwalters
Copy link
Contributor Author

We may want to use other APIs on certain platforms in the future (e.g. FreeBSD has some of the jemalloc nonstandard APIs).

You're thinking of the sized free one?

But the API here returns just a pointer - how would we retain the size? I think the sized dealloc would (could) only come into play in the normal pure-Rust case where it's part of Box or so. This API is unusual in that it's designed to pass to C.

However, no library could actually use that knowledge, since library crates don't get to choose the allocator they run under.

I think general library (pure Rust) crates aren't going to be dealing with CString right? But I can try to tweak the wording here.

I still need to search crates.io more but...if there isn't something like this yet already I plan at some point to extract some of my FFI helpers to a crate and they will be opinionated around only working if the system allocator is in use, because IMO the ergonomic benefits are worth it.

Basically when I was first starting on oxidizing our project I read this page and just found that to be too onerous. In particular we today heavily use __attribute__(cleanup) to free char * strings and having to remember to use a special one for Rust would make the C side even more dangerous.

@hanna-kruppe
Copy link
Contributor

I think general library (pure Rust) crates aren't going to be dealing with CString right? But I can try to tweak the wording here.

There are a lot of C libraries, and their Rust bindings are Rust library crates that call into those libraries via FFI. Just because it's a library doesn't mean it's pure Rust!

This text originated with e20a6db
which was in 2015 when Rust was strongly oriented towards jemalloc.
We now support the system allocator, and it's the default.

Returning strings from Rust to C is much more ergonomic if one can
invoke `free()` directly, so let's make that clear.
@cgwalters cgwalters force-pushed the cstring-system-allocator branch from ea88165 to b921b9f Compare November 27, 2018 23:23
@cgwalters
Copy link
Contributor Author

Just because it's a library doesn't mean it's pure Rust!

Fair.

I updated the patch to add some wording based on discussion so far.

However...I have a new argument which is basically: this PR is documenting a true fact today. And it's likely to quickly become more true.

We've always had a 2x2 matrix generated by (system allocator, jemalloc) x (crate safe with jemalloc, not safe with jemalloc). Of those, only the pair (jemalloc, not safe with jemalloc) is unsafe. Now, before jemalloc was dropped by default, crates which violated this property would (probably) explode fairly quickly.

The defaults are powerful - how many projects change it? Probably not many.

But now that the system allocator is the default...I suspect it probably won't take too long (6 months? A year?) before there are FFI crates which rely on this, even if unintentionally. How many FFI crates will think to enable testing with jemalloc? How many of their users will enable it and have their program blow up? I'd assume popular FFI crates like e.g. openssl will support being used with jemalloc but...as of today:

$ cd ~/src/github/sfackler/rust-openssl/
$ git grep jemalloc
$ 

@Centril Centril added T-lang Relevant to the language team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Nov 28, 2018
@sfackler
Copy link
Member

But the API here returns just a pointer - how would we retain the size?

strlen(the_pointer)?

I'd assume popular FFI crates like e.g. openssl will support being used with jemalloc but...as of today:

What actions would you expect rust-openssl needing to take to be used with jemalloc? That crate supports and has always supported any choice of global allocator.

@hanna-kruppe
Copy link
Contributor

@cgwalters If the situation you describe will come true, it suggests not having plugable allocators at all, i.e. ensuring only the system allocator can ever be used. Suggesting patterns that are known to be busted in the documentation will just make the situation even worse (because FFI code will not just accidentially rely on the system allocator, but also intentionally after checking the docs). However, custom allocators were stabilized recently after years-long development to address very real requirements of many projects, so removing them is pretty much right out, and we're left with accepting that (if you are right) lots of code will only ever be tested by its developers under the system allocator, and can only ask: how do we minimize bugs that sneak in this way? These docs changes are at least not helping with that.

@cgwalters
Copy link
Contributor Author

strlen(the_pointer)?

I am not a malloc expert but...I think the idea of sized dealloc is that it's for cases where the program has the size already at hand e.g. Box<str>. The allocator already can figure out the size from the pointer itself; for large buffers running strlen() I'd be fairly sure would be a pessimization.

What actions would you expect rust-openssl needing to take to be used with jemalloc? That crate supports and has always supported any choice of global allocator.

Tests?

That said...the case we have in our project is primarily C invoking Rust functions. Most FFI crates are the opposite; there, the C ➡️ Rust case happens in callbacks and the like. I know there are C libraries that do this, but it's probably...10% of the cases or less? (Excepting things like GTK+ which are heavily callback/vfunc based).

It may be that openssl for example doesn't have any cases like that, and particularly not ones where it provides a malloc()'d buffer that it expects the user to free() directly.

it suggests not having plugable allocators at all,

Well...I am not going to be the person who fights for that and argues for dropping rustc perf by 7-10% or whatever 😄

and we're left with accepting that (if you are right) lots of code will only ever be tested by its developers under the system allocator, and can only ask: how do we minimize bugs that sneak in this way? These docs changes are at least not helping with that.

I'd reiterate that the doc changes are documenting something that's both true today, and would I believe now be hard to change (and not worth changing). So...can we negotiate to a weaker change that still encourages passing back to Rust but notes the system allocator as a special case?

diff --git a/src/libstd/ffi/c_str.rs b/src/libstd/ffi/c_str.rs
index 0152e9fc2e..2ede59cbd5 100644
--- a/src/libstd/ffi/c_str.rs
+++ b/src/libstd/ffi/c_str.rs
@@ -414,12 +414,14 @@ impl CString {
 
     /// Consumes the `CString` and transfers ownership of the string to a C caller.
     ///
-    /// If the system allocator is in use, it is safe to have C invoke the corresponding
-    /// system-level deallocation function (e.g. `free()` on Unix, `HeapFree` on Windows)
-    /// on the returned pointer.
-    ///
-    /// If you are using a custom allocator, then this value should be returned
-    /// to Rust and reconstituted using [`from_raw`] to be properly deallocated.
+    /// The safe best practice is to then return the pointer back to Rust and invoke [`from_raw`]
+    /// to deallocate it.  Particularly when Rust is configured to use a custom allocator,
+    /// you *must* do this.
+    ///
+    /// If the system allocator is in use (and your code can rely on this), it
+    /// is safe to have C invoke the corresponding system-level deallocation
+    /// function (e.g. `free()` on Unix, `HeapFree` on Windows) on the returned
+    /// pointer.
     ///
     /// As a general rule, best practice is for library crates to not make an assumption
     /// about the global allocator.

?

@hanna-kruppe
Copy link
Contributor

I'd reiterate that the doc changes are documenting something that's both true today, and would I believe now be hard to change (and not worth changing). So...can we negotiate to a weaker change that still encourages passing back to Rust but notes the system allocator as a special case?

Oh I'm not really opposed to these doc changes in principle (but I'm running low on resources to bikeshed the details), I'm just saying if we take the argument in #56295 (comment) seriously, it is not an argument for these changes.

@steveklabnik
Copy link
Member

I am also pretty uncomfortable applying this patch, mostly for reasons others have already stated.

@federicomenaquintero
Copy link
Contributor

I appreciate the spirit of this patch, but I think it's wrong. Please bear with me while I describe what librsvg did about this.

Librsvg also went through a phase of dealing with passing strings from Rust to C, while the code is being Rustified. We ended up just using glib-rs's conventions:

  • Passing strings to C functions as arguments - rust_string.to_glib_none().0 via the Stash mechanism.
  • Returning strings from C-callable functions - rust_string.to_glib_full(). This is guaranteed to use g_malloc(); the C caller can use g_free().
  • Taking C strings as function arguments, borrowed - from_glib_none(c_string). Glib-rs made the decision to make this a copy, since it uses String::from_utf8_lossy(). The "borrowed" here means that the original C string is not freed.
  • Taking ownership of C strings (and freeing them) - from_glib_full(c_string). This does the same as the last one, and calls g_free(c_string).

Librsvg saw a very small performance degradation due to the UTF-8 validating copies. We were able to fix that only because most of the strings it gets from the C side come from libxml2, which already does UTF-8 validation itself. So, librsvg has an unsafe fn utf8_cstr(s: *const c_char) -> &str which uses str::from_utf8_unchecked(), and which is as scary as you can imagine. The library is of course careful to use that only for strings which come from libxml2.

Basically, returning strings ergonomically for both C and Rust is a lost battle. Glib-rs makes it easy to transfer those, plus other types as well.

However, *the rust_gimme_a_string() and rust_free_a_string() functions that result end up going away when the rest of the code is Rustified - they are temporary measures, and everything ends up clean.

@retep998
Copy link
Member

The Windows version of the system allocator doesn't use malloc/free, it uses HeapAlloc/HeapFree.

There is no guarantee by the CRT of what underlying allocator is used for malloc and free. It can even change depending on whether you use the debug CRT or the release CRT. In fact, there are situations where memory allocated by the system allocator cannot be freed by calling HeapFree directly (and vice versa), in particular overaligned allocations.

@Dylan-DPC-zz
Copy link

ping from triage. r? @eddyb

@rust-highfive rust-highfive assigned eddyb and unassigned KodrAus Dec 10, 2018
@eddyb
Copy link
Member

eddyb commented Dec 10, 2018

r? @sfackler cc @RalfJung

@rust-highfive rust-highfive assigned sfackler and unassigned eddyb Dec 10, 2018
@cgwalters
Copy link
Contributor Author

[lots of stuff about strings]

I understand why glib-rs chose their current approach but I think the default allocator switch is actually a major change in Rust.

Basically, returning strings ergonomically for both C and Rust is a lost battle.

My perspective here is probably colored a bit by the fact that I don't see my project being 100% Rust in the foreseeable future (nontrivial external C dependencies, etc.). And if I weigh "require global allocator" versus introducing a new string type to the C side...there's not even a question, we're clearly going to do the former.

That said it's not really worth debating this extensively.

@cgwalters cgwalters closed this Dec 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-lang Relevant to the language team, which will review and decide on the PR/issue. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.