Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add {String,Vec}::into_raw_parts #65705

Merged
merged 3 commits into from
Oct 25, 2019
Merged

Conversation

shepmaster
Copy link
Member

@shepmaster shepmaster commented Oct 22, 2019

Aspects to address:

@rust-highfive
Copy link
Collaborator

r? @dtolnay

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Oct 22, 2019
@shepmaster shepmaster added C-enhancement Category: An issue proposing an enhancement or a PR with one. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 22, 2019
@shepmaster
Copy link
Member Author

src/liballoc/string.rs Outdated Show resolved Hide resolved
src/liballoc/string.rs Outdated Show resolved Hide resolved
src/liballoc/vec.rs Outdated Show resolved Hide resolved
@rust-highfive
Copy link
Collaborator

Your PR failed (pretty log, raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
2019-10-22T18:43:01.5589883Z ##[command]git remote add origin https://github.com/rust-lang/rust
2019-10-22T18:43:01.5780780Z ##[command]git config gc.auto 0
2019-10-22T18:43:01.5848013Z ##[command]git config --get-all http.https://github.com/rust-lang/rust.extraheader
2019-10-22T18:43:01.5906218Z ##[command]git config --get-all http.proxy
2019-10-22T18:43:01.6047752Z ##[command]git -c http.extraheader="AUTHORIZATION: basic ***" fetch --force --tags --prune --progress --no-recurse-submodules --depth=2 origin +refs/heads/*:refs/remotes/origin/* +refs/pull/65705/merge:refs/remotes/pull/65705/merge

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@shepmaster
Copy link
Member Author

I believe I've addressed the first round of review comments, thanks!

@RalfJung
Copy link
Member

LGTM! But let's wait for someone from @rust-lang/libs to also take a look.

@SimonSapin
Copy link
Contributor

What motivates this? Why is it needed or useful when as_mut_ptr + len + capacity + mem::forget (or maybe ManuallyDrop) already exist? What’s an example of non-trivial code that could use this? How does it affect rust-lang/rfcs#2756, if at all?

@rust-highfive
Copy link
Collaborator

Your PR failed (pretty log, raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
2019-10-22T20:23:01.2949724Z ##[command]git remote add origin https://github.com/rust-lang/rust
2019-10-22T20:23:01.3174449Z ##[command]git config gc.auto 0
2019-10-22T20:23:01.3241444Z ##[command]git config --get-all http.https://github.com/rust-lang/rust.extraheader
2019-10-22T20:23:01.3298086Z ##[command]git config --get-all http.proxy
2019-10-22T20:23:01.3481292Z ##[command]git -c http.extraheader="AUTHORIZATION: basic ***" fetch --force --tags --prune --progress --no-recurse-submodules --depth=2 origin +refs/heads/*:refs/remotes/origin/* +refs/pull/65705/merge:refs/remotes/pull/65705/merge

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@Lokathor
Copy link
Contributor

It's strikingly easy to get vec transmutation wrong. Just using the methods you listed in the wrong order is UB, for example.

So this lets you correctly dismantle a Vec into the parts that you need to use with from_raw_parts.

@scottmcm
Copy link
Member

Personally I like this -- I always have the same sort of wonders as the things that Ralf was commenting about when I do this, and getting things back in the same order as from_raw_parts is a nice push in the right direction to not accidentally put them back weirdly.

@SimonSapin
Copy link
Contributor

Just using the methods you listed in the wrong order is UB, for example.

Could you provide code examples that are UB with an explanation why?

This is the same question as in rust-lang/rfcs#2756 (comment) about Vec::transmute, which as far as I know hasn’t been addressed:

Though the code is not too large, there are a good number of things to get wrong – this solution was iteratively created by soundness-knowledgeable Rustaceans, with multiple wrong attempts.

Could you include text in the RFC that documents this progression? In particular, why is the naive approach unsound, and why is the final result correct?


I’m not entirely opposed to having this method. But it’s not obvious what problem(s) it solves, so I think it’s important to have those problems documented in details (ideally in the method’s doc-comment), so that that knowledge is not just in the collective mind of people who happen to have participated in a previous discussion.

@Lokathor
Copy link
Contributor

I... defer to @RalfJung to answer that one >_>

I do not remember the details well enough to be confident in saying at all.

@RalfJung
Copy link
Member

RalfJung commented Oct 23, 2019

@SimonSapin See rust-lang/rust-clippy#4484 and #64073 for some context. The only blessed way to "transmute" a Vec is to go via its raw parts. With the current API that is clumsy and, under stricter aliasing rules, easy to get wrong (see #62610 for an incorrect use of mem::forget).

The argument "why is this useful when you can already do this right now by calling 5 other functions in a row" would kill the vast majority of recent libstd additions...

Vec::into_raw_parts is the natural counterpart to Box::into_raw. It closes an expressiveness gap in Vec's API when compared with Box. After all, we also don't ask users to do a raw-ptr-cast followed by forgetting the Box.

@SimonSapin
Copy link
Contributor

and, under stricter aliasing rules, easy to get wrong

It’s still not clear to me how exactly things can go wrong or what part would be UB.

Again I’m not arguing against adding this method, only for proper documentation (not just in GitHub threads) of all those issues.

@RalfJung
Copy link
Member

It’s still not clear to me how exactly things can go wrong or what part would be UB.

See #62553 (comment) and #65705 (comment).

@shepmaster
Copy link
Member Author

shepmaster commented Oct 23, 2019

What’s an example of non-trivial code that could use this

The original case started in Deserialize a Vec<Foobar> as Vec directly when Foobar has exactly one field. There, I effectively did:

struct FoobarContainer<T> {
    values: Vec<T>,
}

#[repr(transparent)]
struct Foobar<T> {
    value: T,
}

let input: Vec<Foobar<T>> = unimplemented!();
let output: Vec<T> = mem::transmute(input);

Only by a whim of chance did I think to ask if that was safe and learned it was not (as Vec is repr(Rust)). The safe way of performing such a transmute is by decomposing the parts, casting the pointer, and recomposing them.

That currently looks like:

let mut v: Vec<Foobar<T>> = unimplemented!();

let values = unsafe {
    let data = v.as_mut_ptr() as *mut T;
    let len = v.len();
    let cap = v.capacity();

    std::mem::forget(v);

    Vec::from_raw_parts(data, len, cap)
};

With this new feature, it can look like

let v: Vec<Foobar<T>> = unimplemented!();

let values = unsafe {
    let (data, len, cap) = v.into_raw_parts();
    Vec::from_raw_parts(data as *mut T, len, cap)
};

That specific example is intended to be expressed in the example for Vec::into_raw_parts, using a simpler transmute.

when as_mut_ptr + len + capacity + mem::forget (or maybe ManuallyDrop) already exist

I agree with @RalfJung's points above

  • calling the right methods in the correct order
  • making sure we correctly use mutability and ManuallyDrop (see the two commits besides the one that adds into_raw_parts — the official docs were showing incorrect usage)

How does it affect rust-lang/rfcs#2756, if at all?

They are certainly motivated by similar concerns (I think the RFC may have even been opened due to the Clippy issue). I believe that there's some amount of benefit to both. Vec::transmute would address my original case here, but into_raw_parts would be nice to use for FFI concerns.

If I understand it correctly, Vec::transmute could be implemented in terms of into_raw_parts:

unsafe fn transmute<U>(self) -> Vec<U> {
    let (ptr, len, cap) = self.into_raw_parts();
    let ptr = mem::transmute::<*mut T, *mut U>(ptr);
    Vec::from_raw_parts(ptr, len, cap)
}

Then the original example would be

let mut v: Vec<Foobar<T>> = unimplemented!();
let values: Vec<T> = unsafe { v.transmute() };

Thanks for trying to ensure high-quality for standard library functions!

@SimonSapin
Copy link
Contributor

Alright. I’d like to accept this PR, with a tracking issue and some docs improvements:

  • Describe in prose (not just through examples) the components of the returned tuple. Two of them are usize but they are not interchangeable. This is a case where I wish the language had structural records so we could name those components, but oh well.
  • Explain that ownership of the allocation is transferred to the caller. Maybe some phrasing can be taken from docs for Box::into_raw.
  • Specify that (until Clarify guarantees for Vec and String allocation #65801 is done), from_raw_parts is the only valid way to deallocate or reallocate.

under stricter aliasing rules, easy to get wrong (see #62610 for an incorrect use of mem::forget).

I feel that something about this should maybe be part of the docs, but I’m not sure how to phrase it or if it really belongs there. @RalfJung what do you think?

@Centril
Copy link
Contributor

Centril commented Oct 25, 2019

  • This is a case where I wish the language had structural records so we could name those components, but oh well.

cc rust-lang/rfcs#2584.

@shepmaster
Copy link
Member Author

shepmaster commented Oct 25, 2019

Is it worth adding an example of using the raw parts of the String? My first thought was a case-conversion thingie:

    let ascii_case_difference = b'a' - b'A';
    for idx in 0..len {
        unsafe {
            *ptr.add(idx) -= ascii_case_difference;
        }
    }

But there's make_ascii_uppercase, so that's a bit silly of an example. Maybe just something that bumps letters by 1?

@shepmaster
Copy link
Member Author

Should these functions be associated functions like Box::into_raw?

@Lokathor
Copy link
Contributor

Box does that to avoid deref conflicts, I don't think it's necessary here.

@SimonSapin
Copy link
Contributor

Right, if Foo happens to have a into_raw method then it can also be called on Box<Foo> through auto-deref and we don’t want to shadow it.

Vec<Foo> however derefs to [Foo] which only has a limited set of methods (unless a trait is in scope that adds them, but this is enough of a stretch that we’re less worried about that), non of which is called into_raw_parts.

@SimonSapin
Copy link
Contributor

Looks good, thanks!

@bors r+

@bors
Copy link
Contributor

bors commented Oct 25, 2019

📌 Commit 6600cf6 has been approved by SimonSapin

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Oct 25, 2019
Centril added a commit to Centril/rust that referenced this pull request Oct 25, 2019
Add {String,Vec}::into_raw_parts

Aspects to address:

- [x] Create a tracking issue
  - rust-lang#65816
bors added a commit that referenced this pull request Oct 25, 2019
Rollup of 7 pull requests

Successful merges:

 - #63810 (Make <*const/mut T>::offset_from `const fn`)
 - #65705 (Add {String,Vec}::into_raw_parts)
 - #65749 (Insurance policy in case `iter.size_hint()` lies.)
 - #65799 (Fill tracking issue number for `array_value_iter`)
 - #65800 (self-profiling: Update measureme to 0.4.0 and remove non-RAII methods from profiler.)
 - #65806 (Add [T]::as_ptr_range() and [T]::as_mut_ptr_range().)
 - #65810 (SGX: Clear additional flag on enclave entry)

Failed merges:

r? @ghost
@shepmaster
Copy link
Member Author

Should these functions be associated functions like Box::into_raw?

I was mostly asking for consistency between these conceptually similar ideas, not because of an intrinsic need.

JohnTitor added a commit to JohnTitor/rust that referenced this pull request Oct 25, 2019
Add {String,Vec}::into_raw_parts

Aspects to address:

- [x] Create a tracking issue
  - rust-lang#65816
@RalfJung
Copy link
Member

I feel that something about this should maybe be part of the docs, but I’m not sure how to phrase it or if it really belongs there. @RalfJung what do you think?

(That was re: avoiding mem::forget.) The usual problem is where to put "negative docs" like this. I guess we could write something in the mem::forget docs?
But also, this is all speculative -- beause we don't know what the aliasing model will be, it's better to be conservative.

bors added a commit that referenced this pull request Oct 25, 2019
Rollup of 6 pull requests

Successful merges:

 - #65705 (Add {String,Vec}::into_raw_parts)
 - #65749 (Insurance policy in case `iter.size_hint()` lies.)
 - #65799 (Fill tracking issue number for `array_value_iter`)
 - #65800 (self-profiling: Update measureme to 0.4.0 and remove non-RAII methods from profiler.)
 - #65806 (Add [T]::as_ptr_range() and [T]::as_mut_ptr_range().)
 - #65810 (SGX: Clear additional flag on enclave entry)

Failed merges:

r? @ghost
@bors bors merged commit 6600cf6 into rust-lang:master Oct 25, 2019
@shepmaster shepmaster deleted the vec-into-raw branch October 26, 2019 01:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants