Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lint against instantly-dangling pointers like String::with_capacity(MAX_PATH).as_mut_ptr() #123613

Closed
FlorentinoJink opened this issue Apr 8, 2024 · 11 comments · Fixed by #128985
Assignees
Labels
A-lints Area: Lints (warnings about flaws in source code) such as unused_mut. C-discussion Category: Discussion or questions that doesn't represent real issues.

Comments

@FlorentinoJink
Copy link

FlorentinoJink commented Apr 8, 2024

Issue Description

The following Rust code:

const MAX_PATH: usize = 260;
fn main() {
    let str1 = String::with_capacity(MAX_PATH).as_mut_ptr();
    let str2 = String::from("TotototototototototototototototototoT").as_mut_ptr();
    unsafe {
        std::ptr::copy_nonoverlapping(str2, str1, 30);
        println!("{:?}", String::from_raw_parts(str1,30,30));
    }
}

uses the String::with_capacity function to create a string with a capacity of 260 characters. It then uses the as_mut_ptr method to get a raw pointer to the string.

However, this approach can lead to undefined behavior because:

The String::with_capacity function only guarantees to allocate enough memory to store the specified number of characters. It does not guarantee that the allocated memory is valid.
The as_mut_ptr method returns a raw pointer to the internal data of the string. This pointer may point to uninitialized memory or memory that has been invalidated by other operations.
Therefore, using the String::with_capacity(MAX_PATH).as_mut_ptr() method to create a raw pointer can lead to the following problems
-Program crashes
-Data corruption
-Security vulnerabilities

Expected Behavior
The String::with_capacity function should not allow the creation of a raw pointer to uninitialized memory.

Similar to String::from("x").as_mut_ptr(), such raw pointers should not be compiled successfully.

@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Apr 8, 2024
@workingjubilee
Copy link
Member

@FlorentinoJink The problem is not truly with_capacity or as_mut_ptr but your subsequent usage of copy_nonoverlapping and from_raw_parts on an invalid pointer. The program is perfectly sound until that point. Yes, invalid pointers are fairly useless, but they are often created for various reasons.

More unfortunately, a raw pointer does not have a lifetime. This is a fundamental reality of Rust. Code that requires eschewing lifetimes needs that. And you have chosen to eschew lifetimes, and then ignore the invariants of ptr::copy_nonoverlapping and String::from_raw_parts, which state that they require valid pointers as input. There are many functions on pointers that do not require valid pointers and in fact can be quite useful without them (though most of the uses I can immediately think of have now been superceded by offset_of!).

It is indeed an undesirable API but I do not see how we can realistically require it to not compile anymore, as this code, for instance, is perfectly reasonable, and also has no lifetime bound, for the same reason:

fn main() {
    let str1_len = String::from("does not live to the next line").len();
}

And you seem to claim this code does not compile? Or perhaps that was a "should not"?

fn main() {
    let str1 = String::from("x").as_mut_ptr();
}

Unfortunately, it does.

Ultimately, you have not fully explained how the safe functions in question are at fault, you have only specified how irresponsible use of unsafe code can cause this problem. It is undesirable that the current APIs are shaped as they are, and I would not have chosen them, to be sure. But I do not think we can simply break them, as we promise we do not alter the type signatures of stable interfaces.

@workingjubilee
Copy link
Member

The String::with_capacity function only guarantees to allocate enough memory to store the specified number of characters. It does not guarantee that the allocated memory is valid.

To be clear, it sure as hell does allocate valid memory, the problem is that the string doesn't last past the end of that expression so you created a pointer to some valid memory, and then it becomes invalid just as quickly.

@saethlin saethlin added C-discussion Category: Discussion or questions that doesn't represent real issues. A-lints Area: Lints (warnings about flaws in source code) such as unused_mut. and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Apr 8, 2024
@saethlin
Copy link
Member

saethlin commented Apr 8, 2024

We should have a lint against this pattern of creating an instantly-dangling pointer. I've seen this pattern mistakenly used in unit tests; it's a funny sort of canary for people who write unsafe code but didn't use ASan before publishing their work.

@saethlin saethlin changed the title String::with_capacity(MAX_PATH).as_mut_ptr() can lead to undefined behavior Lint against instantly-dangling pointers like String::with_capacity(MAX_PATH).as_mut_ptr() Apr 8, 2024
@FlorentinoJink
Copy link
Author

Thank you for your patient explanation. I will be more careful when using raw pointers in the future.

@workingjubilee
Copy link
Member

workingjubilee commented Apr 8, 2024

@saethlin Hmm. It would be worthwhile if possible but how would we do this without also linting on e.g. hand-implementations of offset_of! that exist in the wild? Do we only do it for standard library types, do we make the lint passes aware of deallocation...?

I suppose we could have the Rust wrapper of the system allocator contain debug assertions (e.g. this particular example winds up deallocating the same pointer twice for a double-free, the second time after the String::from_raw_parts call, which is recreated with an invalid layout: 30 is not a valid capacity to pass).

@saethlin
Copy link
Member

saethlin commented Apr 8, 2024

What I'm suggesting is just a lint against the specific pattern of calling a constructor function then a function called as_ptr or as_mut_ptr, then a semicolon. What I'm imagining targeting here is just the case where someone creates a String or a Vec but then their IDE suggests that they apply as_ptr or as_mut_ptr as a conversion function, because the IDE is stupid and just playing Type Tetris.

@saethlin
Copy link
Member

saethlin commented Apr 8, 2024

I do not think that the scenario as described in the initial issue description is a reasonable thing to detect and warn about in the frontend. We have Miri and ASan for that.

@jendrikw
Copy link
Contributor

jendrikw commented Apr 8, 2024

We already lint temporary-cstring-as-ptr, so linting similar situations with String and Vec don't seem far-fetched.

@GrigorenkoPV
Copy link
Contributor

What I'm suggesting is just a lint against the specific pattern of calling a constructor function then a function called as_ptr or as_mut_ptr, then a semicolon.

We already lint temporary-cstring-as-ptr, so linting similar situations with String and Vec don't seem far-fetched.

@rustbot claim

One question though, there is no simple way for a lint to check that a value is a temporary? Something like is_rvalue in C++ parlance. Or is there?

The current implementation of temporary-cstring-as-ptr seems to only match the CString::new(...).unwrap() or CString::new(...).expect(...). And trying to extend this to other containers by manually matching on all possible constructors does not sound that appealing. So I was wondering if there is a better way.

@workingjubilee
Copy link
Member

...do we have a concept of borrow-checker lints...? because it seems it would be really easy to tell using MIR "oh, this is a pointer to a dead thing."

@saethlin
Copy link
Member

That doesn't help for String::new().as_ptr(), the borrow checker has no idea that the pointer and String aren't connected, which is the point of that function.

bors added a commit to rust-lang-ci/rust that referenced this issue Aug 23, 2024
…ter, r=<try>

Lint against getting pointers from immediately dropped temporaries

Fixes rust-lang#123613

## Changes:
1. New lint: `dangling_pointers_from_temporaries`. Is a generalization of `temporary_cstring_as_ptr` for more types and more ways to get a temporary.
2. `temporary_cstring_as_ptr` is marked as renamed to `dangling_pointers_from_temporaries`.
3. `clippy::temporary_cstring_as_ptr` is marked as renamed to `dangling_pointers_from_temporaries`.
4. `core::cell::Cell` is now `rustc_diagnostic_item = "Cell"`

## TODO:
- [x] ~Add tests for different types~
- [x] ~Add tests for different ways of getting a temporary~
- [x] ~Regroup tests for this and `temporary_cstring_as_ptr`~
- [x] ~Fix a strange ICE when the lint is `allow`ed.~
- [x] ~Check what happens when you `break` with a bound variable from `loop`/`match`/`if`/`block`.~
- [x] ~Document the lint~
- [x] ~Fix tests rust-lang#128985 (comment)
- [x] ~Fix clippy~
- [x] ~Fix miri rust-lang#128985 (review)
- [ ] Crater run?

## Questions:
- [ ] Should we make something like `is_temporary_rvalue` (but not bogus) available in compiler?
- [x] ~Should `temporary_cstring_as_ptr` be deprecated? Across an edition boundary?~
- [ ] Instead of manually checking for a list of known methods, maybe add some sort of annotation to those methods in library and check for the presence of that annotation?
- [ ] Maybe even introduce some form of primitive lifetimes for pointers and check those in borrow-checker?

## Future improvements:
- [ ] Fix false negatives[^fn]
- [ ] Add suggestions rust-lang#128985 (comment)

## Known limitations:

### False negatives[^fn]:

[^fn]: lint **should** be emitted, but **is not**

- `temporary_unsafe_cell.get()`
- `temporary.field.as_ptr()`
- `temporary[index].as_ptr()`
- Converting `&temporary` to pointer with `as` or stuff like `ptr::from_ref`.
- The `&raw [mut] temporary`

### False positives[^fp]:

[^fp]: lint **should not** be emitted, but **is**

Both this lint and the already existing `temporary_cstring_as_ptr` will fire on code like this:
```rust
foo(CString::new("hello").unwrap().as_ptr())
```
even though using the resulting pointer inside of `foo` is completely fine (at least according to miri),
probably due to argument promotion logic.
bors added a commit to rust-lang-ci/rust that referenced this issue Sep 7, 2024
…ter, r=<try>

Lint against getting pointers from immediately dropped temporaries

Fixes rust-lang#123613

## Changes:
1. New lint: `dangling_pointers_from_temporaries`. Is a generalization of `temporary_cstring_as_ptr` for more types and more ways to get a temporary.
2. `temporary_cstring_as_ptr` is marked as renamed to `dangling_pointers_from_temporaries`.
3. `clippy::temporary_cstring_as_ptr` is marked as renamed to `dangling_pointers_from_temporaries`.
4. Fixed a false positive[^fp] for when the pointer is not actually dangling because of lifetime extension for function/method call arguments.
5. `core::cell::Cell` is now `rustc_diagnostic_item = "Cell"`

## TODO:
- [x] ~Add tests for different types~
- [x] ~Add tests for different ways of getting a temporary~
- [x] ~Regroup tests for this and `temporary_cstring_as_ptr`~
- [x] ~Fix a strange ICE when the lint is `allow`ed.~
- [x] ~Check what happens when you `break` with a bound variable from `loop`/`match`/`if`/`block`.~
- [x] ~Document the lint~
- [x] ~Fix tests rust-lang#128985 (comment)
- [x] ~Fix clippy~
- [x] ~Fix miri rust-lang#128985 (review)
- [x] [Crater run](https://crater.rust-lang.org/ex/pr-128985)
- [x] Put a comprehensive list of known false negatives[^fn] into comments.
- [ ] Instead of diagnostic items, maybe use lang items or some special attributes? rust-lang#128985 (comment)

## Questions:
- [ ] Should we make something like `is_temporary_rvalue` (but not bogus) available in compiler?
- [x] ~Should `temporary_cstring_as_ptr` be deprecated? Across an edition boundary?~
- [ ] Instead of manually checking for a list of known methods, maybe add some sort of annotation to those methods in library and check for the presence of that annotation?
- [ ] Maybe even introduce some form of primitive lifetimes for pointers and check those in borrow-checker?

## Future improvements:
- [ ] Fix false negatives[^fn]
- [ ] Add suggestions rust-lang#128985 (comment)

## Known limitations:

### False negatives[^fn]:

See the comments in `compiler/rustc_lint/src/dangling.rs`

[^fn]: lint **should** be emitted, but **is not**

[^fp]: lint **should not** be emitted, but **is**
@bors bors closed this as completed in a9d1762 Oct 29, 2024
flip1995 pushed a commit to flip1995/rust that referenced this issue Nov 7, 2024
…ter, r=Urgau

Lint against getting pointers from immediately dropped temporaries

Fixes rust-lang#123613

## Changes:
1. New lint: `dangling_pointers_from_temporaries`. Is a generalization of `temporary_cstring_as_ptr` for more types and more ways to get a temporary.
2. `temporary_cstring_as_ptr` is removed and marked as renamed to `dangling_pointers_from_temporaries`.
3. `clippy::temporary_cstring_as_ptr` is marked as renamed to `dangling_pointers_from_temporaries`.
4. Fixed a false positive[^fp] for when the pointer is not actually dangling because of lifetime extension for function/method call arguments.
5. `core::cell::Cell` is now `rustc_diagnostic_item = "Cell"`

## Questions:
- [ ]  Instead of manually checking for a list of known methods and diagnostic items, maybe add some sort of annotation to those methods in library and check for the presence of that annotation? rust-lang#128985 (comment)

## Known limitations:

### False negatives[^fn]:

See the comments in `compiler/rustc_lint/src/dangling.rs`

1. Method calls that are not checked for:
   - `temporary_unsafe_cell.get()`
   - `temporary_sync_unsafe_cell.get()`
2. Ways to get a temporary that are not recognized:
   - `owning_temporary.field`
   - `owning_temporary[index]`
3. No checks for ref-to-ptr conversions:
   - `&raw [mut] temporary`
   - `&temporary as *(const|mut) _`
    - `ptr::from_ref(&temporary)` and friends

[^fn]: lint **should** be emitted, but **is not**

[^fp]: lint **should not** be emitted, but **is**
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-lints Area: Lints (warnings about flaws in source code) such as unused_mut. C-discussion Category: Discussion or questions that doesn't represent real issues.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants