Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initializing a zero-length array leaks the initializer #74836

Closed
smarnach opened this issue Jul 27, 2020 · 20 comments · Fixed by #95953 or rust-lang/reference#1243
Closed

Initializing a zero-length array leaks the initializer #74836

smarnach opened this issue Jul 27, 2020 · 20 comments · Fixed by #95953 or rust-lang/reference#1243
Labels
A-destructors Area: Destructors (`Drop`, …) A-MIR Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html C-bug Category: This is a bug. P-low Low priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@smarnach
Copy link
Contributor

This bug was reported on StackOverflow today. Filipe Rodrigues noticed that the initializer of a zero-length array is leaked:

#[derive(PartialEq, Debug)]
struct A;

impl Drop for A {
    fn drop(&mut self) {
        println!("Dropping A");
    }
}

fn main() {
    let vec: Vec<A> = vec![];
    let a = A;
    assert_eq!(vec, [a; 0]);
}

The code above does not run drop() for a, while I'd expect that it should. According to a comment by @mcarton, this regression occurred between Rust 1.11.0 and 1.12.0, i.e. when MIR code generation was enabled by default.

@smarnach smarnach added the C-bug Category: This is a bug. label Jul 27, 2020
@jonas-schievink jonas-schievink added A-destructors Area: Destructors (`Drop`, …) A-MIR Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html I-prioritize Issue: Indicates that prioritization has been requested for this issue. regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jul 27, 2020
@LeSeulArtichaut
Copy link
Contributor

LeSeulArtichaut commented Jul 27, 2020

Minimized:

struct A;

impl Drop for A {
    fn drop(&mut self) {
        println!("Dropping A");
    }
}

fn main() {
    let a = A;
    let _ = [a; 0];
}

@Dylan-DPC-zz Dylan-DPC-zz added P-low Low priority and removed I-prioritize Issue: Indicates that prioritization has been requested for this issue. labels Jul 27, 2020
@Dylan-DPC-zz
Copy link

Marked as p-low as per the discussion

@Aaron1011
Copy link
Member

This is a really weird corner case. If we modify the example slightly:

struct A;

impl Drop for A {
    fn drop(&mut self) {
        println!("Dropping A");
    }
}

fn foo() -> [A; 0] {
    let a = A;
    [a; 0]
}

fn main() {
    foo();
}

them [a; 0] still moves a (a cannot be used again). However, the storage for a is destroyed once foo returns, so we would have to run the destructor when foo returns, not when [a; 0] is dropped. Effectively, the borrow-checker and the MIR drop logic disagree about who owns a.

@Aaron1011
Copy link
Member

Regardless of what we end up doing, I think we should add a lint for this kind of scenario, as I think whatever behavior we use is going to be surprising.

Aaron1011 added a commit to Aaron1011/rust that referenced this issue Jul 28, 2020
Currently, writing a zero-length array repeat expression (e.g.
`[String::new(); 0]`) will cause the initializer value to be leaked.
See rust-lang#74836 for more details

There are three ways that we could potentially handle this case:

1. Run the destructor immediately after 'constructing' the zero-length
   array.
2. Run the destructor when the initializer value would otherwise be
   dropped (i.e. at the end of the lexical scope)
3. Keep the current behavior and continue to leak to the initializer

Note that the 'obvious' choice (running the destructor when the array is
dropped) does not work, since a zero-length array does not actually
store the value we want to drop.

All of these options seem likely to be surprising and inconsistent
to users. Since any fully monomorphic constant can be used as the repeat
length, this has the potential to cause confusing 'action at a distance'
(e.g. changing a `const` from 0 to 1 results in drop order changing).

Regardless of which option we decide on, we should tell users
to use an empty array (`[]`) instead.

This commit adds a new lint ZERO_REPEAT_WITH_DROP, which fires when a
type that (potentially) has a destructor is used in a zero-length array
repeat expression.

If a type has no drop glue, we skip linting, since dropping it has no
user-visible side effects. If we do not know if a type has drop glue
(e.g. `Option<T>`), we lint anyway, since some choice of generic
arguments could trigger the strange drop behavior.

If the length const expression is not fully monomorphic (e.g. contains a
generic parameter), the compiler requires the type used in the repeat
expression to be `Copy`. Therefore, we don't need to lint in this case,
since a `Copy` type can never have drop glue.
@SimonSapin
Copy link
Contributor

However, the storage for a is destroyed once foo returns, so we would have to run the destructor when foo returns, not when [a; 0] is dropped.

I would not expect a destructor to run when foo returns and a goes out of scope, because the value in a has been moved. If we did not have array literal syntax and used a constructor instead, the value in a would clearly be moved in the call and its ownership transferred to the constructor:

fn foo() -> [A; 0] {
    let a = A;
    <[A; 0]>::new(a)
}

When the array length is zero the repeating array "constructor" happens not to use the repeated value so it should be responsible for dropping it.

@RalfJung
Copy link
Member

When the array length is zero the repeating array "constructor" happens not to use the repeated value so it should be responsible for dropping it.

Agreed. Dropping should happen at the location of [expr; 0].

I am not sure a lint is the right solution here? If we want to change MIR building (this looks like a clear bug to me), I do not see what a lint would achieve.

@RalfJung
Copy link
Member

Hm, why does this even build? Usually array initializers only work for Copy types.

Is there a special case for when the array length is <= 1? I guess then we also need a special case during drop elaboration (or so) for when the array length is exactly 0?

Cc @matthewjasper @nikomatsakis (not sure who else to ping for MIR stuff)

@Lucretiel
Copy link
Contributor

Hm, why does this even build? Usually array initializers only work for Copy types.

Is there a special case for when the array length is <= 1? I guess then we also need a special case during drop elaboration (or so) for when the array length is exactly 0?

Cc @matthewjasper @nikomatsakis (not sure who else to ping for MIR stuff)

Yup, it appears that [value; 1] is allowed to simply move a non-copy into an array: playground. Definitely a bit odd but I can certainly understand why this'd be the case.

@tmiasko
Copy link
Contributor

tmiasko commented Dec 2, 2020

I looked into fixing this by teaching parts of compiler that perform move based reasoning about special status of Rvalue::Repeat where length evaluates to zero. I had to change the move data gather to fix drop elaboration; transfer function used by const qualification so that it detects drop of a temporary in const fn g() -> [String; 0] { [String::new(); 0] }. Also, a few data flow analysis likely require changes as well, in particular MaybeInitializedLocals and MaybeRequiresStorage.

Though, maybe that particular case of repeat should be lowered differently when building MIR so that it can be treated more uniformly later on?

@RalfJung
Copy link
Member

RalfJung commented Dec 2, 2020

@tmiasko thanks for looking into this!

Once we are past drop elaboration, no special cases should be needed any more. But I am not sure how early/late in the pipeline that happens. Your list is somewhat concerning though, this sounds like it'd be very easy to miss a place and introduce a subtle bug.

OTOH, we cannot really use different MIR for this -- with const generics, [expr; N] has to work. Though actually... that can only compile when expr: impl Copy. So maybe we can have a special case. Maybe we can make Rvalue::Repeat require Copy, and use a different, more specialized primitive for the non-Copy case? I wonder how that interacts with [const_expr; N] though.

@tmiasko
Copy link
Contributor

tmiasko commented Dec 2, 2020

To give a little more context to earlier comment, consider a MIR that before the drop elaboration looks as follows:

 _0 = [move _1; 0];
 drop(_1) -> bb2;

Currently, the local _1 is considered to be moved from and the drop elaboration removes the drop(_1). The fix explored in #74836 (comment), was to change move data so that the drop remains. This is not sufficient though, because Operand::Move remains and some analysis assign a special meaning to it. Consequently they require similar adjustments. For example, in const qualification Operand::Moveed locals are assumed not to need a drop. In MaybeRequiresStorage they are assumed not to require a storage if they have not been borrowed.

@Aaron1011
Copy link
Member

I wonder if it would be simpler to modify the THIR -> MIR lowering to treat [foo; 0] as { foo; [] }, and require that the MIR Rvalue::Repeat have a non-zero const value.

@RalfJung
Copy link
Member

RalfJung commented Dec 2, 2020

to treat [foo; 0] as { foo; [] }

Yeah, I wondered the same -- so this sounds like a promising idea.

and require that the MIR Rvalue::Repeat have a non-zero const value.

That cannot work; for Copy types N might be a const generic so during THIR -> MIR lowering we cannot say if it is 0.
But we could require that non-Copy Rvalue::Repeat have a non-zero const value (except const array repeat expressions might break this so they could be another exception).

@tmiasko
Copy link
Contributor

tmiasko commented Dec 2, 2020

I wondered about that too. So far I noticed that it would require additional changes to take into account lifetime extension / promotion, as direct implementation would allow:

fn main() {
    let _: &'static [String; 0] = &[String::new(); 0];
}

@RalfJung
Copy link
Member

RalfJung commented Dec 2, 2020

Ah, interesting. We could probably allow this if we really wanted, or we could use a slightly different desugaring such as (foo, []).1.

@MikailBag
Copy link
Contributor

Shouldn't this be I-unsound as similar to #47949?

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=aa5afcbb47e59595dc6ced846671c47f
is an adaptation of #47949 (comment) for this bug.

bors added a commit to rust-lang-ci/rust that referenced this issue Dec 21, 2020
Acknowledge that `[CONST; N]` is stable

When `const_in_array_repeat_expressions` (RFC 2203) got unstably implemented as part of rust-lang#61749, accidentally, the special case of repeating a *constant* got stabilized immediately. That is why the following code works on stable:

```rust
const EMPTY: Vec<i32> = Vec::new();

pub const fn bar() -> [Vec<i32>; 2] {
    [EMPTY; 2]
}

fn main() {
    let x = bar();
}
```

In contrast, if we had written `[expr; 2]` for some expression that is not *literally* a constant but could be evaluated at compile-time (e.g. `(EMPTY,).0`), this would have failed.

We could take back this stabilization as it was clearly accidental. However, I propose we instead just officially accept this and stabilize a small subset of RFC 2203, while leaving the more complex case of general expressions that could be evaluated at compile-time unstable. Making that case work well is pretty much blocked on inline `const` expressions (to avoid relying too much on [implicit promotion](https://github.com/rust-lang/const-eval/blob/master/promotion.md)), so it could take a bit until it comes to full fruition. `[CONST; N]` is an uncontroversial subset of this feature that has no semantic ambiguities, does not rely on promotion, and basically provides the full expressive power of RFC 2203 but without the convenience (people have to define constants to repeat them, possibly using associated consts if generics are involved).

Well, I said "no semantic ambiguities", that is only almost true... the one point I am not sure about is `[CONST; 0]`. There are two possible behaviors here: either this is equivalent to `let x = CONST; [x; 0]`, or it is a NOP (if we argue that the constant is never actually instantiated). The difference between the two is that if `CONST` has a destructor, it should run in the former case (but currently doesn't, due to rust-lang#74836); but should not run if it is considered a NOP. For regular `[x; 0]` there seems to be consensus on running drop (there isn't really an alternative); any opinions for the `CONST` special case? Should this instantiate the const only to immediately run its destructors? That seems somewhat silly to me. After all, the `let`-expansion does *not* work in general, for `N > 1`.

Cc `@rust-lang/lang` `@rust-lang/wg-const-eval`
Cc rust-lang#49147
@sivizius
Copy link

sivizius commented Jan 22, 2021

Is a [ T; 0 ] of any use anyway? I mean, the size is 0 * sizeof(T) = 0, so effectively a () or struct U;/struct U {}. [ T; 0 ] might be a result of a typo in [ T; 10 ] or something like that, so why not a hard error for that?

@SimonSapin
Copy link
Contributor

The [T; 0] type can be very useful (for example to unsize it to an empty slice of type &[T]), but it’s typically created with a [] expression.

This issue is about [sub_expression; 0] expressions which are indeed much less useful, but introducing new hard errors is a very big hammer and not necessary here. We only need to change such expressions from behaving like { std::mem::forget(sub_expression); [] } to behaving like { std::mem::drop(sub_expression); [] }.

@steffahn
Copy link
Member

steffahn commented Dec 9, 2021

I’m unfamiliar with the compiler-internal mechanisms that are relevant here. From a user-perspective, the situation is – in my opinion – as follows

  • for [CONST; N], it feels unintuitive that the case N == 0 would be special in instantiating the constant a different number than exactly N times. I.e. I don’t think, [CONST; 0] should create an instance of CONST (with the effect of calling the destructor once, too, if it has one).
  • considering the case with constants described above, it does feel wrong that [x; 0] would move and drop x. It if did, wouldn’t it also have to do this with constants to be consistent? I feel like it’s hard to find a convincing consistend approach here in general.
  • afaict, the fact that [EXPR; 0] and [EXPR; 1] work with non-Copy-and-non-const expressions it entirely useless in practice. You could always also write [] or [EXPR] instead!
    • The only use-case of this that I can think of is an elsewhere-defined const-item FOO that’s specified to be either zero or one, but you don’t want to rely on which one of the two values it has, e.g. because the defining crate wouldn’t consider a change a breaking change. Then [EXPR; FOO] could make sense and can’t easily be replaced by [] or [EXPR]; however match FOO { 0 => [], 1 => [EXPR], _ => unreachable!() } would still be a valid fix/replacement in this rare situation. Or match FOO { 0 => {drop(EXPR); []}, 1 => [EXPR], _ => unreachable!() } if that’s the intended behavior of the author.

My personal conclusion would be to entirely remove the special capabilities of [EXPR; 0] and [EXPR; 1] and make sure they only compile successfully whenever [EXPR; 2] would do so, too.

In order to avoid breakage, one could

  • introduce a warning in the current edition for any usage of [EXPR; 0] or [EXPR; 1] (or [EXPR; N] where N is a constant that’s zero or one)
    • whenever EXPR has a non-Copy type and is not the path of a const item
  • make the warning a deny-by-default lint or a hard error in future editions
  • also introduce a stronger (version of the) warning, that points our the concrete dangers of leaks, for the case of [EXPR; 0] (or [EXPR; N] where N is a constant that’s zero)
    • whenever EXPR has a non-Copy type and is not the path of a const item
    • or alternatively, at least whenever EXPR has a type that is not known to be free of drop-glue

Dylan-DPC added a commit to Dylan-DPC/rust that referenced this issue May 24, 2022
Modify MIR building to drop repeat expressions with length zero

Closes rust-lang#74836 .

Previously, when a user wrote `[foo; 0]` we used to simply leak `foo`. The goal is to fix that. This PR changes MIR building to make `[foo; 0]` equivalent to `{ drop(foo); [] }` in all cases. Of course, this is a breaking change (see below). A crater run did not indicate any regressions though, and given that the previous behavior was almost definitely not what any user wanted, it seems unlikely that anyone was relying on this.

Note that const generics are in general unaffected by this. Inserting the extra `drop` is only meaningful/necessary when `foo` is of a non-`Copy` type, and array repeat expressions with const generic repetition count must always be `Copy`.

Besides the obvious change to behavior associated with the additional drop, there are three categories of examples where this also changes observable behavior. In all of these cases, the new behavior is consistent with what you would get by replacing `[foo; 0]` with `{ drop(foo); [] }`. As such, none of these give the user new powers to express more things.

**No longer allowed in const (breaking)**:

```rust
const _: [String; 0] = [String::new(); 0];
```

This compiles on stable today. Because we now introduce the drop of `String`, this no longer compiles as `String` may not be dropped in a const context.

**Reduced dataflow (non-breaking)**:

```rust
let mut x: i32 = 0;
let r = &x;
let a = [r; 0];
x = 5;
let _b = a;
```

Borrowck rejects this code on stable because it believes there is dataflow between `a` and `r`, and so the lifetime of `r` has to extend to the last statement. This change removes the dataflow and the above code is allowed to compile.

**More const promotion (non-breaking)**:

```rust
let _v: &'static [String; 0] = &[String::new(); 0];
```

This does not compile today because `String` having drop glue keeps it from being const promoted (despite that drop glue never being executed). After this change, this is allowed to compile.

### Alternatives

A previous attempt at this tried to reduce breakage by various tricks. This is still a possibility, but given that crater showed no regressions it seems unclear why we would want to introduce this complexity.

Disallowing `[foo; 0]` completely is also an option, but obviously this is more of a breaking change. I do not know how often this is actually used though.

r? `@oli-obk`
@bors bors closed this as completed in 11faf2e May 25, 2022
@pwnorbitals
Copy link

If this is closed, the warning update on this page might be outdated : https://doc.rust-lang.org/reference/expressions/array-expr.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-destructors Area: Destructors (`Drop`, …) A-MIR Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html C-bug Category: This is a bug. P-low Low priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet