Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecating UnwindSafe #3260

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Deprecating UnwindSafe #3260

wants to merge 2 commits into from

Conversation

nico-abram
Copy link

@nico-abram nico-abram commented May 4, 2022

This RFC proposes to deprecate UnwindSafe.

Rendered RFC
Pre-RFC discussion on internals

@BurntSushi
Copy link
Member

BurntSushi commented May 4, 2022

Thanks for writing this RFC.

As it stands, I don't think this RFC has enough detail in it yet. Most of the motivation section, for example, describes things that were explicitly addressed and known about in the RFC that stabilized catch_panic and the RFC that added the UnwindSafe bound to catch_panic. In particular, the background discusses the utility of "unwind safety" beyond just memory safety:

These mitigations all address the second aspect of exception unsafety: observation of broken invariants. With the tactics in place, it ends up being the case that safe Rust code can largely ignore exception safety concerns. That being said, it does not mean that safe Rust code can always ignore exception safety issues. There are a number of methods to subvert the mitigation strategies listed above:

  1. When poisoning data across threads, antidotes are available to access poisoned data. Namely the PoisonError type allows safe access to the poisoned information.
  2. Single-threaded types with interior mutability, such as RefCell, allow for sharing data across stack frames such that a broken invariant could eventually be observed.
  3. Whenever a thread panics, the destructors for its stack variables will be run as the thread unwinds. Destructors may have access to data which was also accessible lower on the stack (such as through RefCell or Rc) which has a broken invariant, and the destructor may then witness this.

But all of these "subversions" fall outside the realm of normal, idiomatic, safe Rust code, and so they all serve as a "heads up" that panic safety might be an issue. Thus, in practice, Rust programmers worry about exception safety far less than in languages with full-blown exceptions.

Despite these methods to subvert the mitigations placed by default in Rust, a key part of exception safety in Rust is that safe code can never lead to memory unsafety, regardless of whether it panics or not. Memory unsafety triggered as part of a panic can always be traced back to an unsafe block.

In order to deprecate and effectively remove the notion of protecting against exception safety in Rust, I would expect, at minimum, that this RFC provide a comprehenseive argument against it in terms of the RFC that introduced it in the first place.

And with respect to the idea that "Unwind safety is not actually related to safety," it's important to note that we were under no illusions about this when we stabilized catch_panic. See also. Now to be fair, there was a substantial argument about whether to label catch_panic as unsafe. We weren't as fully committed to "unsafe is only for UB" back then I think, or at least, I don't think it had cemented into the shared values of all project participants yet, unlike today. And indeed, catch_panic (among other things, like the "leak apocalypse") is likely one of the drivers that led to the crisper definition of unsafe that we have today.

Responding to the other part of the motivation:

It can also be problematic when a type does not implement the marker trait, but it could, notably with trait objects (See discussion in rust-lang/rust#40628). It can also be a pain point for library authors, who are not sure if they should add a bound on them for their generic types to guarantee their types are UnwindSafe, which would make their downstream users sometimes have to use AssertUnwindSafe despite not using catch_unwind just to satisfy the bounds.

As someone who has been bitten by precisely this, I agree that there is pain here. But what I don't understand is why this means we have to deprecate the notion of exception safety itself. I suppose the obvious question is: how does the pain that library authors feel from this compare with the pain of writing code that isn't exception safe and getting bitten by it? To be fair, that seems like a very hard question to answer.

@ehuss ehuss added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label May 4, 2022
@glaebhoerl
Copy link
Contributor

One useful piece of data would be what proportion of catch_unwind calls are/aren't using AssertUnwindSafe. The places where it's not being used are where the mechanism is providing value, by making it statically apparent that there's no potential exception safety issue. The places where it is being used, it's effectively just a HereBeDragons sign -- a purpose which, in its absence, could probably be served just as well by the catch_unwind call itself. (A followup question would be what proportion of AssertUnwindSafe use sites need to be using it, or for that matter, are using it correctly, but that seems harder to determine at the level of data.)

Whatever the benefit is, it can then be weighed against the cost. If the cost is just having to use AssertUnwindSafe in some subset of catch_unwind calls, it sounds small to me, given the presumable infrequency of those calls. In that light, I'm actually slightly confused about the enmity UnwindSafe has gathered: is catch_unwind used more often than I think? Or is the main cost in other areas?

From rust-lang/rust#40628, it seems like probably the latter: apparently people who aren't using catch_unwind themselves often need to take on UnwindSafe-related hassle. Could someone give more details or link an example of the kind of situation this currently arises in? (In the afore-linked thread, one pain point mentioned is that dyn Trait<Foo + RefUnwindSafe> couldn't even be written, but trying it out in the playground just now, that seems to have been fixed.)

Another complaint from that thread is that Send and Sync ought to logically imply UnwindSafe and RefUnwindSafe (respectively), but don't. How much of the pain does this account for -- would fixing it make most of the problems go away?

Anyway, it seems like another option alongside or subsuming some of those in the RFC would be "try to fix the problems with UnwindSafe", and address the specific complaints that people have raised:

  • Make Send imply UnwindSafe and Sync imply RefUnwindSafe
  • Improve the documentation... I'm not "a docs person", but leading with a quick concrete example (in code) of how things can go wrong as motivation, a brief segue laying out the contrast against normal safe code (where exception safety isn't an issue) and unsafe code (a different beast), and then introducing the solution (and how it catches the problem in the original example), seems like it could be a better structure.
  • Perhaps rename it, since having "Safe" in the name seems to be a persistent source of confusion. Maybe this could be done backwards compatibly if we let the standard library use a trait alias, or with a pub use self::UnwindSafe as NewName? (I'm not sure what the mechanics of that are.)

@dpc
Copy link

dpc commented May 12, 2022

It's funny to see this mentioned in TWIR 5 years after I wrote rust-lang/rust#40628 . Personally I don't remember having to fight with unwind safety in these few years, but since nothing changed I still think it's kind of a broken feature.

I agree that this RFC could include more content, but generally I agree with the motion. I went through the history of these few slow-paced threads again, and I don't think anyone is all that into UnwindSafe. Mostly everyone acknowledge the issues, there are some opinions on what might be able to make it better, but that's it.

It's mostly just a lint. I did it many times in my code, that I though "wouldn't it be nice to have a type system enforcing some logical invariant", introduced it, and then decided that it's not worth the the type-system trouble. Sometimes it works and is worth it, sometimes it isn't. I think here the invariant has to be deal with so rarely in an idiomatic code, that it's just not worth the trouble.

There's multiple features that I would rather see Rust language developers spend their precious time on than trying to make this one actually work well, and since a lot of people suggests that we probably want to rename it anyway because it has nothing to do with safety, I think it's easiest to deprecate it completely for now, and if we ever really want to, we can introduce it from scratch, under new name. I guess it wouldn't buy us much if anything to try to fix + rename it, skipping deprecation.

@LegionMammal978
Copy link
Contributor

LegionMammal978 commented Jun 20, 2022

Another complaint from that thread is that Send and Sync ought to logically imply UnwindSafe and RefUnwindSafe (respectively), but don't.

Why should this logical implication hold? As I understand it, Send and Sync mean that an object can be sent or shared across threads without any of its interfaces causing UB. Meanwhile, UnwindSafe and RefUnwindSafe mean that an object can be sent or shared across an unwind boundary without silently exposing logic errors to the outside, which is a strictly stronger condition. Since inconsistent states from unwinding are not allowed to result in UB, not all !UnwindSafe types are !Send, and not all !RefUnwindSafe types are !Sync. For instance, a non-poisoning Mutex would be Sync + !RefUnwindSafe when it holds a Send value, since unwinding can expose logic errors but does not cause UB. This perspective has apparently been expressed before in this comment on rust-lang/rust#54768, to no reply.

@bjorn3
Copy link
Member

bjorn3 commented Jun 20, 2022

If you share an object between threads, you are already crossing unwind boundaries as unwinding stops at thread boundaries. IMO only Sync should imply UnwindSafe, as for Send sharing requires a Mutex, which implements poisoning to cause panics to propagate by default.

@LegionMammal978
Copy link
Contributor

LegionMammal978 commented Jun 21, 2022

If you share an object between threads, you are already crossing unwind boundaries as unwinding stops at thread boundaries. IMO only Sync should imply UnwindSafe, as for Send sharing requires a Mutex, which implements poisoning to cause panics to propagate by default.

The issue is, currently, neither Send nor Sync requires poisoning or panic safety. For example, parking_lot::Mutex, a popular implementation of lock_api::Mutex, is !RefUnwindSafe, since it does not implement any poisoning and always allows the user to silently retrieve the value. Thus, I suspect that any enforced Send: UnwindSafe or Sync: RefUnwindSafe relationship would either hamper Send/Sync or make [Ref]UnwindSafe meaningless.

I do not understand the reasoning for why SyncUnwindSafe should hold. The type &mut i32 is Sync, since &&mut i32 can be safely shared across threads; however, it is !UnwindSafe, since it can trivially leak data past a catch_unwind() boundary. Could you elaborate on your reasoning?

@bjorn3
Copy link
Member

bjorn3 commented Jun 21, 2022

For example, parking_lot::Mutex, a popular implementation of lock_api::Mutex, is !RefUnwindSafe, since it does not implement any poisoning and always allows the user to silently retrieve the value.

I don't really like that it doesn't implement poisoning. It makes it way to easy to accidentally continue execution when a thread crashed. (Deliberately continuing is fine, but only deliberately)

The type &mut i32 is Sync, since &&mut i32 can be safely shared across threads; however, it is !UnwindSafe, since it can trivially leak data past a catch_unwind() boundary. Could you elaborate on your reasoning?

&&mut T is effectively equivalent to &&T, which should implement UnwindSafe for T: RefUnwindSafe (it doesn't currently as there is no impl <T: RefUnwindSafe> RefUnwindSafe for &T right now for some reason). I think &mut T should implement RefUnwindSafe, but not UnwindSafe. Just like it implements Sync and not Send.

@LegionMammal978
Copy link
Contributor

LegionMammal978 commented Jun 21, 2022

&&mut T is effectively equivalent to &&T, which should implement UnwindSafe for T: RefUnwindSafe (it doesn't currently as there is no impl <T: RefUnwindSafe> RefUnwindSafe for &T right now for some reason). I think &mut T should implement RefUnwindSafe, but not UnwindSafe. Just like it implements Sync and not Send.

I think I might not have made myself clear. &mut i32 is Sync, but it is not UnwindSafe. In your earlier comment, you suggested that T: Sync should always imply T: UnwindSafe, which is contrary to this example. I don't think such an implication makes much sense.


As an exercise, I've made a table with an example of (nearly) every combination of Send, Sync, UnwindSafe, and RefUnwindSafe. Recall that std::sync types implement poisoning, but parking_lot types do not.

 Send +  Sync +  USafe +  RUSafe: i32
 Send +  Sync +  USafe + !RUSafe: parking_lot::Mutex<i32>
 Send +  Sync + !USafe +  RUSafe: &mut i32
 Send +  Sync + !USafe + !RUSafe: &parking_lot::Mutex<i32>
 Send + !Sync +  USafe +  RUSafe: sync::RwLock<Cell<i32>>
 Send + !Sync +  USafe + !RUSafe: Cell<i32>
 Send + !Sync + !USafe +  RUSafe: &mut sync::RwLock<Cell<i32>>
 Send + !Sync + !USafe + !RUSafe: &mut Cell<i32>
!Send +  Sync +  USafe +  RUSafe: sync::MutexGuard<'_, i32>
!Send +  Sync +  USafe + !RUSafe: {?}
!Send +  Sync + !USafe +  RUSafe: &mut sync::MutexGuard<'_, i32>
!Send +  Sync + !USafe + !RUSafe: parking_lot::MutexGuard<'_, i32>
!Send + !Sync +  USafe +  RUSafe: Rc<i32>
!Send + !Sync +  USafe + !RUSafe: Cell<Rc<i32>>
!Send + !Sync + !USafe +  RUSafe: &mut Rc<i32>
!Send + !Sync + !USafe + !RUSafe: &Cell<i32>

Perhaps this might help others establish the difference. Personally, I don't see much of a pattern, apart from &mut making a type !USafe but otherwise leaving it alone. Something that seems very odd to me is that sync::Mutex<T> and sync::RwLock<T> are USafe and RUSafe, even when T is !USafe. This means that data can be leaked through a sync::Mutex<&mut T> by writing to the reference, since the poisoning would not carry through to the T itself.

@bjorn3
Copy link
Member

bjorn3 commented Jun 22, 2022

I think I might not have made myself clear. &mut i32 is Sync, but it is not UnwindSafe. In your earlier comment, you suggested that T: Sync should always imply T: UnwindSafe, which is contrary to this example. I don't think such an implication makes much sense.

Right, Sync should imply RefUnwindSafe.

@glaebhoerl
Copy link
Contributor

@LegionMammal978

Why should this logical implication hold? As I understand it, Send and Sync mean that an object can be sent or shared across threads without any of its interfaces causing UB. Meanwhile, UnwindSafe and RefUnwindSafe mean that an object can be sent or shared across an unwind boundary without silently exposing logic errors to the outside, which is a strictly stronger condition. Since inconsistent states from unwinding are not allowed to result in UB, not all !UnwindSafe types are !Send, and not all !RefUnwindSafe types are !Sync. For instance, a non-poisoning Mutex would be Sync + !RefUnwindSafe when it holds a Send value, since unwinding can expose logic errors but does not cause UB. This perspective has apparently been expressed before in rust-lang/rust#54768 (comment) on rust-lang/rust#54768, to no reply.

Thank you for raising this point so clearly! Amusingly, when I went back just now to look for the old discussions about making the Atomic types implement RefUnwindSafe, which I recall as having been edifying, I found someone else making more or less the same point you are now, and that someone was me. I have no recollection of this.

There are two different kinds of justifications for why a type should be (Ref)UnwindSafe, which come from something like opposite directions. One of them is the straightforward and intuitive one: that the type doesn't let you observe broken invariants across a catch_unwind() boundary. And indeed, there are many Send/Sync types which don't also satisfy this criterium.

The less intuitive one comes from how the types are used, with the observation that using a type in a concurrency-safe way is harder than using it in an unwind-safe1 way, and any code that's concurrency-safe is therefore inherently unwind-safe as well. The Atomic types are a particularly vivid demonstration. What kind of code wants to use Atomic types directly? Subtle and complicated lock-free algorithms. Why are those algorithms subtle and complicated? Because any thread might read or modify the Atomic variable at any point, so to maintain consistency, all of its associated invariants have to hold 100% of the time, not temporarily relaxed even for a nanosecond. Now, if your invariants hold literally 100% of the time, then that includes all the times when the code might panic and be observed by a catch_unwind().

But I think the same idea holds true for Sync types more generally. The problem you're trying to avoid with regards to both concurrency-safety and unwind-safety is that you relax an invariant temporarily, and then, before you get to restore it, some other code observes it. With concurrency, the other code is running at the same actual time, or perhaps the scheduler interrupts your code to let a different thread run. With unwinding, a panic interrupts your code and lets the catch_unwind() handler run. It's basically a one-time context switch that can only happen at specific points (not necessarily self-evident ones, however).

So in any case where there's an opportunity for catch_unwind() to observe your broken invariant, there's also an opportunity for a concurrently-running thread to observe it. Consider the example of the non-poisoning Mutex. The point here is precisely that if some code panics in the middle of a critical section, while an invariant is temporarily relaxed, then that code is buggy even if it doesn't use catch_unwind() anywhere! Because another thread could also observe that inconsistent state.

From this perspective, the purpose of UnwindSafe is to offer some protection in situations where catch_unwind() is what makes the difference with respect to whether or not there is a bug. In the absence of catch_unwind(), code that panics in the middle of shuffling things around between Cells is completely fine.2 Doing the same with Atomics is already not fine.

Now, all this relies on the assumption that concurrency is the only reason why anyone would ever use a Sync shared-mutable type like Mutex or Atomics, and that if concurrency weren't involved, they'd use some corresponding non-Sync type like RefCell instead. This is admittedly not a logically airtight assumption, and not one to be staking the soundness of the standard library on. But since UnwindSafe isn't enforcing a soundness-related invariant, and is making its best effort to catch other bugs, then in a spirit of pragmatism, it seems like a sensible assumption to make use of.

(One particular failure mode3 one might be concerned about is that previously, when such types weren't UnwindSafe, the only reason to use them was for concurrent code; but once they are, maybe someone might use them just to satisfy an UnwindSafe constraint, without concurrency actually being involved anywhere. While this scenario seems more amusing than likely, I think there's a way to logically reconcile it nonetheless: we should conceive of types like Mutex and AtomicPtr as holding an AssertUnwindSafe internally. The goal is for there to be some marker of intent, and an otherwise-gratuitous use of a Sync type is such a marker.)


@dpc

I did it many times in my code, that I though "wouldn't it be nice to have a type system enforcing some logical invariant", introduced it, and then decided that it's not worth the the type-system trouble. Sometimes it works and is worth it, sometimes it isn't. I think here the invariant has to be deal with so rarely in an idiomatic code, that it's just not worth the trouble.

There's multiple features that I would rather see Rust language developers spend their precious time on than trying to make this one actually work well

This is very reasonable. I don't think the tradeoffs are super clear right now w.r.t. how much effort it would take to improve UnwindSafe4, how much pain it's causing, or how much value it's providing (prior comments raised these questions, and apart from yours, haven't attracted much response). But if the relevant teams came out tomorrow and said that in their best estimation, it doesn't pencil out, and they don't feel like it's worth their effort, then that would be a solid motivation for deprecation. But if that's in fact the case, then I think it's what an RFC should be about, rather than focus on object-level technical issues.

I think it's easiest to deprecate it completely for now, and if we ever really want to, we can introduce it from scratch, under new name

Removing the UnwindSafe constraint from catch_unwind() would be backwards compatible; adding a new FunwindSafer constraint afterwards would not be. (Features like trait aliases or pub use may or may not be adequate for renaming it in-place in a backwards compatible way. If they're not, then the renaming issue is moot.)

Footnotes

  1. Panic-safe, cancel-safe, kill-safe, abort-safe... this property has many names.

  2. I mean, it'll exit, but it won't go haywire.

  3. This has got to be a named logical fallacy or similar, but I'm drawing a blank.

  4. For the Send/Sync thing in particular, the sticking point was coherence. Since these are all marker traits, however, overlap ought to be fine, and in fact at one point, there was an effort to relax the rules for marker traits. That ran aground on some subtle issue where it turned out to not be fine after all. As best as I can remember, the realization they ended up coming to was that while it's fine to relax the overlap restrictions for marker traits, it's not fine to relax the orphan restrictions (due to issues w.r.t. negative reasoning). I'm not sure which of those is the relevant one for this situation. But it's at least conceivable that the given restriction could be lifted in a second push, which would be beneficial on its own terms, and then the UnwindSafe impls we want could go through without a problem. Only lang/compiler folks could say for sure.

@LegionMammal978
Copy link
Contributor

So in any case where there's an opportunity for catch_unwind() to observe your broken invariant, there's also an opportunity for a concurrently-running thread to observe it. Consider the example of the non-poisoning Mutex. The point here is precisely that if some code panics in the middle of a critical section, while an invariant is temporarily relaxed, then that code is buggy even if it doesn't use catch_unwind() anywhere! Because another thread could also observe that inconsistent state.

This reasoning works for atomics, but I don't think it follows for Mutexes. The whole point of the type is that threads' critical sections are mutually exclusive. So if a thread acquires a lock, relaxes an invariant, then restores the invariant before releasing the lock, nothing else in the program will be able to observe the relaxed invariant. In other words, every thread knows that once it acquires a lock, the value within will always satisfy the invariant.

But panics potentially break this last assumption: if a thread panics, it can release its lock before it can restore the invariant, and the next thread to acquire a lock will observe the intermediate state. Thus, we implement poisoning, so that the next thread knows that a panic must have happened in a critical section and that the value must be handled with caution. The alternative is a control-flow explosion, in which every thread to acquire the lock must assume that some other thread may have just panicked.

One particular failure mode one might be concerned about is that previously, when such types weren't UnwindSafe, the only reason to use them was for concurrent code; but once they are, maybe someone might use them just to satisfy an UnwindSafe constraint, without concurrency actually being involved anywhere.

I have recently observed this while reviewing a library that allows users to write Rust plugins for a certain single-threaded C application. Each plugin has some number of hooks that can be triggered by specific events; the hooks communicate via a captured environment. However, what should happen if one hook panics? There's a few obvious possibilities:

  • Print a message and abort the program. The library author deemed this undesirable under any circumstances.
  • Print a message and unregister the panicked plugin. The application's plugin API did not permit this.
  • Print a message and unregister all hooks. The library author rejected this for (IMO) poorly-explained reasons.
  • Print a message and replace all hooks with error stubs. The library author similarly rejected this.

Thus, the author decided to keep all hooks running as before, including the hook that panicked. However, to avoid the control-flow explosion, they required the hook closures to be UnwindSafe, and recommended that shared state be placed in a Mutex for its poisoning. If there were such thing as a poisoning RefCell, that would probably be used for the state instead.

@glaebhoerl
Copy link
Contributor

This reasoning works for atomics, but I don't think it follows for Mutexes.

I'm not sure if you were just pointing out a technical detail, or if you intended this as a broader counterargument...?

Indeed, while the mutex is locked, its state is protected from other threads, and this critical section can be interrupted by a panic. At that point there are two possibilities. Either it doesn't permit the state to be accessed again (whether by poisoning1, or just keeping it locked), or it does (by unlocking). If it doesn't, the first kind of justification for unwind safety applies -- there's no way to observe a violated invariant. If it does, the second kind does -- now that it's unlocked, another thread can access the state, and this is a bug whether or not catch_unwind() is being used as well.

If there were such thing as a poisoning RefCell, that would probably be used for the state instead.

That's an interesting idea, and I guess it makes sense that this use case would come up.

Footnotes

  1. Poisoning can be circumvented explicitly, but that's the same kind of "I'm aware of the implications of what I'm doing" marker as AssertUnwindSafe would be.

@LegionMammal978
Copy link
Contributor

I'm not sure if you were just pointing out a technical detail, or if you intended this as a broader counterargument...?

My apologies; I hadn't seen that you were specifically writing about "the non-poisoning Mutex" in that section. As I understand it, it comes down to the fact that thread::spawn() requires the closure to be Send, but not UnwindSafe, so any cross-thread Sync interaction introduces the same risk of observing broken invariants, and the bound on catch_unwind() doesn't help us at all.

However, there would still seem to be a valuable distinction between poisoning and non-poisoning types. Poisoning types follow our "correct by default" philosophy, since most users aren't expected to carefully track the state of the program following an unwinding panic, and putting up a PoisonError roadblock correctly informs them of the risk. But non-poisoning types also have their niche, having less overhead and being somewhat simpler to reason about in applications where panics don't occur or can't disrupt important invariants. (Personally, I suspect users would worry much less about non-poisoning types if .lock().unwrap() were any shorter.) Currently, we use UnwindSafe to distinguish these, but it would be relegated to documentation alone in a "Send/Sync implies [Ref]UnwindSafe" world. Would we just have to accept that tradeoff?

@Kixunil
Copy link

Kixunil commented Aug 11, 2022

It seems to me that part of the problem may be missing tools to prove the code is actually unwind safe.
For instance Cell<T> is missing RefUnwindSafe impl. The type prevents getting a reference to the interior so it's impossible to invalidate its invariants.

Further, adding these new types could help:

/// Implements poisoning API on top of `&mut` - no shared references involved
pub struct PoisoningCell<T> {
    value: T,
    is_poisoned: bool,
}

impl<T> PoisoningCell<T> {
    pub fn new(value: T) -> Self {
        PoisoningCell {
            value,
            is_poisoned: false,
        }
    }
    pub fn set(&mut self, value: T) {
        self.value = value;
        // we've just overwrote the value with something valid
        self.is_poisoned = false;
    }
    pub fn replace(&mut self, value: T) -> T { self.ck_poison(); mem::replace(&mut self.value, value) }
    pub fn get(&self) -> T where T: Copy { self.ck_poison(); self.value }
    pub fn get_ref(&self) -> &T where T: RefUnwindSafe { self.ck_poison(); &self.value }
    pub fn get_mut(&mut self) -> Guard<'_, T> { self.ck_poison(); Guard(self) }
    pub fn into_inner(self) -> T { self.ck_poison(); self.value }
    fn ck_poison(&self) { assert!(!self.is_poisoned, "attempted access to poisoned cell"); }
}


// Consumer can only replace the values or access them with possible poisoning, so these are OK
impl<T> RefUnwindSafe for RefCell<PoisoningCell<T>> {}
impl<T> UnwindSafe for &'_ mut PoisoningCell<T> {}

// impl Deref{Mut} like other guards
pub struct Guard<'a, T>(&'a mut PoisoningCell);

/// Doesn't allow mutable references to T to exist but allows overwriting with new value
pub struct DummyCell<T>(T);

impl<T> DummyCell<T> {
    pub fn new(value: T) -> Self { DummyCell(value) }
    pub fn set(&mut self, value: T) { self.0 = value; }
    pub fn replace(&mut self, value: T) -> T { mem::replace(&mut self.0, value) }
    pub fn get(&self) -> T where T: Copy { self.0 }
    pub fn get_ref(&self) -> &T where T: RefUnwindSafe { &self.0 }
    pub fn into_inner(self) -> T { self.0 }
}

impl<T> RefUnwindSafe for DummyCell<T> {}
// Consumer can only replace the values or read them, so this is OK
impl<T> UnwindSafe for &'_ mut DummyCell<T> {}

@glaebhoerl
Copy link
Contributor

(Sorry for the lag, I had a hard time wrapping my head around the different perspective and got diverted while trying.)

As I understand it, it comes down to the fact that thread::spawn() requires the closure to be Send, but not UnwindSafe, so any cross-thread Sync interaction introduces the same risk of observing broken invariants, and the bound on catch_unwind() doesn't help us at all.

I guess you could look at it this way.

If thread::spawn() did require UnwindSafe, that'd result in there being no use for types which were Send/Sync but not [Ref]UnwindSafe, even if they were technically possible; so I think you were just making an observation, rather than wishing this were the case?

But non-poisoning types also have their niche, having less overhead and being somewhat simpler to reason about in applications where panics don't occur or can't disrupt important invariants. (...) Currently, we use UnwindSafe to distinguish these, but it would be relegated to documentation alone in a "Send/Sync implies [Ref]UnwindSafe" world. Would we just have to accept that tradeoff?

By using a non-poisoning mutex, someone declares "I take upon myself the burden of ensuring there are no panics [at problematic points] in my critical sections". I agree it would be nice if there were some more upfront indication that this is what's happening (as with unsafe blocks, or AssertUnwindSafe). But since catch_unwind() doesn't even enter into that picture, the presence or absence of a RefUnwindSafe impl doesn't do a good job of serving as that indication -- it essentially is merely documentation, unless the author, by coincidence, happens to also employ a catch_unwind() involving the same mutex somewhere. So yeah, I think we'd have to "accept the tradeoff"... except that the thing we'd be giving up is something we never really had.

As for what *else* could be done about the situation...

Effect typing for panics might've come in useful here, but we don't have that. Perhaps the MutexGuard for such types could abort on panic... which is even more extreme than poisoning (although does avoid the performance hit, which I believe was part of the motivation). Or they could simply keep the Mutex locked, which again accomplishes basically the same thing as poisoning, with the tradeoff being a worse debugging experience. Perhaps the MutexGuard could have an .unlock_on_panic() flag you'd set if you've verified that any potential panics are benign ones, or there could be a separate flavor of MutexGuard type with this behavior, in either case providing thereby the upfront indication we were wishing for above. Or you could use catch_unwind() and do the unlocking yourself, I suppose.

Or maybe the issue in some cases is that the type inside the Mutex doesn't have any invariants? In which case any panic is fine. (This scenario is plausible, since some kind of synchronization is necessary just to avoid data races (satisfy Sync), even if there are no invariants to preserve.) I'm not sure what a sensible and systematic way to indicate this assumption might be.

@glaebhoerl
Copy link
Contributor

For instance Cell<T> is missing RefUnwindSafe impl. The type prevents getting a reference to the interior so it's impossible to invalidate its invariants.

Technically true as long as you only have a single Cell anywhere. Once you have multiple Cells, there can be invariants between them. (Which is exactly the kind of thing that's perfectly fine for single-threaded code as long as catch_unwind() isn't involved, and why we're motivated to track unwind safety when it is.)

@Kixunil
Copy link

Kixunil commented Sep 12, 2022

@glaebhoerl same applies to Atomic* which already have RefUnwindSafe.

@bjorn3
Copy link
Member

bjorn3 commented Sep 12, 2022

Cell and Atomic* don't have any internal invariants that unwinding could violate. If there are external invariants, then the types that contain them which do have such invariants should not implement UnwindSafe.

@Kixunil
Copy link

Kixunil commented Sep 12, 2022

@bjorn3 indeed. There's a footgun that people may forget about it and also the possibility. I'm not sure if it's possible to improve that but maybe some way to mark the invariants would be helpful.

@glaebhoerl
Copy link
Contributor

@glaebhoerl same applies to Atomic* which already have RefUnwindSafe.

Yes, because the use case for that is concurrency! I wrote an entire novel comment about this earlier!

@WiktorPrzetacznik
Copy link

I feel like the only goal of unwind safety (being an useful lint) hasn't really been achieved, because there's no consistency in implementing UnwindSafe and RefUnwindSafe even in std (!). Scanning through whole std and fixing it seems like a lot of effort for little gains. Also, contributors have to keep in mind that any future code modifications may invalidate type's unwind safety.

In my opinion, there's no place for half-baked features like this one in std.

@nikomatsakis
Copy link
Contributor

As one of the original people who argued strongly in favor of UnwindSafe, I agree it has not been successful in its goals. I think it's a powerful tool that could in principle help detect bugs, but in practice I think it's not really used consistently enough nor well understood enough.

I'm not 100% sure what the takeaware is here on my part -- I still believe strongly that unwind safety is subtle and hard to get right, and I believe in a std that makes efforts to be footgun-proof against these sorts of failures, but many of the attempts to do so seem to not quite hit the mark (I'm still a fan of lock poisoning, but I know it's unpopular; my personal belief is most code is still broken in the face of panics while a lock is held, and the default API should simply have panicked implicitly rather than returning a Result, and then it would be far more popular).

@ssokolow
Copy link

ssokolow commented Oct 28, 2023

(I'm still a fan of lock poisoning, but I know it's unpopular; my personal belief is most code is still broken in the face of panics while a lock is held, and the default API should simply have panicked implicitly rather than returning a Result, and then it would be far more popular).

I disagree, but then I'm one of those people who wraps every unit of work in catch_unwind, was willing to spawn "unnecessary" threads in the days before catch_unwind, sees uncaught panics as something not that far below non-FFI use of unsafe (without miri CI and a giant benchmark suite to justify it) as a code smell and constantly laments that Rustig! and findpanics are both unmaintained while #[no_panic] is a linker hack with too much emphasis on the "hack".

@nikomatsakis
Copy link
Contributor

nikomatsakis commented Oct 29, 2023 via email

@ssokolow
Copy link

ssokolow commented Oct 29, 2023

True. I'm just not sure how radical a change it would have involved to developer ergonomics to make the benefit of poisoning negligible. There is, after all, a reason that Rust doesn't require every API which may allocate to return a Result.

Panicking, poisoning, catch_unwind around units of work, and tools like Rustig! or findpanics are Defense in Depth against broken behaviour slipping into a release due to a developer mistake.

@NobodyXu
Copy link

Maybe it would be reasonable to introduce noexcept?

@bjorn3
Copy link
Member

bjorn3 commented Oct 29, 2023

That would mean you can't do any integrity assertions in such functions, which may well be even worse than requiring catch_unwind at a safe place where you can recover from violations of invariants. And it would have all the same composibility issues as any other effect system we have in rust (const fn, async fn, ...)

@glaebhoerl
Copy link
Contributor

@nikomatsakis Do you happen to have any thoughts w.r.t. my earlier comment where I tried to figure out what actual practical problems people are having with UnwindSafe, and whether they could be addressed?

(Also maybe this footnote on a subsequent comment, w.r.t. what it would take for the Send/Sync->Ref/UnwindSafe blanket impls to go through.)

Turbo87 added a commit to Turbo87/crates.io that referenced this pull request Nov 6, 2023
rust-lang/rust#40628, rust-lang/rust#65717 and rust-lang/rfcs#3260 all show that unwind safety isn't particularly ergonomic to use and implement, and ultimately leads to people slapping `AssertUnwindSafe` everywhere until the compiler stops complaining.

This situation has led to built-in test framework using `catch_unwind(AssertUnwindSafe(...))` (see https://github.com/rust-lang/rust/blob/1.73.0/library/test/src/lib.rs#L649) and libraries like tower-http doing the same (see https://docs.rs/tower-http/0.4.4/src/tower_http/catch_panic.rs.html#198).

As people have mentioned in the threads above, trying to implement this correctly is akin to fighting windmills at the moment. Since the above cases demonstrated that `catch_unwind(AssertUnwindSafe(...))` is currently the easiest way to deal with this situation, this commit does the same and refactors our background job runner code accordingly.
Turbo87 added a commit to rust-lang/crates.io that referenced this pull request Nov 6, 2023
rust-lang/rust#40628, rust-lang/rust#65717 and rust-lang/rfcs#3260 all show that unwind safety isn't particularly ergonomic to use and implement, and ultimately leads to people slapping `AssertUnwindSafe` everywhere until the compiler stops complaining.

This situation has led to built-in test framework using `catch_unwind(AssertUnwindSafe(...))` (see https://github.com/rust-lang/rust/blob/1.73.0/library/test/src/lib.rs#L649) and libraries like tower-http doing the same (see https://docs.rs/tower-http/0.4.4/src/tower_http/catch_panic.rs.html#198).

As people have mentioned in the threads above, trying to implement this correctly is akin to fighting windmills at the moment. Since the above cases demonstrated that `catch_unwind(AssertUnwindSafe(...))` is currently the easiest way to deal with this situation, this commit does the same and refactors our background job runner code accordingly.
workingjubilee added a commit to pgcentralfoundation/pgrx that referenced this pull request Jul 2, 2024
Our invariants are so complicated and need such special handling that
once they are take care of, `UnwindSafe` adds nothing. The particular
example that motivated this: `&mut T` isn't `UnwindSafe` because it's
*easy* to witness a broken invariant, not that you will.

See rust-lang/rfcs#3260 for further rationale.
@kornelski
Copy link
Contributor

Problems I have with UnwindSafe:

  1. Unlike unsafe code, this isn't explicitly marked as !UnwindSafe by the author. The unwind-unsafety state is guessed by the compiler based on general heuristics of using certain suspicious types.

    As a result, when finding that some code is not satisfying the UnwindSafe bound, I can't tell whether that's on purpose, and the code is actually not reliable in presence of panics, or whether it's just an oversight, and the compiler is overreacting. In practice, it always turns out to be the latter, so UnwindSafe is hard to take seriously.

  2. A risk of having an application-specific logic bug due to specifically !UnwindSafe types doesn't feel substantially different from a risk having any other application-specific bug due to other unexpected errors. Rust seems to over-emphasise one specific set of edge-case conditions. There are UnwindSafe types like Atomic that could be easily left in a bad state. There's also out-of-Rust state, like open files or TCP connections that could be left in a broken state too. That's why Rust libraries use a lot of Drop guards, because state cleanup calls can be easily forgotten, even without unwinding.

  3. The notion of "logical unsafety" is too vague. It's not a language requirement, but an entirely speculative "maybe the program will do something safe-but-wrong, but who knows?". When the problems and consequences are so nebulous, it's hard to even talk about this feature. This feature doesn't seem to be actually used by anyone, and there are no prominent examples in the ecosystem to use as a reference, that I'm aware of. This makes it too hard define a policy for using this feature within a project/company — other than YOLO it with AssertUnwindSafe.

  4. I can't come up with a case where I'd want to have code that is not UnwindSafe. If I'm aware that some code can become buggy beucause of a panic, I want to fix it anyway. The existing best practices and solutions required by Rust to make code safe are basically the same as for making code UnwindSafe, so typically fixing one is fixing both. I don't want to fix code just barely enough not to cause UB, but still leave it broken enough to be dangerous, especially knowing that UnwindSafe is not taken seriously and is easily bypassed.

@LegionMammal978
Copy link
Contributor

LegionMammal978 commented Oct 17, 2024

@kornelski How I'd consider it, is that UnwindSafe is somewhat of a misnomer if one takes it to mean "values of this type are safe to unwind": obviously every value ought to be sound on unwind, at the very least. Really, what it means is, "either values of this type allow the modification of data outside of the usual ownership tree, or the type and everything depending on it is designed so that any such modification is atomic w.r.t. panics". The idea is, if you use only naturally-UnwindSafe types across a catch_unwind() boundary, then all side effects of your code will be encapsulated within values that it owns. Then, if it unwinds at any point, then all those owned values will all be destroyed, and you'll be left with a clean slate on the outside of the catch_unwind(). The archetypical example being that a server might want to isolate different requests from each others' panics: when a request panics, all of that request's owned resources are cleaned up, and the server can continue serving other requests with no harm done. To address your particular points:

  1. That's why I prefer my reframing of the trait in terms of the ownership tree. It's not a general heuristic of suspiciousness so much as the meaning of an UnwindSafe type, even if its documentation meanders around the point. This also gives a straightforward way to evaluate whether a type should be UnwindSafe: if I use this value to modify anything, then will the modification be wiped clean (or at least be safely "committed" in some way) once I drop the value? For typical data types (String, Vec, Option, ...) this is the case, but it is usually not the case for types granting mutable access to other data (&mut T, &Cell<T>, ...).

  2. UnwindSafe really depends on the existence of Drop guards for safely cleaning up owned objects, rather than trying to be a substitute for Drop guards. For Atomic types, the idea (as I understand it) is that any multi-step operation you'd try to do that a panic could break would generally already be broken in the presence of multiple threads. Except maybe for custom mutexes, but that's covered by UnsafeCell being !RefUnwindSafe. For out-of-Rust data like files and sockets, the idea is that any breakage caused by unexpected panics would also be caused by "unplugging the computer in the middle of the operation", so a carefully-designed upper protocol should already be tolerant of that (with idempotence, transactions, etc.).

  3. Yeah, I don't deny that the documentation and usability of UnwindSafe aren't very great in practice, and that a blanket AssertUnwindSafe is far simpler. If I had to define a policy, I might say, "Consider what the scope and aim of this catch_unwind() block are. Then, consider any side effects of the code that aren't yet cleaned up once it's unwound. Address those side effects, then apply UnwindSafe to the relevant data." In particular, for any catch_unwind() block where the panic is immediately re-raised (e.g., on an FFI boundary), a blanket AssertUnwindSafe is 100% correct, since that part of the program is being taken down by an uncaught panic in any case.

  4. It's more about generic types like &mut T that might allow such buginess in some use cases: any possible &mut WhateverType accessed with a TOCTOU gap (e.g., creating a HashMap entry with a dummy value, then subsequently filling it with some calculated data) might cause such a bug. If you know that your particular type is airtight when accessed across an unwind boundary, you can slap an UnwindSafe/RefUnwindSafe on it with no problem. Again, it's not about the sorts of errors that a drop guard is supposed to fix, but about the errors from partial operations across a catch_unwind() boundary.

I'm not really opposed to this PR, since UnwindSafe as implemented is very impractical, but I think there's some merit to the underlying concepts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.