Panic feature flag #80

leudz · 2020-04-28T18:23:30Z

Shipyard offers alternative try_* version of all fallible functions.
However it's not always easy to spot a function that could panic while reading code or during code review as @colelawrence pointed out.
It's especially bad in some cases like get when a function with this name usually doesn't panic.

To solve this problem all functions that could panic will be feature gated.
It will be a default feature so all users that don't disable them won't see any change.

The text was updated successfully, but these errors were encountered:

almetica · 2020-04-29T05:48:18Z

I personally consider the usage of unwrap/panic inside a library a grave anti-pattern. It hides error states that should be explicitly checked or accounted for by the user without him knowing using the API (since the unwrap/panic is onyl visible inside the library code or in the best case documented in the API documentation).

The current get()/try_get() code for example also breaks some recommendations of the official rust language style guidlines:

When a contract violation is the only kind of error a function may encounter -- i.e., there are no obstructions to its success other than "bad" inputs -- using Result or Option instead is especially warranted. Clients can then use unwrap to assert that they have passed valid input, or re-use the error checking done by the API for their own purposes. LINK
An API should not provide both Result-producing and panicking versions of an operation. LINK
In general, the get family of methods is used to access contained data without any risk of thread failure; they return Option as appropriate. LINK

Both variants of get() are also fallible, one it explicitly fallible (try_get() via Result) whereas the other one in implicitly fallible (get() via panic).

try_*() methods are normaly defined to either allow blocking methods to not block and return a Result early (for example the try_*() methods in tokio::sync::mpsc when you don't want to wait for blocking operation on the channel and check for blocking channel conditions early) or to allow operations that are fallible in APIs that are normaly infallible (for example some Iterator operations like for_each() / try_for_each() or find() / try_find()).

leudz · 2020-04-29T17:34:18Z

Since you're linking guidelines from 6 years ago, my view on the topic aligns with nikomatsakis' comment in the RFC (emphasis mine):

All other things being equal, using Result return is generally
preferable to fail!:

What is recoverable and what is not is more often than not a
question of context.
Examples: RefCell failure can be used to detect cycles.
Receiving from a dead task is ok if you are the supervisor
of that task. Bounds check violations may be expected.

If you do want to recover, it is generally more DRY and more
efficient to have callee do the check.

However, usability intrudes, and in reality sometimes it is better to
use fail! than Result. Ultimately this boils down to the fact that
if the vast majority of your callers would consider the error fatal,
it's annoying for them to call ensure().

Lastly, sfackler proposed removing the "try" variants on RefCell
and replacing them with methods to query the current state. This is
clever, but I don't think it's a good general replacement over
Result-ful returns:

On the pro side, it helps the caller to make
choices: "ah, it's read-locked, I'll go for a read-only borrow rather
than a mut borrow". In the case of ref-cell, the utility is limited,
as there aren't that many things you can do, but in more complex cases
that could be helpful.

On the con side, it is not extensible, at least until we have some way
to declare an enum to which one may add more variants in the future.

It also has all the downsides that were listed above regarding why
returning Result is preferable. Any code that wishes to detect
failure before the case winds up duplicating the checks under which
borrow_mut may succeed -- this means that (a) the checks run
twice, to some (probably minor, but not always) performance
detriment and (b) there is the chance that the user encodes the
rules wrong, and hence winds up with a failure even though they were
trying to check. Both of these are the kinds of things that
motivated the idea of "inform the user about pre-condition
violations rather than failing outright".

I think the ecosystem in which shipyard will mostly be used also has its importance. Rapid prototyping that panics is a valid use case.

The try_* alternative makes it trivial to then switch to a more robust solution in production.
This issue makes it even better, once the feature is disabled you'll get a lot of errors and just have to fix them instead of having bad legacy prototype code ruining your production code.

I personally consider the usage of unwrap/panic inside a library a grave anti-pattern. It hides error states that should be explicitly checked or accounted for by the user without him knowing using the API

That's not the way the std or the Rust ecosystem in general works though, even the RFC you linked suggests:

A contract violation is always a bug, and for bugs we follow the Erlang philosophy of "let it crash": we assume that software will have bugs, and we design coarse-grained thread boundaries to report, and perhaps recover, from these bugs.

In shipyard, almost all errors are "contract violation"s, except when it's related to a storage borrow. Even in case of storage borrow errors, I'd argue it's still a contract violation most of the time since you control the whole application and you should make sure not to make an invalid borrow even across threads. That's what workloads automatically do after all.

For get I'd love to use what the RFC proposes:

// Look up element without failing
fn get(&self, key: K) -> Option<&V>
fn get_mut(&mut self, key: K) -> Option<&mut V>

// Convenience for .get(key).map(|elt| elt.clone())
fn get_clone(&self, key: K) -> Option<V>

// Lookup element, failing if it is not found:
impl Index<K, V> for Container { ... }
impl IndexMut<K, V> for Container { ... }

I want Index and IndexMut functionality but I can't use them.

the get family of methods is used to access contained data without any risk of thread failure; they return Option as appropriate

I don't understand.

The new api-guidelines are more moderated about what they advise to do, it's not an easy topic after all.

almetica · 2020-04-29T18:50:05Z

I think the ecosystem in which shipyard will mostly be used also has its importance. Rapid prototyping that panics is a valid use case.

People can just call .unwrap() then. That way it's clear in the application code that the code can panic. If it's hidden inside the library you use, it's harder to spot such problems.

The try_* alternative makes it trivial to then switch to a more robust solution in production.
This issue makes it even better, once the feature is disabled you'll get a lot of errors and just have to fix them instead of having bad legacy prototype code ruining your production code.

I would say that using .unwrap() and later handling the error is equally trivial and refactoring them is also easy.

In shipyard, almost all errors are "contract violation"s, except when it's related to a storage borrow. Even in case of storage borrow errors, I'd argue it's still a contract violation most of the time since you control the whole application and you should make sure not to make an invalid borrow even across threads. That's what workloads automatically do after all.

Even if it's a contract violation, it could we very well be that it's cause by faulty business logic and you don't want to bring down your whole server system because of this. I think my biggest problem with this is the implicity of the faulty operation (and me not expecting get() to panic). It's not clearly visible when using the API, so novice programmers and stupid future me that's patching a problem in two months will most likely missuse the API and produce unstable code.

For get I'd love to use what the RFC proposes:

// Look up element without failing
fn get(&self, key: K) -> Option<&V>
fn get_mut(&mut self, key: K) -> Option<&mut V>

// Convenience for .get(key).map(|elt| elt.clone())
fn get_clone(&self, key: K) -> Option<V>

// Lookup element, failing if it is not found:
impl Index<K, V> for Container { ... }
impl IndexMut<K, V> for Container { ... }

This could indeed work.

the get family of methods is used to access contained data without any risk of thread failure; they return Option as appropriate

I don't understand.

I think I meant that get() on containers can't panic.

Sorry if I worry too much, but I have a deep background in writing server software and implicit behavior that could bring down a system is a big pain point of mine. I have to agree that my use case of shipyard is very specific and that my deep background in GO doesn't help either...

There the usage of panic in libraries is a major crime punishable with writing two weeks in C or even PHP...

I feel like I'm the devils advocate yet again.

leudz · 2020-04-29T20:45:35Z

This could indeed work.

It can't sadly, Index is defined as:

pub trait Index<Idx: ?Sized> {
    type Output: ?Sized;

    fn index(&self, index: Idx) -> &Self::Output;
}

But when a tuple is returned the reference is in the inside, it can't be modeled with Index.
Plus I couldn't unify them the way I unified get and get_mut to allow mixed views.

Not using unwrap is more about not having to even think about if the function is fallible or not. In most cases it'll work without panicking.

While an ECS is flexible it's still mainly meant for games and the api will facilitate this usage more than others. If panics are such a big deal for you, this issue should be a plus for you, no?
You'll have to use try_get instead of get but no risk of panic.

almetica · 2020-04-30T12:21:50Z

@leudz I refactored my codebase to try_get one hour of you uploading the changes to master.

After having slept the night over it, I just wished that a potential unsafe function should be marked so in the function signature. So the default is the safe function, but if somebody needs to have an unchecked version that could panic (that could be potentially faster?), call the _unchecked version of that function. I think I saw such function signatures in sqlx.

leudz · 2020-04-30T16:26:36Z

I had the same conversation with @colelawrence on zulip a couple of days ago.
I think if you want to make something in Rust that doesn't panic there's no other way than go through the docs.
Many functions in the std that could ignore errors or return a Result panic because it's ok to do so in case of bad input.
Probably the best way to recover from a panic is catch_unwind but it'll likely not be better than just have the server restart the process if it crashes.

I think there are instances where the panic version will be faster but I can't remember which function.
The definitively faster version will be *_unchecked but this one won't panic, it'll trigger UB if something goes wrong.

leudz added enhancement New feature or request good first issue Good for newcomers ergonomics There should be a better way to do this labels Apr 28, 2020

colelawrence mentioned this issue Apr 30, 2020

Add "unwrap_get" default feature to gate get fn #82

Closed

leudz mentioned this issue Apr 30, 2020

Going back to get returning a Result #83

Closed

leudz closed this as completed in 934896e May 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Panic feature flag #80

Panic feature flag #80

leudz commented Apr 28, 2020

almetica commented Apr 29, 2020

leudz commented Apr 29, 2020

almetica commented Apr 29, 2020

leudz commented Apr 29, 2020

almetica commented Apr 30, 2020

leudz commented Apr 30, 2020

Panic feature flag #80

Panic feature flag #80

Comments

leudz commented Apr 28, 2020

almetica commented Apr 29, 2020

leudz commented Apr 29, 2020

almetica commented Apr 29, 2020

leudz commented Apr 29, 2020

almetica commented Apr 30, 2020

leudz commented Apr 30, 2020