Consider improving error handling and `StatsError` type #221

FreezyLemon · 2024-04-27T10:15:29Z

enum StatsError is supposed to be an

Enumeration of possible errors thrown within the statrs library.

This indicates that it should not try to be a generic error type for any kind of statistics calculations, but instead only concern itself with the errors produced in statrs.

With that basic assumption, there are currently some inconsistencies and outdated practices though (API guidelines for reference):

Add test that StatsError implements both Sync and Send (this is more of a formality, but is trivial to implement and good future-proofing)
Error::description is deprecated and should not be implemented
There are a few unused variants (ArgNotNegative, ArgIntervalExclMin, etc.), these are leftovers from older versions and should be removed because they're no longer needed by statrs.
There's at least one case of a function returning a Result<T> that cannot return an error: Empirical::new. There's no real reason for this infallible API to return a Result<T, E>. There might be others.

I realize that most of these are breaking changes, but seeing that the crate is pre-1.0, I don't think there's a big problem doing this.

Other things that could be improved:
StatsError is big: 40 bytes on a Linux x64 target. This is because there are variants which contain 2x &str (2 x 16 = 32 bytes plus discriminant and padding). Is it really necessary to have strings in the error type? The implementation could be replaced mostly 1:1 with some sort of ArgName enum, but there might be an even better solution that does not need this argument name wrapping at all.

All new functions seem to just return StatsError::BadParams whenever the params are outside of the defined/allowed range. Is there a good reason for these to be so vague when compared to the more specific errors returned by other functions? After all, the more specific errors already exist, why not return more exact error information? There might even be value in providing multiple error types, to have errors that are more specific to the exact API being used.

The text was updated successfully, but these errors were encountered:

YeungOnion · 2024-04-28T02:56:06Z

I do see good reason for all of these and I'd be open to making changes for all but the infallible new not returning a result.

All other public structs defined in the distribution module have a new method implemented that returns Result so it does provide consistency. Perhaps if Empirical were in a different module or it's new had a different name?

FreezyLemon · 2024-04-28T16:50:13Z

All other public structs defined in the distribution module have a new method implemented that returns Result so it does provide consistency. Perhaps if Empirical were in a different module or it's new had a different name?

I see your point about consistency. I would personally value the expressiveness ("this call cannot fail") over the consistency ("all constructors return a Result and might need error handling"), but it doesn't matter much tbh.

Hmm I'm not sure about renaming the new function, it's a widespread naming convention in the Rust ecosystem and what most users would expect. Maybe a similar name like new_<something> so people can quickly find it in their IDEs.

YeungOnion · 2024-04-28T22:02:47Z

Perhaps an impl Default over having Empirical::new?

Regardless, the overall discussion you bring up on the error type is valid. I'd merge an effort that does any of

scaling it down
making it adhere closer to API guidelines
returning StatsError::BadParams less where possible.

YeungOnion · 2024-06-20T18:52:57Z

During these changes, we should also work on removing support for degenerate distributions #102 and returning errors.

YeungOnion · 2024-07-19T20:04:58Z

Tried a different approach, even wrote out an example to test using it, but I didn't have a great motivating case, so I just made something up, but I wanted to see what it was like to actually handle those such errors.

YeungOnion · 2024-08-14T19:12:20Z

Maybe we should figure out what the errors should be able to convey first. Since @FreezyLemon posted on #242 I've realized I've not done enough of this bigger picture stuff for decisions. I think it's fair to narrow the discussion to errors from distribution construction since that's at a module scope and most of where errors are handled.

The value in #258 is to make the errors flat, and delegates identifying what we call "bad parameters" to the caller. This is quite reasonable, as the conditions for errors are statically specified (up to decisions about what constitutes degeneracy), but there is no program state in which parameter conditions change. However it means that users of the library may repeat their work among projects where there's some input-validation cycle if they want useful error messages.

The option I have in #247 is pretty dense, and it means all distributions' new have a different type for the Err variant. Would need quite a few imports just to do something that creates multiple distributions from different families. But it has the value of granularity.

A few lists to help organize my thoughts:

Assumptions

Users who are writing analyses will likely unwrap or pair ? with anyhow.
Some end users will benefit from specificity of erroneous input.
Library authors using statrs wish for choice between dynamic and static dispatch.
Library authors will not prefer to duplicate effort of input checking.
Developers do not prefer being exhaustive with writing source code for error handling outside of tests.

Possible targeted behaviors adjacent to error API

distribution family generic code
- e.g. the application is fitting a distribution function to data where the end user chooses a distribution family
- dynamic dispatch for creating distribution type is the only option we can support without changing beyond error API
correcting while widening and narrowing of parameters
- e.g. arguing again via fitting, multiparameter distributions could have jacobians that lead to invalid values for one (or multiple) parameters going out of bounds because types are not statically constraining onto intervals.
matrix updating
- Mutable/Movable parameters for multivariate normal #180 mentions possible memory performance benefit, a further benefit could be updates that have nice approximations as updates, for $M' = M + \epsilon A$ under appropriate constraints on $A$, perhaps symmetry, $A^T = A$ or single rank $A = x x$
dimension checking/mismatch for dynamically sized nalgebra matrices
- we should at least panic on things like this for now. Evaluating pdf for Dyn sizes may fail the unwrapping here

I'm very open to modifying these lists.

YeungOnion · 2024-08-14T19:27:40Z

I realize yet another option is to have the distribution error have too many variants for any one distribution, but use shared language location, scale, shape, etc. in the variants. Suppose this is called ParameterError. Then in each distribution's module, impl From<super::ParameterError> for the distribution's specific error or use some generics to impl Error with source converting to the right error.

This allows a shallow error, small step from unwrap for those writing analyses. And allows for deeper error which can rely more on handling programmatically. But I don't think this is a common idiom, the more I think about this, the more certain there's something easier I'm missing here and I don't think it's needed to replicate std::io's pattern of error handling.

FreezyLemon · 2024-08-15T13:17:12Z

the more I think about this, the more certain there's something easier I'm missing here and I don't think it's needed to replicate std::io's pattern of error handling.

Most usages of StatsError are currently for the return value of SomeDistribution::new(...). Since this function is implemented per-distribution and not to satisfy some trait, the simplest solution here would be creating a concrete error type for each and every distribution in case the construction fails. For example:

Let's turn

// beta.rs

#[derive(Copy, Clone, PartialEq, Debug)]
pub struct Beta {
    shape_a: f64,
    shape_b: f64,
}

impl Beta {
    pub fn new(shape_a: f64, shape_b: f64) -> Result<Beta> {
        if shape_a.is_nan()
            || shape_b.is_nan()
            || shape_a.is_infinite() && shape_b.is_infinite()
            || shape_a <= 0.0
            || shape_b <= 0.0
        {
            return Err(StatsError::BadParams);
        };
        Ok(Beta { shape_a, shape_b })
    }
}

into

#[derive(Copy, Clone, PartialEq, Debug)]
pub struct Beta {
    shape_a: f64,
    shape_b: f64,
}

#[derive(Copy, Clone, PartialEq, Eq, Debug)]
pub enum BetaError {
    ShapeANan,
    ShapeAInfinite,
    ShapeALessThanZero,
    ShapeBNan,
    ShapeBInfinite,
    ShapeBLessThanZero,
}

impl std::fmt::Display for BetaError {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        match self {
            BetaError::ShapeANan => write!(f, "Shape A is NaN"),
            // ...
        }
    }
}

impl std::error::Error for BetaError {}

impl Beta {
    pub fn new(shape_a: f64, shape_b: f64) -> Result<Beta, BetaError> {
        if shape_a.is_nan() {
            Err(BetaError::ShapeANan)
        } else if shape_a.is_infinite() {
            Err(BetaError::ShapeAInfinite)
        } else if shape_a <= 0.0 {
            Err(BetaError::ShapeALessThanZero)
        } else if shape_b.is_nan() {
            Err(BetaError::ShapeBNan)
        } else if shape_b.is_infinite() {
            Err(BetaError::ShapeBInfinite)
        } else if shape_b <= 0.0 {
            Err(BetaError::ShapeBLessThanZero)
        } else {
            Ok(Beta { shape_a, shape_b })
        }
    }
}

Maybe this example is too verbose (i.e. maybe BetaError::ShapeAInvalid would be enough), but it gets the point across.

This will return very specific errors, and users that don't care about it can still .unwrap() or use pattern matching to handle whichever errors. This will not impact the distributions' traits (because new is not a part of any trait) and should allow for dynamic dispatch.

If new methods of creating a distribution are added later, the related error type could be extended or a separate error type, specific to the method of construction, could be added.

It will need a fair bit of boiler plate, especially now that it needs to be implemented all at once, but is otherwise very simple to implement. A crate like thiserror could cut down on boilerplate, but is honestly not needed. The creation errors are simple and thiserror is/has a proc-macro which means it has a negative impact on compilation times.

I might be missing something here, but this is the obvious KISS solution IMHO

YeungOnion · 2024-08-15T22:09:00Z

Ah, it's true that the generic part can only come up once the type is correctly initialized.

I'm good with only specifying which parameter is invalid, so we can just have one variant per parameter, the one message denoting that one of its (potentially multiple) constraints was violated.

impl std::fmt::Display for BetaError {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        match self {
            BetaError::ShapeAInvalid => write!(f, "Shape A must be positive finite"),
            BetaError::ShapeBInvalid => write!(f, "Shape B must be positive finite"),
        }
    }
}

It will be a little boilerplatey, but it need not be flexible, sounds like a text editor problem.

FreezyLemon · 2024-08-16T07:45:46Z

Alright, I'll prepare another draft pr to compare against the rest if you don't mind

YeungOnion · 2024-08-16T15:28:12Z

Thanks for this, I'll keep an eye out.

FreezyLemon mentioned this issue Apr 29, 2024

Add test asserting that StatsError is Sync & Send #226

Merged

YeungOnion mentioned this issue May 31, 2024

Refactor overlap with rand_distr to extend/mirror their implementation #236

Open

YeungOnion linked a pull request Jun 23, 2024 that will close this issue

hierarchy of errors #247

Closed

FreezyLemon mentioned this issue Aug 7, 2024

Remove StatsError type & replace Result<T, StatsError> with Option<T> #258

Closed

YeungOnion linked a pull request Aug 14, 2024 that will close this issue

Remove StatsError type & replace Result<T, StatsError> with Option<T> #258

Closed

FreezyLemon mentioned this issue Aug 16, 2024

Implement concrete Error types for each distribution's new function #265

Merged

5 tasks

FreezyLemon mentioned this issue Sep 10, 2024

Replace all instances of StatsError with custom error types or Option<T> #284

Merged

YeungOnion closed this as completed in #284 Sep 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider improving error handling and `StatsError` type #221

Consider improving error handling and `StatsError` type #221

FreezyLemon commented Apr 27, 2024

YeungOnion commented Apr 28, 2024

FreezyLemon commented Apr 28, 2024 •

edited

Loading

YeungOnion commented Apr 28, 2024

YeungOnion commented Jun 20, 2024 •

edited

Loading

YeungOnion commented Jul 19, 2024

YeungOnion commented Aug 14, 2024

YeungOnion commented Aug 14, 2024 •

edited

Loading

FreezyLemon commented Aug 15, 2024 •

edited

Loading

YeungOnion commented Aug 15, 2024

FreezyLemon commented Aug 16, 2024

YeungOnion commented Aug 16, 2024

Consider improving error handling and StatsError type #221

Consider improving error handling and StatsError type #221

Comments

FreezyLemon commented Apr 27, 2024

YeungOnion commented Apr 28, 2024

FreezyLemon commented Apr 28, 2024 • edited Loading

YeungOnion commented Apr 28, 2024

YeungOnion commented Jun 20, 2024 • edited Loading

YeungOnion commented Jul 19, 2024

YeungOnion commented Aug 14, 2024

Assumptions

Possible targeted behaviors adjacent to error API

YeungOnion commented Aug 14, 2024 • edited Loading

FreezyLemon commented Aug 15, 2024 • edited Loading

YeungOnion commented Aug 15, 2024

FreezyLemon commented Aug 16, 2024

YeungOnion commented Aug 16, 2024

Consider improving error handling and `StatsError` type #221

Consider improving error handling and `StatsError` type #221

FreezyLemon commented Apr 28, 2024 •

edited

Loading

YeungOnion commented Jun 20, 2024 •

edited

Loading

YeungOnion commented Aug 14, 2024 •

edited

Loading

FreezyLemon commented Aug 15, 2024 •

edited

Loading