Checks on delay_density #89

pratikunterwegs · 2023-10-25T10:07:06Z

This issue is in reference to the function that is passed to delay_density. I would ideally like to resolve this issue by adding a fix to #88.

In #88, I have implemented a check on delay_density being a function with only argument. It could be good to constrain users in this way to avoid potential errors.

This is not really in line with other functionals as pointed out in the review comment below. I'm happy to remove this check, but it would be good to hear opinions on either side.

My initial preference would be to allow both to be in line with what other common functions are doing (apply(), map(), etc.). Could you open an issue for this please so others can weigh in and we can take some time to consider both alternatives?

Originally posted by @Bisaloo in #88 (comment)

The text was updated successfully, but these errors were encountered:

Bisaloo · 2023-10-25T11:03:42Z

From a very practical standpoint, this issue is about whether we want to also allow:

cfr_static(
  ebola1976,
  delay_density = mydensityfct
)

or only:

cfr_static(
  ebola1976,
  delay_density = function(x) mydensityfct(x)
)

pratikunterwegs · 2023-10-25T11:33:37Z

Yes, thanks for the clearer explanation.

joshwlambert · 2023-10-26T09:34:51Z

I agree the check for a single argument (https://github.com/epiverse-trace/cfr/blob/dev/handle-dists/R/known_outcomes.R#L53-L58) is useful, but I think the first implementation Hugo listed is preferable given it will be more familiar to R users with knowledge of functional programming.

pratikunterwegs · 2023-10-26T09:36:50Z

Okay, thanks both - @adamkucharski is on leave for a few days but tagging here for future reference: I'll take the check away and users can follow the examples and/or know what they're doing.

TimTaylor · 2023-10-26T10:27:18Z

IIANM mistaken the way delay_density() is eventually used is in known_outcomes() where it is only given 1 input. Without the check, a mistake on the users part could potentially give confusing error messages. For this reason is it not best to leave the check in place? There may be scope to restructure a little to avoid duplicate checks.

pratikunterwegs · 2023-10-26T10:38:09Z

Thanks @TimTaylor - a related issue is that known_outcomes() is quasi-internal, but we do show how to use it in a vignette IIRC. Perhaps we can leave the check out for now, and revisit it once we get some user feedback.

pratikunterwegs · 2023-10-26T11:55:36Z

As follow up @TimTaylor - what restructure did you have in mind? If it's small, can be made here, or else in another PR

TimTaylor · 2023-10-26T12:06:23Z

This is more of a general thing.

Where you end up with external functions used internally you can sometimes avoid duplicating checks by having an external function just be a thin checking wrapper around an unchecked function. Then when you need to use the function internally you call the unchecked version. I often use the same name for the internal function but with a . prefix (e.g. myfun() and .myfun().

Of course this works best when the checks on one function are a subset of another (else you start needing to bubble up errors which quickly gets tricky).

pratikunterwegs · 2023-10-26T12:32:20Z

Thanks - so not related to the delay_density issue then? Will merge #88 for now

TimTaylor · 2023-10-26T12:45:54Z

Thanks - so not related to the delay_density issue then?

Well related in that checks on delay_density() are being duplicated.

Will merge #88 for now

It's hard to keep track of issues and PRs but my main concern was 08f870a. I think you want this check in place no? Again I may be missing something but if, for instance, someone passed delay_density = dgamma to the current PR would the resulting error message be helpful?

TimTaylor · 2023-10-26T12:51:35Z

I may be missing something but if, for instance, someone passed delay_density = dgamma to the current PR would the resulting error message be helpful?

Answering my own question. It would be ok because the underlying function called is delay_density() so the error message is still quite helpful even though it is dgamma erroring ...

Error in delay_density(seq(from = 0, to = nrow(data) - 1L)) : 
  argument "shape" is missing, with no default

TimTaylor · 2023-10-26T13:00:57Z

@pratikunterwegs - although I do still think this is more useful ...

R> cfr_static(ebola1976,  delay_density = dgamma)

Error in cfr_static(ebola1976, delay_density = dgamma) : 
  `delay_density` must be a distribution density function with 1 argument
    evaluating density at a vector of values and returning a numeric vector.
    E.g. function(x) stats::dgamma(x = x, shape = 5, scale = 1)

pratikunterwegs · 2023-10-26T13:04:04Z

Okay thanks - let me have a think about how we can balance the issue of informative erroring with @Bisaloo's point about not having to wrap functions with extra arguments

Bisaloo · 2023-10-31T13:51:24Z

The immediate issue is that the nargs argument in test_function() checks the total number of arguments where we would like to have the number of required arguments / arguments without default value. This is illustrated in this reprex:

library(checkmate)

f1 <- function(x) {
  cat("Hello ", x)
}

f1("world")
#> Hello  world

checkmate::check_function(f1, nargs = 1)
#> [1] TRUE

f2 <- function(x, y = NULL) {
  cat("Hello ", x)
}

# f2 behaves the same as f1 because y has a default value
f2("world")
#> Hello  world

checkmate::check_function(f2, nargs = 1)
#> [1] "Must have exactly 1 formal arguments, but has 2"

^{Created on 2023-10-31 with reprex v2.0.2}

This also means that my initial understanding of this issue was wrong and the real reason why #88 (comment) was failing was not because the function was passed as a symbol.

The immediate fix would be to write a test_function() variant and/or open a feature request in checkmate with a new required_nargs argument.

PR #94 addresses a distinct issue: the type and length of the return value. But it will not change anything to the error message mentioned in #89 (comment).

f <- function(delay_density) {
  
  stopifnot(
    "`delay_density` must be a function evaluating distribution density at a
    vector of values and returning a numeric vector of the same length.
    E.g. function(x) stats::dgamma(x = x, shape = 5, scale = 1)" =
      (checkmate::test_function(delay_density) &&
         checkmate::test_numeric(delay_density(seq(10)),
                                 lower = 0,
                                 any.missing = FALSE, finite = TRUE, len = 10L
         )) || is.null(delay_density)
  )

}

f(dgamma)
#> Error in delay_density(seq(10)): argument "shape" is missing, with no default

^{Created on 2023-10-31 with reprex v2.0.2}

pratikunterwegs · 2023-10-31T14:39:38Z

Thanks - I had misunderstood what each of you had been asking to be fixed.

From @Bisaloo I understood that the change mentioned in #88 (comment) should work. This was IMO fixed in 08f870a, but because that function already has default arguments for shape and scale.

From @TimTaylor I understood that the error message resulting from removing a check on n_args in 08f870a might not be informative, but I didn't see a specific preference there.

I can implement a small test_function() variant for now as I'd prefer to push {cfr} soon, and optionally make a feature request to {checkmate} with this variant as an example.

pratikunterwegs · 2023-10-31T14:53:54Z

An alternative is a suggestion from @sbfnk on Slack to pass the function with distribution parameters contained in ..., which would be added to all functions cfr_*(). See e.g.

f = function(fn, n = seq(10), ...) {
  args = list(...)
  args = c(
    x = list(n),
    args
  )
  do.call(fn, args)
}

f(dgamma, shape = 5, scale = 1)
#>  [1] 0.01532831 0.09022352 0.16803136 0.19536681 0.17546737 0.13385262
#>  [7] 0.09122619 0.05725229 0.03373716 0.01891664

^{Created on 2023-10-31 with reprex v2.0.2}

Bisaloo · 2023-10-31T17:06:27Z

I can implement a small test_function() variant for now as I'd prefer to push {cfr} soon, and optionally make a feature request to {checkmate} with this variant as an example.

Yes, this seems like the best way forward. Note that this is not a blocker for release IMO as it's an enhancement that won't result in breaking changes. Happy to let others chip in if they feel otherwise though.

pratikunterwegs · 2023-10-31T17:15:50Z

Thanks - this has been added and used, see d0ef400 part of #94. Happy to hear alternatives as well.

TimTaylor · 2023-11-01T09:37:01Z

Using the ... will probably work well and would likely be simpler if people wanted to map over different parameters (e.g. shape and scale). You may (later) want to force them to be named and check they are valid within the function.

Is there a benefit to the do.call() approach versus fn(n, ...) (maybe - I'm trying to think at the moment)?

Think you would benefit from handling all of this within the underlying known_outcomes() (or whatever it's currently called) function. Then you could tryCatch() (or the comparable {rlang} call) in the wrapper function to bubble up the error messages. This is probably for later but would dry the input checking out a bit.

pratikunterwegs · 2023-11-01T10:05:33Z

Thanks @TimTaylor - I've gone with the test_function variant approach for now. It resolves the initial issues you and Hugo raised, and is not a user-facing change.

Using the ... will probably work well and would likely be simpler if people wanted to map over different parameters (e.g. shape and scale).

I don't understand this suggestion. Users can already pass a density function with different parameters. Could you show me an example of what you mean?

You may (later) want to force them to be named and check they are valid within the function.

No. This something I would prefer to leave to a dedicted distribution class instead. We do have an entire vignette on how to use this functionality so hopefully mant examples to follow.

Is there a benefit to the do.call() approach versus fn(n, ...) (maybe - I'm trying to think at the moment)?

That would work too, and looks cleaner, but we've decided to go with checking the delay_density arg instead.

Think you would benefit from handling all of this within the underlying known_outcomes() (or whatever it's currently called) function. Then you could tryCatch() (or the comparable {rlang} call) in the wrapper function to bubble up the error messages. This is probably for later but would dry the input checking out a bit.

Yes, that's fair, I'm happy to do that.

TimTaylor · 2023-11-01T10:15:55Z

I don't understand this suggestion. Users can already pass a density function with different parameters. Could you show me an example of what you mean?

.mapply(known_outcomes, dots = list(shape = 1:5, scale = 1:5), MoreArgs = list(data, dgamma))

I'm relatively indifferent to the approach you take but thought this was a pro for the ... approach as one less step I guess.

pratikunterwegs · 2023-11-01T10:22:04Z

I don't understand this suggestion. Users can already pass a density function with different parameters. Could you show me an example of what you mean?
.mapply(known_outcomes, dots = list(shape = 1:5, scale = 1:5), MoreArgs = list(data, dgamma))

Thanks. I do not want to support mapping over parameters. I think iteration over parameters should be a considered step where I do want users to slow down.

pratikunterwegs · 2023-11-01T10:31:47Z

I'm relatively indifferent to the approach you take but thought this was a pro for the ... approach as one less step I guess.

For clarity, I think the use case we have seen in the vignettes is about mapping cfr_*() over different datasets (e.g. countries), rather than over different delay distribution parameters. So I do not think that vectorising over the distribution parameters is likely to be a use case.

pratikunterwegs · 2023-11-01T10:34:10Z

Think you would benefit from handling all of this within the underlying known_outcomes() (or whatever it's currently called) function. Then you could tryCatch() (or the comparable {rlang} call) in the wrapper function to bubble up the error messages. This is probably for later but would dry the input checking out a bit.

I can take the check away from other cfr_*() functions as the error does bubble up as is.

library(cfr)
data = ebola1976
ddens = \(x) dgamma(x, shape = 2.40, scale = 3.33)

cfr_static(data, delay_density = ddens)
#>   severity_mean severity_low severity_high
#> 1         0.959        0.842             1

ddens = \(x, some) dgamma(x, shape = 2.40, scale = 3.33)

cfr_static(data, delay_density = ddens)
#> Error in known_outcomes(data = data, delay_density = delay_density): `delay_density` must be a function with a single required argument,
#>     and evaluating distribution density at a vector of values and returning a
#>     numeric vector of the same length.
#>     E.g. function(x) stats::dgamma(x = x, shape = 5, scale = 1)

^{Created on 2023-11-01 with reprex v2.0.2}

TimTaylor · 2023-11-01T10:42:16Z

For clarity, I think the use case we have seen in the vignettes is about mapping cfr_*() over different datasets (e.g. countries), rather than over different delay distribution parameters. So I do not think that vectorising over the distribution parameters is likely to be a use case.

OK. But returning to:

Thanks. I do not want to support mapping over parameters. I think iteration over parameters should be a considered step where I do want users to slow down.

This is not about you performing the iteration for users but how your interface works with or against it. Users can still map over parameters just with a little more friction. I'm merely trying to highlight pros and cons of either approach. E.g.

funs <- mapply(
    function(shape, scale) \(x) dgamma(x, shape = shape, scale = scale),
    shape = 1:5,
    scale = 1:5
)

lapply(funs, known_outcomes, data = data)

Like I say, overall pretty I'm indifferent to the closure versus ellipsis approach.

pratikunterwegs · 2023-11-01T10:53:35Z

Okay thanks - I do want to work against such iteration. If users want to estimate CFRs based on different parameter sets, that is something I want them to pause and reconsider so this friction is inadvertent but IMO helpful.

pratikunterwegs self-assigned this Oct 25, 2023

pratikunterwegs mentioned this issue Oct 25, 2023

Remove {epiparameter} integration #88

Merged

pratikunterwegs added a commit that referenced this issue Oct 26, 2023

Remove n_args check on delay_density, fixes #89

08f870a

pratikunterwegs closed this as completed in 7dd9c1f Oct 26, 2023

pratikunterwegs reopened this Oct 26, 2023

pratikunterwegs added a commit that referenced this issue Oct 27, 2023

Add more checks on delay_density, fixes #89

b6aefb3

pratikunterwegs mentioned this issue Oct 27, 2023

Checks on delay_density #94

Merged

pratikunterwegs added a commit that referenced this issue Oct 30, 2023

Add more checks on delay_density, fixes #89

67d7832

pratikunterwegs added a commit that referenced this issue Nov 1, 2023

Add more checks on delay_density, fixes #89

e41f9de

pratikunterwegs added a commit that referenced this issue Nov 2, 2023

Add more checks on delay_density, fixes #89

837380f

pratikunterwegs closed this as completed in #94 Nov 6, 2023

pratikunterwegs added a commit that referenced this issue Nov 6, 2023

Add more checks on delay_density, fixes #89

514dbdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Checks on delay_density #89

Checks on delay_density #89

pratikunterwegs commented Oct 25, 2023 •

edited

Loading

Bisaloo commented Oct 25, 2023 •

edited

Loading

pratikunterwegs commented Oct 25, 2023

joshwlambert commented Oct 26, 2023

pratikunterwegs commented Oct 26, 2023

TimTaylor commented Oct 26, 2023 •

edited

Loading

pratikunterwegs commented Oct 26, 2023

pratikunterwegs commented Oct 26, 2023

TimTaylor commented Oct 26, 2023

pratikunterwegs commented Oct 26, 2023

TimTaylor commented Oct 26, 2023 •

edited

Loading

TimTaylor commented Oct 26, 2023

TimTaylor commented Oct 26, 2023

pratikunterwegs commented Oct 26, 2023

Bisaloo commented Oct 31, 2023

pratikunterwegs commented Oct 31, 2023

pratikunterwegs commented Oct 31, 2023

Bisaloo commented Oct 31, 2023

pratikunterwegs commented Oct 31, 2023

TimTaylor commented Nov 1, 2023

pratikunterwegs commented Nov 1, 2023

TimTaylor commented Nov 1, 2023

pratikunterwegs commented Nov 1, 2023 •

edited

Loading

pratikunterwegs commented Nov 1, 2023

pratikunterwegs commented Nov 1, 2023

TimTaylor commented Nov 1, 2023

pratikunterwegs commented Nov 1, 2023

Checks on delay_density #89

Checks on delay_density #89

Comments

pratikunterwegs commented Oct 25, 2023 • edited Loading

Bisaloo commented Oct 25, 2023 • edited Loading

pratikunterwegs commented Oct 25, 2023

joshwlambert commented Oct 26, 2023

pratikunterwegs commented Oct 26, 2023

TimTaylor commented Oct 26, 2023 • edited Loading

pratikunterwegs commented Oct 26, 2023

pratikunterwegs commented Oct 26, 2023

TimTaylor commented Oct 26, 2023

pratikunterwegs commented Oct 26, 2023

TimTaylor commented Oct 26, 2023 • edited Loading

TimTaylor commented Oct 26, 2023

TimTaylor commented Oct 26, 2023

pratikunterwegs commented Oct 26, 2023

Bisaloo commented Oct 31, 2023

pratikunterwegs commented Oct 31, 2023

pratikunterwegs commented Oct 31, 2023

Bisaloo commented Oct 31, 2023

pratikunterwegs commented Oct 31, 2023

TimTaylor commented Nov 1, 2023

pratikunterwegs commented Nov 1, 2023

TimTaylor commented Nov 1, 2023

pratikunterwegs commented Nov 1, 2023 • edited Loading

pratikunterwegs commented Nov 1, 2023

pratikunterwegs commented Nov 1, 2023

TimTaylor commented Nov 1, 2023

pratikunterwegs commented Nov 1, 2023

pratikunterwegs commented Oct 25, 2023 •

edited

Loading

Bisaloo commented Oct 25, 2023 •

edited

Loading

TimTaylor commented Oct 26, 2023 •

edited

Loading

TimTaylor commented Oct 26, 2023 •

edited

Loading

pratikunterwegs commented Nov 1, 2023 •

edited

Loading