flexsurvreg handles uncentered and/or high-variance predictors particularly badly #67

dwalsh-css · 2019-10-19T21:23:43Z

I have what I'm sure is a very common use case: a predictor that's a timestamp converted to numeric, in seconds. This creates very large values, which often have very large numeric variance and/or have a mean that's very far from zero. While survival seems to handle this without any issue, flexsurvreg fails to initialize unless I recenter; and even if I recenter but have a scale that's very large, it fails to converge.

MWE:

library(survival)
library(flexsurv)
library(tidyverse)
library(magrittr)
#> 
#> Attaching package: 'magrittr'
#> The following object is masked from 'package:purrr':
#> 
#>     set_names
#> The following object is masked from 'package:tidyr':
#> 
#>     extract

set.seed(2019)

train_data <- tribble(
  ~wait_time,        ~called_yet, ~time_queued,
  131.282999992371, 0,           1570733365.28,
  358.296000003815, 1,           1570733421.187,
  1352.13999986649,  1,           1570733540.923,
  1761.61400008202,  0,           1570733941.343,
  1208.25300002098,  0,           1570734327.11,
  522.296999931335, 1,           1570734376.953,
  241.75,           0,           1570734659.44,
  143.156999826431, 0,           1570734809.673,
  1202.79999995232,  1,           1570734942.907,
  614.640000104904, 1,           1570735526.567
)

survival_model <- survreg(Surv(wait_time, called_yet) ~ time_queued,
                          data = train_data,
                          dist = "weibull")

flexsurv_model <- flexsurvreg(Surv(wait_time, called_yet) ~ time_queued,
                              data = train_data,
                              dist = "weibull")
#> Error in flexsurvreg(Surv(wait_time, called_yet) ~ time_queued, data = train_data, : Initial value for parameter 2 out of range

train_data %<>% mutate_at("time_queued", scale, scale = 1.5e-9)

flexsurv_model <- flexsurvreg(Surv(wait_time, called_yet) ~ time_queued,
                              data = train_data,
                              dist = "weibull")
#> Warning in flexsurvreg(Surv(wait_time, called_yet) ~ time_queued, data
#> = train_data, : Optimisation has probably not converged to the maximum
#> likelihood - Hessian is not positive definite.

^{Created on 2019-10-19 by the reprex package (v0.3.0)}

The text was updated successfully, but these errors were encountered:

chjackson · 2019-10-22T11:48:33Z

Interesting examples! In the first one, the MLE of the Weibull scale parameter (using scale in the dweibull sense not the survreg sense) in this example is exp(4e+05), which is outside the range of numbers that can be represented by R.

survreg works here because everything is done on the log transformed scale: the initial values supplied by users, the optimisation, and the coefficients presented.

Whereas a principle of flexsurv is that the parameters should be interpretable on a natural scale, consistent with those from the d,p,q,r functions in R. It does optimisation on the transformed scale, but the initial values supplied by users, and the results, are all on the natural scale. I suppose it sacrifices a bit of robustness at the edges for more user-accessibility.

For Weibull models, flexsurv actually wraps a survreg fit to get the initial values, then runs its own optimisation just to double check it's converged. So you would expect it to work here. But it fails because it transforms the estimates from survreg to the natural scale, before transforming them back to do the optimisation. I didn't foresee the problem with doing that! Not sure when I'll get the chance to look at a fix though.

In the second case, the MLE of one of the parameters is very small, -4e-13. survreg and flexsurvreg use different optimisation methods, which disagree over whether the optimisation has converged at this value. I don't really know enough about the optimisation methods to judge whether their default settings are appropriate for such small values. The author of survreg coded his own optimisation function, whereas I'm just using off the shelf ones, so I wouldn't be surprised if survreg was correct here.

dwalsh-css · 2019-10-22T11:53:00Z

Interesting, thanks for the explanation!

shishir-909 · 2024-11-15T03:40:50Z

I noticed that this discrepancy with the standard errors between survreg and flexsurvreg occurs when the MLE is too big as well. See my question (and my comment) here on cross-validated here .

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

flexsurvreg handles uncentered and/or high-variance predictors particularly badly #67

flexsurvreg handles uncentered and/or high-variance predictors particularly badly #67

dwalsh-css commented Oct 19, 2019

chjackson commented Oct 22, 2019 •

edited

Loading

dwalsh-css commented Oct 22, 2019

shishir-909 commented Nov 15, 2024

flexsurvreg handles uncentered and/or high-variance predictors particularly badly #67

flexsurvreg handles uncentered and/or high-variance predictors particularly badly #67

Comments

dwalsh-css commented Oct 19, 2019

chjackson commented Oct 22, 2019 • edited Loading

dwalsh-css commented Oct 22, 2019

shishir-909 commented Nov 15, 2024

chjackson commented Oct 22, 2019 •

edited

Loading