Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Splines not supported by standsurv without specifying newdata #167

Open
markdanese opened this issue Jul 21, 2023 · 2 comments
Open

Splines not supported by standsurv without specifying newdata #167

markdanese opened this issue Jul 21, 2023 · 2 comments

Comments

@markdanese
Copy link

I really appreciate this package. It makes things much easier, particularly with regard to generating causal contrasts and getting reasonable variance estimates.

I ran into an issue trying to get standsurv() to work when using a natural spline from the splines package. In this case it was age as a predictor in a model of time to death (in lung cancer). When age was used as a simple continuous variable, standsurv() worked fine without needing to specify the data set. When I changing to a natural spline (ns(age, 2)) to handle some non-linearity in increasing risk with age, I got the error that it could not find the variable age. Helpfully, the error message suggested I should specify "newdata".

I noted that the model object includes the transformed age (i.e., in this case with 2 spline terms), so the error makes sense -- age is not there. And when I specified newdata = the original dataset, it worked without an error.

I am guessing that the predict function (which I think is part of summary()) isn't built for this use case. I tried to see how to work around this and suggest a code change, but I couldn't find anything helpful.

The simple workaround is to explicitly specify the original dataset, so it is not a critical issue. However, I wanted to put this out there in case anyone runs into this.

@chjackson
Copy link
Owner

Thanks for the report. The default newdata that the flexsurvreg predict method uses is the "model frame"
that is created in this line of flexsurvreg.R. When run with a ns() formula, this line seems to put the basis variables into the model frame, rather than the original covariate values that we want. I haven't used ns and the like, so I can't see a quick fix. I will leave this open.

@chjackson
Copy link
Owner

This is proving tricky to handle. The function stats::get_all_vars seems like it would be useful here, as it is designed to extract the original variables supplied to a formula, whereas stats::model.frame extracts the transformed versions. However get_all_vars fails in cases where the formula contains a data frame look-up, e.g. compare

get_all_vars(ovarian$futime ~ 1, data=NULL) # fails
model.frame(ovarian$futime ~ 1, data=NULL) # works

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants