Skip to content

Adding new R packages to DHARMA

Florian Hartig edited this page Dec 6, 2022 · 31 revisions

I receive a lot of request for supporting further R packages with DHARMa, so here a few general technical comments regarding this topic:

First of all, if you have an unsupported model, there are two options to create DHARMa residuals plots

  1. Making your package compatible with DHARMa
  2. Simulate new data from a fitted model by hand, and use the createDHARMa function to read those simulations into DHARMa

If feasible, Option 1 obviously preferable, so I'll start with this, followed by the fallback option 2.

Integrating a package with DHARMa

DHARMa interacts with other regression packages through a number of S3 wrapper functions which are available and described in the file compatibility.R.

For each wrapper, there is a default function. These default functions essentially assume that the following functions / options are implemented and work as for glm / lme4.

  • predict(object) - including type = response for GLMs, and ideally using the re.form syntax of lme4 to condition on REs
  • simulate(object) - ideally using the re.form syntax of lme4 to condition on REs
  • coef(object)
  • residuals(object) - implementing raw and Pearson residuals
  • LogLik(object)
  • object$family available, with the family object being implemented in the standard structure
  • model.frame(object), with response being in the 1st column

If your model class implements these points identical to lm/glm/lme4, chances are good that DHARMA will work out of the box, even if a warning is produced.

If that is not the case, you (or I) have to create a new S3 wrapper (e.g. getSimulations.YourClass) to extract simulations from the fitted model using your model syntax. Of course, this assumes that the respective functions / options are implemented by your model class in the first place. Regarding the latter, my strong preference is that:

  • I maintain the DHARMa wrappers, and DHARMa wrappers NEVER perform model-specific calculations, they only deal with with the interface (i.e. variable / function names)
  • If a certain functionality is not included in the original model class, or if the functionality is faulty, I will not fix it on the DHARMa side by writing new code, this needs to be fixed by the developers of the respective package.

The sense of this separation of work is that functions related to a specific package should stay and be maintained with this package, not with DHARMa.

Usually I write the wrapper function myself, it's enough that you provide the S3 functions needed for DHARMa with your model, but of course you can also create a (tested) pull request with all the necessary wrapper functions, which would allow me to integrate your package seamlessly into DHARMa.

Simulating new data by hand and reading this into DHARMa

If your model is not supported by DHARMa, you can still simulate data by hand and read this into DHARMa, allowing you to perform most of the residual checks implemented. This is easiest if your model already has a simulate function implemented, which allows creating new data based on the assumptions and parameters of a fitted statistical model, as in:

fit <- glm(...)
newData <- simulate(fit)

A simulate function is a very useful extension of any advanced regression package, as it allows to perform a number of essential tasks for technical validation, statistical validation, and non-parametric methods such as, among others,

  • Technical validation / unit tests
  • Power analysis
  • Analysis of bias or coverage
  • Calculations of CIs via parametric bootstrap
  • Simulated residuals implemented in DHARMa

If a simulate function is not available, the best option would be to Ask the maintainers of the respective regression package to create a simulate function, or create one yourself and make a pull request.

Note for developers:

  • I recommend thats signature and output format should conform to the existing implementations in glm and lme4, which is simulate(fittedModel, nsim, ...)
  • ... should ideally allow changing the conditioning of the data creation on all hierarchical levels of the model, which includes being able to condition on the REs, as well as possibly on things like zero-inflation etc.

If a simulate function is not available, you could still take out the parameters of the fitted model, and simulate new data according to these parameters. Here an example for a poisson glm:

testData = createData(sampleSize = 50, randomEffectVariance = 0)
fittedModel <- glm(observedResponse ~ Environment1, data = testData, family = "poisson")

# in DHARMA, using the simulate.glm function of glm 
sims = simulateResiduals(fittedModel)
plot(sims, quantreg = FALSE)

# Doing the same with a handcode simulate function. 
# of course this code will only work with a 1-par glm model
simulateMyfit <- function(n=10, fittedModel){
  int = coef(fittedModel)[1]
  slo = coef(fittedModel)[2]
  pred = exp(int + slo * testData$Environment1)
  predSim = replicate(n, rpois(length(pred), pred))
  return(predSim)
}

sims = simulateMyfit(250, fittedModel)

dharmaRes <- createDHARMa(simulatedResponse = sims, observedResponse = testData$observedResponse, fittedPredictedResponse = predict(fittedModel, type = "response"), integer = T)
plot(dharmaRes)

Another example here

Once you have an option to simulate, you can always read in the simulations via createDHARMa, which allows using most options of the DHARMa package.