v 0.3.0

biodiverse · Mar 29, 2022 · 14e5291 · 14e5291
1 parent 1afd4c6
commit 14e5291
Show file tree

Hide file tree

Showing 62 changed files with 3,273 additions and 423 deletions.
diff --git a/.gitignore b/.gitignore
@@ -10,5 +10,8 @@ _pkgdown.yml
 src/*.o
 src/*.so
 vignettes/modelFitting_cache
+vignettes/modelFitting_files
 vignettes/randomEffects_cache
 vignettes/randomEffects_files
+vignettes/factorModels_files
+vignettes/factorModels_cache
diff --git a/NEWS.md b/NEWS.md
@@ -4,7 +4,7 @@ spOccupancy Version 0.3.0 contains numerous substantial updates that provide new
 
 + Additional functionality for fitting spatial and non-spatial multi-species occupancy models with residual species correlations (i.e., joint species distribution models with imperfect detection). See documentation for `lfMsPGOcc()` and `sfMsPGOcc()`. We also included the functions `lfJSDM()` and `sfJSDM()` which are more typical joint species distribution models that fail to explicitly account for imperfect detection.
 + All single-species and multi-species models allow for unstructured random intercepts in both the occurrence and detection portions of the occupancy model. Prior to this version, random intercepts were not supported in the occurrence portion of spatially-explicit models. 
-+ All `predict()` functions now include the argument `type`, which allows for prediction of detection probability (`type = 'detection'`) at a set of covariate values as well as predictions of occurrence (`type = 'occupancy'`). 
++ `predict()` functions for single-species and multi-species models now include the argument `type`, which allows for prediction of detection probability (`type = 'detection'`) at a set of covariate values as well as predictions of occurrence (`type = 'occupancy'`). 
 + All models are substantially faster than version 0.2.1. We improved performance by implementing a change in how we sample the latent Polya-Gamma variables in the detection component of the model. This results in substantial increases in speed for models where the number of replicates varies across sites. We additionally updated how non-spatial random effects were sampled, which also contributes to improved computational performance.
 + All model fitting functions now include the object `like.samples` in the resulting model object, which contains model likelihood values needed for calculation of WAIC. This leads to much shorter run times for `waicOcc()` compared to previous versions.
 + All `fitted.*()` functions now return both the fitted values and the estimated detection probability samples from a fitted `spOccupancy` model. 

diff --git a/R/PGOcc.R b/R/PGOcc.R
@@ -396,10 +396,10 @@ PGOcc <- function(occ.formula, det.formula, data, inits, priors,
         }
     }   else {
         if (verbose) {	    
-          message("No prior specified for sigma.sq.psi.ig.\nSetting prior shape to 2 and prior scale to 1\n")
+          message("No prior specified for sigma.sq.psi.ig.\nSetting prior shape to 0.1 and prior scale to 0.1\n")
         }
-        sigma.sq.psi.a <- rep(2, p.occ.re)
-        sigma.sq.psi.b <- rep(1, p.occ.re)
+        sigma.sq.psi.a <- rep(0.1, p.occ.re)
+        sigma.sq.psi.b <- rep(0.1, p.occ.re)
       }
     } else {
       sigma.sq.psi.a <- 0

diff --git a/R/lfJSDM.R b/R/lfJSDM.R
@@ -572,7 +572,9 @@ lfJSDM <- function(formula, data, inits, priors,
     colnames(out$beta.star.samples) <- beta.star.names
     out$re.level.names <- re.level.names
   }
+  loadings.names <- paste(rep(sp.names, times = n.factors), rep(1:n.factors, each = N), sep = '-')
   out$lambda.samples <- mcmc(do.call(rbind, lapply(out.tmp, function(a) t(a$lambda.samples))))
+  colnames(out$lambda.samples) <- loadings.names
 
   # Return things back in the original order. 
   out$z.samples <- do.call(abind, lapply(out.tmp, function(a) array(a$z.samples, 

diff --git a/R/lfMsPGOcc.R b/R/lfMsPGOcc.R
@@ -1043,7 +1043,9 @@ lfMsPGOcc <- function(occ.formula, det.formula, data, inits, priors,
     colnames(out$beta.star.samples) <- beta.star.names
     out$re.level.names <- re.level.names
   }
+  loadings.names <- paste(rep(sp.names, times = n.factors), rep(1:n.factors, each = N), sep = '-')
   out$lambda.samples <- mcmc(do.call(rbind, lapply(out.tmp, function(a) t(a$lambda.samples))))
+  colnames(out$lambda.samples) <- loadings.names
 
   # Return things back in the original order. 
   out$z.samples <- do.call(abind, lapply(out.tmp, function(a) array(a$z.samples, 

diff --git a/R/sfJSDM.R b/R/sfJSDM.R
@@ -270,8 +270,6 @@ sfJSDM <- function(formula, data, inits, priors,
   }
   # phi -----------------------------
   coords.D <- iDist(coords)
-  lower.unif <- 3 / max(coords.D)
-  upper.unif <- 3 / sort(unique(c(coords.D)))[2]
   # Get distance matrix which is used if priors are not specified
   if ("phi.unif" %in% names(priors)) {
     if (!is.list(priors$phi.unif) | length(priors$phi.unif) != 2) {
@@ -811,7 +809,9 @@ sfJSDM <- function(formula, data, inits, priors,
       colnames(out$beta.star.samples) <- beta.star.names
       out$re.level.names <- re.level.names
     }
+    loadings.names <- paste(rep(sp.names, times = n.factors), rep(1:n.factors, each = N), sep = '-')
     out$lambda.samples <- mcmc(do.call(rbind, lapply(out.tmp, function(a) t(a$lambda.samples))))
+    colnames(out$lambda.samples) <- loadings.names
     out$theta.samples <- mcmc(do.call(rbind, lapply(out.tmp, function(a) t(a$theta.samples))))
     if (cov.model != 'matern') {
       theta.names <- paste(rep(c('phi'), each = q), 1:q, sep = '-')

diff --git a/R/sfMsPGOcc.R b/R/sfMsPGOcc.R
@@ -496,8 +496,6 @@ sfMsPGOcc <- function(occ.formula, det.formula, data, inits, priors,
 
   # phi -----------------------------
   coords.D <- iDist(coords)
-  lower.unif <- 3 / max(coords.D)
-  upper.unif <- 3 / sort(unique(c(coords.D)))[2]
   # Get distance matrix which is used if priors are not specified
   if ("phi.unif" %in% names(priors)) {
     if (!is.list(priors$phi.unif) | length(priors$phi.unif) != 2) {
@@ -1298,7 +1296,9 @@ sfMsPGOcc <- function(occ.formula, det.formula, data, inits, priors,
       colnames(out$beta.star.samples) <- beta.star.names
       out$re.level.names <- re.level.names
     }
+    loadings.names <- paste(rep(sp.names, times = n.factors), rep(1:n.factors, each = N), sep = '-')
     out$lambda.samples <- mcmc(do.call(rbind, lapply(out.tmp, function(a) t(a$lambda.samples))))
+    colnames(out$lambda.samples) <- loadings.names
     out$theta.samples <- mcmc(do.call(rbind, lapply(out.tmp, function(a) t(a$theta.samples))))
     if (cov.model != 'matern') {
       theta.names <- paste(rep(c('phi'), each = q), 1:q, sep = '-')

diff --git a/R/simMsOcc.R b/R/simMsOcc.R
@@ -286,7 +286,7 @@ simMsOcc <- function(J.x, J.y, n.rep, N, beta, alpha, psi.RE = list(),
   psi <- matrix(NA, nrow = N, ncol = J)
   z <- matrix(NA, nrow = N, ncol = J)
   for (i in 1:N) {
-    if (sp) {
+    if (sp | factor.model) {
       if (length(psi.RE) > 0) {
         psi[i, ] <- logit.inv(X %*% as.matrix(beta[i, ]) + w.star[i, ] + beta.star.sites[i, ])
       } else {

diff --git a/R/spIntPGOcc.R b/R/spIntPGOcc.R
@@ -405,10 +405,10 @@ spIntPGOcc <- function(occ.formula, det.formula, data, inits, priors,
       sigma.sq.b <- priors$sigma.sq.ig[2]
     } else {
       if (verbose) {
-        message("No prior specified for sigma.sq.ig.\nSetting the shape and scale parameters to 2.\n")
+        message("No prior specified for sigma.sq.ig.\nSetting the shape parameter to 2 and scale parameter to 1.\n")
       }
       sigma.sq.a <- 2
-      sigma.sq.b <- 2
+      sigma.sq.b <- 1
     }
     # nu -----------------------------
     if (cov.model == 'matern') {

diff --git a/R/spMsPGOcc.R b/R/spMsPGOcc.R
@@ -491,8 +491,6 @@ spMsPGOcc <- function(occ.formula, det.formula, data, inits, priors,
 
   # phi -----------------------------
   coords.D <- iDist(coords)
-  lower.unif <- 3 / max(coords.D)
-  upper.unif <- 3 / sort(unique(c(coords.D)))[2]
   # Get distance matrix which is used if priors are not specified
   if ("phi.unif" %in% names(priors)) {
     if (!is.list(priors$phi.unif) | length(priors$phi.unif) != 2) {
@@ -545,10 +543,10 @@ spMsPGOcc <- function(occ.formula, det.formula, data, inits, priors,
     }
   } else {
     if (verbose) {
-      message("No prior specified for sigma.sq.ig.\nSetting the shape and scale parameters to 2.\n")
+      message("No prior specified for sigma.sq.ig.\nSetting the shape parameter to 2 and scale parameter to 1.\n")
     }
     sigma.sq.a <- rep(2, N)
-    sigma.sq.b <- rep(2, N)
+    sigma.sq.b <- rep(1, N)
   }
 
   # nu -----------------------------

diff --git a/R/spPGOcc.R b/R/spPGOcc.R
@@ -400,8 +400,6 @@ spPGOcc <- function(occ.formula, det.formula, data, inits, priors,
   # phi -----------------------------
   # Get distance matrix which is used if priors are not specified
   coords.D <- iDist(coords)
-  lower.unif <- 3 / max(coords.D)
-  upper.unif <- 3 / sort(unique(c(coords.D)))[2]
   if ("phi.unif" %in% names(priors)) {
     if (!is.vector(priors$phi.unif) | !is.atomic(priors$phi.unif) | length(priors$phi.unif) != 2) {
       stop("error: phi.unif must be a vector of length 2 with elements corresponding to phi's lower and upper bounds")
@@ -424,10 +422,10 @@ spPGOcc <- function(occ.formula, det.formula, data, inits, priors,
     sigma.sq.b <- priors$sigma.sq.ig[2]
   } else {
     if (verbose) {
-      message("No prior specified for sigma.sq.ig.\nSetting the shape and scale parameters to 2.\n")
+      message("No prior specified for sigma.sq.ig.\nSetting the shape parameter to 2 and scale parameter to 1.\n")
     }
     sigma.sq.a <- 2
-    sigma.sq.b <- 2
+    sigma.sq.b <- 1
   }
   # nu -----------------------------
   if (cov.model == 'matern') {
@@ -478,10 +476,10 @@ spPGOcc <- function(occ.formula, det.formula, data, inits, priors,
       }
   }   else {
       if (verbose) {	    
-        message("No prior specified for sigma.sq.psi.ig.\nSetting prior shape to 2 and prior scale to 1\n")
+        message("No prior specified for sigma.sq.psi.ig.\nSetting prior shape to 0.1 and prior scale to 0.1\n")
       }
-      sigma.sq.psi.a <- rep(2, p.occ.re)
-      sigma.sq.psi.b <- rep(1, p.occ.re)
+      sigma.sq.psi.a <- rep(0.1, p.occ.re)
+      sigma.sq.psi.b <- rep(0.1, p.occ.re)
     }
   } else {
     sigma.sq.psi.a <- 0

diff --git a/README.Rmd b/README.Rmd
@@ -23,7 +23,7 @@ cat(
 )
 ```
 
-spOccupancy fits single-species, multi-species, and integrated spatial occupancy models using Markov Chain Monte Carlo (MCMC). Models are fit using P&oacute;ly-Gamma data augmentation. Spatial models are fit using either Gaussian processes or Nearest Neighbor Gaussian Processes (NNGP) for large spatial datasets. The package provides functionality for data integration of multiple single-species occupancy data sets using a joint likelihood framework. Below we provide a very brief introduction to some of the package's functionality, and illustrate just one of the model fitting funcitons. For more information, see the resources referenced at the bottom of this page. 
+spOccupancy fits single-species, multi-species, and integrated spatial occupancy models using Markov Chain Monte Carlo (MCMC). Models are fit using P&oacute;ly-Gamma data augmentation. Spatial models are fit using either Gaussian processes or Nearest Neighbor Gaussian Processes (NNGP) for large spatial datasets. The package provides functionality for data integration of multiple single-species occupancy data sets using a joint likelihood framework. For multi-species models, spOccupancy provides functions to account for residual species correlations in a joint species distribution model framework while accounting for imperfect detection. Below we provide a very brief introduction to some of the package's functionality, and illustrate just one of the model fitting funcitons. For more information, see the resources referenced at the bottom of this page. 
 
 ## Installation
 
@@ -35,19 +35,23 @@ install.packages("spOccupancy")
 
 ## Functionality
 
-|`spOccupancy` Function  | Description                                                   |
-|---------------- |----------------------------------------------------------------------|
-|`PGOcc`          | Single-species occupancy model                                       |
-|`spPGOcc`        | Single-species spatial occupancy model                               |
-|`intPGOcc`       | Single-species occupancy model with multiple data sources            |
-|`spIntPGOcc`     | Single-species spatial occupancy model with multiple data sources    |
-|`msPGOcc`        | Multi-species occupancy model                                        |
-|`spMsPGOcc`      | Multi-species spatial occupancy model                                |
-|`ppcOcc`         | Posterior predictive check using Bayesian p-values                   | 
-|`waicOcc`        | Compute Widely Applicable Information Criterion (WAIC)               |
-|`simOcc`         | Simulate single-species occupancy data                               |
-|`simMsOcc`       | Simulate multi-species occupancy data                                |
-|`simIntOcc`      | Simulate single-species occupancy data from multiple data sources
+|`spOccupancy` Function  | Description                                                     |
+|---------------- |------------------------------------------------------------------------|
+|`PGOcc()`          | Single-species occupancy model                                       |
+|`spPGOcc()`        | Single-species spatial occupancy model                               |
+|`intPGOcc()`       | Single-species occupancy model with multiple data sources            |
+|`spIntPGOcc()`     | Single-species spatial occupancy model with multiple data sources    |
+|`msPGOcc()`        | Multi-species occupancy model                                        |
+|`spMsPGOcc()`      | Multi-species spatial occupancy model                                |
+|`lfJSDM()`         | Joint species distribution model without imperfect detection         |
+|`sfJSDM()`         | Spatial joint species distribution model without imperfect detection |
+|`lfMsPGOcc()`      | Multi-species occupancy model with species correlations              |
+|`sfMsPGOcc()`      | Multi-species spatial occupancy model with species correlations      |
+|`ppcOcc()`         | Posterior predictive check using Bayesian p-values                   | 
+|`waicOcc()`        | Compute Widely Applicable Information Criterion (WAIC)               |
+|`simOcc()`         | Simulate single-species occupancy data                               |
+|`simMsOcc()`       | Simulate multi-species occupancy data                                |
+|`simIntOcc()`      | Simulate single-species occupancy data from multiple data sources    |
 
 ## Example usage
 
@@ -128,9 +132,11 @@ out.pred <- predict(out, X.0, coords.0, verbose = FALSE)
 
 ## Learn more
 
-The `vignette("modelFitting")` provides a more detailed description and tutorial of all functions in `spOccupancy`. For full statistical details on the MCMC samplers used in `spOccupancy`, see `vignette("mcmcSamplers")`. In addition, see [our recent paper](https://arxiv.org/abs/2111.12163) that describes the package in more detail (Doser et al. 2021). 
+The `vignette("modelFitting")` provides a more detailed description and tutorial of the core functions in `spOccupancy`. For full statistical details on the MCMC samplers for core functions in `spOccupancy`, see `vignette("mcmcSamplers")`. In addition, see [our recent paper](https://arxiv.org/abs/2111.12163) that describes the package in more detail (Doser et al. 2021). For a detailed description and tutorial of joint species distribution models in `spOccupancy` that account for residual species correlations, see `vignette("factorModels")`, as well as `vignette("mcmcFactorModels")` for full statistical details.
 
 ## References
 
-Doser, J. W., Finley, A. O., Kéry, M., and Zipkin, E. F. (2021a). spOccupancy: An R package for single-species, multi-species, and integrated spatial occupancy models. arXiv preprint arxiv:2111.12163.
+Doser, J. W., Finley, A. O., Kéry, M., and Zipkin, E. F. (2021). spOccupancy: An R package for single-species, multi-species, and integrated spatial occupancy models. arXiv preprint arxiv:2111.12163.
+
+Doser, J. W., Finley, A. O., Banerjee, S. (2022) Joint species distribution models with imperfect detection for high-dimensional spatial data. In prep.