Colin J. Carlson (Georgetown University); February 2020
How do I install?
# install.packages('devtools')
devtools::install_github('cjcarlson/embarcadero')
Before you do that, you need the "velox" package, which is currently not on CRAN (that should be fixed soon). You can do that one of a couple ways:
remotes::install_github("hunzikp/velox@master")
Or,
devtools::install_version("velox", version = "0.2.0")
If you have additional issues with velox install on Mac, it might be because of some underlying issues with your C compiler, the current version of Rcpp (which has a known issue), and/or GDAL. If you leave me an issue I can try to help you figure it out. Here are some posts that were helpful when we recently had some of these issues.
What's BART?
Bayesian additive regression trees (BARTs) are an exciting alternative to other popular classification tree methods being used in ecology, like random forests or boosted regression trees. Whereas boosted regression trees fit an ensemble of trees each explaining smaller fractions of variance, BART starts by fitting a sum-of-trees model and then uses Bayesian "backfitting" with an MCMC algorithm to create a posterior draw.
Why BART?
BART does well with model-free variable selection and handles irrelevant predictors well; the Bayesian aspect also comes with perks, like posterior distributions on predictions (without having to bootstrap, like you do with BRTs) and automated confidence intervals on partial dependency plots. So far, it also seems to avoid the problem BRTs have for niche modeling: with no automated selection of tree depth or tree complexity, BRTs tend to overfit especially with randomly-generated pseudoabsences and ensembling over subsets. In previous "bake-offs," BART commonly outperforms other methods, including comparable classification tree methods like random forests and boosted regression trees.
Are BARTs new to ecology?
No, there is nothing new under the sun. Yen et al. used BARTs to examine habitat selection of birds back in 2011. But so far, no, no one is using them for species distribution modeling (as far as I can tell).
What does embarcadero do?
This package is basically a wrapper around 'dbarts' with a few tools
- basic model summary statistics and diagnostics
- spatial prediction with raster data
- credible interval draws from the posterior distribution
- visualization of how posterior draws learn over time
- variable importance measures and plots
- stepwise variable elimination
- automatic Nice Plots for partials, including multiple ways to visualize posterior draws
- spatial projection of partials ("spartials")
- compatibility with random intercept BART models (riBART)
- plots for random intercepts
In future versions I'd hope to include compatibility with:
- explicitly-spatial adaptations of BART (spatial priors)
- compatibility with smoothed BART models (softBART) and sparse BART models with Dirichlet priors (DART)
How do I learn to use embarcadero?
The paper at Methods in Ecology and Evolution includes a bunch of code, and there's a new vignette included in the package that runs through all of the basic functionality using a virtual species. I've migrated the older, more advanced vignette - which is over 30 pages and takes a good 3 hours to run on an older machine - to another repo (cjcarlson/pier39) and am in the process of updating it to be much clearer.
How do I cite embarcadero?
The embarcadero package is published at Methods in Ecology and Evolution!
Can I help?
Please! Reach out to colin.carlson@georgetown.edu if you want to help with development or otherwise are interested in being one of the first users.
Why is it called embarcadero?
Because of Embarcadero BART station, and because I'm homesick for Humphry Slocombe.