This is the Biodiversity Information Review and Decision Support package
for R!
NB: BIRDS is an acronym. This packages is not limited to birds' data
(i.e. Aves) π
This set of tools has been developed for systematizing biodiversity data review in order to evaluate whether a set of species observation are fit-for-use and help take decisions upon its use on further analysis.
This R-package was awarded the Third Prize on the 2019 GBIF Ebbe Nielsen Challenge for which it was developed.
The set of tools provided are aimed to review and understand
biodiversity data quality in terms of completeness, and the data
generation process (i.e. the observers' sampling behaviour). The BIRDS
package provides a systematic approach to evaluate biodiversity data β
to enhance reproducibility and facilitate the review of data. The
BIRDS
package intends to provide the data user with knowledge about
sampling effort (amount of effort expended during an event) and data
completeness (data gaps) to help judge whether the data is
representative, valid and fit for the purpose of its intended use β and
hence support for making decisions upon the use and further analysis of
biodiversity data.
The BIRDS
package is most useful for heterogeneous data sets with
variation in the sampling process, i.e. where data have been collected
and reported in variable ways, not conforming to the same sampling
protocol and therefore varying in sampling effort, leading to variation
in data completeness (i.e. how well the reported observations describe
the "true" state). Primary biodiversity data (PBD) combining data from
different data sets, like e.g. GBIF mediated data, commonly vary in the
ways data has been generated - containing opportunistically collected
presence-only data (no sampling protocol, no or inconsistent information
about absences, high sampling variability between observers), and data
sets that have been collected using different sampling protocols. The
set of tools provided by the BIRDS
package is aimed at illuminating
and understanding the process that generated the data (i.e. observing,
recording and reporting species into databases). It does this by a
systematic approach, and providing summaries that inform about sampling
effort and data completeness (or data gaps).
The BIRDS
package is not concerned with data accuracy, which can
be evaluated and improved using other existing packages (as outlined in
the technical
details
vignette), before processing the data using BIRDS
.
The concepts and methods, and examples are described after a short description on how to install this package into R.
This package is now published on CRAN. Therefore the easiest option to
install it is install.packages('BIRDS')
. Else, you can install the
development version directly from GitHub using the package remotes
.
Install remotes
if you have not already installed it
(install.packages('remotes')
):
remotes::install_github('GreenswayAB/BIRDS')
library(BIRDS)
In order to systematize and enhance reproducibility of the review
process for PBD the BIRDS
package takes a systematic approach. With
this package the date are systematically organised and reviewed. This
systematic approach actually starts before using BIRDS
as we suggest
steps and tools for optionally cleaning the data before processing by
BIRDS
. Hence, before using biodiversity data for the intended analysis
start by optionally cleaning the data, then use BIRDS
to organize,
summarize and review the data:
Then, use your review to evaluate sampling effort and data gaps, and to inform decisions about whether the data are fit-for-purpose and how to further analyse the data.
A central concept used by the BIRDS
package is the "visit" β defining
the sampling unit as a sampling event by a unique observer (or group of
observers), at a unique unit of space and time (commonly a day). Visits
can help us to summarize the amount of effort expended in the field.
During a visit, the observer commonly samples (i.e. observes and
records) species by similar methods. The sampling effort can vary among
visits, with the amount of effort expended being greater when spending
more time, and reporting more of the observed species. The same number
of observations (records of species) at a unique unit of time and space
could be made by either few observers reporting many species (greater
effort by each observer) or many observers reporting few species (small
effort by each observer). Using visits as sampling units allows
separation of sampling effort into the effort that can be expressed
through the number of visits by different observers and the effort per
visit (e.g. species list length, or when available the time spent during
a visit). Hence, the quality (completeness) of the data can be judged by
using information for each visit and information from a collection of
visits.
You can examine this in the technical details vignette.
Defined by a unique observer (or group of observers), at a unique unit of space and time visits can be identified by a unique combination of variables: observer id, location, time. Often location is a named unit of space that has been visited during the same sampling event. For example a botanist visiting and reporting species for a meadow, or a bird watcher visiting and reporting species for a lake.
Sometimes locations can be more accurate positions for individuals of species that have been observed and reported during the same field visit. The botanist may have visited the meadow but reported species from a number of different sampling points in that meadow. Or the bird watcher reported species for different parts of the lake. In that case there is no common spatial identifier for the visit.
If there is no common spatial identifier to define the visit extent, and the observer id is not enough to constrain observations spatially (e.g. group of observers from organisation where observer id = organisation name), then visits can be created when overlaying the observation data with the spatial grid. A visit is then defined as all the observations falling into the same grid cell. It is important to keep in mind to choose a grid with a cell size that corresponds to (or at least is not smaller than) the average spatial extent known (or assumed) to be typical for field visits for the reference species group (see below). This process can be repeated with a set of grids with different offset to explore the sensitivity of the results to the size of the grid cells.
You can examine this in the technical details vignette.
Because visits result from the sampling process they can only be defined for a reference species group, i.e. a group of species observed and recorded by similar methods. The rationale for a reference species group is based on the assumption that species groups share similar bias: we assume that, despite varying field skills and accuracy, observers reporting observations for species of a reference species group share similar observer behavior and methods and, hence, generate data with similar sampling bias (Phillips et al. 2009). From this we can assume that the larger the number of visits (or observations) reporting species from the reference group at a specific unit of space and time, the more likely it is that the lack of visits for (or observations of) a particular species reflects the absence of (or failure to detect) a focal species rather than a lack of visits and reports made.
It is important to keep in mind that, to keep the sampling bias consistent, the reference species group should only include species that are assumed to be sampled with the same methodology (Ponder et al. 2001). For example, a reference group should not include all species in the Order Lepidoptera because butterflies sensu stricto (superfamily Papilionoidea) are sampled in very different ways than most other species of Lepidoptera (mainly moths).
The SLL per visit (i.e. the number of species observed and recorded per
visit) is a well known proxy for the time spent in the field and
willingness to report all species seen of a reference taxonomic group,
Szabo et al. 2010). The BIRDS
package therefore uses SLL as a proxy
for sampling effort.
With the BIRDS
' package set of tools PBD can be reviewed based on the
information contained in the visits. Use BIRDS
to organize the data,
summarize and review the data as shown above. The BIRDS
package
organizes the data into a spatially gridded visit-based format, from
which summaries are retrieved for a number of variables describing the
visits across both spatial and temporal dimension. Those variables are
the number of visits, number of species, number of observations, average
species list length per visit, number of units of space and time with
visits. The variables can be used to collectively describe the sampling
effort and data completeness (data gaps), and can be examined spatially
(e.g. viewed on maps) and temporally (e.g. plotted as time series).
Using the detailed information on sampling effort and data completeness
provided by the BIRDS
' package summaries allows better inference on
what the reported species observations mean. As a much of the PBD is
presence-only data the provided information helps us judging to what
degree a lack of observations may be (1) due to the species not being
observed (absent, or failed to detect) or (2) due to a lack of reports
(lack of visits, or lack of reports for observed species) (little
sampling effort). We can be more confident about the first when there is
good sampling effort and data completeness, while evidence is shaky,
i.e. high probability to have missed species, when there is little
sampling effort and data completeness. In this way the user can judge
whether the data is fit-for-purpose for the intended use. Using this
information about how the data has been collected the user can also
decide about how to analyse the data.
It helps you getting πΊοΈ π π π π π‘ about
π π π π π π π π π π€ π πΊ πΈ π¨ π» π π π« π πΌ π π¦ π§ π’ π π π π π π π π‘ π π π³ π¬ π π π π π πͺ
and
π πΈ π· π πΉ π» πΊ π π π πΏ π π΅ π΄ π² π³ π° π± πΌ πΎ
but, maybe not π² π π
Phillips et al. 2009 Sample selection bias and presenceβonly
distribution models: implications for background and pseudoβabsence
data, Ecol Appl 19:181-197.
Ponder et al. 2001 Evaluation of Museum Collection Data for Use in
Biodiversity Assessment, Cons Biol 15:648-657.
Szabo et al. 2010 Regional avian species declines estimated from
volunteerβcollected longβterm data using List Length Analysis, Ecol Appl
20:2157-2169.
You can find an overview of the BIRDS
main components and functions,
organised as an overview workflow
here
and a workflow highlighting the decisions to be taking when using BIRDS
here.
The Intro to BIRDS vignette provides a useful walk through the package tools using an example data set.
A short introductory video can be found here.
We continuously update and improve the BIRDS package. Check the changelog
Check here for a list of future features to be added, and don't hesitate sending your suggestions by e-mail
The development of the BIRDS package is part of a project 'Using opportunistic citizen science data for evaluations of environmental change' financed by the Swedish Research Council Formas.