-
Notifications
You must be signed in to change notification settings - Fork 1
/
Meier_etal_Appendix_S2.Rmd
1074 lines (797 loc) · 49.9 KB
/
Meier_etal_Appendix_S2.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "Ecosphere - Spatial and temporal sampling strategy connecting NEON Terrestrial Observation System protocols"
subtitle: "Appendix S2: Example linking NEON TOS Coarse Downed Wood volume and Small Mammal abundance data at plot and site scales"
author: "Courtney L. Meier, Katherine M. Thibault, and David T. Barnett"
date: "`r format(Sys.Date(), format = '%B %d, %Y')`"
output:
pdf_document:
toc: yes
latex_engine: xelatex
html_document:
toc: true
toc_depth: 3
theme: united
highlight: tango
---
```{r setup, include=FALSE, message=FALSE, warning=FALSE}
knitr::opts_chunk$set(echo = TRUE)
## Load required CRAN packages
# devtools: Load packages from Github not available on CRAN
# knitr: Data table display tools for pdf output
# neonOS: Functions for basic quality checks of NEON Observation System data; e.g., duplicate checks
# neonUtilities: Tools to retrieve data from the NEON Portal
cranPackages <- c("tidyverse",
"devtools",
"data.table",
"knitr",
"neonOS",
"neonUtilities")
# Load packages if installed, install and load if not installed
sapply(
cranPackages,
FUN = function(x) {
if (x %in% installed.packages()) {
library(x, character.only = TRUE)
} else {
install.packages(x, dependencies = TRUE)
library(x, character.only = TRUE)
}
}
)
## Load required packages from Github
# geoNEON: Required to programmatically access TOS Spatial Data
gitPackages <- c("geoNEON")
# Load packages if installed, install from Git and load if not installed
sapply(
gitPackages,
FUN = function(x) {
if (x %in% installed.packages()) {
library(x, character.only = TRUE)
} else if (x == "geoNEON") {
devtools::install_github('NEONScience/NEON-geolocation/geoNEON', dependencies = TRUE)
library(x, character.only = TRUE)
}
}
)
```
## Introduction
The NEON Terrestrial Observation System (TOS) data products are frequently spatially integrated at multiple scales, from the site level to the level of relatively small 'sampling cells' that occur within plots (see Supporting Information File 1, Table 3). In addition, the frequency with which TOS data products are generated at each site varies widely, from multiple bouts per year to once every 5 years. This example is a companion piece to our manuscript, _"Spatial and temporal sampling strategy connecting NEON Terrestrial Observation System protocols,"_ and the purpose is to illustrate how a NEON data user can draw scientific insight by working with two TOS data products collected at different spatial scales and at different temporal frequencies.
Here, we focus on the 'Coarse downed wood log survey' data product (DP1.10010.001; generated every 5-years) and the 'Small mammal box trapping' data product (DP1.10072.001; generated multiple times per year). Using R software (https://www.r-project.org/) to download data from the NEON API via the `neonUtilities` package (https://cran.r-project.org/web/packages/neonUtilities/), we show how to link data at both the plot and site spatial scales via the `plotID`. We also develop one method for comparing TOS data across products generated with very different temporal frequencies. For this example, we chose to investigate correlations between 'Coarse downed wood log survey' data and 'Small mammal box trapping' data based on the simple hypothesis that coarse downed wood (CDW) provides habitat for small mammals (MAM) in forested systems, and increasing CDW volume may therefore be correlated with increasing MAM abundance, both within and among sites. Briefly, we first identify colocated mammal grids and Distributed base plots that produce CDW data within a time-frame of interest. Next we generate plot-specific CDW volume estimates, and plot-specific MAM abundance estimates from each bout of mammal sampling within a 3-year window centered on the year that CDW was sampled. We then take the plot-specific maximum MAM abundance from all bouts within the 3-year window and compare with plot-level CDW volume data.
For the code chunks throughout this document, we preface each with a brief summary of the objectives. In addition, within each chunk we employ comments prefaced with the '`###`' symbols to document the purpose of individual lines of code. The `#-->` symbol denotes an explanation of example code output. A single '`#`' denotes code that has been commented out (e.g., lengthy API calls that need only be run once).
## Load data from the NEON Data Portal
- When retrieving data from the NEON API using the `neonUtilities` functions (see code chunks below), use of the `token` argument allows authentication with a user-specific API token. Use of a token is not required, but tokens provide faster download speeds and help NEON to measure data use for reporting to the NSF.
- All TOS data products, including both the 'Coarse downed wood log survey' and 'Small mammal box trapping' data products, may be affected by site management practices and/or natural disturbances. NEON publishes site management data via the 'Site management and event reporting' product (DP1.10111.001; https://data.neonscience.org/data-products/DP1.10111.001). We do not explore the 'Site management and event reporting' product further in this example. However, disturbance and management events like hurricanes, wildfires, and controlled burns can clearly affect both CDW volume and MAM abundance.
### Load Coarse Downed Wood Tally Data
Note: The word `'log'` is used frequently in the Coarse Downed Wood dataset, and it refers to a downed piece of wood not a logarithm.
At sites with sufficient logs >= 2 cm diameter, the 'Coarse downed wood log survey' data product, hereafter referred to as 'CDW Tally', is generated from both Distributed and Tower plots on a 5 year interval. However, sampling of Distributed and Tower plots is staggered through time such that each plot type is sampled every 2-3 years within a site on an alternating basis. Distributed plots are allocated across each site in proportion to the area of dominant National Land Cover Database (NLCD) Vegetation Classes, and these plots allow creating statistically robust site-level estimates of both CDW volume (derived from tallies) and small mammal density. The Small Mammal (MAM) data product is generated annually from most TOS sites, and MAM data are collected from grids that are typically colocated with Distributed base plots but not Tower plots. Thus, the first tasks for this example are:
- To identify which NEON sites have produced CDW Tally data.
- To identify several forested sites with recent Distributed plot data for the example, and
- To identify which Distributed plotIDs are colocated with MAM sampling.
The NEON 'Coarse downed wood log survey' data product citation is:
- NEON (National Ecological Observatory Network). Coarse downed wood log survey, RELEASE-2022 (DP1.10010.001). https://doi.org/10.48443/54dd-9407. Dataset accessed from https://data.neonscience.org on `r format(Sys.Date(), format = "%B %d, %Y")`
In the code chunk below, we complete the following:
- Use the `neonUtilities::loadByProduct()` function to programmatically retrieve CDW data using the NEON Data Portal API (commented out), save these data locally (commented out), then read the saved data into R. This approach allows potentially lengthy API calls to be made only once.
- Summarize data to identify 5-6 sites with recent CDW Tally data from Distributed plots that can be paired with MAM data. For demonstration purposes, the goal is to choose a small number of forested sites that span a range of habitats.
- Within each selected site, identify plotIDs in the CDW dataset that are colocated with MAM sampling grids. We use the `geoNEON::getLocByName()` function and the common structure of the `namedLocation` variable across TOS products to identify which CDW sampling locations are also MAM sampling locations.
``` {r cdwLoadData, results=FALSE, message=FALSE}
### Retrieve CDW Tally data for last 6 years from NEON Data Portal for all sites and dates
# cdwDP <- neonUtilities::loadByProduct(
# dpID = "DP1.10010.001",
# site = "all",
# startdate = "2016-01",
# enddate = "2021-12",
# package = "basic",
# release = "RELEASE-2022",
# tabl = "all",
# check.size = FALSE,
# token = Sys.getenv('NEON_PAT')
# )
### Extract the field tally data from the download list, and save for quicker read-in
# cdw <- cdwDP$cdw_fieldtally
# saveRDS(cdw, file = "cdw_fieldTally_2016-2021.RDS")
### Read in data saved via neonUtilities code chunk above, then filter to
### Distributed plots and identify plotIDs within scheduled sampling events that
### have generated data. Records with samplingImpractical != "OK" have no data.
cdw <- readRDS(file = "cdw_fieldTally_2016-2021.RDS")
summaryDist <- cdw %>%
dplyr::filter(plotType == "distributed") %>%
dplyr::group_by(domainID, siteID, eventID, samplingImpractical) %>%
dplyr::summarise(plotCount = length(unique(plotID)))
### From 'summaryDist' output, select 6 sites with closed-canopy forest; sites
### are selected based on authors' knowledge of the sites.
theSites <- c("HARV","JERC","STEI","UKFS","RMNP","ABBY")
### Filter CDW data to selected sites, filter to most recent sampling eventID,
### then remove unneeded columns.
tempCDW <- cdw %>%
dplyr::filter(plotType == "distributed",
siteID %in% theSites) %>%
dplyr::group_by(domainID, siteID) %>%
dplyr::filter(yearBoutBegan == max(yearBoutBegan)) %>%
dplyr::arrange(domainID, siteID, plotID) %>%
dplyr::ungroup() %>%
dplyr::select(-uid,
-coordinateUncertainty,
-elevationUncertainty,
-publicationDate)
### Identify plotIDs in CDW dataset that are colocated with MAM sampling grids.
### First, create MAM 'namedLocation' value for each CDW namedLocation value
### (e.g., 'HARV_001.mammalGrid.mam'); the goal is to check whether each CDW
### namedLocation has a colocated MAM namedLocation. The structure of the namedLocation
### value for any NEON data product can be determined by downloading data and checking
### the namedLocation field.
tempCDW <- tempCDW %>%
dplyr::mutate(
mamNamedLoc = stringr::str_replace(
string = namedLocation,
pattern = "basePlot.cdw",
replacement = "mammalGrid.mam"
),
.after = namedLocation
) %>%
as.data.frame()
### Second, retrieve namedLocation data for each mamNamedLoc value that was constructed
### above; if a CDW namedLocation is also a MAM namedLocation, the
### geoNEON::getLocByName() function will return a value from the NEON API; if a CDW
### namedLocation is not also a MAM namedLocation, the function does not return a value.
### Note: The 'data' argument must be a data.frame() and CANNOT be a tibble (the latter
### is commonly returned by dplyr).
# mamPlots <- geoNEON::getLocByName(
# data = tempCDW,
# locCol = "mamNamedLoc",
# locOnly = TRUE,
# token = Sys.getenv('NEON_PAT')
# )
### Save data locally for quicker subsequent read-in
# saveRDS(mamPlots, file = "cdw_mam_collocation.RDS")
### Read in previously saved MAM location data
mamPlots <- readRDS(file = "cdw_mam_collocation.RDS")
### Third, join tempCDW data with mamPlots and use 'subtype' column to identify
### colocated sampling.
tempCDW <- tempCDW %>%
dplyr::left_join(mamPlots %>%
select(plotID, subtype),
by = "plotID") %>%
dplyr::select(-mamNamedLoc)
```
As an aside, rather than programmatically retrieving TOS Spatial Data via the API with the `geoNEON` functions, as is done in the code chunk above, these spatial data may also be manually downloaded:
- First, navigate to https://data.neonscience.org/documents
- Click on 'Spatial Data' and click on the 'All_NEON_TOS_Plots_VX' link to retrieve a zip that contains a .csv with plot data.
- If this approached is used, one can then read in the included .csv file.
- For example: `tosPlots <- read.csv(file = "All_NEON_TOS_Plot_Centroids_V8.csv", header = TRUE)`
### Load Small Mammal Data
Small Mammal sampling grids (n=3 to n=8 per site) are colocated with Distributed base plots that support multiple TOS protocols (including Coarse Downed Wood), and grids are sampled one to several times per year (ideally, 4-6 times per year). Similar to Distributed base plots, mammal grids are allocated using a stratified random design informed by NLCD Vegetation Class.
The NEON 'Small mammal box trapping' data product citation is:
- NEON (National Ecological Observatory Network). Small mammal box trapping, RELEASE-2022 (DP1.10072.001). https://doi.org/10.48443/h3dk-3a71. Dataset accessed from https://data.neonscience.org on `r format(Sys.Date(), format = "%B %d, %Y")`
The code chunk below accomplishes the following:
- Use the `neonUtilities::loadByProduct()` function to programmatically retrieve MAM data using the NEON Data Portal API (commented out), save these data locally (commented out), then read the saved data into R.
- Retrieve the MAM master taxon table using the NEON API.
``` {r mamLoadData, results=FALSE, message=FALSE, warning=FALSE}
### Retrieve small mammal box trapping data for same sites and date range as CDW
# mamDP <- neonUtilities::loadByProduct(
# dpID = "DP1.10072.001",
# site = theSites, #--> same sites identified for CDW
# startdate = "2016-01",
# enddate = "2021-12",
# package = "basic",
# release = "RELEASE-2022",
# tabl = "all",
# check.size = FALSE,
# token = Sys.getenv('NEON_PAT')
# )
### Save to RDS format for quicker read-in
# saveRDS(mamDP, file = "mam_allTables_2016-2021.RDS")
### Read in stored MAM data
mamDP <- readRDS(file = "mam_allTables_2016-2021.RDS")
### Convert list tables to dataframes in the Global environment
list2env(mamDP, envir=.GlobalEnv)
### Read in master SMALL_MAMMAL taxon table via the NEON API
### https://www.neonscience.org/resources/learning-hub/tutorials/neon-api-usage
### using the verbose option to get the taxonProtocolCategory field
mam.req <- httr::GET(
paste0(
"https://data.neonscience.org/api/v0/taxonomy/?",
"taxonTypeCode=SMALL_MAMMAL&verbose=true&offset=0&limit=1000"
)
)
mam.list <- jsonlite::fromJSON(httr::content(mam.req, as = "text"))
```
## Data QC Checks
### Coarse Downed Wood Tally QC Checks
For CDW tally data, the code chunk below performs the following checks:
- Check for duplicates. Removing duplicates is important because log counts are used to estimate CDW volume. As of this writing, we do not use the standardized `neonOS::removeDups()` function to identify CDW duplicates because the primary key in the data needs an update for the function to identify duplicates as intended.
- Remove any tallied logs with a `logDistance` greater than the protocol-specified transect length, and
- Verify that all tallied logs meet 'limiting distance' tally criteria based on reported `equivalentLogDiameter`, `logDistance`, and `volumeFactor` data (see Affleck 2010 for details).
- The `equivalentLogDiameter` is used so that round and elliptical logs (typically highly decayed) can be assessed with the same criteria. Logs with `logDistance` greater than the limiting distance are removed from the dataset.
- Because technicians sometimes use look-up tables to determine limiting distance from a subset of `equivalentLogDiameter` values (e.g., when digital ingest devices fail), and look-up tables require choosing the nearest `equivalentLogDiameter` rather than using the actual measured value, it is expected that some logs will have `logDistance` greater than the calculated limiting distance and will be removed.
``` {r cdwDataQC, results=FALSE, message=FALSE}
### Check for individualID duplicates based on the 'individualID'. First, construct
### an individualID for logs < 10 cm diameter that are not tagged; smaller logs are
### not tagged and are given a unique temporary logID beginning with 'L' since they
### are less likely to persist over the 5-year CDW Tally measurement interval. Then,
### construct a primary key from plotID, date, lidsAzimuth, and individualID.
tempCDW <- tempCDW %>%
dplyr::mutate(
individualID = case_when(
is.na(individualID) &
targetTaxaPresent == "Y" ~ paste("NEON.CDW", domainID, plotID, logID, sep = "."),
TRUE ~ individualID
),
key = paste(plotID, date, lidsAzimuth, individualID, sep = "_")
)
### Identify duplicates ('dups') based on the 'key' value
cdwDupsKey <- tempCDW %>%
dplyr::filter(duplicated(tempCDW$key)) %>%
dplyr::select(key)
cdwDups <- tempCDW %>%
dplyr::filter(key %in% cdwDupsKey$key)
#--> Dups results: Only first one listed appears to be a real duplicate based on
#--> examination of other data fields; ABBY records are mis-indentified by the
#--> function as 'dups' because transect was reflected (i.e., not real duplicates).
### Remove second instance of cdwDups$key==
### "STEI_002_2016-05-26_320_NEON.CDW.D05.00375"
cdwRemoveDup <- cdwDups %>%
dplyr::distinct(key)
tempCDW <- tempCDW %>%
dplyr::filter(!key %in% cdwRemoveDup$key[1]) %>%
dplyr::bind_rows(cdwDups[1, ]) %>%
dplyr::arrange(domainID, siteID, plotID, lidsAzimuth, individualID)
### Volume Factor check: Ensure recorded value matches protocol and flag if no match.
### First, read in CDW volumeFactor and transectLength look-up table derived from
### protocol (NEON.DOC.001711). Note that RMNP has different F-values and transect lengths
### for Distributed and Tower plots due to distant parcels of land used for each plot type
### and different forest types in each parcel. The RMNP values in this table are specific
### to Distributed plots since we wish to compare data with Small Mammal data.
cdwParam <- read.csv(file = "cdw_siteParamLookup.csv", header = TRUE)
### Join look-up table values with CDW data and flag records with incorrect volume factor
### (i.e., f-value). Flagged records are identified with a '1'.
tempCDW <- tempCDW %>%
dplyr::left_join(cdwParam %>% dplyr::select(siteID, fValue, transectLength),
by = "siteID") %>%
dplyr::mutate(fValQF = case_when(
is.na(volumeFactor) ~ 1,
volumeFactor == fValue ~ 0,
TRUE ~ 1
))
### Check records with volumeFactor problems and evaluate
fCheck <- tempCDW %>%
dplyr::filter(fValQF == 1) %>%
dplyr::select(domainID, siteID, volumeFactor, fValue, fValQF)
#--> volumeFactor problems identified for HARV and STEI; volumeFactor is supposed to
#--> be assigned at the site level and should not vary from plot-to-plot. The variation
#--> in the data likely reflects a data entry error. We encourage data users to let NEON
#--> staff know about issues they discover like this one:
#--> https://www.neonscience.org/about/contact-us
### Correct the volumeFactor data for HARV, STEI sites based on fCheck results above
tempCDW <- tempCDW %>%
dplyr::mutate(volumeFactor = as.numeric(volumeFactor)) %>%
dplyr::mutate(volumeFactor = case_when(
siteID == "HARV" & (volumeFactor != 5 | is.na(volumeFactor)) ~ 5,
siteID == "STEI" & (volumeFactor != 5 | is.na(volumeFactor)) ~ 5,
TRUE ~ volumeFactor
))
### Log Distance tally check: Flag 'logDistance' values that exceed the per-site
### maximum transect length defined in the protocol.
tempCDW <- tempCDW %>%
dplyr::mutate(
logDistQF = case_when(
is.na(logDistance) ~ 0,
logDistance > transectLength ~ 1,
TRUE ~ 0
))
### Check whether the logDistance is <= limitingDistance, as expected, based on
### equivalentLogDiameter, and flag with '1' if not. The formula to calculate the
### limiting distance is from Affleck 2010, Appendix A:
### limitingDist = ((pi^2)*(roundDiameter^2))/(8*transectNum*fValue), where
### roundDiameter = log round diameter (cm); equal to equivalentLogDiameter in NEON data
### transectNum = number of transects established from survey point (3 for NEON plots),
### fValue = the F value assigned at the site level; volumeFactor field in NEON data
tempCDW <- tempCDW %>%
dplyr::mutate(
limitingDist = round(((pi^2)*(equivalentLogDiameter^2))/(8*3*volumeFactor), digits=2)
) %>%
dplyr::mutate(
limDistQF = case_when(
is.na(limitingDist) ~ 0,
logDistance > limitingDist ~ 1,
TRUE ~ 0
))
### Filter out flagged records corresponding to logs that should not have been tallied
filterCDW <- tempCDW %>%
dplyr::filter(limDistQF == 0,
logDistQF == 0)
```
### Small Mammal Box Trapping Data QC: Check for duplicates
In the code chunk below, we check for duplicates in the `mam_perplotnight` and `mam_pertrapnight` tables using the `neonOS::removeDups()` function:
- `mam_perplotnight`: In this table, duplicate records are defined as those with the same plot and date combination (as captured in an auto-generated nightuid).
- `mam_pertrapnight`: In this table, duplicate records are defined as those with the same nightuid, trap coordinate, and tagID or individualCode; note that the standard `neonOS::removeDups()` function used to remove duplicates cannot account for multiple captures of untagged individuals in a single trap.
``` {r mamDupeChecks, results=FALSE, message=FALSE, warning=FALSE}
### First, check mam_perplotnight table by nightuid using standard removeDups function
mam_plotNight_nodups <- neonOS::removeDups(data = mam_perplotnight,
variables = variables_10072,
table = "mam_perplotnight")
#--> Running this check generated the message: Primary key fields contain NA values
#--> and/or empty strings. Results may be unreliable; check input data carefully.
### To troubleshoot, first query the table to find the problematic records.
### Next, check the mam_pertrapnight table to see if there are corresponding records
### that might have a nightuid
problems <- which(is.na(mam_perplotnight$nightuid))
temp <- mam_pertrapnight %>%
dplyr::filter(
plotID %in% mam_perplotnight$plotID[problems] &
collectDate %in% mam_perplotnight$collectDate[problems]
)
#--> Since the remaining fields were populated and there are captures for each,
#--> these records appear to represent a valid night of trapping. So, we will assign
#--> custom temporary nightuids in both tables here. As an aside, we encourage users
#--> to let NEON staff know about issues they discover like this one:
#--> https://www.neonscience.org/about/contact-us
### Create custom nightuids for identified duplicates missing primary key value,
### then re-run the neonOS::removeDups() function
mam_perplotnight_adj <- mam_perplotnight %>%
dplyr::mutate(nightuid = paste(plotID, '_', collectDate, sep = ''))
mam_pertrapnight_adj <- mam_pertrapnight %>%
dplyr::mutate(nightuid = paste(plotID, '_', collectDate, sep = ''))
mam_plotNight_nodups <- neonOS::removeDups(data = mam_perplotnight_adj,
variables = variables_10072,
table = "mam_perplotnight")
#--> No duplicate key values found
### We now check the mam_pertrapnight table by nightuid and trapcoordinate using the
### standard neonOS::removeDups() function.
### Note: RELEASE 2022 contains ~142K records for theSites downloaded, so this
### operation can take some time. In rare cases, multiple animals can be captured in
### a single trap on the same night and not all receive unique tagIDs or individualCodes.
### This means that the duplicate function is not effective for these records. Here, we
### remove these records prior to running the standard check, and we then add these
### records back to the data frame afterwards.
mam_trapNight_multipleCaps <- mam_pertrapnight_adj %>%
dplyr::filter(trapStatus == "4 - more than 1 capture in one trap" &
is.na(tagID) & is.na(individualCode))
mam_trapNight_remainingRecords <- mam_pertrapnight_adj %>%
dplyr::filter(!(uid %in% mam_trapNight_multipleCaps$uid))
mam_trapNight_nodups <- neonOS::removeDups(data = mam_trapNight_remainingRecords,
variables = variables_10072,
table = "mam_pertrapnight")
#--> Output: Two unresolveable duplicates flagged with duplicateRecordQF == 2, which
#--> indicates records that are identical based on primary keys but contain different
#--> values in other fields; see ?neonOS::removeDups documentation for additional
#--> information on duplicate record flagging.
### Follow-up trouble-shooting to deal with unresolveable duplicates: Check for
### unexpected NAs in primary keys
which(is.na(mam_pertrapnight_adj$nightuid))
#--> No NAs found
which(is.na(mam_pertrapnight_adj$trapCoordinate))
#--> No NAs found. The two which() outputs above mean that trapID and individualCode
#--> are the source of the NAs, which is expected based on the protocol.
### Identify duplicates resulting from > 1 animal captured in a single trap on the
### same night.
dupCheck <- mam_trapNight_nodups %>%
dplyr::filter(duplicateRecordQF == 2 &
trapStatus != "4 - more than 1 capture in one trap")
#--> No records in the set; unresolved duplicates will not impact the analyses
### Add multiple capture records back to dataset
mam_trapNight_nodups <- mam_trapNight_nodups %>%
dplyr::mutate(collectDate = as.Date(collectDate)) %>%
dplyr::bind_rows(mam_trapNight_multipleCaps)
```
## Analysis of Spatially Integrated Data Products at Plot and Site Scales
In this section, we generate Coarse Downed Wood volume estimates and MAM abundance estimates at both plot and site scales to demonstrate the spatial integration of NEON TOS data products. Because CDW volume is generated every 5-years and does not change rapidly, barring fires or other site management activities, we compare with MAM data collected the same year as the CDW data +/- 1 year. For example, for HARV CDW volume data collected in 2019, we compare to MAM abundance data from 2018-2020. While we have selected a 3-year MAM window centered on the CDW sampling year, other researchers may select a different approach based on the questions being addressed.
### Create Coarse Downed Wood Volume Datasets
Using the LIDS method (Affleck 2008, 2010), CDW volume is calculated at the plot scale as:
`CDW volume density = F * n` (m^3^/hectare);
where `F` is the site-specific volume factor (an approximate inverse to plot size), and `n` is the number of qualifying logs tallied across all three transects originating from the centroid of the plot. Importantly, at the plot scale, CDW volume density is colocated with MAM abundance data at a maximum of n=6 plots per site (for `theSites` included in this analysis).
At the site scale, mean CDW volume (m^3^/hectare) is simply the mean of the plot scale values; the mean site-scale parameter is robust because of the spatially-balanced, stratified-random nature of the TOS Spatial Design. When comparing CDW volume and MAM abundance at the site scale, CDW data from all Distributed plots are used for the calculation.
In the code chunk below, we carry out the following steps:
- Remove records with `samplingImpractical` not equal to "OK" and keep records with `samplingImpractical` equals "NA". The latter is important because the `samplingImpractical` field was introduce in 2020 and older data are not populated with a value.
- Use the `targetTaxaPresent` field to identify only those records associated with a tallied log. For transects within a plot that have no logs, field staff create records with `targetTaxaPresent == N`.
- Calculate CDW volume density for those plotIDs that have logs.
- Because plotIDs that have no logs tallied for any of the the three transects contribute meaningful zeroes to the dataset, we separately identify these plots and manually add a CDW volume density of '0' to the dataset.
- Use plot-level CDW volume density data to calculate mean site-level CDW volume density and standard deviations.
``` {r cdwVolumeEstimates, message=FALSE}
### Calculate CDW volume for each plotID in the cleaned dataset. First, focus
### on plots with targetTaxaPresent == "Y" --> i.e., at least one qualifying log was
### tallied within the plot. To do this, remove records for transects with no logs,
### and remove records with samplingImpractical != "OK", then count the number of
### logs per plot and calculate CDW volume.
plotCdwVol <- filterCDW %>%
dplyr::filter(is.na(samplingImpractical) |
samplingImpractical == "OK") %>%
dplyr::filter(targetTaxaPresent == "Y") %>%
dplyr::group_by(domainID, siteID, volumeFactor, eventID, plotID) %>%
dplyr::summarise(logCount = n()) %>%
dplyr::mutate(cdwVol = volumeFactor * logCount)
### Second, identify plots where all three transects do not have logs and bind to
### add these important zeroes back to the log count output from above
noLogs <- filterCDW %>%
dplyr::filter(targetTaxaPresent == "N") %>%
dplyr::group_by(domainID, siteID, volumeFactor, eventID, plotID) %>%
dplyr::summarise(transectCount = n()) %>%
dplyr::filter(transectCount == 3) %>%
dplyr::mutate(logCount = 0,
cdwVol = 0) %>%
dplyr::select(-transectCount)
plotCdwVol <- plotCdwVol %>%
dplyr::bind_rows(noLogs) %>%
dplyr::arrange(domainID, siteID, eventID, plotID)
### Add back mammalGrid collocation data that was dropped during grouping and
### summarizing operations immediately above
temp <- tempCDW %>%
dplyr::select(plotID, subtype) %>%
dplyr::distinct()
plotCdwVol <- plotCdwVol %>%
dplyr::left_join(temp, by = "plotID")
### Display table of plot-level CDW volume estimates
knitr::kable(
x = plotCdwVol %>%
dplyr::ungroup() %>%
dplyr::rename(
"Log Count" = logCount,
"CDW Vol (m3/ha)" = cdwVol,
"Plot subtype" = subtype
) %>%
dplyr::select(-eventID) %>%
dplyr::slice_head(n = 5),
row.names = FALSE,
caption = "Example subset of CDW volume density estimates by plotID for selected sites.
The 'Plot subtype' column indicates whether a given CDW plot is colocated with a Small
Mammal sampling grid."
)
### Finally, use plot-level CDW volume density calculated immediately above to
### calculate mean site-level CDW and standard deviations.
siteCdwVol <- plotCdwVol %>%
dplyr::group_by(domainID, siteID, eventID) %>%
dplyr::summarise(
meanCdwVol = round(mean(cdwVol, na.rm = TRUE), digits = 1),
sdCdwVol = round(sd(cdwVol, na.rm = TRUE), digits = 1),
nCdwVol = n()
)
### Display table of site-level CDW volume estimates
knitr::kable(
x = siteCdwVol %>%
dplyr::ungroup() %>%
dplyr::rename(
"Mean CDW Vol (m3/ha)" = meanCdwVol,
"StdDev CDW Vol" = sdCdwVol,
"Plot Number" = nCdwVol
),
row.names = FALSE,
caption = "CDW volume density estimates by siteID for selected sites and eventIDs."
)
```
### Create Small Mammal Density datasets
In this section, we employ the minimum number known alive (MNKA) approach to estimate total MAM abundance - e.g., as defined in Slade & Blair (2000). This approach assumes that a marked individual is present at all sampling points between its first and last capture dates, even if it was not actually captured in those interim trapping sessions. To make the dataset meet this assumption, we add those implicit records to the dataset to make them explicit.
Note: At the time of writing it is necessary to create Small Mammal `eventIDs` for legacy data collected prior to 2021. For the Small Mammal product, `eventIDs` correspond to discrete sampling bouts that represent a group of consecutive nights of trapping scheduled around a particular new moon. For the NEON 'RELEASE-2023' data, the omission of `eventID` from legacy data will be fixed and the code below that creates `eventIDs` will be moot.
In the code chunk below, we carry out the following steps:
- Use `mam_plotNight` data to create an `eventID` to group records according to unique sampling events. We assign the same `eventID` to each record with an `endDate` no more than 10 days later than the previously created record.
- Rectify the `trapStatus` field in the `mam_trapNight` table for records of captured animals that do not have the correct value.
- Filter the `mam_trapNight` data to retain records with the desired `trapStatus`, and records that have `taxonIDs` in the list of target taxa downloaded with the NEON Data Portal API. The latter step removes taxa from the dataset for which the MAM trapping protocol was not designed (e.g., shrews, weasels, etc.).
- Use `mam_trapNight` data and a `for()` loop to add in missing trap nights so that the data meet the assumptions described in Slade & Blair (2000) for calculating MNKA. The `for()` loop is commented out and the resulting dataset is read back in due to the inefficient computation.
- Use the `nightuid` to join the prepared `mam_trapNight` dataset with `mam_plotNight` to bring in the previously calculated `eventID`. The `eventID` is required for downstream computational steps.
``` {r mamFinalizeDatasets, results='asis', message=FALSE}
### Create eventID grouping variable using mam_plotNight data and add to mam_trapNight
### Assign the same eventID to each record with an endDate no more than 10 days later
### than the previously created record. Below, we use the 'data.table::shift()'
### function to return the previous row value for the specified column (i.e., endDate)
mam_plotNight_nodups <- mam_plotNight_nodups %>%
dplyr::mutate(year = as.integer(format(collectDate, "%Y")),
.before = eventID) %>%
dplyr::group_by(siteID, year) %>%
dplyr::mutate(
priorEndDate = data.table::shift(endDate, fill = endDate[1]),
diffDays = difftime(endDate, priorEndDate, units = "days"),
boutIncrement = ifelse(diffDays >= 10, 1, 0),
boutNum = cumsum(boutIncrement),
.after = endDate
) %>%
dplyr::group_by(siteID, year, boutNum) %>%
dplyr::mutate(
weekBoutBegan = lubridate::isoweek(min(endDate)),
eventID = case_when(
is.na(eventID) ~ paste(siteID, year, weekBoutBegan, sep = "."),
TRUE ~ eventID
),
.after = boutNum
) %>%
dplyr::ungroup() %>%
dplyr::select(-priorEndDate, -diffDays, -boutIncrement, -boutNum)
### Add nightuid and eventID from mam_plotNight_nodups to mam_trapNight_nodups
mam_trapNight_nodups <- mam_trapNight_nodups %>%
dplyr::left_join(mam_plotNight_nodups %>%
dplyr::select(nightuid, eventID),
by = "nightuid")
### Create and finalize Small Mammal capture dataset for selected sites. First,
### check to ensure all captures have the correct 'trapStatus' - i.e., check if tagIDs
### exist but trap status does not include "capture".
problemRecords <- mam_trapNight_nodups %>%
dplyr::filter(!is.na(tagID),
!grepl("capture", trapStatus))
### If nrow(problemRecords) > 0, update the corresponding trapStatus fields to
### "5 - capture" or, in the rare case where there are multiple captures in one trap,
### "4 - more than 1 capture in one trap" - this rare case does not occur in the
### current dataset, so it is not addressed here.
mam_trapNight_nodups <- mam_trapNight_nodups %>%
dplyr::mutate(trapStatus = case_when(
uid %in% problemRecords$uid ~ "5 - capture",
TRUE ~ trapStatus
))
### Next, create the list of target taxa from taxon list downloaded from the API in
### a previous code chunk.
targetTaxa <- mam.list$data %>%
dplyr::filter(taxonProtocolCategory == "target") %>%
dplyr::select(taxonID)
### Next, subset the trapping data to only the capture records, i.e., the records that
### describe a captured small mammal, including only those taxa for which the trapping
### protocol is designed. Retain only those columns needed for the MNKA calculation.
captures <- mam_trapNight_nodups %>%
dplyr::filter(grepl("capture", trapStatus) &
taxonID %in% targetTaxa$taxonID) %>%
dplyr::select(uid, nightuid, plotID, collectDate, tagID)
### Next, per Slade & Blair (2000), generate a column of all the unique tagIDs
### included in the dataset, and create an empty data frame to populate with 'for()'
### loop output.
uTags <- captures %>%
dplyr::select(tagID) %>%
dplyr::filter(!is.na(tagID)) %>%
dplyr::distinct()
capsNew <- dplyr::slice(captures, 0)
### Use the 'captures' data derived from mam_trapNight, and for each tagged
### individual, add a record for each night of trapping done in the plots
### in which it was captured between the first and last dates of capture.
# for (i in uTags$tagID) {
# temp <- captures %>% filter(tagID == i)
# firstCap <- as.Date(min(temp$collectDate), "YYYY-MM-DD", tz = "UTC")
# lastCap <- as.Date(max(temp$collectDate), "YYYY-MM-DD", tz = "UTC")
# possibleDates <- seq(as.Date(firstCap), as.Date(lastCap), by="days")
# plots <- unique(temp$plotID)
# potentialNights <- mam_plotNight_nodups %>%
# dplyr::filter(as.character(collectDate) %in% as.character(possibleDates) &
# plotID %in% plots) %>%
# dplyr::select(nightuid,plotID, collectDate) %>%
# dplyr::mutate(tagID = i)
# temp2 <- dplyr::left_join(potentialNights, temp)
# capsNew <- dplyr::bind_rows(capsNew, temp2)
# }
### Save a copy to avoid time-consuming 'for()' loop processing
# saveRDS(capsNew, file = "mam_final_dataset.RDS")
### Read in processed capsNew dataset
capsNew <- readRDS(file = "mam_final_dataset.RDS")
### Add untagged individuals back to the dataset
capsNew <- captures %>%
dplyr::filter(is.na(tagID)) %>%
dplyr::bind_rows(capsNew)
### Add eventID to enable summarizing by trapping bouts (includes up to 3 nights
### of trapping for some plots in each bout, 1 night for other plots).
capsNew <- capsNew %>%
dplyr::left_join(mam_plotNight_nodups %>%
dplyr::select(eventID, nightuid),
by = "nightuid") %>%
dplyr::relocate(eventID, .after = collectDate) %>%
dplyr::arrange(eventID, plotID, collectDate)
```
In the code chunk below, we define a function to calculate MNKA for each plot within each bout:
- The function requires the `capsNew` data set generated in the code chunk above as an input.
- In addition, in the event that the `capsNew` data contain records from more plots than are needed for the MNKA estimation, the function requires a list of plots of interest (`plotsOI`) to filter the input dataset to only those plots for which MNKA outputs are desired.
``` {r mamDensityFunctions, results='asis', message=FALSE}
### Define function to calculate MNKA for each plot within bout; 'plotsOI' is a required
### input vector of plotIDs for which MNKA per plot per bout is desired.
mnka_per_plot_per_bout <- function(capture_data, plotsOI) {
caps <- capture_data %>%
dplyr::filter(plotID %in% plotsOI)
ids_by_plot_bout <- capture_data %>%
dplyr::group_by(eventID, plotID) %>%
dplyr::distinct(tagID)
mnka_by_plot_bout <- ids_by_plot_bout %>%
dplyr::group_by(eventID, plotID) %>%
dplyr::count() %>%
dplyr::mutate(
siteID = stringr::str_extract(eventID, "^[A-Z]{4}"),
year = as.numeric(stringr::str_extract(eventID, "20[0-9]{2}")),
.before = eventID
) %>%
dplyr::ungroup() %>%
as.data.frame()
return(mnka_by_plot_bout)
}
#--> Output is MNKA (count/hectare), based on the 1 ha size of the NEON trapping grid
```
In the code chunk below, we carry out additional MAM calculations:
- We use the MNKA small mammal density function defined immediately above to calculate MNKA at the plot scale.
- We identify MAM data within 3-year windows by site based on the year that CDW volume density was sampled at each site.
- Within the 3-year window, we retain the maximum MAM abundance observed for each plot to compare to plot-level CDW volume density data.
- Because CDW volume does not change rapidly, and MAM abundance may change substantially both within and among years, we generate the maximum small mammal abundance for a given plot for a three-year window centered on the year that logs were tallied for CDW.
- Researchers may choose any number of different methods to integrate TOS data products with different temporal sampling frequencies (below, we also calculate mean, median, and minimum per-plot MAM abundance).
- Additional methods beyond those shown here will all give a different statistical result.
- Plot-level MNKA MAM density data are used to calculate mean site-level MAM MNKA abundances. We also calculate several other MAM MNKA-based abundance metrics to illustrate the many choices available when working with these data.
``` {r mamMnkaEstimates, message=FALSE}
### Calculate MNKA per plot per bout for all data in processed capsNew dataset
plotBoutMNKA <- mnka_per_plot_per_bout(capture_data = capsNew,
plotsOI = unique(capsNew$plotID))
### Create table of MAM data within 3-year windows by site based on year of CDW volume
### sampling.
mamSiteYears <- plotCdwVol %>%
dplyr::ungroup() %>%
dplyr::distinct(domainID, siteID, eventID) %>%
dplyr::mutate(
cdwYear = as.numeric(stringr::str_extract(eventID, "20[0-9]{2}")),
minMamYear = cdwYear - 1,
maxMamYear = cdwYear + 1
)
### Calculate plot-specific MNKA across bouts: Join plotBoutMNKA and mamSiteYears,
### filter to records within 3-year window centered on CDW measurement year, and
### determine maximum small mammal abundance for each plot across all sampling
### events, as well as other per plot metrics.
plotMNKA <- plotBoutMNKA %>%
dplyr::left_join(mamSiteYears %>%
dplyr::select(siteID,
minMamYear,
maxMamYear),
by = "siteID") %>%
dplyr::filter(year >= minMamYear,
year <= maxMamYear) %>%
dplyr::group_by(siteID, plotID) %>%
dplyr::summarise(
plotMeanAbundance = round(mean(n), digits = 1),
plotMedAbundance = median(n),
plotMinAbundance = min(n),
plotMaxAbundance = max(n),
plotBoutCount = n()
) %>%
dplyr::ungroup()
### Display table of plot-level MNKA parameters
knitr::kable(
x = plotMNKA %>%
dplyr::ungroup() %>%
dplyr::rename(
"Mean Abundance (count/ha)" = plotMeanAbundance,
"Median Abundance (count/ha)" = plotMedAbundance,
"Min Abundance (count/ha)" = plotMinAbundance,
"Max Abundance (count/ha)" = plotMaxAbundance,
"# Bouts Sampled" = plotBoutCount
) %>%
dplyr::slice_head(n = 5),
row.names = FALSE,
caption = "Example subset of Small Mammal abundance metrics (count/ha) by plotID for
selected sites for a 3-year window centered on the year that CDW volume was measured."
)
### Calculate MNKA metrics per site based on plot-level data:
### siteMeanAbundance = mean abundance across plots within a site
### siteMedAbundance = median abundance across plots within a site
### siteMinAbundance = average minimum abundance across plots within a site
### siteMaxAbundance = average maximum abundance across plots within a site
siteMNKA <- plotMNKA %>%
dplyr::group_by(siteID) %>%
dplyr::summarise(
siteMeanAbundance = round(mean(plotMeanAbundance), digits = 1),
siteMedAbundance = round(median(plotMedAbundance), digits = 1),
siteMinAbundance = round(mean(plotMinAbundance), digits = 1),
siteMaxAbundance = round(mean(plotMaxAbundance), digits = 1)
) %>%
dplyr::ungroup()
### Display table of site-level MNKA parameters
knitr::kable(
x = siteMNKA %>%
dplyr::ungroup() %>%
dplyr::rename(
"Mean Abundance (count/ha)" = siteMeanAbundance,
"Median Abundance (count/ha)" = siteMedAbundance,
"Average Min Abundance (count/ha)" = siteMinAbundance,
"Average Max Abundance (count/ha)" = siteMaxAbundance
),
row.names = FALSE,
caption = "Small Mammal abundance metrics (count/ha) by siteID for selected sites
for a 3-year window centered on the year that CDW volume was measured."
)
```
### Create unified CDW and MAM datasets
To create plot-level and site-level data sets suitable for plotting and correlation analyses, in the code chunk below we join the CDW volume and MAM abundance data at both plot and site scales.
``` {r unifyDataSets, message=FALSE}
### Join plot-level CDW and MAM datasets
plotDF <- plotCdwVol %>%
dplyr::left_join(plotMNKA %>%
dplyr::select(-plotBoutCount),
by = c("siteID", "plotID")) %>%
dplyr::filter(!is.na(plotMaxAbundance))
### Join site-level CDW and MAM datasets
siteDF <- siteCdwVol %>%
dplyr::left_join(siteMNKA,
by = "siteID")
```
### Create graphs of CDW and MAM data at plot and site spatial scales
We first examine the potential relationship between CDW volume and MAM abundance using data from all spatially colocated plots from all of the six forested sites we selected (HARV, JERC, STEI, UKFS, RMNP, and ABBY).
``` {r generateGraph-allPlots, message=FALSE}
### Create graph of CDW volume vs. MAM maximum abundance across all plots and sites
plotsGraph <- ggplot2::ggplot(data = plotDF,
mapping = aes(x = cdwVol,
y = plotMaxAbundance)) +
ggplot2::geom_point(alpha = 0.5,
size = 3) +
ggplot2::geom_smooth() +
ggplot2::labs(
x = "CDW Volume (m3/ha)",
y = "Max MAM Abundance (count/ha)",
title = "Fig S1. Correlation between CDW Volume and Max MAM
Abundance at the plot scale across six forested NEON sites."
)
plotsGraph
```
Plot-level results across all sites:
- Across a selection of close-canopy forested NEON sites, measurements of CDW volume (m^3^/ha) and MAM abundance (count/ha) reveal no clear correlation at the colocated plot-scale.
- The Pearson's correlation coefficient is: `r round(cor(plotDF$cdwVol, plotDF$plotMaxAbundance), digits = 2)`.
In the next code chunk below, we check for correlations between CDW volume and Small Mammal abundance within sites using colocated plot-scale data.
``` {r generateGraph-allPlotsBySite, message=FALSE}
### Create graph of CDW volume vs. MAM maximum abundance across plots within site
sitePlotsGraph <- ggplot2::ggplot(data = plotDF,
mapping = aes(x = cdwVol,
y = plotMaxAbundance)) +
ggplot2::geom_point(alpha = 0.5,
size = 3) +
ggplot2::geom_smooth(method = lm) +
ggplot2::facet_wrap(facets = ~ siteID,
ncol = 3,
scales = "free") +
ggplot2::labs(
x = "CDW Volume (m3/ha)",
y = "Max MAM Abundance (count/ha)",
title = "Fig S2. Correlation between CDW Volume and Max MAM
Abundance by plot within site."
)
sitePlotsGraph
```
Plot-level results within site:
- Within a selection of close-canopy forested NEON sites, CDW volume (m^3^/ha) and MAM abundance (count/ha) show positive, negative, and weak correlations.
- Positive correlation: RMNP, r~Pearson~ = `r round(cor(plotDF$cdwVol[plotDF$siteID=="RMNP"], plotDF$plotMaxAbundance[plotDF$siteID=="RMNP"]), digits = 2)`).
- Negative correlation: ABBY, r~Pearson~ = `r round(cor(plotDF$cdwVol[plotDF$siteID=="ABBY"], plotDF$plotMaxAbundance[plotDF$siteID=="ABBY"]), digits = 2)`).
- Weak correlations (HARV, JERC, STEI, UKFS).
In the final code chunk below, we check for correlations between mean CDW volume and average maximum MAM abundance at the site scale.
``` {r generateGraph-allSites, message=FALSE}
### Create graph of CDW volume vs. MAM maximum abundance across sites
sitesGraph <- ggplot2::ggplot(data = siteDF,
mapping = aes(x = meanCdwVol,
y = siteMaxAbundance,)) +
ggplot2::geom_point(alpha = 0.5,
size = 3) +
ggplot2::geom_smooth(method = lm) +
ggplot2::labs(x = "CDW Volume (m3/ha)",