-
Notifications
You must be signed in to change notification settings - Fork 4
/
Copy pathbulk_baconizing.Rmd
501 lines (359 loc) · 32.3 KB
/
bulk_baconizing.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
---
title: "Bulk Baconizing"
author:
- affiliation: University of Wisconsin - Madison
name: Simon Goring
- affiliation: Mount Royal University
name: Andria Dawson
- affiliation: University of Wisconsin - Madison
name: Allison Stegner
- affiliation: Georgia Institute of Technology
name: Yue Wang
date: "November 13, 2018"
bibliography: styles/baconizing.bib
output:
html_document:
code_folding: show
fig_caption: yes
keep_md: yes
self_contained: yes
theme: readable
toc: yes
toc_float: yes
dev: svg
highlight: tango
keywords: chronology, geochronology, paleoecology, age-models,
Bacon, 210Pb, 14C, radiocarbon, biostratigraphy, Bayesian
csl: styles/elsevier-harvard.csl
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## Introduction
Over the course of working with the Neotoma Paleoecology Database [@williams2017neotoma] and the age modeling software Bacon [@blaauw2011flexible] it has often become neccessary to run a large number of Bacon models in an itterative process (for example the [Stepps-Baconizing GitHub repository](https://github.com/PalEON-Project/stepps-baconizing)). The process of building and re-building a large number of age models is often complicated and can result in a large amount of code. This code might involve fine-tuning results for single sites, using information from multiple runs, the generation of multiple data files and a set of R code that can span hundreds of lines. In addition, the complexities of the underlying data (for example, issues with existing chronologies and data structures within Neotoma) may mean that users might unintentionally introduce errors into their workflows. Data repositories serve a unique position within the research community [@goring2018nexus], and one service they can provide is the distillation and distribution of best-practices documents such as this resource for generating a large number of Bacon age models from Neotoma data.
This repository makes use of experience obtained in working through several projects [@kujawa2016theeffect;@dawson2016quantifying], and uses data from several projects as the basis for establishing certain prior values. We make an effort in this document to clearly explain the steps neccessary to undertake such a project in a fairly straightforward manner.
### The Workflow
The general workflow is encoded in the Rmd document that can be rendered to an HTML file. Rmd documents *knit* together descriptive text and R commands ([read more about RMarkdown documents](https://bookdown.org/yihui/rmarkdown/)). The Rmd document is intended to provide context and guide the user in some best practices for the generation of Bacon outputs in batch. The Rmd is also intended to operate as a stand-alone tool to generate a number of age records, and output the PDF ghost plots, age files and posterior draws for the cores of interest. The R code that is embedded in the `Rmd` file does the following:
![](images/workflow.svg)
The key steps of the workflow:
1. Executing the Rmd file: This will knit together the text and execute all the R code to generate an HTML file with embedded results, along with the output files from the Bacon runs.
2. Reading in the `settings.yaml` file. See the text below for information about the settings that can be used for these runs.
3. Loading data from Neotoma using the [`neotoma` package](http://github.com/ropensci/neotoma).
4. Setting default parameters for the Bacon runs (accumulation rates, memory, etc.)
5. Updating parameters based on past runs (if you have files with alternate settings).
6. Building Bacon age files (the csv files used by Bacon to record the chronological controls) based on established chronological controls from Neotoma.
7. Running Bacon.
8. Revising parameters based on model outputs and re-running step 7.
The final step in this process requires modifying the settings file after the first run of this workflow, and adjusting the the parameters in the parameters file generated by the run, to ensure that the Bacon runs for each core reflect the best possible age models. When the Rmd is re-run with the `settings.yaml` and parameters file adjusted (see Rmd for details), it is possible to do the batch processing faster, since the script is set to run only cores that have not had successful runs. Since "good" chronologies are excluded from being re-run it is possible to efficiently modify the settings for one or a few cores, while leaving the rest unchanged.
This code is provided as an RMarkdown document to help guide the user. It can support extensive customization if a user chooses to make changes anywhere in the code, and individuals are welcome to make or suggest changes as they wish.
### Executing the code
Before you run the code, ensure that all packages are up to date and that you have all the packages that are required for the proper execution of this code. Some errors that may arise may simply be the result of missing packages or out-of-date packages. There is [a script for MacOS and Linux computers posted on github](https://gist.github.com/SimonGoring/fe3b0bf6b4cb9b8f0a2aa6cdce10bb29) that will install all packages required. Otherwise a user can check their local packages and install and update manually.
#### RStudio
Many people working with R may choose to use [RStudio](http://rstudio.com). If you are using RStudio you can customize elements of this document (for example, removing this header information) and then use the *knit* button to compile the code into an HTML document, while generating the neccessary output files (see the [README file](https://github.com/neotomadb/bulk-baconizing) in the GitHub repository).
#### R Console
Users who use the R Console without any other editor can move to the appropriate working directory (using `setwd()`) and then run the entire document using the following command:
```R
library(rmarkdown)
render('bulk_baconizing.Rmd')
```
This can be shortened to: `rmarkdown::render('bulk_baconizing.Rmd')`.
#### Terminal
Similarly, if you wish to execute the code from a console or terminal you can navigate to the working directory and execute the following:
```bash
Rscript -e "rmarkdown::render('bulk_baconizing.Rmd')"
```
Note that with the terminal you need to use two different quotation marks when executing. The inner and outer quotations must be different (*e.g.*, double on the outside, single on the inside, or *vice versa*). Whether you customize the code, or choose to run it as-is (for just the US state of Utah), these three methods of executing the code are your main options.
### Providing Feedback
If, in any place instructions are not clear or more details are required please feel free to contact us by either [raising a GitHub Issue for this repository](https://github.com/neotomadb/bulk-baconizing), or by mailing me directly at [goring@wisc.edu](mailto:goring@wisc.edu).
## Setting up the System
This document is intended to be used as a template for users, and not as a solution in and of itself. The process for generating chronologies is itterative, as such, the use of this Rmd script is intended to be an itterative process, whereby you select sites, run Bacon, revise parameters and run the script again. Each itteration will involve modifying the parameters file (created in the `data/params` folder during your first run), and also the `settings.yaml` file. Please be sure to check carefully as you do this. Errors may result in long wait times, or runs that provide no new useful information.
### Defining Your Settings
Because there are a number of elements to this project, it was decided to place some of the main elements into a single, easy to use file. The project stores these settings in the parent directory in a file called `settings.yaml`. By default these settings are:
```{r settings_file, echo=FALSE, comment=NA}
cat(paste0(readLines('settings.yaml'), '\n'))
```
A `yaml` file is a special kind of markup that uses a `key`: `value` pairing, with `#` to indicate a comment. We assign parameters to the various keys here so that we can define the behaviour of the model runs. In the following section we explain each parameter and its behaviour:
#### Settings Defined
* `clean_run`: (default `true`) Should we clear all the files and parameters and start from scratch? Can take either `true` or `false`.
* `version`: (default `1`) The use of a version key allows you to keep track of your Bacon parameters if, over the course of a project, you make changes in your choices about how you want to run your analysis. If the version number changes a new parameter file will be generated. You can assign any value to this parameter.
* `reset_parameters`: (default `true`) Bulk generation of chronologies is an itterative process. In general default parameters will be set, and then, over the course of running bacon, parameters for individual cores might be changed. In the initial stages it might be worth modifying all default values in the code below and then setting `reset_parameters` to `true` so that the full table is re-written. Otherwise, edit the value to `false` so that the table is not overwritten each time the scripts are run.
* `core_path`: (default `Cores/`) Bacon needs to output files to a particular location, this tells Bacon where core files will be stored. You can use any directory in your working folder but that directory must exist.
* `date`: (default `today`) In cases where files might be overwritten, the default is to overwrite but store the old file with a date-stamp. The default is to use the current date, but it might be that you want to associate a date stamp with a particular version. In that case you can change the setting from `today` (which will use the current date) to a date of format `YYYY-MM-DD`.
* `core_top_err`: (default `2`) An uncertainty to assign the core top age. While, in principle we often know the date with absolute certainty, Bacon requires uncertainty for all age samples.
* `modern`: (default `true`) In Goring et al (2012) and elsewhere it is shown that modern sediment often has a higher accumulation rate than earlier sediment. One solution is to initiate an instantaneous hiatus at an identified "settlement" horizon, and then set the accumulation rate of the upper sediment to a higher `acc.mean` rate. If `modern` is set to `true`, the program will place a hiatus at the modern horizon. If `false` it will not.
* `thickness`: (default `false`) There is a file included in the repository called `data/paleon_thick.csv`, however, it may be the case that in previous runs you have adjusted thickness values. For example, in `version` 2 of your runs you could point to the `params` file generated in version 1 by pointing to `data/params/bacon_params_v1.csv`. If you modify thicknesses you must also modify the content of the chunk `prior_thickness` below, so that you point to the correct thickness and dataset ID columns.
* `settlement`: (default `false`) In Kujawa et al (2017) and in Dawson et al (2018) an expert-assessment exercise was undertaken to evaluate the presence and location of biostratigraphic markers for Euro-American settlement in the Upper Midwestern United States. This file is included for reference. With columns indicating `datasetid` and `depth`, if the parameter is set to a file location, such as `data/expert_assessment.csv`, then any indicator of Settlement will be revised, otherwise (e.g., `settlement: false`) the chronological controls for the records will remain as-is.
* `coretop`: This points to a file that contains adjustments to recorded core top ages for records based on additional information from external sources. Requires a `datasetid` column and an `age` and `depth` column. In cases where a reported core top should be removed, place an `NA` in the file for `age`.
* `settlement_adjust`: This points to a file that contains adjustments to recorded settlement ages for records based on additional information from external sources. Requires a `datasetid` column and an `age` and `depth` column. In cases where a reported settlement age should be removed, place an `NA` in the file for `age`.
* `accumulation`: (default `false`) Similar to thickness, if modifications have been made to accumulation rates, a file can be defined here.
* `parallel`: (default `true`) Should the code try to run Bacon in parallel? This is either the value `false` or the maximum number of cores to attempt (e.g., `3`). This means it is possible to run your script on fewer cores than your system has available to allow you to continue work on other projects. The code used in this script uses the `mclapply()` function, which means that the parallel operations, which use [parallel forking in R](https://garthtarr.com/parallel-computation-in-r/), are not available for Windows systems.
#### Modifying the Settings
Throughout your workflow you will likely change some of these settings. For example, if you are changing the search parameters for your dataset search, you may want to keep some prior information, but increment the `version`, or set `clean_run` to `true`, however, as you begin fine tuning your results, you will want to set `clean_run` back to `false`, and `reset_parameters` to `false` as well. This will prevent unneccessary errors, and reduce the amount of time required for the runs to complete, since it will accept completed runs and not attempt to re-run them.
```{r setup_runs, echo=TRUE, results='hide', message=FALSE, warning=FALSE}
source('R/setup_runs.R', echo=FALSE, verbose=FALSE)
```
There are a number of libraries used in this project. It is likely that you may be missing one or several. If this is the case you might be interested in [this bash script](https://gist.github.com/SimonGoring/fe3b0bf6b4cb9b8f0a2aa6cdce10bb29) for Mac or Linux. You may also be interested in the [autoinst package for R](https://github.com/jimhester/autoinst). Otherwise, take some time to look at `R/setup_runs.R` and ensure you install all packages listed.
## Getting Core Data from Neotoma
Here we download the sedimentary data. The key component here is the `get_dataset()` function. The newest version of `neotoma` (version >= 1.7.3) allows a user to pass in a vector of dataset IDs, so you can either get datasets using a set of parameters in the `get_dataset()` function (see [help for the `get_dataset()` function](https://www.rdocumentation.org/packages/neotoma/versions/1.7.0/topics/get_dataset)), or you can pass in a vector of dataset IDs (for example, if you've already picked a set of sites to be examined). In this example I am using the state, Michigan, as my search parameter:
```{r datasets_and_downloads, echo=TRUE, results='hide', message=FALSE, warnings=FALSE}
# Define using state names here. Could also use altitudes, locations,
# dataset types. . .
dataset_list <- get_dataset(datasettype='pollen',
gpid = c('Michigan', 'Wisconsin', 'Minnesota',
'Indiana', 'Ohio', 'North Dakota', 'South Dakota'),
ageyoung = 0)
# The load_downloads() function will download the datasets if either:
# there is not currently a downloaded object saved with the same dataset version
# or the setup parameter is TRUE, which calls for the whole data file to be re-written.
downloads <- suppressWarnings(load_downloads(dataset_list,
version = settings$version,
setup = settings$clean_run))
```
It might be the case that you want to do further validation on samples based on parameters in the `downloads` object (*e.g.*, only sites with a certain taxon). The downloads are required for the generation of the core depths file used by Bacon. If you want to further adjust the records, do so with the dataset object, not the `downloads`. For example, you can remove elements of a list by setting them to `NULL`, so if you know you want to remove the dataset in the 5th position, you could assign `dataset_list[[5]] <- NULL`. You would then need to re-run `load_downloads()` and ensure that `setup` was `TRUE` in `load_downloads()`:
```r
# This code is not run:
dataset_list <- get_dataset(gpid = 'Michigan')
# Only accept sites with a site description:
for (i in length(dataset_list):1) {
if (is.na(dataset_list[[i]]$site$description)) {
dataset_list[[i]] <- NULL
}
}
# We would go from approximately 810 datasets to 502.
downloads <- load_downloads(dataset_list,
version = settings$version,
setup = TRUE)
```
Using the `neotoma` R package's `plot_leaflet()` function we can generate an interactive map of the dowloaded sites that can be used to explore the data further.
```{r}
plot_leaflet(dataset_list)
```
Click on any point to obtain more information about the site. Note that this rendering is only interactive when the document is knit to an HTML file.
## Default Bacon Parameters
The Baconizing tool records parameters in a parameter `data.frame` called `params`. This `data.frame` is generated and the saved to file to ensure that errors in the Bacon run do not affect any information stored in the parameters. The `data.frame` includes a number of columns.
One unique element in this workflow is the use of two accumulation rates, one identified as `mod` and one identified as `old`. This comes from work by Goring et al. [-@goring2012deposition] and Dawson et al [-@dawson2016quantifying] who identifiy a break in accumulation rates at or around a European settlement horizon. This treatment of modern sediments is the default, however, the `modern` flag in the `settings.yaml` file can be set to `false` if you do not wish to use these modern/historical settings.
| Column | Default | Purpose |
| -------- | -------- | ---------------------------------------------------- |
| **`handle`** | none | The unique text identifer for the dataset collection unit. |
| **datasetid** | none | The unique dataset identifier. |
| **chronid** | none | The chronology used for the model. |
| **allchronids** | none | All chronologies available for the model. |
| **acc.mean.mod** | 3.02 | Accumulation rate for *modern* sediment (above an identified Settlement Horizon). If the `settings.yaml` flag `modern` is set to `false` this will be the same as `acc.mean.old`. |
| **acc.mean.old** | 15 | Accumulation rate for sediment if no horizon is identified, or the `modern` flag is set to `false`. |
| **acc.shape.mod** | 0.53 | Shape parameter for modern sediment (as above). |
| **acc.shape.old** | 0.9 | Shape parameter for sediement (as above). |
| **mem.strength** | 2 | Memory (accumulation autocorrelation parameter) |
| **mem.mean** | 0.5 | Mean value for memory. |
| **`hiatus`** | NA | Location of a hiatus (depth in *cm*), either set at settlement, or a known hiatus. |
| **`hiatus_age`** | NA | Estimated age of the hiatus. |
| **`thick`** | 5 | Section thickness (cm) to be used by Bacon. |
| **`age.type`** | NA | From the dataset, what is the original age type for the core chronology. |
| **`run`** | NA | Has the model been run by Bacon? |
| **`suitable`** | NA | Is the model suitable for a Bacon run? |
| **`ndates`** | NA | How many dates are used in the chronology? |
| **`success`** | NA | Did Bacon run and return a result successfully? |
| **`notes`** | `.` | Any notes associated with the run. |
| **`user_notes`** | `.` | Notes by the user for subsequent runs. |
Any of these parameters can be changed by altering the values in the code chunk below (in the `.Rmd` file) before running the document. If you have already run the document using a set of parameters and decide to change defaults, you can either change the `version` in `settings.yaml` or choose to set `reset_parameters` to `true`. Once you have run the full script and you have produced Bacon files you can manually edit individual sites by opening the `data/params/bacon_params_version.csv` file.
### Modifying the Parameters
After your first run of the script you can open the file at `r paste0('data/params/bacon_params_v', settings$version, '.csv')` to change the settings if the Bacon model appears poorly suited to the default settings. In this case you would change the appropriate setting (e.g., `thick`), and also change `run` to `NA` and `success` to `NA`. This will
```{r create_params, echo=TRUE, warning=FALSE}
existing_params <- file.exists(paste0('data/params/bacon_params_v', settings$version, '.csv'))
if ((!existing_params) | settings$reset_parameters == TRUE) {
# If we can't find a parameters file, or the user asked to reset
# the bulk-baconizing parameters:
message("Writing a new parameters file.")
if (existing_params) {
# There is an existing parameters file. We're just going to
# move it over and add the current date to it.
origin <- paste0('data/params/bacon_params_v',
settings$version, '.csv')
destination <- paste0('data/params/bacon_params_v',
settings$version, '_',
lubridate::round_date(lubridate::now("UTC"),
unit="day"),
'.csv')
file.copy(origin, destination)
}
# Get dataset handles and IDs from the datasets:
ds_handles <- sapply(dataset_list,
function(x) { x$dataset.meta$collection.handle })
ds_ids <- as.integer(sapply(dataset_list,
function(x) { x$dataset.meta$dataset.id }))
# Build the big `params` data.frame
params <- data.frame(handle = ds_handles,
datasetid = ds_ids,
chronid = NA,
allchronids = NA,
core_top = NA,
acc.mean.mod = 3.02,
acc.mean.old = 15.,
acc.shape.mod = 0.53,
acc.shape.old = 0.9,
mem.strength = 2.,
mem.mean = 0.5,
hiatus = as.numeric(NA),
hiatus_age = as.numeric(NA),
thick = 5.,
age.type = as.character(NA),
run = FALSE,
suitable = NA,
ndates = as.integer(NA),
success = NA,
reliableold = NA,
reliableyoung = NA,
notes = ".",
user_notes = ".",
stringsAsFactors = FALSE) %>%
arrange(handle)
default <- params
readr::write_csv(x = params,
path = paste0('data/params/bacon_params_v', settings$version, '.csv'))
} else {
p_file <- paste0('data/params/bacon_params_v', settings$version, '.csv')
params <- suppressMessages(readr::read_csv(p_file,
col_types = paste0(c('c','i', 'i',
'c', rep('n',10),
'c', 'l','l','i',
'l','c','c','c','c'),
collapse=''))) %>%
arrange(handle)
params$success[is.na(params$success)] <- 0
params$run[is.na(params$run)] <- 0
}
```
### Prior thicknesses
In the case that there has been a prior run or some prior assessment of thicknesses the user should set the `thickness` value of `settings.yaml` to point to the correct file location. Otherwise the value should be set to `false`.
```{r prior_thickness, echo = TRUE, results = 'asis'}
# We only modify thicknesses if we're doing new parameters,
# if we're loading the old parameters, then we expect that they've
# already been set.
assertthat::assert_that(!is.null(settings$thickness),
msg = "The thickness setting must be a file name or `FALSE`")
assertthat::assert_that(settings$thickness == FALSE |
"character" %in% class(settings$thickness),
msg = "The setting for thickness in `settings.yaml` is not set correctly.")
modify_thick <- file.exists(as.character(settings$thickness)) &
settings$reset_parameters == TRUE
if (modify_thick) {
params <- add_thickness(file = settings$thickness,
id_col = 1,
thick_col = 4,
parameters = params)
readr::write_csv(x = params,
path = paste0('data/params/bacon_params_v', settings$version, '.csv'))
}
```
### Prior accumulations
In the case that there has been a prior run or some prior assessment of accumulation rates the user should set the `accumulation` value of `settings.yaml` to point to the correct file location. Otherwise the value should be set to `false`.
```{r prior_accumulation, echo = TRUE, results = 'asis'}
assertthat::assert_that(!is.null(settings$accumulation),
msg = "The accumulation setting must be a file name or `FALSE`")
assertthat::assert_that(settings$accumulation == FALSE |
"character" %in% class(settings$accumulation),
msg = "The setting for accumulation in `settings.yaml` is not set correctly.")
modify_acc <- file.exists(as.character(settings$accumulation)) &
settings$reset_parameters == TRUE
if (modify_acc) {
params <- add_accumulation(file = settings$accumulation,
id_col = 1,
accum_col = 4,
parameters = params)
readr::write_csv(x = params,
path = paste0('data/params/bacon_params_v', settings$version, '.csv'))
}
```
## Generate Core Age and Depth Files
Here we begin to generate the `csv` files for the cores to allow Bacon to run. This requires calling Neotoma's API and then resolving the return so that it is turned into a `csv`. There are several decisions that go into the construction of the CSV file. These decisions are documented both using the `notes` column of the parameter file, and also commented in the file `R/build_agefiles.R`. The main decisions are:
1. Given the age model precedence for chronologies in Neotoma (*Calendar years BP > Calibrated radiocarbon years BP > Varve years BP > Calendar years BP (Ma) > Radiocarbon years BP*), always choose the chronological controls used for the "best" age type. **This is the default and no message is raised in the `notes` field**.
2. Given that each age type has a default model, choose the chronologies used in the default model for the best age type. **This is the default and no message is raised in the `notes` field**.
3. In cases where a chronology exists for the best age type, but there is no assigned default model, choose the chonology that is most recent, based either on creation date or chronology id. **Raises a message in `notes`**
4. In cases where a chronology exists for the best age type, but there are multiple assigned default model, choose the chonology that is most recent, based either on creation date or chronology id. **Raises a message in `notes`**
```{r write_agefiles, echo = TRUE, results='hide', warnings=FALSE, message=FALSE}
ageorder <- get_table('agetypes')
for (i in 1:nrow(params)) {
good_run <- params$run[i] == 1 &
params$success[i] == 1
if(!good_run) {
tester <- try(build_agefiles(param = params[i, ],
ageorder = ageorder,
datasets = dataset_list,
downloads = downloads,
settings = settings))
if (!"try-error" %in% class(tester)) {
params[i,] <- tester
} else {
params$notes[i] <- add_msg(params$notes[i],
"Age file build error.")
}
readr::write_csv(x = params,
path = paste0("data/params/bacon_params_v", settings$version, ".csv"))
} else {
cat("Skipping", params$handle[i], "\n")
}
}
```
## Possible warnings raised
The script is set up so that places where there is an objective choice, a clear notation of that choice is made and added to the parameter file's `notes` column as a semi-colon separated list. In addition to notes regarding the choice of age file, notes are also raised if:
* There are significantly high accumulation rates (more than 100yr/cm)
* There are adjusted indicators for Euro-american settlement (defined in a file reported in `settings.yaml`.)
* If there is a core top with missing age information (age is assigned `-60`, to be adjusted by the user).
* If there are no uncertainty constraints on an age that is not associated with geochronological information (uncertainties are assigned to `0`).
* If the chronological control is not associated with a geochronological object and comes from an age estimate such as `Guess`, or `Deglaciation` (age is dropped)
* If a geochronological object does not have a proper lab ID associated with it.
* If the geochronological object has a NULL age.
* If a 210Pb age has an assigned age greater than 500 (may have been assigned years AD/BC)
* If there was no uncertainty associated with a 210Pb age.
* If there was no uncertainty associated with a geochronological age other than 210Pb.
* If there is only one chronological constraint.
* If core files already existed in the `Cores` directory associated with the record.
* If for one reason or another something prevented the core file from being written.
## Running Bacon
The final step, once all the files have been written, is to run Bacon. Using the `rbacon` package we can simplify everything:
```{r baconruns, eval=TRUE, results="hide", message=FALSE, warning=FALSE, fig.show="hide"}
params <- run_batch(params, settings = settings)
readr::write_csv(x = params,
path = paste0("data/params/bacon_params_v", settings$version, ".csv"))
```
The ultimate product here is a Cores directory with a folder for each dataset that generates a suitable chronology for Bacon, a parameters file (in `data/params/`) that records the parameters used in the Bacon model, along with any notes about elements that may have been changed or were of note in the Bacon runs.
A file has been added to the standard Bacon output. In each directory the file **`handle`**`_XXX_posteriorout.csv` represents draws from the posterior provided by Bacon. You will see NAs in this result (but not in the Bacon `ages` file)
## Run Summary
### Success / Failure
```{r svg_box, echo=FALSE, result="as-is"}
success <- round(sum(params$success == 1, na.rm=TRUE) /
length(dataset_list) * 400, 0)
fail <- round((1 - sum(params$success == 1, na.rm=TRUE) /
length(dataset_list)) * 400, 0)
htmltools::HTML(paste0('<svg width="400" height="110">
<rect y="0" x="0" width="',success,'" height="100" style="fill:rgb(102,102,255);stroke-width:1;stroke:rgb(0,0,0)" />
<rect y="0" x="',success,'" width="',fail,'" height="100" style="fill:rgb(255,153,153);stroke-width:1;stroke:rgb(0,0,0)" />
</svg>'))
```
**Figure**. Success (blue; *n* = **`r success`**) and failure (pink; *n* = **`r fail`**) rate of the Bacon runs.
Here we summarize some elements of the run that was conducted with this document:
* Datasets obtained from Neotoma: **`r length(dataset_list)`**
* Datasets with suitable chronologies: **`r sum(params$suitable == 1, na.rm=TRUE)`**
* Datasets with Bacon runs: **`r sum(params$run == 1, na.rm=TRUE)`**
* Datasets with successful Bacon runs: **`r sum(params$success == 1, na.rm=TRUE)`**
### All Dataset Notes
This interactive table allows you to page through your records for each site to examine the notes associated with each record. You can use the filters at the bottom of the table to search for individual records, or certain fields. The dataset IDs link to [Neotoma Explorer](http://apps.neotomadb.org/Explorer) and can be used to investigate sites further.
```{r echo=FALSE}
params %>%
filter((!notes == ".")) %>%
select(handle, datasetid, suitable, run, success, notes) %>%
mutate(notes = stringr::str_replace_all(notes, ";", "\\n")) %>%
mutate(datasetid = paste0("<a href=http://apps.neotomadb.org/explorer/?datasetid=", datasetid, " target='_blank'>",datasetid,"</a>")) %>%
DT::datatable(filter = "bottom", escape = FALSE) %>%
DT::formatStyle(columns = colnames(.), fontSize = '50%')
```
### Successful Runs
```{r echo=FALSE}
params %>%
filter(success == 1) %>%
select(handle, datasetid, reliableyoung, reliableold, notes) %>%
mutate(notes = stringr::str_replace_all(notes, ";", "\\n")) %>%
mutate(datasetid = paste0('<a href=http://apps.neotomadb.org/explorer/?datasetid=', datasetid, ' target="_blank">',datasetid,'</a>')) %>%
DT::datatable(filter = 'bottom', escape = FALSE) %>%
DT::formatStyle(columns = colnames(.), fontSize = '50%')
```
## References