-
Notifications
You must be signed in to change notification settings - Fork 11
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #85 from stineb/main
Added data use vignette
- Loading branch information
Showing
1 changed file
with
136 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,136 @@ | ||
--- | ||
title: "Data use" | ||
author: "Benjamin Stocker" | ||
output: rmarkdown::html_vignette | ||
vignette: > | ||
%\VignetteIndexEntry{Data use} | ||
%\VignetteEngine{knitr::rmarkdown} | ||
%\usepackage[utf8]{inputenc} | ||
--- | ||
|
||
```{r include=FALSE} | ||
evaluate <- ifelse( | ||
Sys.info()['nodename'] == "balder" | | ||
Sys.info()['nodename'] == "dash" | | ||
Sys.info()['nodename'] == "pop-os", | ||
TRUE, | ||
FALSE | ||
) | ||
``` | ||
|
||
```{r message=FALSE, warning=FALSE} | ||
library(dplyr) | ||
library(ggplot2) | ||
library(lubridate) | ||
library(readr) | ||
library(FluxDataKit) | ||
``` | ||
|
||
*Note this routine will only run with the appropriate files in the correct | ||
place. Given the file sizes involved no demo data can be integrated into | ||
the package.* | ||
|
||
## Time series | ||
|
||
For time series modelling and analysing long-term trends, we want complete sequences of good-quality data. Data gaps are not an option as they would break sequences. We have to allow for the reduced quality of gap-filled data. But we have to set limits. FluxDataKit contains information about the start and end dates of complete sequences good-quality gap-filled time series for each site for variables gross primary production, the latent heat flux, and the energy-balance corrected latent heat flux. This information is provided by the object `fdk_site_fullyearsequence`. | ||
|
||
This vignette demonstrates an example for how to use this information for obtaining good-quality daily GPP time series data for which the following criteria apply: | ||
|
||
- At least 25% of the half-hourly data used for creating daily aggregates is measured or good-quality gap-filled data (QC flag > 0.25). | ||
- The maximum gap length of data after applying the above QC flag is 90 days. | ||
- Retain only data for full years (starting 1 Jan and ending 31 Dec). | ||
- Exclude croplands and wetlands. | ||
- Each site should provide a full sequence (after applying above criteria) with at least three full years' data. | ||
|
||
Information considering points 1-3 are contained in the data frame `fdk_site_fullyearsequence` which is provided as part of the FluxDataKit package. | ||
|
||
### Determine sites | ||
|
||
Apply criteria listed above. | ||
|
||
```{r} | ||
sites <- FluxDataKit::fdk_site_info |> | ||
filter(!(sitename %in% c("MX-Tes", "US-KS3"))) |> # failed sites | ||
filter(!(igbp_land_use %in% c("CRO", "WET"))) |> | ||
left_join( | ||
fdk_site_fullyearsequence, | ||
by = "sitename" | ||
) |> | ||
filter(!drop_gpp) |> # where no full year sequence was found | ||
filter(nyears_gpp >= 3) | ||
``` | ||
|
||
This yields 142 sites with a total of 1208 years' data, corresponding to 399,310 data points of daily data. | ||
```{r} | ||
# number of sites | ||
nrow(sites) | ||
# number of site-years | ||
sum(sites$nyears_gpp) | ||
# number of data points (days) | ||
sum(sites$nyears_gpp) * 365 | ||
``` | ||
|
||
### Load data | ||
|
||
Read files with daily data. | ||
|
||
```{r message=FALSE, warning=FALSE, eval=FALSE} | ||
path <- "~/data/FluxDataKit/FLUXDATAKIT_FLUXNET/" # adjust for your own local use | ||
read_onesite <- function(site, path){ | ||
filename <- list.files(path = path, | ||
pattern = paste0("FLX_", site, "_FLUXDATAKIT_FULLSET_DD"), | ||
full.names = TRUE | ||
) | ||
out <- read_csv(filename) |> | ||
mutate(sitename = site) | ||
return(out) | ||
} | ||
# read all daily data for the selected sites | ||
ddf <- purrr::map_dfr( | ||
sites$sitename, | ||
~read_onesite(., path) | ||
) | ||
``` | ||
|
||
## Select data sequences | ||
|
||
Cut to full-year sequences of good-quality data. | ||
```{r, eval=FALSE} | ||
ddf <- ddf |> | ||
left_join( | ||
sites |> | ||
select( | ||
sitename, | ||
year_start = year_start_gpp, | ||
year_end = year_end_gpp), | ||
by = join_by(sitename) | ||
) |> | ||
mutate(year = year(TIMESTAMP)) |> | ||
filter(year >= year_start & year <= year_end) |> | ||
select(-year_start, -year_end, -year) | ||
``` | ||
|
||
|
||
## Plot | ||
|
||
Visualise data availability. | ||
```{r, fig.height=25} | ||
sites |> | ||
select( | ||
sitename, | ||
year_start = year_start_gpp, | ||
year_end = year_end_gpp) |> | ||
ggplot(aes(y = sitename, | ||
xmin = year_start, | ||
xmax = year_end)) + | ||
geom_linerange() + | ||
theme(legend.title = element_blank(), | ||
legend.position="top") + | ||
labs(title = "Good data sequences", | ||
y = "Site") | ||
``` | ||
|