longTermAnnualTrends_lmSlope.Rmd

---
title: "Long-Term Annual Trends in T & DO (May to Sept)"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE)
library(tidyverse)

```

      There is one more thing I would like to try with this dataset, which is to look at long-term trends in seasonal changes in T and DO.  What I mean by this is that I would like to calculate rates of warming and declining oxygen for each year, and then determine whether there is a long-term trend in these annual trends.
    For this analysis we will be using data for May-Sep (since lakes are cooling by Oct).
    I think we should focus on data starting in 1999 as I don't think the earlier data will have sufficient resolution (enough months in a given year).
    As an example, the first set of relevant data in the 41-station file is from 1999 for station 2-JKS044.60.  In this year, there are monthly measurements starting May 18 through Sep 23.  You will want to calculate a slope to depict the rate of change in T and DO min, mean, max, and range.     In this case, there are 5 observations for Tmin (5.7 to 8.6) yielding a slope in C/day.  You would then calculate the same slopes for the next available year (2001).  I would suggest that we only use years in which we have a minimum of 4 observations (i.e., measurements in 4 of the 5 months).  For the surface-only stations, we only need slopes for Tmax and DOmax.
    Once we have the slopes for each year, then we can fit an mblm regression for the data from that station.
    
This analysis effectively substitutes the station-year for the station-month as the unit of analysis in deriving mblm trends at the station level.

```{r `read in data`}
trendSites.df <- openxlsx::read.xlsx("data_original.xlsx", sheet = 2)
allData.df <- openxlsx::read.xlsx("data_original.xlsx", sheet = 3)

# Convert `date` column from Excel encoded date to a more legible date format. Otherwise date shows as numeric value, e.g. '44230'.
    allData.df$Date <- allData.df$Date * 86400      # 86400 = seconds in a day.
    allData.df$Date <- as.POSIXct(allData.df$Date, origin = "1899-12-30", tz = "UTC")
    
```

```{r 'remove anomlies'}
# a surface temperature (Tmax) of 2.8 C in August
# 13774 	6ACNR000.00 	8/28/1990 	0.3 	NA 	NA 	NA 	NA

which(grepl("13774", allData.df$X1))
allData.df[13770, (5:8)] <- NA

```

```{r `subset data`}
# filter for rows corresponding to the 41 stations
station_IDS <- trendSites.df$FDT_STA_ID

surfaceOnly_IDs <- c("2-JKS053.48",
                    "2-XDD000.40",
                    "4AROA192.55",
                    "4AROA196.05",
                    "5ASRN000.66",
                    "6ACNR000.00",
                    "6APNR008.15") # only surface water measurements ever taken

# May to Sep only; reservoirs are cooling by Oct
subset.df <- allData.df %>%
  mutate(MonthNum = lubridate::month(Date)) %>%  
  filter(FDT_STA_ID %in% station_IDS) %>%
  filter(MonthNum %in% 5:9) %>%
  filter(year(Date) >= 1999) %>%
  mutate(stationYear = paste0(FDT_STA_ID,"-", as.character(year(Date))))

# Identify station years with < 4 profiles
yrs_insufObs <- subset.df %>%
  group_by(FDT_STA_ID, year(Date)) %>%
  summarize(n()) %>%
  filter(`n()` < 4) %>%
  mutate(stationYear = paste0(FDT_STA_ID,"-", as.character(`year(Date)`)))

# exclude years with fewer than 4 obs
subset.df <- subset.df %>%
  filter(!stationYear %in% yrs_insufObs$stationYear)

```

Variable-by-variable filtering must be done to further eliminate station-years where Nobs < 4. Some station-years that passed initial filter above nevertheless lack 4 or more observations for specific variables.

```{r `slopes list`}
slopesList <- list()

```

## Min, Mean, Range slopes

**Exclude surface-only stations from these analyses** 

### Temp

```{r}
# TMin
## --------------------------------------------
# Derive valid Nobs for each station-year
TMin_yrSta_Nobs <- subset.df %>%
  group_by(FDT_STA_ID, year(Date)) %>%
  summarize(n(),
            NA_count = length(which(is.na(TMin))),
            stationYear = paste0(FDT_STA_ID,"-", as.character(year(Date)))) %>%
  mutate(validObs = `n()` - NA_count) %>%
  summarize(stationYear = stationYear,
            validObs = validObs) %>%
  ungroup()

# Identify station-years w/ fewer than 4 Nobs
TMin_insufNobs <- TMin_yrSta_Nobs %>%
  group_by(stationYear) %>%
  filter(validObs < 4) %>%
  summarize(stationYear = unique(stationYear))

TMin_data <- subset.df %>%
  select(FDT_STA_ID, TMin, DOY, Date, stationYear) %>%
  filter(!stationYear %in% TMin_insufNobs$stationYear) %>%
  filter(!FDT_STA_ID %in% surfaceOnly_IDs)

# df to hold model slopes
TMin_lmslopes <- data.frame(StationYear = unique(TMin_data$stationYear),
                          model_slope = NA,
                          pval = NA,
                          stdErr = NA)


for (sy in unique(TMin_data$stationYear)) {
  model_data <- TMin_data %>%
    filter(stationYear == sy)
  
  model <- lm(TMin ~ DOY, data = model_data)
  modsum <- summary(model)
  
  TMin_lmslopes[TMin_lmslopes$StationYear == sy, 2] <- modsum$coefficients[2, 1]
  TMin_lmslopes[TMin_lmslopes$StationYear == sy, 3] <- modsum$coefficients[2, 4]
  TMin_lmslopes[TMin_lmslopes$StationYear == sy, 4] <- modsum$coefficients[2, 2]
  
}

TMin_lmslopes <- TMin_lmslopes %>%
  mutate(Year = as.numeric(sub(".*-", "", StationYear))) %>%
  mutate(StationID = sub("-[0-9]{4}$", "", StationYear))

slopesList[["TMin"]] <- TMin_lmslopes

```

```{r}
# TMean
## --------------------------------------------
# Derive valid Nobs for each station-year
TMean_yrSta_Nobs <- subset.df %>%
  group_by(FDT_STA_ID, year(Date)) %>%
  summarize(n(),
            NA_count = length(which(is.na(TMean))),
            stationYear = paste0(FDT_STA_ID,"-", as.character(year(Date)))) %>%
  mutate(validObs = `n()` - NA_count) %>%
  summarize(stationYear = stationYear,
            validObs = validObs) %>%
  ungroup()

# Identify station-years w/ fewer than 4 Nobs
TMean_insufNobs <- TMean_yrSta_Nobs %>%
  group_by(stationYear) %>%
  filter(validObs < 4) %>%
  summarize(stationYear = unique(stationYear))

TMean_data <- subset.df %>%
  select(FDT_STA_ID, TMean, DOY, Date, stationYear) %>%
  filter(!stationYear %in% TMean_insufNobs$stationYear) %>%
  filter(!FDT_STA_ID %in% surfaceOnly_IDs)

# df to hold model slopes
TMean_lmslopes <- data.frame(StationYear = unique(TMean_data$stationYear),
                          model_slope = NA,
                          pval = NA,
                          stdErr = NA)


for (sy in unique(TMean_data$stationYear)) {
  model_data <- TMean_data %>%
    filter(stationYear == sy)
  
  model <- lm(TMean ~ DOY, data = model_data)
  modsum <- summary(model)
  
  TMean_lmslopes[TMean_lmslopes$StationYear == sy, 2] <- modsum$coefficients[2, 1]
  TMean_lmslopes[TMean_lmslopes$StationYear == sy, 3] <- modsum$coefficients[2, 4]
  TMean_lmslopes[TMean_lmslopes$StationYear == sy, 4] <- modsum$coefficients[2, 2]
  
}

TMean_lmslopes <- TMean_lmslopes %>%
  mutate(Year = as.numeric(sub(".*-", "", StationYear))) %>%
  mutate(StationID = sub("-[0-9]{4}$", "", StationYear))

slopesList[["TMean"]] <- TMean_lmslopes

```

```{r}
# TRange
## --------------------------------------------
# Derive valid Nobs for each station-year
TRange_yrSta_Nobs <- subset.df %>%
  group_by(FDT_STA_ID, year(Date)) %>%
  summarize(n(),
            NA_count = length(which(is.na(TRange))),
            stationYear = paste0(FDT_STA_ID,"-", as.character(year(Date)))) %>%
  mutate(validObs = `n()` - NA_count) %>%
  summarize(stationYear = stationYear,
            validObs = validObs) %>%
  ungroup()

# Identify station-years w/ fewer than 4 Nobs
TRange_insufNobs <- TRange_yrSta_Nobs %>%
  group_by(stationYear) %>%
  filter(validObs < 4) %>%
  summarize(stationYear = unique(stationYear))

TRange_data <- subset.df %>%
  select(FDT_STA_ID, TRange, DOY, Date, stationYear) %>%
  filter(!stationYear %in% TRange_insufNobs$stationYear) %>%
  filter(!FDT_STA_ID %in% surfaceOnly_IDs)

# df to hold model slopes
TRange_lmslopes <- data.frame(StationYear = unique(TRange_data$stationYear),
                          model_slope = NA,
                          pval = NA,
                          stdErr = NA)


for (sy in unique(TRange_data$stationYear)) {
  model_data <- TRange_data %>%
    filter(stationYear == sy)
  
  model <- lm(TRange ~ DOY, data = model_data)
  modsum <- summary(model)
  
  TRange_lmslopes[TRange_lmslopes$StationYear == sy, 2] <- modsum$coefficients[2, 1]
  TRange_lmslopes[TRange_lmslopes$StationYear == sy, 3] <- modsum$coefficients[2, 4]
  TRange_lmslopes[TRange_lmslopes$StationYear == sy, 4] <- modsum$coefficients[2, 2]
  
}

TRange_lmslopes <- TRange_lmslopes %>%
  mutate(Year = as.numeric(sub(".*-", "", StationYear))) %>%
  mutate(StationID = sub("-[0-9]{4}$", "", StationYear))

slopesList[["TRange"]] <- TRange_lmslopes

```

### Dissolved Oxygen

```{r}
# DOMin
## --------------------------------------------
# Derive valid Nobs for each station-year
DOMin_yrSta_Nobs <- subset.df %>%
  group_by(FDT_STA_ID, year(Date)) %>%
  summarize(n(),
            NA_count = length(which(is.na(DOMin))),
            stationYear = paste0(FDT_STA_ID,"-", as.character(year(Date)))) %>%
  mutate(validObs = `n()` - NA_count) %>%
  summarize(stationYear = stationYear,
            validObs = validObs) %>%
  ungroup()

# Identify station-years w/ fewer than 4 Nobs
DOMin_insufNobs <- DOMin_yrSta_Nobs %>%
  group_by(stationYear) %>%
  filter(validObs < 4) %>%
  summarize(stationYear = unique(stationYear))

DOMin_data <- subset.df %>%
  select(FDT_STA_ID, DOMin, DOY, Date, stationYear) %>%
  filter(!stationYear %in% DOMin_insufNobs$stationYear) %>%
  filter(!FDT_STA_ID %in% surfaceOnly_IDs)

# df to hold model slopes
DOMin_lmslopes <- data.frame(StationYear = unique(DOMin_data$stationYear),
                          model_slope = NA,
                          pval = NA,
                          stdErr = NA)


for (sy in unique(DOMin_data$stationYear)) {
  model_data <- DOMin_data %>%
    filter(stationYear == sy)
  
  model <- lm(DOMin ~ DOY, data = model_data)
  modsum <- summary(model)
  
  DOMin_lmslopes[DOMin_lmslopes$StationYear == sy, 2] <- modsum$coefficients[2, 1]
  DOMin_lmslopes[DOMin_lmslopes$StationYear == sy, 3] <- modsum$coefficients[2, 4]
  DOMin_lmslopes[DOMin_lmslopes$StationYear == sy, 4] <- modsum$coefficients[2, 2]
  
}

DOMin_lmslopes <- DOMin_lmslopes %>%
  mutate(Year = as.numeric(sub(".*-", "", StationYear))) %>%
  mutate(StationID = sub("-[0-9]{4}$", "", StationYear))

slopesList[["DOMin"]] <- DOMin_lmslopes

```


```{r}
# DOMean
## --------------------------------------------
# Derive valid Nobs for each station-year
DOMean_yrSta_Nobs <- subset.df %>%
  group_by(FDT_STA_ID, year(Date)) %>%
  summarize(n(),
            NA_count = length(which(is.na(DOMean))),
            stationYear = paste0(FDT_STA_ID,"-", as.character(year(Date)))) %>%
  mutate(validObs = `n()` - NA_count) %>%
  summarize(stationYear = stationYear,
            validObs = validObs) %>%
  ungroup()

# Identify station-years w/ fewer than 4 Nobs
DOMean_insufNobs <- DOMean_yrSta_Nobs %>%
  group_by(stationYear) %>%
  filter(validObs < 4) %>%
  summarize(stationYear = unique(stationYear))

DOMean_data <- subset.df %>%
  select(FDT_STA_ID, DOMean, DOY, Date, stationYear) %>%
  filter(!stationYear %in% DOMean_insufNobs$stationYear) %>%
  filter(!FDT_STA_ID %in% surfaceOnly_IDs)

# df to hold model slopes
DOMean_lmslopes <- data.frame(StationYear = unique(DOMean_data$stationYear),
                          model_slope = NA,
                          pval = NA,
                          stdErr = NA)


for (sy in unique(DOMean_data$stationYear)) {
  model_data <- DOMean_data %>%
    filter(stationYear == sy)
  
  model <- lm(DOMean ~ DOY, data = model_data)
  modsum <- summary(model)
  
  DOMean_lmslopes[DOMean_lmslopes$StationYear == sy, 2] <- modsum$coefficients[2, 1]
  DOMean_lmslopes[DOMean_lmslopes$StationYear == sy, 3] <- modsum$coefficients[2, 4]
  DOMean_lmslopes[DOMean_lmslopes$StationYear == sy, 4] <- modsum$coefficients[2, 2]
  
}

DOMean_lmslopes <- DOMean_lmslopes %>%
  mutate(Year = as.numeric(sub(".*-", "", StationYear))) %>%
  mutate(StationID = sub("-[0-9]{4}$", "", StationYear))

slopesList[["DOMean"]] <- DOMean_lmslopes

```

```{r}
# DORange
## --------------------------------------------
# Derive valid Nobs for each station-year
DORange_yrSta_Nobs <- subset.df %>%
  group_by(FDT_STA_ID, year(Date)) %>%
  summarize(n(),
            NA_count = length(which(is.na(DORange))),
            stationYear = paste0(FDT_STA_ID,"-", as.character(year(Date)))) %>%
  mutate(validObs = `n()` - NA_count) %>%
  summarize(stationYear = stationYear,
            validObs = validObs) %>%
  ungroup()

# Identify station-years w/ fewer than 4 Nobs
DORange_insufNobs <- DORange_yrSta_Nobs %>%
  group_by(stationYear) %>%
  filter(validObs < 4) %>%
  summarize(stationYear = unique(stationYear))

DORange_data <- subset.df %>%
  select(FDT_STA_ID, DORange, DOY, Date, stationYear) %>%
  filter(!stationYear %in% DORange_insufNobs$stationYear) %>%
  filter(!FDT_STA_ID %in% surfaceOnly_IDs)

# df to hold model slopes
DORange_lmslopes <- data.frame(StationYear = unique(DORange_data$stationYear),
                          model_slope = NA,
                          pval = NA,
                          stdErr = NA)


for (sy in unique(DORange_data$stationYear)) {
  model_data <- DORange_data %>%
    filter(stationYear == sy)
  
  model <- lm(DORange ~ DOY, data = model_data)
  modsum <- summary(model)
  
  DORange_lmslopes[DORange_lmslopes$StationYear == sy, 2] <- modsum$coefficients[2, 1]
  DORange_lmslopes[DORange_lmslopes$StationYear == sy, 3] <- modsum$coefficients[2, 4]
  DORange_lmslopes[DORange_lmslopes$StationYear == sy, 4] <- modsum$coefficients[2, 2]
  
}

DORange_lmslopes <- DORange_lmslopes %>%
  mutate(Year = as.numeric(sub(".*-", "", StationYear))) %>%
  mutate(StationID = sub("-[0-9]{4}$", "", StationYear))

slopesList[["DORange"]] <- DORange_lmslopes

```

## TMax and DOMax slopes

Include surface only stations

```{r}
# TMax
## --------------------------------------------
# Derive valid Nobs for each station-year
TMax_yrSta_Nobs <- subset.df %>%
  group_by(FDT_STA_ID, year(Date)) %>%
  summarize(n(),
            NA_count = length(which(is.na(TMax))),
            stationYear = paste0(FDT_STA_ID,"-", as.character(year(Date)))) %>%
  mutate(validObs = `n()` - NA_count) %>%
  summarize(stationYear = stationYear,
            validObs = validObs) %>%
  ungroup()

# Identify station-years w/ fewer than 4 Nobs
TMax_insufNobs <- TMax_yrSta_Nobs %>%
  group_by(stationYear) %>%
  filter(validObs < 4) %>%
  summarize(stationYear = unique(stationYear))

TMax_data <- subset.df %>%
  select(FDT_STA_ID, TMax, DOY, Date, stationYear) %>%
  filter(!stationYear %in% TMax_insufNobs$stationYear)

# df to hold model slopes
TMax_lmslopes <- data.frame(StationYear = unique(TMax_data$stationYear),
                          model_slope = NA,
                          pval = NA,
                          stdErr = NA)


for (sy in unique(TMax_data$stationYear)) {
  model_data <- TMax_data %>%
    filter(stationYear == sy)
  
  model <- lm(TMax ~ DOY, data = model_data)
  modsum <- summary(model)
  
  TMax_lmslopes[TMax_lmslopes$StationYear == sy, 2] <- modsum$coefficients[2, 1]
  TMax_lmslopes[TMax_lmslopes$StationYear == sy, 3] <- modsum$coefficients[2, 4]
  TMax_lmslopes[TMax_lmslopes$StationYear == sy, 4] <- modsum$coefficients[2, 2]
  
}

TMax_lmslopes <- TMax_lmslopes %>%
  mutate(Year = as.numeric(sub(".*-", "", StationYear))) %>%
  mutate(StationID = sub("-[0-9]{4}$", "", StationYear))

slopesList[["TMax"]] <- TMax_lmslopes

```

```{r}
# DOMax
## --------------------------------------------
# Derive valid Nobs for each station-year
DOMax_yrSta_Nobs <- subset.df %>%
  group_by(FDT_STA_ID, year(Date)) %>%
  summarize(n(),
            NA_count = length(which(is.na(DOMax))),
            stationYear = paste0(FDT_STA_ID,"-", as.character(year(Date)))) %>%
  mutate(validObs = `n()` - NA_count) %>%
  summarize(stationYear = stationYear,
            validObs = validObs) %>%
  ungroup()

# Identify station-years w/ fewer than 4 Nobs
DOMax_insufNobs <- DOMax_yrSta_Nobs %>%
  group_by(stationYear) %>%
  filter(validObs < 4) %>%
  summarize(stationYear = unique(stationYear))

DOMax_data <- subset.df %>%
  select(FDT_STA_ID, DOMax, DOY, Date, stationYear) %>%
  filter(!stationYear %in% DOMax_insufNobs$stationYear)

# df to hold model slopes
DOMax_lmslopes <- data.frame(StationYear = unique(DOMax_data$stationYear),
                          model_slope = NA,
                          pval = NA,
                          stdErr = NA)


for (sy in unique(DOMax_data$stationYear)) {
  model_data <- DOMax_data %>%
    filter(stationYear == sy)
  
  model <- lm(DOMax ~ DOY, data = model_data)
  modsum <- summary(model)
  
  DOMax_lmslopes[DOMax_lmslopes$StationYear == sy, 2] <- modsum$coefficients[2, 1]
  DOMax_lmslopes[DOMax_lmslopes$StationYear == sy, 3] <- modsum$coefficients[2, 4]
  DOMax_lmslopes[DOMax_lmslopes$StationYear == sy, 4] <- modsum$coefficients[2, 2]
  
}

DOMax_lmslopes <- DOMax_lmslopes %>%
  mutate(Year = as.numeric(sub(".*-", "", StationYear))) %>%
  mutate(StationID = sub("-[0-9]{4}$", "", StationYear))

slopesList[["DOMax"]] <- DOMax_lmslopes

```

## All Slopes -> Single DF

```{r}
# Add field to each df in slopesList to identiy the variable used in analysis
for (i in 1:length(slopesList)) {
  
  varName <- names(slopesList[i])
  slopesList[[i]]$variable <- varName
}


slopes.df <- bind_rows(slopesList)
slopes.df <- slopes.df %>%
  select(-StationYear, -pval, -stdErr)

wider_df <- slopes.df %>%
  pivot_wider(names_from = variable, values_from = model_slope)

write_csv(wider_df, "output_data/annualLongTerm/yearStation_slopeSummary.csv")
```


## MBLM, seasonal rate of change ~ year

```{r}
library(mblm)

```

### TMin

```{r `fit regressions TMin`}
#TMin

## --------------------------
# create list to store regression results
TMin_modSums <- list()

for (station_id in unique(TMin_lmslopes$StationID)) {
  model_data <- TMin_lmslopes %>%
    filter(StationID == station_id)
    

      model <- mblm(model_slope ~ Year, data = model_data)
      mod.sum <- summary.mblm(model)
      
      # store results
      TMin_modSums[[station_id]] <- list(
        slope = mod.sum$coefficients[2,1],
        MAD = mod.sum$coefficients["Year", "MAD"] ,
        pvalue = mod.sum$coefficients["Year", 4],
        intercept = mod.sum$coefficients["(Intercept)", 1]
        )
  }

```


```{r `add reg stats to TMin summary df`}
# create station-month key
TMin_mblm <- data.frame(StationID = unique(TMin_lmslopes$StationID),
          model_slope = NA,
          model_MAD = NA,
          model_pval = NA,
          model_intcpt = NA,
          variable = "TMin")

for (i in 1:nrow(TMin_mblm)) {
  station = TMin_mblm$StationID[i]
  
  TMin_mblm$model_slope[i] <-  TMin_modSums[[station]]$slope
   TMin_mblm$model_MAD[i] <-  TMin_modSums[[station]]$MAD
    TMin_mblm$model_pval[i] <-  TMin_modSums[[station]]$pvalue
     TMin_mblm$model_intcpt[i] <- TMin_modSums[[station]]$intercept
  
}


#write_csv(TMin_mblm, "output_data/annualLongTerm/TMin_mblmAnnual.csv")
```


### TMean

```{r `fit regressions TMean`}
#TMean

## --------------------------
# create list to store regression results
TMean_modSums <- list()

for (station_id in unique(TMean_lmslopes$StationID)) {
  model_data <- TMean_lmslopes %>%
    filter(StationID == station_id)
    

      model <- mblm(model_slope ~ Year, data = model_data)
      mod.sum <- summary.mblm(model)
      
      # store results
      TMean_modSums[[station_id]] <- list(
        slope = mod.sum$coefficients[2,1],
        MAD = mod.sum$coefficients["Year", "MAD"] ,
        pvalue = mod.sum$coefficients["Year", 4],
        intercept = mod.sum$coefficients["(Intercept)", 1]
        )
  }

```


```{r `add reg stats to TMean summary df`}
# create station-month key
TMean_mblm <- data.frame(StationID = unique(TMean_lmslopes$StationID),
          model_slope = NA,
          model_MAD = NA,
          model_pval = NA,
          model_intcpt = NA,
          variable = "TMean")

for (i in 1:nrow(TMean_mblm)) {
  station = TMean_mblm$StationID[i]
  
  TMean_mblm$model_slope[i] <-  TMean_modSums[[station]]$slope
   TMean_mblm$model_MAD[i] <-  TMean_modSums[[station]]$MAD
    TMean_mblm$model_pval[i] <-  TMean_modSums[[station]]$pvalue
     TMean_mblm$model_intcpt[i] <- TMean_modSums[[station]]$intercept
  
}

#write_csv(TMean_mblm, "output_data/annualLongTerm/TMean_mblmAnnual.csv")
```

### TRange

```{r `fit regressions TRange`}
#TRange

## --------------------------
# create list to store regression results
TRange_modSums <- list()

for (station_id in unique(TRange_lmslopes$StationID)) {
  model_data <- TRange_lmslopes %>%
    filter(StationID == station_id)
    

      model <- mblm(model_slope ~ Year, data = model_data)
      mod.sum <- summary.mblm(model)
      
      # store results
      TRange_modSums[[station_id]] <- list(
        slope = mod.sum$coefficients[2,1],
        MAD = mod.sum$coefficients["Year", "MAD"] ,
        pvalue = mod.sum$coefficients["Year", 4],
        intercept = mod.sum$coefficients["(Intercept)", 1]
        )
  }

```


```{r `add reg stats to TRange summary df`}
# create station-month key
TRange_mblm <- data.frame(StationID = unique(TRange_lmslopes$StationID),
          model_slope = NA,
          model_MAD = NA,
          model_pval = NA,
          model_intcpt = NA,
          variable = "TRange")

for (i in 1:nrow(TRange_mblm)) {
  station = TRange_mblm$StationID[i]
  
  TRange_mblm$model_slope[i] <-  TRange_modSums[[station]]$slope
   TRange_mblm$model_MAD[i] <-  TRange_modSums[[station]]$MAD
    TRange_mblm$model_pval[i] <-  TRange_modSums[[station]]$pvalue
     TRange_mblm$model_intcpt[i] <- TRange_modSums[[station]]$intercept
  
}

#write_csv(TRange_mblm, "output_data/annualLongTerm/TRange_mblmAnnual.csv")
```


### TMax

```{r `fit regressions TMax`}
#TMax

## --------------------------
# create list to store regression results
TMax_modSums <- list()

for (station_id in unique(TMax_lmslopes$StationID)) {
  model_data <- TMax_lmslopes %>%
    filter(StationID == station_id)
    

      model <- mblm(model_slope ~ Year, data = model_data)
      mod.sum <- summary.mblm(model)
      
      # store results
      TMax_modSums[[station_id]] <- list(
        slope = mod.sum$coefficients[2,1],
        MAD = mod.sum$coefficients["Year", "MAD"] ,
        pvalue = mod.sum$coefficients["Year", 4],
        intercept = mod.sum$coefficients["(Intercept)", 1]
        )
  }

```


```{r `add reg stats to TMax summary df`}
# create station-month key
TMax_mblm <- data.frame(StationID = unique(TMax_lmslopes$StationID),
          model_slope = NA,
          model_MAD = NA,
          model_pval = NA,
          model_intcpt = NA,
          variable = "TMax")

for (i in 1:nrow(TMax_mblm)) {
  station = TMax_mblm$StationID[i]
  
  TMax_mblm$model_slope[i] <-  TMax_modSums[[station]]$slope
   TMax_mblm$model_MAD[i] <-  TMax_modSums[[station]]$MAD
    TMax_mblm$model_pval[i] <-  TMax_modSums[[station]]$pvalue
     TMax_mblm$model_intcpt[i] <- TMax_modSums[[station]]$intercept
  
}

#write_csv(TMax_mblm, "output_data/annualLongTerm/TMax_mblmAnnual.csv")
```

### DOMin

```{r `fit regressions DOMin`}
#DOMin

## --------------------------
# create list to store regression results
DOMin_modSums <- list()

for (station_id in unique(DOMin_lmslopes$StationID)) {
  model_data <- DOMin_lmslopes %>%
    filter(StationID == station_id)
    

      model <- mblm(model_slope ~ Year, data = model_data)
      mod.sum <- summary.mblm(model)
      
      # store results
      DOMin_modSums[[station_id]] <- list(
        slope = mod.sum$coefficients[2,1],
        MAD = mod.sum$coefficients["Year", "MAD"] ,
        pvalue = mod.sum$coefficients["Year", 4],
        intercept = mod.sum$coefficients["(Intercept)", 1]
        )
  }

```


```{r `add reg stats to DOMin summary df`}
# create station-month key
DOMin_mblm <- data.frame(StationID = unique(DOMin_lmslopes$StationID),
          model_slope = NA,
          model_MAD = NA,
          model_pval = NA,
          model_intcpt = NA,
          variable = "DOMin")

for (i in 1:nrow(DOMin_mblm)) {
  station = DOMin_mblm$StationID[i]
  
  DOMin_mblm$model_slope[i] <-  DOMin_modSums[[station]]$slope
   DOMin_mblm$model_MAD[i] <-  DOMin_modSums[[station]]$MAD
    DOMin_mblm$model_pval[i] <-  DOMin_modSums[[station]]$pvalue
     DOMin_mblm$model_intcpt[i] <- DOMin_modSums[[station]]$intercept
  
}

#write_csv(DOMin_mblm, "output_data/annualLongTerm/DOMin_mblmAnnual.csv")
```


### DOMean

```{r `fit regressions DOMean`}
#DOMean

## --------------------------
# create list to store regression results
DOMean_modSums <- list()

for (station_id in unique(DOMean_lmslopes$StationID)) {
  model_data <- DOMean_lmslopes %>%
    filter(StationID == station_id)
    

      model <- mblm(model_slope ~ Year, data = model_data)
      mod.sum <- summary.mblm(model)
      
      # store results
      DOMean_modSums[[station_id]] <- list(
        slope = mod.sum$coefficients[2,1],
        MAD = mod.sum$coefficients["Year", "MAD"] ,
        pvalue = mod.sum$coefficients["Year", 4],
        intercept = mod.sum$coefficients["(Intercept)", 1]
        )
  }

```


```{r `add reg stats to DOMean summary df`}
# create station-month key
DOMean_mblm <- data.frame(StationID = unique(DOMean_lmslopes$StationID),
          model_slope = NA,
          model_MAD = NA,
          model_pval = NA,
          model_intcpt = NA,
          variable = "DOMean")

for (i in 1:nrow(DOMean_mblm)) {
  station = DOMean_mblm$StationID[i]
  
  DOMean_mblm$model_slope[i] <-  DOMean_modSums[[station]]$slope
   DOMean_mblm$model_MAD[i] <-  DOMean_modSums[[station]]$MAD
    DOMean_mblm$model_pval[i] <-  DOMean_modSums[[station]]$pvalue
     DOMean_mblm$model_intcpt[i] <- DOMean_modSums[[station]]$intercept
  
}

#write_csv(DOMean_mblm, "output_data/annualLongTerm/DOMean_mblmAnnual.csv")
```


### DORange

```{r `fit regressions DORange`}
#DORange

## --------------------------
# create list to store regression results
DORange_modSums <- list()

for (station_id in unique(DORange_lmslopes$StationID)) {
  model_data <- DORange_lmslopes %>%
    filter(StationID == station_id)
    

      model <- mblm(model_slope ~ Year, data = model_data)
      mod.sum <- summary.mblm(model)
      
      # store results
      DORange_modSums[[station_id]] <- list(
        slope = mod.sum$coefficients[2,1],
        MAD = mod.sum$coefficients["Year", "MAD"] ,
        pvalue = mod.sum$coefficients["Year", 4],
        intercept = mod.sum$coefficients["(Intercept)", 1]
        )
  }

```


```{r `add reg stats to DORange summary df`}
# create station-month key
DORange_mblm <- data.frame(StationID = unique(DORange_lmslopes$StationID),
          model_slope = NA,
          model_MAD = NA,
          model_pval = NA,
          model_intcpt = NA,
          variable = "DORange")

for (i in 1:nrow(DORange_mblm)) {
  station = DORange_mblm$StationID[i]
  
  DORange_mblm$model_slope[i] <-  DORange_modSums[[station]]$slope
   DORange_mblm$model_MAD[i] <-  DORange_modSums[[station]]$MAD
    DORange_mblm$model_pval[i] <-  DORange_modSums[[station]]$pvalue
     DORange_mblm$model_intcpt[i] <- DORange_modSums[[station]]$intercept
  
}

#write_csv(DORange_mblm, "output_data/annualLongTerm/DORange_mblmAnnual.csv")
```

### DOMax

```{r `fit regressions DOMax`}
#DOMax

## --------------------------
# create list to store regression results
DOMax_modSums <- list()

for (station_id in unique(DOMax_lmslopes$StationID)) {
  model_data <- DOMax_lmslopes %>%
    filter(StationID == station_id)
    

      model <- mblm(model_slope ~ Year, data = model_data)
      mod.sum <- summary.mblm(model)
      
      # store results
      DOMax_modSums[[station_id]] <- list(
        slope = mod.sum$coefficients[2,1],
        MAD = mod.sum$coefficients["Year", "MAD"] ,
        pvalue = mod.sum$coefficients["Year", 4],
        intercept = mod.sum$coefficients["(Intercept)", 1]
        )
  }

```


```{r `add reg stats to DOMax summary df`}
# create station-month key
DOMax_mblm <- data.frame(StationID = unique(DOMax_lmslopes$StationID),
          model_slope = NA,
          model_MAD = NA,
          model_pval = NA,
          model_intcpt = NA,
          variable = "DOMax")

for (i in 1:nrow(DOMax_mblm)) {
  station = DOMax_mblm$StationID[i]
  
  DOMax_mblm$model_slope[i] <-  DOMax_modSums[[station]]$slope
   DOMax_mblm$model_MAD[i] <-  DOMax_modSums[[station]]$MAD
    DOMax_mblm$model_pval[i] <-  DOMax_modSums[[station]]$pvalue
     DOMax_mblm$model_intcpt[i] <- DOMax_modSums[[station]]$intercept
  
}

#write_csv(DOMax_mblm, "output_data/annualLongTerm/DOMax_mblmAnnual.csv")
```


```{r}
## Combine all mblm model results into single df

varlist <- ls()

myvars <- varlist[which(grepl("_mblm$", varlist))]

combined_data <- do.call(rbind, lapply(myvars, function(x) {
  get(x)
}))

# write_csv(combined_data, "output_data/annualLongTerm/allVariablesCombinbed.csv")

```

## Data Viz: mblm slopes for all stations, by variable

```{r}
# geom_line has no 'mblm' formula option; abline is the work around.
# but abline has no axis values associated, it is just an infinite line based on slope and intercept.
# effectively plotting the mblm lines thus requires a geom_ layer to define the scales. 

# also, field to color lines by slope significance must be added
mblm_data <- combined_data %>%
  mutate(significant = if_else(model_pval <= .05 & model_slope > 0, "significant positive", 
                               if_else(model_pval <= .05 & model_slope < 0, "significant negative", "not significant"))
         ) %>%
  mutate(var_class = ifelse(startsWith(variable, "T"), "Temperature", "Dissolved Oxygen")) %>%
  mutate(var_type = case_when(
    endsWith(variable, "Max") ~ "Max",
    endsWith(variable, "Min") ~ "Min",
    endsWith(variable, "Range") ~ "Range",
    endsWith(variable, "Mean") ~ "Mean",
    TRUE ~ ""
  ))

# set significant as factor to ensure sig lines sit on top of insignificant lines
mblm_data$significant <- factor(mblm_data$significant, levels = c("significant positive", "significant negative", "not significant"))

# point plotting variable -- new var name so that it can be modified to remove outliers as needed without affecting the original data frame
annualTrends.df <- slopes.df

## The following was iterative, based on plotting and replotting using facet_grid and visually asessing the plots. For some reason scale = "free" did not actually allow for free scales, such that each row (and col?)
# Remove outlier from TMAx to tighten the y axis range.
xoutlier <- annualTrends.df %>%
  filter(variable == "TMax") %>%
  summarize(min = min(model_slope))

outlier_index <- which(annualTrends.df$model_slope == xoutlier$min & annualTrends.df$variable == "TMax")

annualTrends.df$model_slope[outlier_index] <- NA

# Remove max and min from TRange
xoutlier <- annualTrends.df %>%
  filter(variable == "TRange") %>%
  summarize(min = min(model_slope),
            max = max(model_slope))

outlier_indices <- which((annualTrends.df$model_slope == xoutlier$min | annualTrends.df$model_slope == xoutlier$max) & annualTrends.df$variable == "TRange")

annualTrends.df$model_slope[outlier_indices] <- NA

```


```{r}

plot <- ggplot() +
  geom_point(data = annualTrends.df, aes(x = Year, y = model_slope), alpha = 0) +
  geom_abline(data = mblm_data, aes(intercept = model_intcpt, slope = model_slope, color = significant), 
              linewidth = .42, alpha = .6) +
   labs(title = NULL,
       x = "Year",
       y = NULL) +  
  facet_grid(var_class ~ var_type, scales = "free",
             labeller = labeller(var_class = as_labeller(c("Dissolved Oxygen" = "Dissolved Oxygen (mg 02/L/day)", "Temperature" = "Temperature (C/day)")))) +
  scale_color_manual(values = c("not significant" = "grey72", 
                                "significant positive" = "#C9A818", 
                                "significant negative" = "#124E78")) +
  theme_minimal() + 
  theme(legend.position = "top",
        panel.border = element_rect(colour = "black", fill=NA, size=.5),
        plot.title = element_text(size = 16),
        axis.title = element_text(size = 14), 
        axis.text = element_text(size = 9.5), 
        axis.text.x = element_text(angle = 45, hjust = .8),
        legend.text = element_text(size = 8),  
        legend.title = element_blank(),
        strip.text = element_text(size = 11),
        panel.grid = element_line(color = "grey90", linewidth = .1),
        aspect.ratio = 1.8)

plot(plot)
ggsave("annualLongterm_plot2.jpg", plot)
```

# Custom labels after creating plot using `grid` package
## Create grob object

# define my label positions and text
y_temp <- 0.2
y_do <- 0.7
lab_temp <- "°C/day"
lab_do <- "mg 02/L/d"
rot = 90
grid.text(lab_temp, x = 0, y = y_temp, rot = 90, gp=gpar(fontsize=20, col="grey"))


# Draw the modified plot
x <- grid::grid.draw(pgrob, recording = TRUE)