From f1df1a67aa65fa6fcf81518b9b6a496295658f2b Mon Sep 17 00:00:00 2001
From: kellijohnson-NOAA <kelli.johnson@noaa.gov>
Date: Tue, 3 Dec 2024 07:54:00 -0800
Subject: [PATCH] Adds demo with long-form inputs for length

---
 vignettes/fims-demo-length.Rmd | 503 +++++++++++++++++++++++++++++++++
 1 file changed, 503 insertions(+)
 create mode 100644 vignettes/fims-demo-length.Rmd

diff --git a/vignettes/fims-demo-length.Rmd b/vignettes/fims-demo-length.Rmd
new file mode 100644
index 00000000..14b93f4e
--- /dev/null
+++ b/vignettes/fims-demo-length.Rmd
@@ -0,0 +1,503 @@
+---
+title: "Introducing the Fisheries Integrated Modeling System (FIMS)"
+output: github_document
+vignette: >
+  %\VignetteIndexEntry{Introducing the Fisheries Integrated Modeling System (FIMS)}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+
+```{r setup, include=FALSE}
+knitr::opts_chunk$set(echo = TRUE)
+# library(dplyr)
+```
+
+## Fisheries Integrated Modeling System
+The NOAA Fisheries Integrated Modeling System (FIMS) is a new modeling framework for fisheries modeling. FIMS is a software system designed and architected to support next-generation fisheries stock assessment, ecosystem, and socioeconomic modeling. It's important to note that FIMS itself is not a model, but rather a framework for creating models. The framework is made up of many modules that come together to create a "the best model" that suites the needs of the end-user. What follows is a demo of creating a catch-at-age assessment model using FIMS. 
+
+## Creating Models in FIMS
+To begin, we import the FIMS and TMB libraries. Calling `library(FIMS)` automatically loads the Rcpp functions and modules into the R environment. The function call, `clear()`, ensures C++ memory from any previous fims model run is cleared out. 
+```{r fims1, warning=FALSE, message=FALSE}
+# automatically loads fims Rcpp module
+library(FIMS)
+library(TMB)
+
+# clear memory
+clear()
+
+```
+## Setting up Data
+Data and variable values are taken from the [Li et. al.](https://spo.nmfs.noaa.gov/content/fishery-bulletin/comparison-4-primary-age-structured-stock-assessment-models-used-united) Model Comparison project ([github site](https://github.com/Bai-Li-NOAA/Age_Structured_Stock_Assessment_Model_Comparison)). See [R/data_mile1.R](https://github.com/NOAA-FIMS/FIMS/blob/main/R/data_mile1.R) and [tests/testthat/test-fims-estimation.R](https://github.com/NOAA-FIMS/FIMS/blob/integration-test-estimation-R-clear-FIMS/tests/testthat/test-fims-estimation.R) for details on how data and variable values are read into FIMS from the Model Comparison project.
+
+First let's set up the dimensions of the model based on the Model Comparison project:
+```{r fims-dims}
+include_age_comps <- TRUE
+include_length_comps <- FALSE
+nyears <- 30 # the number of years which we have data for.
+nseasons <- 1 # the number of seasons in each year. FIMS currently defaults to 1
+ages <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12) # age vector.
+nages <- 12 # the number of age groups.
+if(include_length_comps){
+  comp_lengths <- seq(0,1100,50) # length vector.
+  nlengths <- 23 # the number of length bins.
+}
+```
+
+### Preparing Data using FIMSFrame
+
+ We will be reading data into the model using the FIMSFrame S4 R class set up in [R/fimsframe.R](https://github.com/NOAA-FIMS/FIMS/blob/main/R/fimsframe.R)
+
+```{r fimsframe}
+# use FIMS data frame
+data(package = "FIMS")
+if(include_length_comps){
+  fims_frame <- FIMSFrame(data_mile2)
+}else{
+  fims_frame <- FIMSFrame(data_mile1)
+}
+```
+
+The `fims_frame` object contains a `@data` slot that holds a long data frame with catch data for the fishery and index data for the survey:
+
+```{r ageframe}
+str(fims_frame)
+fims_frame@data |>
+  dplyr::filter(type == "landings") |>
+  utils::head()
+fims_frame@data |>
+  dplyr::filter(type == "index") |>
+  utils::head()
+```
+
+
+Using this data frame, we will start setting up the FIMS data objects. This example from the Model Comparison project sets up a single fishery fleet with age composition and catch data and a single survey with age composition data and an index. Data are read into FIMS as long vectors, regardless of their original dimension, hence the motivation behind the long data frames created with the fimsframe S4 classes. 
+
+```{r data}
+# fishery data
+fishery_catch <- FIMS::m_landings(fims_frame)
+
+# survey data
+survey_index <- FIMS::m_index(fims_frame, "survey1")
+
+
+if(include_age_comps){
+# fishery agecomp
+fishery_agecomp <- FIMS::m_agecomp(fims_frame, "fleet1")
+# survey agecomp
+survey_agecomp <- FIMS::m_agecomp(fims_frame, "survey1")
+}
+
+if(include_length_comps){
+fishery_lengthcomp <- FIMS::m_lengthcomp(fims_frame, "fleet1")
+fishery_length_at_age <- FIMS::m_length_at_age(fims_frame, "fleet1")
+
+survey_lengthcomp <- FIMS::m_lengthcomp(fims_frame, "survey1")
+survey_length_at_age <- FIMS::m_length_at_age(fims_frame, "survey1")
+}
+```
+
+## Creating Modules in FIMS
+Now that we've prepared the data, let's pass it into FIMS. Each module in the FIMS-R interface is made of S4 objects. These S4 objects serve as a interface between R and the underlining C++ code that defines FIMS. Modules are instantiated using the `methods::new()` function. We can use `methods::show()` to view all the fields (i.e. variables) and methods (i.e. functions) available in a given module.
+
+### The Fleet Module
+
+#### Fleet Data
+
+Each fleet is required to have data in order to evaluate the objective function. Currently FIMS only has a fleet module that is used to set up both fleets and surveys. FIMS contains an Index module and AgeComp module to pass data objects into the fleet module. Each of these data modules require a dimension be added to indicate the dimensions of the raw data (e.g. nyears x nages matrix). Any years with missing data should be specified with a value set to -999. Given this information, FIMS is able to correctly apply dimension folding for model output.
+
+Using the `methods::show()` function, we can see that the Index module has a vector field named *index_data* and the AgeComp module has a vector field names *age_comp_data*. 
+
+```{r fleet-show}
+show(Index)
+if(include_age_comps){show(AgeComp)}
+```
+
+We'll create both index and age composition modules for the fleet using the `methods::new()` function and pass in the data defined above from the Model Comparison project.
+```{r fleet-set-data}
+# fleet index data
+fishing_fleet_index <- methods::new(Index, nyears)
+fishing_fleet_index$index_data <- fishery_catch # unit: mt
+
+
+if(include_age_comps){
+# fleet age composition data
+fishing_fleet_age_comp <- methods::new(AgeComp, nyears, nages)
+# Effective sampling size is 200
+fishing_fleet_age_comp$age_comp_data <- fishery_agecomp * 200 # unit: number at age; proportion at age also works
+}
+
+if(include_length_comps){
+fishing_fleet_length_comp <- methods::new(LengthComp, nyears, nlengths)
+# Effective sampling size is 200
+fishing_fleet_length_comp$length_comp_data <- fishery_lengthcomp * 200 # unit:
+}
+```
+
+#### Fleet Selectivity
+Now that we've passed in data for the fishing fleet, we need to set up its selectivity module. We will set this to be selectivity function using the LogisticSelectivity module. The`methods::show()` function indicates this module has two parameter fields: *inflection_point* and *slope*, and an `evaluate()` and `get_id()` function. 
+
+Each variable of [Parameter class](https://github.com/NOAA-FIMS/FIMS/blob/main/inst/include/interface/rcpp/rcpp_objects/rcpp_interface_base.hpp) has three additional fields: *value*, *is_random_effect*, and *estimated*. Currently, FIMS is not set up to run random effects. The default value for this field and the *estimate* field is currently set to `FALSE`. We can use the *value* field to input variables defined in the Model Comparison project.
+ 
+```{r fleet_selectivity}
+methods::show(LogisticSelectivity)
+fishing_fleet_selectivity <- methods::new(LogisticSelectivity)
+fishing_fleet_selectivity$inflection_point[1]$value <- 2.0
+fishing_fleet_selectivity$inflection_point[1]$is_random_effect <- FALSE
+fishing_fleet_selectivity$inflection_point[1]$estimated <- TRUE
+
+# Set to not be time-varying but else statement shows how to make the
+# parameter be time-varying, with a different value for every year
+time_varying <- FALSE
+if (!time_varying) {
+  fishing_fleet_selectivity$slope[1]$value <- 1.0
+  fishing_fleet_selectivity$slope[1]$is_random_effect <- FALSE
+  fishing_fleet_selectivity$slope[1]$estimated <- TRUE
+} else {
+  fishing_fleet_selectivity$slope$resize(nyears)
+  for (i in 1:nyears) {
+    fishing_fleet_selectivity$slope[i]$value <- 1.0
+    fishing_fleet_selectivity$slope[i]$is_random_effect <- FALSE
+    fishing_fleet_selectivity$slope[i]$estimated <- TRUE
+  }
+}
+```
+
+#### Creating the Fleet Object
+Now that we've created everything that a fleet needs, lets create the actual fleet object. First let's run `methods::show(Fleet)` to see all the fields and methods available from R. 
+
+```{r show-Fleet}
+show(Fleet)
+```
+We can see that there are five boolean flags: estimate_F, estimate_q, and is_survey, random_F, and random_q. There are two vectors, log_Fmort and log_obs_error, and a double, log_q. There are two integer fields for the number of ages and years. Additionally, there are five Methods: SetAgeCompLikelihood, SetIndexLikelihood, SetObservedAgeCompData, SetObservedIndexData, and setSelectivity. The last three of these will be used to link up the AgeComp, Index, and Selectivity modules defined above with the fleet module defined below. 
+
+
+```{r fleet}
+# Create fleet module
+fishing_fleet <- methods::new(Fleet)
+# Set nyears and nages
+fishing_fleet$nages <- nages
+fishing_fleet$nyears <- nyears
+# Set values for log_Fmort
+fishing_fleet$log_Fmort <- new(ParameterVector, log(c(
+  0.009459165, 0.02728886, 0.04506364,
+  0.06101782, 0.04860075, 0.08742055,
+  0.0884472, 0.1866079, 0.109009, 0.1327043,
+  0.1506155, 0.161243, 0.1166402, 0.1693461,
+  0.1801919, 0.1612405, 0.3145732, 0.2572476,
+  0.2548873, 0.2514621, 0.3491014, 0.2541077,
+  0.4184781, 0.3457212, 0.3436855, 0.3141712,
+  0.3080268, 0.4317453, 0.3280309, 0.4996754
+)), nyears)
+# Turn on estimation for F
+fishing_fleet$log_Fmort$set_all_estimable(TRUE)
+# Set value for log_q
+fishing_fleet$log_q[1]$value <- log(1.0)
+fishing_fleet$estimate_q <- FALSE
+fishing_fleet$random_q <- FALSE
+# Set Index, AgeComp, and Selectivity using the IDs from the modules defined above
+fishing_fleet$SetObservedIndexData(fishing_fleet_index$get_id())
+
+if(include_age_comps){
+  fishing_fleet$SetObservedAgeCompData(fishing_fleet_age_comp$get_id())
+}
+
+if(include_length_comps){
+  fishing_fleet$nlengths <- nlengths
+  fishing_fleet$SetObservedLengthCompData(fishing_fleet_length_comp$get_id())
+}
+
+fishing_fleet$SetSelectivity(fishing_fleet_selectivity$get_id())
+```
+
+We will use the Lognormal distribution for fitting the fishing fleet CPUE data. We indicate a log link function so that the expected value is the log of the expected CPUE. 
+```{r fleet-index-distribution}
+fishing_fleet_index_distribution <- initialize_data_distribution(
+  module = fishing_fleet,
+  family = lognormal(link = "log"),
+  sd = list(
+    value = rep(sqrt(log(0.01^2 + 1)), nyears),
+    estimated = FALSE
+  ),
+  data_type = "index"
+)
+```
+
+Now we will set up the distribution for the fishery age composition data using the Multinomial distribution
+```{r fleet-age-comp-distribution}
+
+if(include_age_comps){
+  fishing_fleet_agecomp_distribution <- initialize_data_distribution(
+  module = fishing_fleet,
+  family = multinomial(link = "logit"),
+  data_type = "agecomp"
+)
+
+}
+
+if(include_length_comps){
+  fishing_fleet_lengthcomp_distribution <- initialize_data_distribution(
+  module = fishing_fleet,
+  family = multinomial(link = "logit"),
+  data_type = "lengthcomp"
+)
+    #Set length-at-age conversion matrix
+  fishing_fleet$age_length_conversion_matrix <- new(ParameterVector,fishery_length_at_age,nages*nlengths)
+  # Turn off estimation for length-at-age
+  fishing_fleet$age_length_conversion_matrix$set_all_estimable(FALSE)
+  fishing_fleet$age_length_conversion_matrix$set_all_random(FALSE)
+}
+
+```
+### The Survey Module
+We will now repeat the steps from Fleet to set up the Survey. A survey object is essentially the same as a fleet object with a catchability (q) variable.
+
+#### Survey Data
+
+```{r survey-set-data}
+# fleet index data
+survey_fleet_index <- methods::new(Index, nyears)
+survey_fleet_index$index_data <- survey_index # unit: mt; it's possible to use other units as long as the index is assumed to be proportional to biomass
+
+if(include_age_comps){
+# survey age composition data
+survey_fleet_age_comp <- methods::new(AgeComp, nyears, nages)
+# Effective sampling size is 200
+survey_fleet_age_comp$age_comp_data <- survey_agecomp * 200 # unit: number at age; proportion at age also works
+}
+
+if(include_length_comps){
+  survey_fleet_length_comp <- methods::new(LengthComp, nyears, nlengths)
+# Effective sampling size is 200
+survey_fleet_length_comp$length_comp_data <- survey_lengthcomp * 200 # unit:
+}
+
+```
+
+#### Survey Selectivity
+
+```{r survey-selectivity}
+survey_fleet_selectivity <- new(LogisticSelectivity)
+survey_fleet_selectivity$inflection_point[1]$value <- 1.5
+survey_fleet_selectivity$inflection_point[1]$is_random_effect <- FALSE
+survey_fleet_selectivity$inflection_point[1]$estimated <- TRUE
+survey_fleet_selectivity$slope[1]$value <- 2.0
+survey_fleet_selectivity$slope[1]$is_random_effect <- FALSE
+survey_fleet_selectivity$slope[1]$estimated <- TRUE
+```
+
+
+#### Creating the Survey Object
+
+```{r survey}
+survey_fleet <- methods::new(Fleet)
+survey_fleet$is_survey <- TRUE
+survey_fleet$nages <- nages
+survey_fleet$nyears <- nyears
+# survey_fleet$estimate_F <- FALSE
+# survey_fleet$random_F <- FALSE
+survey_fleet$log_q[1]$value <- log(3.315143e-07)
+survey_fleet$estimate_q <- TRUE
+survey_fleet$random_q <- FALSE
+survey_fleet$SetObservedIndexData(survey_fleet_index$get_id())
+survey_fleet$SetSelectivity(survey_fleet_selectivity$get_id())
+
+if(include_age_comps){
+survey_fleet$SetObservedAgeCompData(survey_fleet_age_comp$get_id())
+}
+
+if(include_length_comps){
+survey_fleet$nlengths <- nlengths
+survey_fleet$SetObservedLengthCompData(survey_fleet_length_comp$get_id())
+}
+
+```
+
+```{r survey-distribution}
+survey_fleet_index_distribution <- initialize_data_distribution(
+  module = survey_fleet,
+  family = lognormal(link = "log"),
+  sd = list(
+    value = rep(sqrt(log(0.2^2 + 1)), nyears),
+    estimated = FALSE
+  ),
+  data_type = "index"
+)
+
+
+if(include_age_comps){
+# Age composition data
+survey_fleet_agecomp_distribution <- initialize_data_distribution(
+  module = survey_fleet,
+  family = multinomial(link = "logit"),
+  data_type = "agecomp"
+)
+}
+
+if(include_length_comps){
+survey_fleet_lengthcomp_distribution <- initialize_data_distribution(
+  module = survey_fleet,
+  family = multinomial(link = "logit"),
+  data_type = "lengthcomp"
+)
+#Set length-at-age conversion matrix
+survey_fleet$age_length_conversion_matrix <- new(ParameterVector,survey_length_at_age,nages*nlengths)
+# Turn off estimation for length-at-age
+survey_fleet$age_length_conversion_matrix$set_all_estimable(FALSE)
+survey_fleet$age_length_conversion_matrix$set_all_random(FALSE)
+}
+```
+
+### Creating a Population
+The final step is to set up the population module. Before doing so, we first need to set up each component of the population (e.g. recruitment, growth, etc.). 
+
+#### Recruitment
+We'll use a Beverton Holt recruitment module. We first instantiate a module using the `methods::new()` function. We can use `methods::show()` to view all the fields (i.e. variables) and methods (i.e. functions) available in `BevertonHoltRecruitment` module.
+
+```{r recruitment}
+# Recruitment
+recruitment <- methods::new(BevertonHoltRecruitment)
+methods::show(BevertonHoltRecruitment)
+```
+Set up the following parameters: *log_rzero* and *logit_steep*.
+
+```{r set-up-recruitment}
+recruitment$log_rzero[1]$value <- log(1e+06) # unit: log(number)
+recruitment$log_rzero[1]$is_random_effect <- FALSE
+recruitment$log_rzero[1]$estimated <- TRUE
+recruitment$logit_steep[1]$value <- -log(1.0 - 0.75) + log(0.75 - 0.2)
+recruitment$logit_steep[1]$is_random_effect <- FALSE
+recruitment$logit_steep[1]$estimated <- FALSE
+recruitment$log_devs <- new(ParameterVector, c(
+  0.08904850, 0.43787763, -0.13299042, -0.43251973,
+  0.64861200, 0.50640852, -0.06958319, 0.30246260,
+  -0.08257384, 0.20740372, 0.15289604, -0.21709207,
+  -0.13320626, 0.11225374, -0.10650836, 0.26877132,
+  0.24094126, -0.54480751, -0.23680557, -0.58483386,
+  0.30122785, 0.21930545, -0.22281699, -0.51358369,
+  0.15740234, -0.53988240, -0.19556523, 0.20094360,
+  0.37248740, -0.07163145
+), nyears)
+```
+
+In order to estimate *log_devs*, a new distribution needs to be created. This will be left commented out as we will proceed with not estimating log_devs.
+
+```{r recruitment-distribution}
+recruitment_distribution <- initialize_process_distribution(
+  module = recruitment,
+  par = "log_devs",
+  family = gaussian(),
+  sd = list(value = 0.4, estimated = FALSE),
+  is_random_effect = FALSE
+)
+```
+
+#### Growth
+Now, we'll define the growth module for our population using an empirical weight at age model.
+```{r growth}
+# Growth
+ewaa_growth <- methods::new(EWAAgrowth)
+ewaa_growth$ages <- ages
+ewaa_growth$weights <- c(
+  0.0005306555, 0.0011963283, 0.0020582654,
+  0.0030349873, 0.0040552124, 0.0050646975,
+  0.0060262262, 0.0069169206, 0.0077248909,
+  0.0084461128, 0.0090818532, 0.0096366950
+) # unit: mt
+```   
+#### Maturity
+Each population will also need a maturity model. Here we define a logistic maturity model. 
+```{r maturity}
+# Maturity
+maturity <- new(LogisticMaturity)
+maturity$inflection_point[1]$value <- 2.25
+maturity$inflection_point[1]$is_random_effect <- FALSE
+maturity$inflection_point[1]$estimated <- FALSE
+maturity$slope[1]$value <- 3
+maturity$slope[1]$is_random_effect <- FALSE
+maturity$slope[1]$estimated <- FALSE
+```
+
+Now that our life history sub-models are defined, lets define the actual population.
+
+```{r population}
+# Population
+population <- new(Population)
+population$log_M <- new(ParameterVector, rep(log(0.2), nyears * nages), nyears * nages)
+population$log_M$set_all_estimable(FALSE)
+population$log_init_naa <- new(ParameterVector, log(c(
+  993947.5, 811707.8, 661434.4,
+  537804.8, 436664.0, 354303.4,
+  287397.0, 233100.2, 189054.0,
+  153328.4, 124353.2, 533681.3
+)), nages) # unit: in number
+population$log_init_naa$set_all_estimable(TRUE)
+population$nages <- nages
+population$ages <- ages
+population$nfleets <- 2 # 1 fleet and 1 survey
+population$nseasons <- nseasons
+population$nyears <- nyears
+```
+
+Now we need to link up the recruitment, growth, and maturity modules we set above with this new population module. We do this by calling `get_id()` from each respective module and passing that unique ID into each respective `Set` function from population. 
+
+```{r set-pop-modules}
+population$SetMaturity(maturity$get_id())
+population$SetGrowth(ewaa_growth$get_id())
+population$SetRecruitment(recruitment$get_id())
+```
+
+
+## Putting It All Together 
+
+### Fitting the Model
+```{r model}
+sucess <- CreateTMBModel()
+parameters <- list(p = get_fixed())
+input <- list(data = fims_frame, parameters = parameters, version = "FIMS demo")
+# Run the model without optimization to help ensure a viable model
+test_fit <- fit_fims(input, optimize = FALSE)
+# Run the  model with optimization
+fit <- fit_fims(input)
+
+```
+
+## Plotting Results
+```{r plots}
+library(ggplot2)
+index_results <- data.frame(
+  observed = survey_fleet_index$index_data,
+  expected = fit@report[["exp_index"]][[2]]
+)
+print(index_results)
+
+ggplot(index_results, aes(x = 1:nyears, y = observed)) +
+  geom_point() +
+  xlab("Year") +
+  ylab("Index (mt)") +
+  geom_line(aes(x = 1:nyears, y = expected), color = "blue") +
+  theme_bw()
+
+catch_results <- data.frame(
+  observed = fishing_fleet_index$index_data,
+  expected = fit@report[["exp_index"]][[1]]
+)
+print(catch_results)
+
+ggplot(catch_results, aes(x = 1:nyears, y = observed)) +
+  geom_point() +
+  xlab("Year") +
+  ylab("Index (mt)") +
+  geom_line(aes(x = 1:nyears, y = expected), color = "blue") +
+  theme_bw()
+```
+
+```{r output}
+
+finalize(fit@obj, fit@opt)
+json_output <- get_output()
+writeLines(json_output, "output.json")
+recruitment_log <- get_log_module("information")
+cat(recruitment_log)
+
+```