8data.tex

\hypertarget{DataFile}{}
\section[Data File]{\protect\hyperlink{DataFile}{Data File}}

\hypertarget{OverviewDataFile}{}
\subsection[Overview of Data File]{\protect\hyperlink{OverviewDataFile}{Overview of Data File}}
	\begin{enumerate}
		\item Dimensions (years, ages, number of fleets, number of surveys, etc.)
		\item Fleet and survey names, timing, etc.
		\item Catch data (biomass or numbers)
		\item Discard totals or rate data
		\item Mean body weight or mean body length data
		\item Length composition set-up
		\item Length composition data
		\item Age composition set-up
		\item Age imprecision definitions
		\item Age composition data
		\item Mean length-at-age or mean bodyweight-at-age data
		\item Generalized size composition (e.g., weight frequency) data
		\item Environmental data
		\item Tag-recapture data
		\item Stock composition (e.g., morphs identified by otolith microchemistry) data
		\item Selectivity observations (new placeholder, not yet implemented)
	\end{enumerate}
	
\hypertarget{UnitsOfMeasure}{}
\subsection[Units Of Measure]{\protect\hyperlink{UnitsOfMeasure}{Units of Measure}}
The normal units of measure are as follows:
\begin{itemize}
	\item Catch biomass - metric tons
	\item Body weight - kg
	\item Body length - usually in cm, weight-at-length parameters must correspond to the units of body length and body weight	
	\item Survey abundance - any units if catchability (Q) is freely scaled; metric tons or thousands of fish if Q has a quantitative interpretation	
	\item Output biomass - metric tons
	\item Numbers - thousands of fish, because catch is in metric tons and body weight is in kg	
	\item Spawning biomass (\gls{ssb})- metric tons of mature females if eggs/kg = 1 for all weights; otherwise has units that are based on the user-specified fecundity	
\end{itemize}

\hypertarget{RecrTiming}{}
\subsection[Time Units]{\protect\hyperlink{RecrTiming}{Time Units}}
	\begin{itemize}
		\item Spawning: 
		\begin{itemize}
			\item Happens once per year at a specified date (in real months, 1.0 - 12.99). To create multiple spawning events per year, change the definition of a year (e.g., call a 6-month period a ``year'' when there are two spawning events about 6 months apart). However, revising the definition of year affects assignment of fish age, so it should not be used if age data are included.
		\end{itemize}
		
		\item Recruitment: 
		\begin{itemize}
			\item Occurs at specified recruitment events that occur at user-specified dates (in real months, 1.0 - 12.99).
			\item There can be one to many recruitment events across a year; each producing a platoon as a portion of the total recruitment.
			\item A settlement platoon enters the model at age 0 if settlement is between the time of spawning and the end of the year; it enters at age 1 if settlement is after the first of the year; these ages at settlement can be overridden in the settlement setup.
		\end{itemize}
		
		\item Timing
		\begin{itemize}
			\item All fish advance to the next older integer age on January 1, no matter when they were born during the year. Consult with your ageing lab to assure consistent interpretation.
		\end{itemize}		

		\item Parameters:
		\begin{itemize}
			\item Time-varying parameters are allowed to change annually, not seasonally.
			\item Rates like growth and mortality are per year.
		\end{itemize}
	\end{itemize}

\hypertarget{Seasons}{}
\subsubsection[Seasons]{\protect\hyperlink{Seasons}{Seasons}}
Seasonal quantities in the model are calculated and treated in the following methods:
	 \begin{itemize}
	 	\item Seasons are the time step during which constant rates apply.
	 	\item Catch and discard amounts are per season and $F$ is calculated per season.
	 	\item The year can have just one annual season, or be subdivided into seasons of unequal length.
	 	\item Season duration is input in real months (1.0 - 12.99) and is converted into fractions of an annum. Annual rate values are multiplied by the per annum season duration.
	 	\item If the sum of the input season durations do not equal 12.0, then the input durations are divided by the total duration input to rescale season duration to equal 1.0.  
	 	\item Allows for a special situation in which the year could be only 0.25 in duration (e.g., seasons as years) so that spawning and time-varying parameters can occur more frequently. Other durations are also possible (e.g., where a model year is really only 6 months). Note that real month inputs should always have the range 1.0 - 12.99, even in cases where a model year is not a true year. See the \hyperlink{continuous-seasonal-recruitment-sec}{continuous seasonal recruitment section} for more information on how to set up models where the model year duration is different from a true year.
	 \end{itemize}

\hypertarget{SubSeas}{}
\subsubsection[Subseasons and Timing of Events]{\protect\hyperlink{SubSeas}{Subseasons and Timing of Events}}
The treatment of subseasons in SS3 provide more precision in the timing of events compared to earlier model versions. In early versions, v.3.24 and before, there was effectively only two subseasons per season because the \gls{alk} for each observation used the mid-season mean length-at-age and spawning occurred at the beginning of a specified season.  

Time steps can be broken into subseason and the \gls{alk} can be calculated multiple times over the course of a year:
\vspace*{-\baselineskip}
\begin{center}
	\begin{tabular}{|p{2.37cm}|p{2.37cm}|p{2.37cm}|p{2.37cm}|p{2.37cm}|p{2.37cm}|}
		\hline
		\gls{alk} & \gls{alk}* & \gls{alk}* & \gls{alk} & \gls{alk}* & \gls{alk} \Tstrut\Bstrut\\
		\hline
		Subseason 1 & Subseason 2 & Subseason 3 & Subseason 4 & Subseason 5 & Subseason 6 \Tstrut\Bstrut\\		
		\hline
		\multicolumn{6}{l}{\gls{alk}* only re-calculated when there is a survey that subseason} \Tstrut\Bstrut\\			
	\end{tabular}
\end{center}

\begin{itemize}
	\item Even number (min = 2) of subseasons per season (regardless of season duration):
		\begin{itemize}
			\item Two subseasons will mimic v.3.24
			\item Specifying more subseasons will give finer temporal resolution, but will slow the model down, the effect of which is mitigated by only calculating growth as needed.
		\end{itemize}
	\item Survey timing is now cruise-specific and specified in units of months (e.g., April 15 = 4.5; possible inputs are 1.0 to 12.99).
		\begin{itemize}
			\item \texttt{ss\_trans.exe} will convert year, season in v.3.24 format to year, real month in v.3.30 format.
		\end{itemize}
	\item Survey integer season and spawn integer season assigned at run time based on real month and season duration(s).
	\item Growth and the \gls{alk} is calculated at the beginning and mid-season or when there is a defined subseason with data observation.
	\item Fishery body weight uses mid-subseason growth.
	\item Survey body weight and size composition is calculated using the nearest subseason.
	\item Reproductive output now has specified spawn timing (in months fraction) and interpolates growth to that timing.
	\item Survey numbers calculated at cruise survey timing using $e^{-z}$.
	\item Continuous $Z$ for entire season. Same as applied in version v.3.24.
\end{itemize}

\hypertarget{DataTerminology}{}
\subsection[Terminology]{\protect\hyperlink{DataTerminology}{Terminology}}
The term COND appears in the ``Typical Value'' column of this documentation (it does not actually appear in the model files), it indicates that the following section is omitted except under certain conditions, or that the factors included in the following section depend upon certain conditions. In most cases, the description in the definition column is the same as the label output to the ss\_new files.

\hypertarget{ModelDimensions}{}
\subsection[Model Dimensions]{\protect\hypertarget{ModelDimensions}{Model Dimensions}}
\begin{center}
	\begin{longtable}{p{3cm} p{12cm}}
		\hline
		\textbf{Value} & \textbf{Description} \Tstrut\Bstrut\\
		\hline
		\#V3.30.XX.XX & \multirow{1}{1cm}[-0.1cm]{\parbox{12cm}{Model version number. This is written by SS3 in the new files and a good idea to keep updated in the input files.}} \Tstrut\\
		& \Bstrut\\

		\hline
		\#C data using new survey & \multirow{1}{1cm}[-0.1cm]{\parbox{12cm}{Data file comment. Must start with \#C to be retained then written to top of various output files. These comments can occur anywhere in the data file, but must have \#C in columns 1-2.}} \Tstrut\\
		&  \Bstrut\\

		\hline
		1971 & Start year \Tstrut\Bstrut\\

		\hline
		2001 & \raisebox{0.1\ht\strutbox}{\hypertarget{EndYear}{End year}} \Tstrut\Bstrut\\

		\hline
		1 & Number of seasons per year \Tstrut\Bstrut\\

		\hline
		12 & \multirow{1}{1cm}[-0.1cm]{\parbox{12cm}{Vector with the number of months in each season. These do not need to be integers. Note: If the sum of this vector is close to 12.0, then it is rescaled to sum to 1.0 so that season duration is a fraction of a year. If the sum is not equal to 12.0, then the entered values are summed and rescaled to 1. So, with one season per year and 3 months per season, the calculated season duration will be 0.25, which allows a quarterly model to be run as if quarters are years. All rates in SS3 are calculated by season (growth, mortality, etc.) using annual rates and season duration.}} \Tstrut\\
		& \\
		& \\
		& \\
		& \\
		& \\
		& \\
		& \Bstrut\\
		
		\hline
		2 & \multirow{1}{1cm}[-0.1cm]{\parbox{12cm}{The number of subseasons. Entry must be even and the minimum value is 2. This is for the purpose of finer temporal granularity in calculating growth and the associated \gls{alk}.}} \Tstrut\\
		& \\
		& \Bstrut\\
		
		\hline
		\raisebox{0.1\ht\strutbox}{\hypertarget{RecrTiminig}{1.5}} & \multirow{1}{1cm}[-0.1cm]{\parbox{12cm}{Spawning month; spawning biomass is calculated at this time of year (1.5 means January 15) and used as basis for the total recruitment of all settlement events resulting from this spawning.}} \Tstrut\\
		& \\
		& \Bstrut\\

		\hline
		2 \Tstrut & Number of sexes: \\
		 & 1 = current one sex, ignore fraction female input in the control file;\\
		 & 2 = current two sex, use fraction female in the control file; and \\
		 & -1 = one sex and multiply the spawning biomass by the fraction female in the control file. \Bstrut\\

		\hline
		20 \Tstrut & Number of ages. The value here will be the plus-group age. SS3 starts at age 0. \\

		\hline
		1 & Number of areas \Tstrut\Bstrut\\

		\hline
		2 \Tstrut & Total number of fishing and survey fleets (which now can be in any order).\\
		\hline
	\end{longtable}
	\vspace*{-1.7\baselineskip}
\end{center}

\hypertarget{FleetDefinitions}{}
\subsection[Fleet Definitions]{\protect\hyperlink{FleetDefinitions}{Fleet Definitions}}
\hypertarget{GenericFleets}{}
The catch data input has been modified to improve the user flexibility to add/subtract fishing and survey fleets to a model set-up. The fleet setup input is transposed so each fleet is now a row. Previous versions (v.3.24 and earlier) required that fishing fleets be listed first followed by survey only fleets. In SS3 all fleets have the same status within the model structure and each has a specified fleet type (except for models that use tag recapture data, this will be corrected in future versions). Available types are; catch fleet, bycatch only fleet, or survey.   

\begin{center}
	\begin{tabular}{p{2cm} p{2cm} p{2cm} p{2cm} p{2cm} p{4cm}}
		\multicolumn{6}{l}{Inputs that define the fishing and survey fleets:} \\
		\hline
		2 & \multicolumn{5}{l}{Number of fleets which includes survey in any order} \Tstrut\Bstrut\\

		\hline
		Fleet Type & Timing & Area & Catch Units & Catch Mult. & Fleet Name \Tstrut\Bstrut\\

		\hline
		1 & -1 & 1 & 1 & 0 & FISHERY1 \Tstrut\\
		3 &  1 & 1 & 2 & 0 & SURVEY1 \Bstrut\\
		\hline	
	\end{tabular}
\end{center}

\myparagraph{Fleet Type}
Define the fleet type (e.g., fishery fleet, survey fleet):
\begin{itemize}
	\item 1 = fleet with input catches;
 	\item 2 = bycatch fleet (all catch discarded) and invoke extra input for treatment in equilibrium and forecast;
  	\item 3 = survey: assumes no catch removals even if associated catches are specified below. If you would like to remove survey catch set fleet type to option = 1 with specific month timing for removals (defined below in the ``Timing'' section); and 
 	\item 4 = predator (M2) fleet that adds additional mortality without a fleet $F$ (added in v.3.30.18). Ideal for modeling large mortality events such as fish kills or red tide. Requires additional long parameter lines for a second mortality component (M2) in the control file after the natural mortality/growth parameter lines (entered immediately after the fraction female parameter line).
\end{itemize}

\hypertarget{ObsTiming}{}
\myparagraph{Timing}
Timing for data observations:
\begin{itemize}
	\item Fishery options:
		\begin{itemize}
		    \item -1: catch is treated as if it occurred over the whole season. SS3 may change the \gls{cpue} data to occur in the middle of the season if it is specified otherwise (i.e., the \gls{cpue} observations may have a different month in the \texttt{data\_echo.ss\_new} file). A user can override this assumption for specific data observations (e.g., length or age) by specifying a month. This option works well for fisheries where fishing is spread throughout the year.
			\item 1: The fleet timing is not used and only the month value associated with each observation is relevant. This option works well for pulse fisheries that occurs over a small subset of months.
		\end{itemize}
	\item Survey option, 1: The fleet timing is not used and only the month value associated with each observation is relevant (e.g., month specification in the indices of abundance or the month for composition data). This input should always be used for surveys.
\end{itemize}	  
	  
\myparagraph{Area}
An integer value indicating the area in which a fleet operates.

\myparagraph{Catch Units}
Ignored for survey fleets, their units are read later:
\begin{itemize}
	\item 1 = biomass (in metric tons); and
	\item 2 = numbers (thousands of fish).
\end{itemize}   
See \hyperlink{UnitsOfMeasure}{Units of Measure} for more information.

\hypertarget{CatchMult}{}
\myparagraph{Catch Multiplier}
Invokes use of a catch multiplier, which is then entered as a parameter in the mortality-growth parameter section. The estimated value or fixed value of the catch multiplier is used to adjust the observed catch:
\begin{itemize}
  	\item 0 = No catch multiplier used; and
  	\item 1 = Apply a catch multiplier which is defined as an estimable parameter in the control file after the cohort growth deviation in the biology parameter section. The model's estimated retained catch will be multiplied by this factor before being compared to the observed retained catch.
\end{itemize} 
	  
A catch multiplier can be useful when trying to explore historical unrecorded catches or ongoing illegal and unregulated catches. The catch multiplier is a full parameter line in the control file and has the ability to be time-varying.   

\hypertarget{BycatchFleets}{}
\subsection[Bycatch Fleets]{\protect\hyperlink{BycatchFleets}{Bycatch Fleets}}
The option to include bycatch fleets was introduced in v.3.30.10. This is an optional input and if no bycatch is to be included in to the catches this section can be ignored.

A fishing fleet is designated as a bycatch fleet by indicating that its fleet type is 2. A bycatch fleet creates a fishing mortality, same as a fleet of type 1, but a bycatch fleet has all catch discarded, so the input value for retained catch is ignored. However, an input value for retained catch is still needed to indicate that the bycatch fleet was active in that year and season. A catch multiplier cannot be used with bycatch fleets because catch multiplier works on retained catch. SS3 will expect that the retention function for this fleet will be set in the selectivity section to type 3, indicating that all selected catch is discarded dead. It is necessary to specify a selectivity pattern for the bycatch fleet and, due to generally lack of data, to externally derive values for the parameters of this selectivity.

All catch from a bycatch fleet is discarded, so one option to use a discard fleet is to enter annual values for the amount (not proportion) that is discarded in each time step. However, it is uncommon to have such data for all years. An alternative approach that has been used principally in the U.S. Gulf of Mexico is to input a time series of effort data for this fleet in the survey section (e.g.,  effort is a ``survey'' of $F$, for example, the shrimp trawl fleet in the Gulf of Mexico catches and discards small finfish and an effort time series is available for this fleet) and to input in the discard data section an observation for the average discard over time using the super year approach. Another use of bycatch fleet is to use it to estimate effect of an external source of mortality, such as a red tide event. In this usage there may be no data on the magnitude of the discards and SS3 will then rely solely on the contrast in other data to attempt to estimate the magnitude of the red tide kill that occurred. The benefit of doing this as a bycatch fleet, and not a block on natural mortality, is that the selectivity of the effect can be specified. 

Bycatch fleets are not expected to be under the same type of fishery management controls as the retained catch fleets included in the model. This means that when SS3 enters into the reference point equilibrium calculations, it would be incorrect to have SS3 re-scale the magnitude of the $F$ for the bycatch fleet as it searches for the $F$ that produces, for example, F35\%. Related issues apply to the forecast. Consequently, a separate set of controls is provided for bycatch fleets (defined below). Input is required for each fleet designated as fleet type = 2.

\noindent If a fleet above was set as a bycatch fleet (fleet type = 2), the following line is required: 
\begin{center}
	\vspace*{-\baselineskip}
	\begin{tabular}{p{2.25cm} p{2.5cm} p{2.25cm} p{2.5cm} p{2.5cm} p{2cm}}

		\multicolumn{6}{l}{Bycatch fleet input controls:} \\
		\hline
		a: 			  & b:  			     & c:             & d:                & e:        & f: \Tstrut\\
		Fleet Index   & Include in \gls{msy} & $F\text{mult}$ & $F$ or First Year & Last Year & Not used \Bstrut\\					
		\hline
		2 & 2 & 3 & 1982 & 2010 & 0 \Tstrut\Bstrut\\
		\hline		
	\end{tabular}
\end{center}

The above example set-up defines one fleet (fleet number 2) as a bycatch fleet with the dead catch from this fleet to not be included in the search for \gls{msy} (b: Include in \gls{msy} = 2). The level of $F$ from the bycatch fleet in reference point and forecast is set to the mean (c: $F\text{mult}$ = 3) of the estimated $F$ for the range of years from 1982-2010.  


\myparagraph{Fleet Index}
Fleet number for which to include bycatch catch. Fleet number is assigned within the model based on the order of listed fleets in the Fleet Definition section. If there are multiple bycatch fleets, then a line for each fleet is required in the bycatch section.


\myparagraph{Include in \gls{msy}}
The options are:	  
\begin{itemize}
  	\item 1 = dead fish in \gls{msy}, \gls{abc}, and other benchmark and forecast output; and
  	\item 2 = omit from \gls{msy} and \gls{abc} (but still include the mortality).
\end{itemize}

\myparagraph{$F$ Multiplier ($F\text{mult}$)}
The options are:  
\begin{itemize}
  	\item 1 = $F$ multiplier scales with other fleets;
  	\item 2 = bycatch $F$ constant at input value in column d; and
  	\item 3 = bycatch $F$ from range of years input in columns d and e.
\end{itemize}

\myparagraph{$F$ or First Year}  
The specified $F$ or first year for the bycatch fleet.

\myparagraph{$F$ or Last Year}
The specified $F$ or last year for the bycatch fleet.

\myparagraph{Not Used}  
This column is not yet used and is reserved for future features.

\myparagraph{Bycatch Fleet Usage Instructions and Warnings}
When implementing a bycatch fleet, changes to both the data and control file are needed.  

The needed changes to the data file are:

\begin{enumerate}
	\item Fleet type - set to value of 2.
	\item Set bycatch fleet controls per information above.
	\item Catch input - you must enter a positive value for catch in each year/season that you want a bycatch calculated. The entered value of catch will be ignored by SS3, it is just a placeholder to invoke creating an $F$.
	\begin{enumerate}
		\item Initial equilibrium - you may want to enter the bycatch amount as retained catch for the initial equilibrium year because there is no option to enter initial equilibrium discard in the discard section.
	\end{enumerate}	
	\item Discard input - It is recommended to enter the amount of discard to assist SS3 in estimating the $F$ for the bycatch fleet.
	\item Survey input - It is useful, but not absolutely necessary, to enter the effort time series by the bycatch fleet to assist SS3 in estimating the annual changes in $F$ for the bycatch fleet.
\end{enumerate}

The needed changes to the control file are:

\begin{enumerate}
	\item The $F$ method must be set to 2 in order for SS3 to estimate $F$ with having information on retained catch.
	\item Selectivity - 
	\begin{enumerate}
		\item A selectivity pattern must be specified and fixed (or estimated if composition data is provided).
		\item The discard column of selectivity input must be set to a value of 3 to cause all catch to be discarded.
	\end{enumerate}	
\end{enumerate}

In v.3.30.14 it was identified that there can be an interaction between the use of bycatch fleets and the search for the $F_{0.1}$ reference point which may results in the search failing.  Changes to the search feature were implemented to make the search more robust, however, issue may still be encountered. In these instances it is recommended to not select the $F_{0.1}$ reference point calculation in the forecast file.

\hypertarget{PredatorFleets}{}
\subsection[Predator Fleets]{\protect\hyperlink{PredatorFleets}{Predator Fleets}}

Introduced in v.3.30.18, a predator fleet provides the capability to define an entity as a predator that adds additional mortality ($M2$, i.e., the predation mortality) to the base natural mortality. This new capability means that previous use of bycatch fleets to mimic predators (or fish kills, e.g., due to red tide) will no longer be necessary. The problem with using a bycatch fleet as a predator was that it still created an $F$ that was included in the reporting of total $F$ even if the bycatch was not included in the \gls{msy} search.

For each fleet that is designated as a predator, a new parameter line is created in the \gls{mg} parameter section in the control file. This parameter will have the label M2\_pred1, where the ``1'' is the index for the predator (not the index of the fleet being used as a predator). More than one predator can be included. If the model has > 1 season, it is normal to expect $M2$ to vary seasonally. Therefore, only if the number of seasons is greater than 1, follow each $M2$ parameter with number of season parameters to provide the seasonal multipliers. These are simple multipliers times $M2$, so at least one of these needs to have a non-estimated value. The set of multipliers can be used to set $M2$ to only operate in one season if desired. If there is more than one predator fleet, each will have its own seasonal multipliers. If there is only 1 season in the model, then no multiplier lines are included.

Three types of data relevant to $M2$ can be input:

\begin{itemize}
	\item Total kill (as discard in the data file): $M2$ is a component of $Z$, so $M2/Z$ can be used to calculate the amount of the total kill that is attributable to $M2$. This is completely analogous to calculating catch for the fishing fleets. The total kill (e.g., consumption) is output to the discard array. If data on the total kill by the $M2$ predator is available, it can be input as observed ``discard'' for this fleet and thus included in the total log likelihood to estimate the magnitude of the $M2$ parameter.
	
	\item \hyperlink{PredEffort}{Predator effort} (as a survey index in the data file): $M2$ is a rate analogous to $F$, so the survey of $F$ approach (survey units = 2) can be used to input predator abundance as an indicator of the ``effort'' that produced the $M2$. Like all surveys, this survey of $M2$ will also need a Q specification. Note that in the future we can explore improved options for this Q.
	
	\item Predated age-length composition (as length or age composition data in the data file): $M2$ ``eats'' the modeled fish, so gut contents or other sources may have size and/or age composition data which may be input to estimate selectivity of the $M2$ source. 
\end{itemize}

With the input of data on the time series of total kill or predator effort, it should be possible to estimate annual deviations around the base $M2$ for years with data. If the $M2$ time series is instead driven by environmental data, then also including data on kill or effort can provide a means to view consistency between the environmental time series and the additional data sets. Output of $M2$ is found in a \texttt{Report.sso} section labeled predator ($M2$). In the example below, the $M2$ seasonal multiplier was defined to have random deviations by year. This allowed multipliers plus $M2$ itself to closely match the input consumption amounts (288 mt of consumption per season, the fit can be examined by looking at the discard output report).

\hyperlink{Catch}{}
\subsection[Catch]{\protect\hyperlink{Catch}{Catch}}
\hypertarget{CatchFormat}{}
After reading the fleet-specific indicators, a list of catch values by fleet and season are read in by the model. The format for the catches is year and season that the catch is attributed to, fleet, a catch value, and a year-specific catch standard error. Only positive catches need to be entered, so there is no need for records corresponding to all years and fleets. To include an equilibrium catch value for a fleet and season, the year should be noted as -999. For each non-zero equilibrium catch value included, a short parameter line is required in the \hyperlink{InitF}{initial $F$ section} of the control file.

\hypertarget{ListBased}{}
There is no longer a need to specify the number of records to be read; instead the list is terminated by entering a record with the value of -9999 in the year field. The updated list based approach extends throughout the data file (e.g., catch, length- and age-composition data), the control file (e.g., lambdas), and the forecast file (e.g., total catch by fleet, total catch by area, allocation groups, forecasted catch).

In addition, it is possible to collapse the number of seasons. So, if a season value is greater than the number of seasons for a particular model, that catch is added to the catch for the final season. This is one way to easily collapse a seasonal model into an annual model. The alternative option is to the use of season = 0. This will cause SS3 to distribute the input value of catch equally among the number of seasons. SS3 assumes that catch occurs continuously over seasons and hence is not specified as month in the catch data section. However, all other data types will need to be specified by month.

The format for a 2 season model with 2 fisheries looks like the table below. Example is sorted by fleet, but the sort order does not matter. In \texttt{data.ss\_new}, the sort order is fleet, year, season.

\begin{center}
	\begin{tabular}{p{3cm} p{3cm} p{2cm} p{3cm} p{3cm}}
		\multicolumn{5}{l}{Catches by year, season for every fleet:} \\
		\hline
		Year & Season & Fleet & Catch & Catch \gls{se} \Tstrut\Bstrut\\
		\hline
		-999 & 1 & 1 & 56  & 0.05 \Tstrut\\
		-999 & 2 & 1 & 62  & 0.05 \\
		1975 & 1 & 1 & 876 & 0.05 \\
		1975 & 2 & 1 & 343 & 0.05 \\
		 ... & ... & ... & ... & ... \\
		 ... & ... & ... & ... & ... \\
		-999 & 1 & 2 & 55  & 0.05 \\
		-999 & 2 & 2 & 22  & 0.05 \\
		1975 & 1 & 2 & 555 & 0.05 \\
		1975 & 2 & 2 & 873 & 0.05 \\
		 ... & ... & ... & ... & ... \\
		 ... & ... & ... & ... & ... \\
		-9999 & 0 & 0 & 0 & 0 \Bstrut\\
		\hline
	\end{tabular}
\end{center}

\begin{itemize}
	\item Catch can be in terms of biomass or numbers for each fleet, but cannot be mixed within a fleet.
	\item Catch is retained catch (aka landings). If there is discard also, then it is handled in the discard section below. This is the recommended setup which results in a model estimated retention curve based upon the discard data (specifically discard composition data). However, there may be instances where the data do not support estimation of retention curves. In these instances catches can be specified as all dead (retained + discard estimates).
	\item If there are challenges to estimating discards within the model, catches can be input as total dead without the use of discard data and retention curves.
	\item If there is reason to believe that the retained catch values underestimate the true catch, then it is possible in the retention parameter set up to create the ability for the model to estimate the degree of unrecorded catch. However, this is better handled with the new catch multiplier option.
\end{itemize}

\hypertarget{SurveysIndices}{}
\subsection[Surveys and Indices]{\protect\hyperlink{SurveysIndices}{Surveys and Indices}}
Indices are data that are compared to aggregate quantities in the model. Typically, the index is a measure of selected fish abundance, but this data section also allows for the index to be related to a fishing fleet's $F$, or to another quantity estimated by the model. The first section of the ``Indices'' setup contains the fleet number, units, error distribution, and whether additional output (\gls{sd} Report) will be written to the Report file for each fleet that has index data.

\begin{center}
	\begin{tabular}{p{3cm} p{3cm} p{4cm} p{4cm}}
		\multicolumn{4}{l}{\gls{cpue} and Survey Abundance Observations:} \\
		\hline
		Fleet/ &       & Error        & \Tstrut\\
		Survey & Units & Distribution & \gls{sd} Report \Bstrut\\
		\hline
		1 & 1 & 0 & 0 \Tstrut\\
		2 & 1 & 0 & 0 \\
		... & ... & ... & ... \Bstrut\\
		\hline
	\end{tabular}		
\end{center}


\hypertarget{IndexUnits}{}
\myparagraph{Units}
The options for units for input data are:	
\begin{itemize}
	\item 0  = numbers;
	\item 1  = biomass; 
	\item 2  = $F$; and
		\begin{itemize}
			\item Note the $F$ option can only be used for a fishing fleet and not for a survey, even if the survey selectivity is mirrored to a fishing fleet. The values of these effort data are interpreted as proportional to the level of the fishery $F$ values. No adjustment is made for differentiating between continuous $F$ values versus exploitation rate values coming from Pope's approximation. A normal error structure is recommended so that the input effort data are compared directly to the model's calculated $F$, rather than to $ln(F)$. The resultant proportionality constant has units of 1/Q where Q is the catchability coefficient. For more information see the section on \hypertarget{PredEffort}{Predator effort}.	
		\end{itemize}
	\item \hypertarget{SpecialSurvey}{} $>=$ 30 = Special survey types. These options bypass the calculation of survey selectivity so the no selectivity parameter are required and age/length selectivity pattern should be set as 0. A catchability parameter line in the control file will be required for each special survey. Special survey types 31, 32, and 36 relate to recruitment deviations. Before v.3.30.22, the expected value for observations before recdev\_start or after recdev\_end were null. With v.3.30.22, expected values are now based on recruitment deviations for all years and suggestions are included in \texttt{warnings.sso} if observations occur outside the range of active recruitment deviations. The expected values for these types are:
		\begin{itemize}
			\item 30 = spawning biomass/output (e.g., for an egg and larvae survey);
			\item 31 = exp(recruitment deviation), useful for environmental index affecting recruitment;
			\item 32 = spawning biomass * exp(recruitment deviation), for a pre-recruit survey occurring before density-dependence;
			\item 33 = recruitment, age-0 recruits;
			\item 34 = depletion (spawning biomass/virgin spawning biomass);
			\begin{itemize}
				\item Special survey option 34 automatically adjusts phases of parameters. To use the depletion survey approach, the user will need to make the following revisions to the SS3 data file: 1) add a new survey fleet, 2) define the survey type as option 34, 3) add two depletion survey data points, and initial unfished set equal to 1 for an unfished modeled year and one for a later year with the depletion estimates, 4) set the input \gls{cv} value for each survey data point to a low value (e.g., 0.0001) to force the model to fit these data, and in the control file 5) add the survey to the control file in the Q set-up and selectivity sections with float set to 0 with parameter value set to 0. 
				\item There are options for additional control over this in the control file catchability setup section under the \hyperlink{link_info}{link information} bullet where:
				\begin{itemize}
				    \item 0 = add 1 to phases of all parameters. Only $R_{0}$ active in new phase 1. Mimics the default option of previous model versions;
				    \item 1 = only $R_{0}$ active in phase 1. Then finish with no other parameters becoming active; useful for data-limited draws of other fixed parameters. Essentially, this option allows SS3 to mimic \gls{dbsra}; and
				    \item 2 = no phase adjustments, can be used when profiling on fixed $R_{0}$.
				\end{itemize}
				\item Warning: the depletion survey approach has not been tested on multiple area models. This approach may present challenges depending upon the dynamics within each area.
			\end{itemize}
			\item 35 = survey of a deviation vector ($e(survey(y)) = f(parm\_dev(k,y))$), can be used for an environmental time series that serves as an index for a parameter deviation vector. The selected deviation vector is specified in Q section of the control file. The index of the deviation vector to which the index is related is specified in the 2nd column of the Q setup table (see \hyperlink{Qsetup}{Catchability});
			\item 36 = recruitment deviation
		\end{itemize}
\end{itemize}

\myparagraph{Error Distribution}
The options for error distribution form are:
\begin{itemize}
	\item -1 = normal error;
	\item  0 = log-normal error; and 
	\item > 0 = Student's t-distribution in natural log space with \gls{df} equal to this value. For \gls{df} > 30, results will be nearly identical to that for log-normal distribution. A \gls{df} value of about 4 gives a fat-tail to the distribution. The \gls{se} values entered in the data file must be the \gls{se} in $ln_{e}$ space.
\end{itemize}

Abundance indices typically assumed to have a log-normal error structure with units of \gls{se} of $ln_{e}$(index). If the variance of the observations is available only as a \gls{cv} (\gls{se} of the observation divided by the mean value of the observation in natural space), then the value of standard error in natural log space can be calculated as $\sqrt{(ln_e(1+(CV)^2))}$.

For the normal error structure, the entered values for \gls{se} are interpreted directly as a \gls{se} in arithmetic space and not as a \gls{cv}. Thus switching from a log-normal to a normal error structure forces the user to provide different values for the \gls{se} input in the data file.

If the data exist as a set of normalized Z-scores, you can assert a log-normal error structure after entering the data as $exp(Z-score)$ because it will be logged by SS3. Preferably, the Z-scores would be entered directly and the normal error structure would be used.

\myparagraph{Enable gls{sd} Report}
Indices with \gls{sd} Report enabled will have the expected values for their historical values appear in the \texttt{ss.std} and \texttt{ss.cor} files. The default value is for this option is 0.

\begin{itemize}
	\item 0 = \gls{sd} Report not enabled for this index; and
	\item 1 = \gls{sd} Report enabled for this index.
\end{itemize}


\myparagraph{Data Format}
\begin{center}
	\begin{tabular}{p{3cm} p{2cm} p{3cm} p{3cm} p{2.5cm}}
		\hline
		Year & Month & Fleet/Survey & Observation & \gls{se} \Tstrut\Bstrut\\
		\hline
		1991 & 7   & 3   & 80000 & 0.056 \Tstrut\\
		1995 & 7.2 & 3   & 65000 & 0.056 \\
		...  & ... & ... & ...   & ... \\
		2000 & 7.1 & 3   & 42000 & 0.056 \\
		-9999 & 0  & 0   & 0     & 0 \Bstrut\\ 
		\hline
	\end{tabular}
\end{center}

\begin{itemize}
	\item For fishing fleets, \gls{cpue} is defined in terms of retained catch (biomass or numbers).
	\item For fishery independent surveys, retention/discard is not defined so \gls{cpue} is implicitly in terms of total \gls{cpue}.
	\item If a survey has its selectivity mirrored to that of a fishery, only the selectivity is mirrored so the expected \gls{cpue} for this mirrored survey does not use the retention curve (if any) for the fishing fleet.
	\item If the fishery or survey has time-varying selectivity, then this changing selectivity will be taken into account when calculating expected values for the \gls{cpue} or survey index.
	\item Year values that are before start year or after end year are excluded from model, so the easiest way to include provisional data in a data file is to put a negative sign on its year value.
	\item Duplicate survey observations for the same year are not allowed.
	\item Observations that are to be included in the model but not included in the negative log likelihood need to have a negative sign on their fleet ID. Previously the code for not using observations was to enter the observation itself as a negative value. However, that old approach prevented use of a Z-score environmental index as a ``survey''. This approach is best for single or select years from an index rather than an approach to remove a whole index. Removing an index from the model should be done through the use of lambdas at the bottom of the control file which will eliminate the index from model fitting. 
	\item Observations can be entered in any order, except if the super-year feature is used.
	\item Super-periods are turned on and then turned back off again by putting a negative sign on the season. Previously, super-periods were started and stopped by entering -9999 and the -9998 in the \gls{se} field. See the \hyperlink{SuperPeriod}{Data Super-Period} section of this manual for more information.
	\item If the statistical analysis used to create the \gls{cpue} index of a fishery has been conducted in such a way that its inherent size/age selectivity differs from the size/age selectivity estimated from the fishery's size and age composition, then you may want to enter the \gls{cpue} as if it was a separate survey and with a selectivity that differs from the fishery's estimated selectivity. The need for this split arises because the fishery size and age composition should be derived through a catch-weighted approach (to appropriately represent the removals by the fishery) and the \gls{cpue} should be derived through an area-weighted approach to better serve as a survey of stock abundance.
\end{itemize}

\hypertarget{Discard}{}
\subsection[Discard]{\protect\hyperlink{Discard}{Discard}}
If discard is not a feature of the model specification, then just a single input is needed:

\begin{center}
	\begin{tabular}{p{2cm} p{13cm}}
		\hline
		0 & Number of fleets with discard observations \Tstrut\Bstrut\\
		\hline
	\end{tabular}
\end{center}
	
	
If discard is being used, the input syntax is:
\begin{center}
	\begin{tabular}{p{2cm} p{3cm} p{3cm} p{3cm} p{3cm}}
		\hline
		1 & \multicolumn{4}{l}{Number of fleets with discard observations} \Tstrut\Bstrut\\
		\hline
		Fleet & Units & \multicolumn{3}{l}{Error Distribution} \Tstrut\Bstrut\\
		\hline
		1 & 2 & \multicolumn{3}{l}{-1} \Tstrut\Bstrut\\
		\hline
		Year & Month & Fleet & Observation & \gls{se} \Tstrut\Bstrut\\
		\hline
		1980  & 7 & 1 & 0.05 & 0.25 \Tstrut\\
		1991  & 7 & 1 & 0.10 & 0.25 \\
		-9999 & 0 & 0 &    0 & 0 \Bstrut\\
		\hline
	\end{tabular}
\end{center}

Note that although the user must specify a month for the observed discard data, the unit for discard data is in terms of a season rather than a specific month. So, if using a seasonal model, the input month values must correspond to some time during the correct season. The actual value will not matter because the discard amount is calculated for the entirety of the season. However, discard length or age observations will be treated by entered observation month.
	
\myparagraph{Discard Units}
The options are:
\begin{itemize}
	\item 1 = values are amount of discard in either biomass or numbers according to the selection made for retained catch;
	\item 2 = values are fraction (in biomass or numbers) of total catch discarded, biomass/number selection matches that of retained catch; and
	\item 3 = values are in numbers (thousands) of fish discarded, even if retained catch has units of biomass.
\end{itemize}

\myparagraph{Discard Error Distribution}	
The four options for discard error are:
\begin{itemize}
	\item > 0 = \gls{df} for Student's t-distribution used to scale mean body weight deviations. Value of error in data file is interpreted as \gls{cv} of the observation;
	\item 0 = normal distribution, value of error in data file is interpreted as \gls{cv} of the observation;
	\item -1 = normal distribution, value of error in data file is interpreted as \gls{se} of the observation;
	\item -2 = log-normal distribution, value of error in data file is interpreted as \gls{se} of the observation in natural log space; and 
	\item -3 = truncated normal distribution (new with v.3.30, needs further testing), value of error in data file is interpreted as \gls{se} of the observation. This is a good option for low observed discard rates.
\end{itemize}

\myparagraph{Discard Notes}
\begin{itemize}
	\item Year values that are before start year or after end year are excluded from model, so the easiest way to include provisional data in a data file is to put a negative sign on its year value.
	\item Negative value for fleet causes it to be included in the calculation of expected values, but excluded from the log likelihood.
	\item Zero (0.0) is a legitimate discard observation, unless log-normal error structure is used.
	\item Duplicate discard observations from a fleet for the same year are not allowed.
	\item Observations can be entered in any order, except if the super-period feature is used. 
	\item Note that in the control file you will enter information for retention such that 1-retention is the amount discarded. All discard is assumed dead, unless you enter information for discard mortality. Retention and discard mortality can be either size-based or age-based (new with v.3.30).
\end{itemize}
	
\myparagraph{Cautionary Note}
The use of \gls{cv} as the measure of variance can cause a small discard value to appear to be overly precise, even with the minimum \gls{se} of the discard observation set to 0.001. In the control file, there is an option to add an extra amount of variance. This amount is added to the \gls{se}, not to the \gls{cv}, to help correct this problem of underestimated variance.

\hypertarget{MeanBodyWL}{}
\subsection[Mean Body Weight or Length]{\protect\hyperlink{MeanBodyWL}{Mean Body Weight or Length}}
This is the overall mean body weight or length across all selected sizes and ages. This may be useful in situations where individual fish are not measured but mean weight is obtained by counting the number of fish in a specified sample (e.g., a 25 kg basket).    

\begin{center}
	\begin{tabular}{p{1.75cm} p{1.75cm} p{1.75cm} p{1.75cm} p{1.75cm} p{2cm} p{1cm}}
		\multicolumn{7}{l}{Mean Body Weight Data Section:} \\
		\hline
		1  & \multicolumn{6}{l}{Use mean body size data (0/1)} \Tstrut\Bstrut\\
		\hline
		\multicolumn{7}{l}{COND > 0:}\Tstrut\\
		30 & \multicolumn{6}{l}{Degrees of freedom for Student's t-distribution used to evaluate mean body} \\
		  & \multicolumn{6}{l}{weight deviation.} \Bstrut\\
		\hline
		Year & Month & Fleet & Partition & Type & Observation & \gls{cv} \Tstrut\Bstrut\\
		\hline
		1990  & 7 & 1 & 0 & 1 & 4.0 & 0.95 \Tstrut\\
		1990  & 7 & 1 & 0 & 1 & 1.0 & 0.95 \\
		-9999 & 0 & 0 & 0 & 0 & 0   & 0 \Bstrut\\
		\hline
	\end{tabular}
\end{center}

\myparagraph{Partition}
Mean weight data and composition data require specification of what group the sample originated from (e.g., discard, retained, discard + retained).
Note: if retention is not defined in the selectivity section, observations with Partition = 2 will be changed to Partition = 0.
\begin{itemize}
	\item 0 = combined catch in units of weight (whole, e.g., discard + retained);
	\item 1 = discarded catch in units of weight; and
	\item 2 = retained catch in units of weight.
\end{itemize}

\myparagraph{Type}	
Specify the type of data:
\begin{itemize}
	\item 1 = mean length; and
	\item 2 = mean body weight.
\end{itemize}

\myparagraph{Observation - Units}
Units must correspond to the units of body weight, normally in kg, (or mean length in cm). The expected value of mean body weight (or mean length) is calculated in a way that incorporates effect of selectivity and retention.

\myparagraph{Error}
Error is entered as the \gls{cv} of the observed mean body weight (or mean length)

\hypertarget{PopLBins}{}
\subsection[Population Length Bins]{\protect\hyperlink{PopLBins}{Population Length Bins}}
The first part of the length composition section sets up the bin structure for the population. These bins define the granularity of the \gls{alk} and the coarseness of the length selectivity. Fine bins create smoother distributions, but a larger and slower running model.
First read a single value to select one of three population length bin methods, then any conditional input for options 2 and 3:

\begin{center}
	\begin{tabular}{p{2cm} p{5cm} p{8cm}}
		\hline
		1 & \multicolumn{2}{l}{Use data bins to be read later. No additional input here.} \Tstrut\Bstrut\\
		\hline
		2 & \multicolumn{2}{l}{generate from bin width min max, read next:} \Tstrut\\
		\multirow{4}{2cm}[-0.1cm]{} & 2 & Bin width \\
								    & 10 & Lower size of first bin \\
									& 82 & Lower size of largest bin \\
		\multicolumn{3}{l}{The number of bins is then calculated from: (max Lread - min Lread)/(bin width) + 1}\Bstrut\\
		\hline
		3 & \multicolumn{2}{l}{Read 1 value for number of bins, and then read vector of bin boundaries} \Tstrut\\
		\multirow{2}{2cm}[-0.1cm]{} & 37 & Number of population length bins to be read \\ 
									& 10 12 14 ... 82 & Vector containing lower edge of each population size bin \Bstrut\\

		\hline									  
	\end{tabular}
\end{center}

\myparagraph{Notes}
There are some items for users to consider when setting up population length bins:
\begin{itemize}
	\item For option 2, bin width should be a factor of min size and max size. For options 2 and 3, the data length bins must not be wider than the population length bins and the boundaries of the bins do not have to align. The transition matrix between population and data length bins is output to \texttt{echoinput.sso}.
	
	\item The mean size at settlement (virtual recruitment age) is set equal to the min size of the first population length bin.
	
	\item When using more, finer population length bins, the model will create smoother length selectivity curves and smoother length distributions in the \gls{alk}, but run more slowly (more calculations to do).
	
	\item The mean weight-at-length, maturity-at-length and size-selectivity are based on the mid-length of the population bins. So these quantities will be rougher approximations if broad bins are defined.
	
	\item Provide a wide enough range of population size bins so that the mean body weight-at-age will be calculated correctly for the youngest and oldest fish. If the growth curve extends beyond the largest size bin, then these fish will be assigned a length equal to the mid-bin size for the purpose of calculating their body weight.
	
	\item While exploring the performance of models with finer bin structure, a potentially pathological situation has been identified. When the bin structure is coarse (note that some applications have used 10 cm bin widths for the largest fish), it is possible for a selectivity slope parameter or a retention parameter to become so steep that all the action occurs within the range of a single size bin. In this case, the model will see zero gradient of the log likelihood with respect to that parameter and convergence will be hampered.
	
	\item A value read near the end of the \texttt{starter.ss} file defines the degree of tail compression used for the \gls{alk}, called \gls{alk} tolerance. If this is set to 0.0, then no compression is used and all cells of the \gls{alk} are processed, even though they may contain trivial (e.g., 1 e-13) fraction of the fish at a given age. With tail compression of, say 0.0001, the model, at the beginning of each phase, will calculate the min and max length bin to process for each age of each morph \gls{alk} and compress accordingly. Depending on how many extra bins are outside this range, you may see speed increases near 10-20\%. Large values of \gls{alk} tolerance, say 0.1, will create a sharp end to each distribution and likely will impede convergence. It is recommended to start with a value of 0 and if model speed is an issue, explore values greater than 0 and evaluate the trade-off between model estimates and run time. The user is encouraged to explore this feature.
\end{itemize}

\hypertarget{length-comp-structure}{}
\subsection[Length Composition Data Structure]{\protect\hyperlink{length-comp-structure}{Length Composition Data Structure}}
\begin{tabular}{p{2cm} p{13cm}}
		\multicolumn{2}{l}{Enter a code to indicate whether length composition data will be used:} \Tstrut\Bstrut\\
		\hline	
		1 & Use length composition data (0/1/2) \Tstrut\Bstrut\\
		\hline									  
\end{tabular}

If the value 0 is entered, then skip all length related inputs below and skip to the age data setup section. If value 1 is entered, all data weighting options for composition data apply equally to all partitions within a fleet. If the value 2 is entered, then the data weighting options are applied by the partition specified. Note that the partitions must be entered in numerical order within each fleet.

If the value for fleet is negative, then the vector of inputs is copied to all partitions (0 = combined, 1 = discard, and 2 = retained) for that fleet and all higher numbered fleets. This as a good practice so that the user controls the values used for all fleets.

\begin{tabular}{p{2cm} p{2cm} p{2cm} p{2cm} p{2cm} p{1.5cm} p{1.5cm}}
	\multicolumn{7}{l}{Example table of length composition settings when ``Use length composition data'' = 1} \\
	\multicolumn{7}{l}{(where here the first fleet has multinomial error structure with no associated parameter,} \\ 
	\multicolumn{7}{l}{and the second fleet uses Dirichlet-multinomial structure):} \\
	\hline
	Min.      & Constant & Combine  &           & Comp. &        & Min. \Tstrut\\
	Tail      & added    & males \& & Compress. & Error & Param. & Sample \\
	Compress. & to prop. & females  & Bins      & Dist. & Select & Size \Bstrut\\
	\hline
	0 & 0.0001 & 0 & 0 & 0 & 0 & 0.1 \Tstrut\\
	0 & 0.0001 & 0 & 0 & 1 & 1 & 0.1 \Bstrut\\
	\hline
\end{tabular}

\begin{tabular}{p{1cm} p{1.5cm} p{1.75cm} p{1.5cm} p{1.5cm} p{1.75cm} p{1.25cm} p{1.25cm} p{1.5cm}}
	\multicolumn{9}{l}{Example table of length composition settings when ``Use length composition data'' = 2}\\
	\multicolumn{9}{l}{(where here the -1 in the fleet column applies the first parameter to all partitions} \\
	\multicolumn{9}{l}{for fleet 1 while fleet 2 has separate parameters for discards and retained fish):} \\
	\hline
	        &           & Min.      & Constant & Combine  &           & Comp. &        & Min. \Tstrut\\
	        &           & Tail      & added    & males \& & Compress. & Error & Param. & Sample \\
	Fleet   & Partition & Compress. & to prop. & females  & Bins      & Dist. & Select & Size \Bstrut\\
	\hline
	-1 & 0 & 0 & 0.0001 & 0 & 0 & 1 & 1 & 0.1 \Tstrut\\
	2 & 1 & 0 & 0.0001 & 0 & 0 & 1 & 2 & 0.1 \\
	2 & 2 & 0 & 0.0001 & 0 & 0 & 1 & 3 & 0.1 \\
	... &  &  &        &   &   &   &   &  \\
	-9999 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \Bstrut\\
	\hline
\end{tabular}

%\pagebreak

\myparagraph{Minimum Tail Compression}
Compress tails of composition until observed proportion is greater than this value; negative value causes no compression; Advise using no compression if data are very sparse, and especially if the set-up is using age composition within length bins because of the sparseness of these data. A single fish being observed with tail compression on will cause the entire vector to be collapsed to that bin.

\myparagraph{Added Constant to Proportions}
Constant added to observed and expected proportions at length and age to make log likelihood calculations more robust. Tail compression occurs before adding this constant. Proportions are re-normalized to sum to 1.0 after constant is added.

The constant should be greater than 0. Commonly used values range from 0.00001 to 0.01. Larger values will cause differences among bins with smaller values to be less influential, leading to greater relative influence of the bins with the largest proportions of the compositions.

\myparagraph{Combine Males \& Females}
Combine males into females at or below this bin number. This is useful if the sex determination of very small fish is doubtful so allows the small fish to be treated as combined sex. If Combine Males \& Females > 0, then add males into females for bins 1 through this number, zero out the males, set male data to start at the first bin above this bin. Note that Combine Males \& Females > 0 is entered as a bin index, not as the size associated with that bin. Comparable option is available for age composition data.

\myparagraph{Compress Bins}
This option allows for the compression of length or age bins beyond a specific length or age by each data source. As an example, a value of 5 in the compress bins column would condense the final five length bins for the specified data source.

\myparagraph{Composition Error Distribution}	
The options are:
\begin{itemize}
	\item 0 = Multinomial Error;
	\item 1 = Dirichlet-multinomial Error (linear); and
	\begin{itemize}
		\item The Dirichlet-multinomial Error distribution requires the addition of a parameter lines for the natural log of the effective sample size multiplier ($\theta$) at the end of the selectivity parameter section in the control file. See the \hyperlink{Dirichletparameter}{Dirichlet parameter} in the control file for information regarding setup.
		\item The Parameter Select option needs be used to specify which data sources should be weighted together or separate. 
	\end{itemize}
	\item 2 = Dirichlet-multinomial Error (saturation).
	\begin{itemize}
		\item This parameterization of the Dirichlet-multinomial Error has not been tested, so this option should be used with caution. The Dirichlet-multinomial Error data weighting approach will calculate the effective sample size based on equation 12 from \citet{thorson-model-based-2017} where the estimated parameter will now be in terms of $\beta$. The application of this method should follow the same steps detailed above for option 1. 
	\end{itemize}
	% \item 3 = Multivariate Tweedie. (add when MV Tweedie is implemented)
\end{itemize}

%\pagebreak

\myparagraph{Parameter Select}	
Value that indicates the groups of composition data for estimation of the Dirichlet 
% or Multivariate Tweedie (add when MV Tweedie is implemented)
parameter for weighting composition data.

\begin{itemize}
	\item 0 = Default; and
	\item 1-N = Only used for the Dirichlet option. Set to a sequence of numbers from 1 to N where N is the total number of combinations of fleet and age/length. That is, if you have 3 fleets with length data, but only 2 also have age data, you would have values 1 to 3 in the length comp setup and 4 to 5 in the age comp setup. You can also have a data weight that is shared across fleets by repeating values in Parameter Select. Note that there can be no skipped numbers in the sequence from 1 to N, otherwise the model will exit on error when reading in the input files.
\end{itemize}	

\myparagraph{Minimum Sample Size}
The minimum value (floor) for all sample sizes. This value must be at least 0.001. Conditional age-at-length data may have observations with sample sizes less than 1. Version 3.24 had an implicit minimum sample size value of 1.

\myparagraph{Additional information on Dirichlet Parameter Number and Effective Sample Sizes}
If the Dirichlet-multinomial error distribution is selected, indicate here which of a list of Dirichlet-multinomial parameters will be used for this fleet. So each fleet could use a unique Dirichlet-multinomial parameter, or all could share the same, or any combination of unique and shared. The requested number of Dirichlet-multinomial parameters are specified as parameter lines in the control file immediately after the selectivity parameter section. Please note that age-compositions Dirichlet-multinomial parameters are continued after length-compositions, so a model with one fleet and both data types would presumably require two new Dirichlet-multinomial parameters.  	 	
	
The Dirichlet estimates the effective sample size as $N_{eff}=\frac{1}{1+\theta}+\frac{N\theta}{1+\theta}$ where $\theta$ is the estimated parameter and $N$ is the input sample size. Stock Synthesis estimates the natural log of the Dirichlet-multinomial parameter such that $\hat{\theta}_{\text{fishery}} = e^{-0.6072} = 0.54$ where assuming $N=100$ for the fishery would result in an effective sample size equal to 35.7.
	
This formula for effective sample size implies that, as the Stock Synthesis parameter $ln(DM\text{\_theta})$ goes to large values (i.e., 20), then the adjusted sample size will converge to the input sample size. In this case, small changes in the value of the $ln(DM\text{\_theta})$ parameter has no action, and the derivative of the negative log likelihood is zero with respect to the parameter, which means the Hessian will be singular and cannot be inverted. To avoid this non-invertible Hessian when the $ln(DM\text{\_theta})$ parameter becomes large, turn it off while fixing it at the high value. This is equivalent to turning off down-weighting of fleets where evidence suggests that the input sample sizes are reasonable.
	
For additional information about the Dirichlet-multinomial please see \citet{thorson-model-based-2017} and the detailed \hyperlink{DataWeight}{Data Weighting} section.

\hypertarget{CompTiming}{}
\subsection[Length Composition Data]{\protect\hyperlink{CompTiming}{Length Composition Data}}
Composition data can be entered as proportions, numbers, or values of observations by length bin based on data expansions.  

The data bins do not need to cover all observed lengths. The selection of data bin structure should be based on the observed distribution of lengths and the assumed growth curve. If growth asymptotes at larger lengths, having additional length bins across these sizes may not contribute information to the model and may slow model run time. Additionally, the lower length bin selection should be selected such that, depending on the size selection, to allow for information on smaller fish and possible patterns in recruitment. While set separately users should ensure that the length and age bins align. It is recommended to explore multiple configurations of length and age bins to determine the impact of this choice on model estimation.

Specify the length composition data as:
\begin{center}
	\begin{tabular}{p{4cm} p{10cm}}
		\hline
		28 & Number of length bins for data \\
		\hline
		26 28 30 ... 80 & Vector of length bins associated with the length data \\
		\hline
	\end{tabular}
\end{center}
Note: the vector of length bins above will aggregate data from outside
the range of values as follows:
\begin{center}
    \begin{tabular}{lccccccccc}
		\hline
  		             & bin 1 & bin 2 & bin 3 & ... & bin 27 & bin 28 \\ 
		\hline
 		bin vector   & 26 & 28 & 30 & ... & 78 & 80 \\ 
    	bin contains & 0--27.99 & 28--29.99 & 30--30.99 & ... & 78--79.99 & 80+ \\
		\hline
    \end{tabular}
\end{center}

Example of a single length composition observation:
\vspace*{-1cm} % used this because the spacing was off in the pdf
\begin{center}
	\begin{tabular}{p{1.5cm} p{1.5cm} p{1.5cm} p{1.5cm} p{1.5cm} p{1.5cm} p{5cm}}
		\multicolumn{7}{l}{} \\
		\hline
		Year & Month & Fleet & Sex & Partition & Nsamp & data vector \Tstrut\Bstrut\\
		\hline
		1986 & 1 & 1 & 3 & 0 & 20 & <female then male data> \Tstrut\\
		... & ... & ... & ... & ... & ... & ... \\
		-9999 & 0 & 0 & 0 & 0 & 0 & <0 repeated for each element of the data vector above> \Bstrut\\
		\hline	
	\end{tabular}
\end{center}

\myparagraph{Sex}
If model has only one sex defined in the set-up, all observations must have sex set equal to 0 or 1 and the data vector by year will equal the number of the user defined data bins. This also applies to the age data. 

In a 2 sex model, the data vector always has female data followed by male data, even if only one of the two sexes has data that will be used. The below description applies to a 2 sex model:
\begin{itemize}
	\item Sex = 0 means combined male and female (must already be combined and information placed in the female portion of the data vector) (male entries must exist for correct data reading, then will be ignored).
	\item Sex = 1 means female only (male entries must exist for correct data reading, then will be ignored).
	\item Sex = 2 means male only (female entries must exist and will be ignored after being read).
	\item Sex = 3 means data from both sexes will be used and they are scaled so that they together sum to 1.0; i.e., sex ratio is preserved.
\end{itemize}

\myparagraph{Partition}
Partition indicates samples from either combined, discards, or retained catch. 
Note: if retention is not defined in the selectivity section, observations with Partition = 2 will be changed to Partition = 0.
\begin{itemize}
	\item 0 = combined (whole, e.g., discard + retained);
	\item 1 = discard; and
	\item 2 = retained.
\end{itemize}

\myparagraph{Excluding Data}	
\begin{itemize}
	\item If the value of year is negative, then that observation is not transferred into the working array. This feature is the easiest way to include observations in a data file but not to use them in a particular model scenario.
	\item If the value of fleet in the length or age composition observed data line is negative, then the observation is processed and its expected value and log likelihood are calculated, but this log likelihood is not included in the total log likelihood. This feature allows the user to see the fit to a provisional observation without having that observation affect the model.
\end{itemize}

\myparagraph{Note}
When processing data to be input into SS3, all observed fish of sizes smaller than the first bin should be added to the first bin and all observed fish larger than the last bin should be condensed into the last bin.	

The number of length composition data lines no longer needs to be specified in order to read the length (or age) composition data. Starting in v.3.30, the model will continue to read length composition data until a pre-specified exit line is read. The exit line is specified by entering -9999 at the end of the data matrix. The -9999 indicates to the model the end of length composition lines to be read.

Each observation can be stored as one row for ease of data management in a spreadsheet and for sorting of the observations. However, the 6 header values, the female vector and the male vector could each be on a separate line because \gls{admb} reads values consecutively from the input file and will move to the next line as necessary to read additional values.

The composition observations can be in any order and replicate observations by a year for a fleet are allowed (unlike survey and discard data). However, if the super-period approach is used, then each super-periods' observations must be contiguous in the data file.

\hypertarget{AgeCompOption}{}
\subsection[Age Composition Option]{\protect\hyperlink{AgeCompOption}{Age Composition Option}}
The age composition section begins by reading the number of age bins. If the value 0 is entered for the number of age bins, then skips reading the bin structure and all reading of other age composition data inputs.
\begin{center}
	\vspace*{-\baselineskip}
	\begin{tabular}{p{3cm} p{13cm}}
		\hline
		17 \Tstrut & Number of age bins; can be equal to 0 if age data are not used; do not include a vector of age bins if the number of age bins is set equal to 0. \Bstrut\\
		\hline
	\end{tabular}
\end{center}

\hypertarget{AgeCompBins}{}
\subsubsection[Age Composition Bins]{\protect\hyperlink{AgeCompBins}{Age Composition Bins}}
If a positive number of age bins is read, then reads the bin definition next.
\begin{center}
	\vspace*{-\baselineskip}
	\begin{tabular}{p{3cm} p{13cm}}
		\hline
		1 2 3 ... 20 25 & Vector of ages \Tstrut\Bstrut\\
		\hline		
	\end{tabular}
\end{center}
The bins are in terms of observed age (here age) and entered as the lower edge of each bin. Each ageing imprecision definition is used to create a matrix that translates true age structure into age structure. The first and last age' bins work as accumulators. So in the example any age 0 fish that are caught would be assigned to the age = 1 bin.

\hypertarget{AgeError}{}
\subsubsection[Ageing Error]{\protect\hyperlink{AgeError}{Ageing Error}}
Here, the capability to create a distribution of age (e.g., age with possible bias and imprecision) from true age is created. One or many ageing error definitions can be created. For each, the model will expect an input vector of mean age and a vector of standard deviations associated with the mean age. 

\begin{center}
	\begin{longtable}{p{2cm} p{2cm} p{2cm} p{1cm} p{4.5cm} p{2.5cm}}
		\hline
		\multicolumn{1}{l}{2} & \multicolumn{5}{l}{Number of ageing error matrices to generate} \Tstrut\Bstrut\\
		\hline \\
		\multicolumn{6}{l}{Example with no bias and very little uncertainty at age:} \Tstrut\Bstrut\\
		\hline
		Age-0 & Age-1 & Age-2 &...& Max Age & \Tstrut\Bstrut\\
		\hline
		-1 & -1 & -1 &...& -1 & \#Mean Age \Tstrut\\
		0.001 & 0.001 & 0.001 &...& 0.001 & \#SD \Bstrut\\
		\hline \\
		\multicolumn{6}{l}{Example with no bias and some uncertainty at age:} \Tstrut\Bstrut\\
		\hline
		0.5 & 1.5 & 2.5 &...& Max Age + 0.5 & \#Mean Age \Tstrut\\
		0.5 & 0.65 & 0.67 &...& 4.3 & \#SD Age \Bstrut\\
		\hline \\
		\multicolumn{6}{l}{Example with bias and uncertainty at age:} \Tstrut\Bstrut\\
		\hline
		0.5 & 1.4 & 2.3 &...& Max Age + Age Bias & \#Mean Age \Tstrut\\
		0.5 & 0.65 & 0.67 &...& 4.3 & \#SD Age \Bstrut\\
		\hline
	\end{longtable}
\end{center}
\vspace*{-1.2cm}

In principle, one could have year or laboratory specific matrices for ageing error. For each matrix, enter a vector with mean age for each true age; if there is no ageing bias, then set age equal to true age + 0.5.  Alternatively, -1 value for mean age means to set it equal to true age plus 0.5. The addition of + 0.5 is needed so that fish will get assigned to the intended integer age. The length of the input vector is equal to the population maximum age plus one (0-max age), with the first entry being for age 0 fish and the last for fish of population maximum age even if the maximum age bin for the data is lower than the population maximum age. The following line is a vector with the standard deviation of age for each true age with a normal distribution assumption.

The model is able to create one ageing error matrix from parameters, rather than from an input vector. The range of conditions in which this new feature will perform well has not been evaluated, so it should be considered as a preliminary implementation and subject to modification. To invoke this option, for the selected ageing error vector, set the standard deviation of ageing error to a negative value for age 0. This will cause creation of an ageing error matrix from parameters and any age or size-at-age data that specify use of this age error pattern will use this matrix. Then in the control file, add a full parameter line below the cohort growth deviation parameter (or the movement parameter lines if used) in the mortality growth parameter section. These parameters are described in the control file section of this manual.

Code for ageing error calculation can be found in \href{https://github.com/nmfs-ost/ss3-source-code/blob/main/SS_miscfxn.tpl}{\texttt{SS\_miscfxn.tpl}}, search for function ``get\_age\_age'' or ``SS\_Label\_Function 45''.

\hypertarget{AgeCompSpec}{}
\subsubsection[Age Composition Specification]{\protect\hyperlink{AgeCompSpec}{Age Composition Specification}}
If age data are included in the model, the following set-up is required, similar to the length data section. See \hyperlink{length-comp-structure}{Length Composition Data Structure} for details on each of these inputs.

\begin{tabular}{p{2cm} p{2cm} p{2cm} p{1.5cm} p{1.5cm} p{2cm} p{2cm}}
	\multicolumn{7}{l}{Specify bin compression and error structure for age composition data for each fleet:} \\
	\hline
	Min.      & Constant & Combine  &           & Comp. &        & Min. \Tstrut\\
	Tail      & added    & males \& & Compress. & Error & Param. & Sample \\
	Compress. & to prop. & females  & Bins      & Dist. & Select & Size \Bstrut\\
	\hline
	0 & 0.0001 & 1 & 0 & 0 & 0 & 1 \Tstrut\\
	0 & 0.0001 & 1 & 0 & 0 & 0 & 1 \Bstrut\\
	\hline
\end{tabular}

\begin{tabular}{p{1cm} p{14cm}}
	 & \\
	\multicolumn{2}{l}{Specify method by which length bin range for age obs will be interpreted:} \\
	\hline
	1 & Bin method for age data \Tstrut\\
	  & 1 = value refers to population bin index \\
	  & 2 = value refers to data bin index \\
	  & 3 = value is actual length (which must correspond to population length bin \\
	  & boundary) \Bstrut\\
	 \hline
\end{tabular}

\begin{tabular}{p{1cm} p{1cm} p{1cm} p{1cm} p{1.5cm} p{1cm} p{1cm} p{1cm} p{1cm} p{2.5cm}}
	\multicolumn{10}{l}{} \\
	\multicolumn{10}{l}{An example age composition observation:} \\
	\hline
	Year & Month & Fleet & Sex & Partition & Age Err & Lbin lo & Lbin hi & Nsamp & Data Vector \Tstrut\\
	\hline
	1987 & 1 & 1 & 3 & 0 & 2 & -1 & -1 & 79 & <enter data values> \Tstrut\\
	-9999 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \Bstrut\\
	\hline
\end{tabular}

Syntax for Sex, Partition, and data vector are same as for length. The data vector has female values then male values, just as for the length composition data.

% \pagebreak

\myparagraph{Age Error}
Age error (Age Err) identifies which ageing error matrix to use to generate expected value for this observation.

\myparagraph{Lbin Low and Lbin High}	
Lbin lo and Lbin hi are the range of length bins that this age composition observation refers to. Normally these are entered with a value of -1 and -1 to select the full size range. Whether these are entered as population bin number, length data bin number, or actual length is controlled by the value of the length bin range method above.

\begin{itemize}
	\item Entering value of 0 or -1 for Lbin lo converts Lbin lo to 1;
	\item Entering value of 0 or -1 for Lbin hi converts Lbin hi to Maxbin;
	\item It is strongly advised to use the -1 codes to select the full size range. If you use explicit values, then the model could unintentionally exclude information from some size range if the population bin structure is changed.
	\item In reporting to the \texttt{comp\_report.sso}, the reported Lbin\_lo and Lbin\_hi values are always converted to actual length.
\end{itemize}

\myparagraph{Excluding Data}
As with the length composition data, a negative year value causes the observation to not be read into the working matrix, a negative value for fleet causes the observation to be included in expected values calculation, but not in contribution to total log likelihood, a negative value for month causes start-stop of super-period.

\hypertarget{CondAatL}{}
\subsection[Conditional Age-at-Length]{\protect\hyperlink{CondAatL}{Conditional Age-at-Length}}
Use of conditional age-at-length will greatly increase the total number of age composition observations and associated model run time, but there can be several advantages to inputting ages in this fashion. First, it avoids double use of fish for both age and size information because the age information is considered conditional on the length information. Second, it contains more detailed information about the relationship between size and age so provides stronger ability to estimate growth parameters, especially the variance of size-at-age. Lastly, where age data are collected in a length-stratified program, the conditional age-at-length approach can directly match the protocols of the sampling program.

However, simulation research has shown that the use of conditional age-at-length data can result in biased growth estimates in the presence of unaccounted for age-based movement when length-based selectivity is assumed \citep{lee-effects-2017}, when other age-based processes (e.g., mortality) are not accounted for \citep{lee-use-2019}, or based on the age sampling protocol \citep{piner-evaluation-2016}. Understanding how data are collected (e.g., random, length-conditioned samples) and the biology of the stock is important when using conditional age-at-length data for a fleet.     

In a two sex model, it is best to enter these conditional age-at-length data as single sex observations (sex = 1 for females and = 2 for males), rather than as joint sex observations (sex = 3). Inputting joint sex observations comes with a more rigid assumption about sex ratios within each length bin. Using separate vectors for each sex allows 100\% of the expected composition to be fit to 100\% observations within each sex, whereas with the sex = 3 option, you would have a bad fit if the sex ratio were out of balance with the model expectation, even if the observed proportion at age within each sex exactly matched the model expectation for that age. Additionally, inputting the conditional age-at-length data as single sex observations isolates the age composition data from any sex selectivity as well.

Conditional age-at-length data are entered within the age composition data section and can be mixed with marginal age observations for other fleets of other years within a fleet. To treat age data as conditional on length, Lbin\_lo and Lbin\_hi are used to select a subset of the total size range. This is different from setting Lbin\_lo and Lbin\_hi both to -1 to select the entire size range, which treats the data entered on this line within the age composition data section as marginal age composition data.  

\vspace*{-\baselineskip}
\begin{tabular}{p{1cm} p{1cm} p{1cm} p{1cm} p{1.5cm} p{1cm} p{1cm} p{1cm} p{1cm} p{2.5cm}}
	\multicolumn{10}{l}{} \\
	\multicolumn{10}{l}{An example conditional age-at-length composition observations:} \\
	\hline
	Year & Month & Fleet & Sex & Partition & Age Err & Lbin lo & Lbin hi & Nsamp & Data Vector \Tstrut\\
	\hline
	1987 & 1 & 1 & 1 & 0 & 2 & 10 & 10 & 18 & <data values> \Tstrut\\
	1987 & 1 & 1 & 1 & 0 & 2 & 12 & 12 & 24 & <data values> \Tstrut\\
	1987 & 1 & 1 & 1 & 0 & 2 & 14 & 14 & 16 & <data values> \Tstrut\\
	1987 & 1 & 1 & 1 & 0 & 2 & 16 & 16 & 30 & <data values> \Tstrut\\
	-9999 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \Bstrut\\
	\hline
\end{tabular}

In this example observation, the age data is treated as on being conditional on the 2 cm length bins of 10--11.99, 12--13.99, 14--15.99, and 16--17.99 cm. If there are no observations of ages for a specific sex within a length bin for a specific year, that entry may be omitted.

\hypertarget{MeanLorBWatA}{}
\subsection[Mean Length or Body Weight-at-Age]{\protect\hyperlink{MeanLorBWatA}{Mean Length or Body Weight-at-Age}}
The model also accepts input of mean length-at-age or mean body weight-at-age. This is done in terms of observed age, not true age, to take into account the effects of ageing imprecision on expected mean size-at-age. If the value of the Age Error column is positive, then the observation is interpreted as mean length-at-age. If the value of the Age Error column is negative, then the observation is interpreted as mean body weight-at-age and the abs(Age Error) is used as Age Error.

\begin{center}
	\begin{tabular}{p{0.75cm} p{1cm} p{0.75cm} p{1cm} p{0.75cm} p{1cm} p{1cm} p{3.2cm} p{3.2cm}}
		\hline
		1 & \multicolumn{8}{l}{Use mean size-at-age observation (0 = none, 1 = read data matrix)} \Tstrut\\
		\multicolumn{9}{l}{An example observation:} \Bstrut\\
		\hline
		   &       &       &     &       & Age  &        & Data Vector     & Sample Size \Tstrut\\
		Yr & Month & Fleet & Sex & Part. & Err. & Ignore & (Female - Male) & (Female - Male) \Bstrut\\
		\hline
		1989  & 7 & 1 & 3 & 0 & 1 & 999 & <Mean Size values> & <Sample Sizes> \Tstrut\\
		...   &   &   &   &   &   &   &  & \\
		-9999 & 0 & 0 & 0 & 0 & 0 & 0 & 0 0 0 0 0 0 0 & 0 0 0 0 0 0 0 \Bstrut\\
		\hline
	\end{tabular}
\end{center}


\myparagraph{Note}
\begin{itemize}
	\item Negatively valued mean size entries with be ignored in fitting. This feature allows the user to see the fit to a provisional observation without having that observation affect the model.
	\item A number of fish value of 0 will cause mean size value to be ignored in fitting. If the number of fish is zero, a non-zero mean size or body weight-at-age value, such as 0.01 or -999, still needs to be added. This feature allows the user to see the fit to a provisional observation without having that observation affect the model.
	\item Negative value for year causes observation to not be included in the working matrix. This feature is the easiest way to include observations in a data file but not to use them in a particular model scenario.
	\item Each sexes' data vector and N fish vector has length equal to the number of age bins.
	\item The ``Ignore'' column is not used (set aside for future options) but still needs to have default values in that column (any value). 
	\item Where age data are being entered as conditional age-at-length and growth parameters are being estimated, it may be useful to include a mean length-at-age vector with nil emphasis to provide another view on the model's estimates.
	\item An experiment that may be of interest might be to take the body weight-at-age data and enter it to the model as empirical body weight-at-true age in the \texttt{wtatage.ss} file, and to contrast results to entering the same body weight-at-age data here and to attempt to estimate growth parameters, potentially time-varying, that match these body weight data.
	\item If using mean size-at-age data, please see the \hyperlink{SaAlambda}{lambda usage notes} regarding issues for model fitting depending upon other data within the model.  		
\end{itemize}

\hypertarget{env-dat}{}
\subsection[Environmental Data]{\protect\hyperlink{env-dat}{Environmental Data}}
The model accepts input of time series of environmental data. Parameters can be made to be time-varying by making them a function of one of these environmental time series. In v.3.30.16 the option to specify the centering of environmental data by either using the mean of the by mean and the z-score. 

\begin{center}
	\vspace*{-\baselineskip}
	\begin{tabular}{p{1cm} p{2cm} p{2cm} p{1cm}}
		\multicolumn{4}{l}{Parameter values can be a function of an environmental data series:} \\
		\hline
		1 & \multicolumn{3}{l}{Number of environmental variables} \Tstrut\Bstrut\\
		\multicolumn{4}{l}{The environmental data can be centered by subtracting the mean and dividing by \gls{sd}} \\
		\multicolumn{4}{l}{(z-score, -1) or by subtracting the mean of the environmental variable (-2) based on} \\
		\multicolumn{4}{l}{the year column value.} \\
		\hline
		\multicolumn{4}{l}{COND > 0  Example of 2 environmental observations:} \Tstrut\\
		  & Year & Variable & Value \Bstrut\\
		\hline
		  & 1990 & 1 & 0.10 \Tstrut\\
		  & 1991 & 1 & 0.15 \\
		  & -1   & 1 & 1 \\
		  & -2   & 2 & 1 \\
		  & -9999 & 0 & 0 \Bstrut\\
		\hline
	\end{tabular}
\end{center}

The final two lines in the example above indicate in that variable series 1 will be centered by subtracting the mean and dividing by the \gls{sd} (indicated by the -1 value in the year column). The environmental variable series 2 will be centered by subtracting the mean of the time series (indicated by the -2 value in the year column). The input in the ``value'' column for both of the final two lines specifying the centering of the time series is ignored by the model. The control file also will need to be modified to in the long parameter line column ``env-var'' for the selected parameter. This feature was added in v.3.30.16.


\myparagraph{Note}
\begin{itemize}
	\item Any years for which environmental data are not read are assigned a value of 0.0. None of the current link functions contain a link parameter that acts as an offset. Therefore, you should subtract the mean from your data. This lessens the problem with missing observations, but does not eliminate it. A better approach for dealing with missing observations is to use a different approach for the environmental effect on the parameter. Set up the parameter to have random deviations for all years, then enter the zero-centered environmental information as a \hyperlink{SpecialSurvey}{special survey of type 35} and set up the catchability of that survey to be a link to the deviation vector. This is a more complex approach, but it is superior in treatment of missing values and superior in allowing for error in the environmental relationship.
	\item Users can assign environmental conditions for the initial equilibrium year by including environmental data for one year before the start year. However, this works only for recruitment parameters, not biology or selectivity parameters.
	\item Environmental data can be read for up to 100 years after the end year of the model. Then, if the recruitment-environment link has been activated, the future recruitments will be influenced by any future environmental data. This could be used to create a future ``regime shift'' by setting historical values of the relevant environmental variable equal to zero and future values equal to 1, in which case the magnitude of the regime shift would be dictated by the value of the environmental linkage parameter. Note that only future recruitment and growth can be modified by the environmental inputs; there are no options to allow environmentally-linked selectivity in the forecast years.
\end{itemize}

\hypertarget{GenSizeComp}{}
\subsection[Generalized Size Composition Data]{\protect\hyperlink{GenSizeComp}{Generalized Size Composition Data}}
The generalized approach to size composition information was designed initially to provide a means to include weight frequency data. However, the uses are broader, such as allowing for size composition data with different data bins. The user can define as many generalized size composition methods as necessary.

\begin{itemize}
	\item Each method has a specified number of bins.
	\item Each method has ``units'' so the frequencies can be in units of biomass or numbers.
	\item Each method has ``scale'' so the bins can be in terms of weight or length (including ability to convert bin definitions in pounds or inches to kg or cm). 
	\item The composition data is input as females then males, just like all other composition data in SS3. In a two-sex model, the new composition data can be combined sex, single sex, or both sex.
	\item The generalized size composition data can be from the combined discard and retained (i.e., whole), discard only, or retained only.
	\item There are two options for treating fish that in population size bins are smaller than the smallest size frequency bin.
	\begin{itemize}
		\item Option 1: By default, these fish are excluded (unlike length composition data where the small fish are automatically accumulated up into the first bin).
		\item Option 2: If the first size bin is given as a negative value, then accumulation is turned on and the absolute value of the entered value is used as the lower edge of the first size bin.
	\end{itemize}
\end{itemize}

\begin{center}
	\begin{tabular}{p{1.4cm} p{0.7cm} p{12.8 cm}}
		\multicolumn{3}{l}{Example entry:} \\
		\hline
		2 & & Number (N) of size frequency methods to be read. If this value is 0, then omit all entries below. A value of -1 (or any negative value) triggers expanded optional inputs below that allow for Dirichlet 
		% or two parameter Multivariate (MV) Tweedie likelihood (add when MV Tweedie is implemented)
		for fitting these data. \Tstrut\Bstrut\\
		\hline
		\multicolumn{3}{l}{COND < 0 - Number of size frequency} \Tstrut\\
		\multicolumn{2}{l}{2} & Number of size frequency methods to read \Tstrut\\
		\multicolumn{3}{l}{END COND < 0} \Bstrut\\
		\hline
		\multicolumn{2}{r}{25 15} & Number of bins per method \Tstrut\\
		\multicolumn{2}{r}{2 2} & Units per each method (1 = biomass, 2 = numbers) \\
		\multicolumn{2}{r}{3 3} & Scale per each method (1 = kg, 2 = lbs, 3 = cm, 4 = inches) \\
		\multicolumn{2}{r}{1e-9 1e-9} & Min compression to add to each observation (entry for each method) \\
		\multicolumn{2}{r}{2 2} & Number of observations per weight frequency method \Bstrut\\
		\hline
		\multicolumn{3}{l}{COND < 0 - Number of size frequency} \Tstrut\\
		\multicolumn{2}{r}{1 1} & Composition error structure (0 = multinomial, 1 = Dirichlet using Theta*n, 2 = Dirichlet using beta) \Tstrut\\
		% , 3 = MV Tweedie (add when MV Tweedie is implemented)
		\multicolumn{2}{r}{1 1} & Parameter select consecutive index for Dirichlet 
		% or MV Tweedie (add when MV Tweedie is implemented)
		composition error \Bstrut\\
		\multicolumn{3}{l}{END COND < 0} \Tstrut\\
		\hline
	\end{tabular}
\end{center}

\begin{center}
	\begin{tabular}{p{0.4cm} p{0.4cm} p{0.4cm} p{0.4cm} p{0.4cm} p{0.4cm} p{0.5cm} p{0.5cm} p{0.5cm} p{0.5cm} p{0.5cm} p{0.5cm} p{0.5cm} p{0.5cm} p{0.5cm} p{0.5cm} p{0.5cm} p{0.5cm} p{0.5cm} p{0.25cm}}
		\multicolumn{18}{l}{Then enter the lower edge of the bins for each method. The two row vectors shown} \\
		\multicolumn{18}{l}{below contain the bin definitions for methods 1 and 2 respectively:} \\
		\hline
		-26 & 28 & 30 & 32 & 34 & 36 & 38 & 40 & 42 & ... & 60 & 62 & 64 & 68 & 72 & 76 & 80 & 90 \Tstrut\\
		-26 & 28 & 30 & 32 & 34 & 36 & 38 & 40 & 42 &  44 & 46 & 48 & 50 & 52 & \multicolumn{4}{l}{54} \Bstrut\\
		\hline 
	\end{tabular}
\end{center}

Example input is shown below. Note that the format is identical to the length composition data, including sex and partition options, except for the addition of the first column, which indicates the size frequency method.

\begin{center}
	\begin{tabular}{p{1.5cm} p{1cm} p{1cm} p{1cm} p{1cm} p{1cm} p{1.5cm} p{5cm}}
		\hline
		& & & & & & Sample & <composition \Tstrut\\
		Method & Year & Month & Fleet & Sex & Part & Size & females then males> \Bstrut\\
		\hline
		1 & 1975 & 1 & 1 & 3 & 0 & 43 & <data> \Tstrut\\
		1 & 1977 & 1 & 1 & 3 & 0 & 43 & <data> \\
		1 & 1979 & 1 & 1 & 3 & 0 & 43 & <data> \\
		1 & 1980 & 1 & 1 & 3 & 0 & 43 & <data> \Bstrut\\
		\hline
	\end{tabular}
\end{center}

\myparagraph{Note}
\begin{itemize}
	\item There is no tail compression for generalized size frequency data.
	\item Super-period capability is as for the length and age composition data.
	\item By choosing units = 2 and scale = 3 with identical bins and a negative first bin to turn accumulation of small fish on, the size composition method is identical to the length composition method.
	\item Bin boundaries do not need to align with the population length bin boundaries. The model interpolates as necessary.
	\item Size bins cannot be defined as narrower than the population bin width.
	\item The transition matrix can depend upon weight-at-length which differs between sexes and can vary seasonally. Thus, the transition matrix is calculated internally for each sex and each season.
\end{itemize}

\hypertarget{tag-recapture}{}
\subsection[Tag-Recapture Data]{\protect\hyperlink{tag-recapture}{Tag-Recapture Data}}
Each released tag group is characterized by an area, time, sex and age at release. Each recapture event is characterized by a time and fleet (since fleets operate in only one area, it is not necessary to specify the area of recapture). Fleets with tagging data must be fishing fleets (e.g., fleet type 1 or 2). 

Inside the model, the tagged cohort is apportioned across all growth patterns in a given area at a given time (with options to apportion to only one sex or to both). The tag cohort by growth pattern then behaves according to the movement and mortality of the growth pattern. The number of tagged fish is modeled as a negligible fraction of the total population, so a tagging event does not move fish from an untagged group to a tagged group. Instead, tagged fish are seeded into the population with no impact at all on the total population abundance or mortality. 

Predominant age at release for each tag group must be assigned; this requirement keeps SS3 efficient. By assigning a tag group to a single age rather than distributing it across all possible ages according to the size composition of the release group, the tag group can be tracked as a single cohort through the age by time matrix with minimal overhead to the rest of the model. Tags are released at the beginning of a season and recaptures follow the timing of the fleet that made the recapture.

\begin{center}
	\begin{longtable}{p{1.1cm} p{1.1cm} p{1.1cm} p{1.1cm} p{1.1cm} p{1.1cm} p{1.1cm} p{1.1cm} p{3cm}}
		\multicolumn{9}{l}{Example set-up for tagging data:} \\
		\hline
		1 & & \multicolumn{7}{l}{Do tags - 0/1/2. If this value is 0, then omit all entries below.} \\
		&   & \multicolumn{7}{l}{If value is 2, read 1 additional input.} \Tstrut\Bstrut\\
		\hline
		\multicolumn{9}{l}{COND > 0 All subsequent tag-recapture entries must be omitted if ``Do Tags'' = 0}
		 \Tstrut\\

		 & 3 & \multicolumn{7}{l}{Number of tag groups} \Bstrut\\
		 \hline
		 & 7 & \multicolumn{7}{l}{Number of recapture events} \Tstrut\Bstrut\\
		 \hline
		 & 2 & \multicolumn{7}{l}{Mixing latency period: N periods to delay before comparing observed} \Tstrut\\
		 &   &  \multicolumn{7}{l}{to expected recoveries (0 = release period).} \Bstrut\\
		 \hline
		 & 10 & \multicolumn{7}{l}{Max periods (seasons) to track recoveries, after which tags enter} \Tstrut\\
		 &    & \multicolumn{7}{l}{accumulator} \Bstrut\\
		 \hline
		 \multicolumn{9}{l}{COND = 2} \Tstrut\\
		 &  2 &  \multicolumn{7}{l}{Minimum recaptures. The number of recaptures $>=$ maxperiod must be} \\
		 &    &  \multicolumn{7}{l}{$>=$ min tags recaptured specified to include tag group in log likelihood}\Bstrut\\
		 
		 \hline
		 & \multicolumn{8}{l}{Release Data} \Tstrut\\ 
		 & TG & Area & Year & Season & <tfill> & Sex & Age & N Release \Bstrut\\ 
		 \hline
		 & 1 & 1 & 1980 & 1 & 999 & 0 & 24 & 2000 \Tstrut\\
		 & 2 & 1 & 1995 & 1 & 999 & 1 & 24 & 1000 \\
		 & 3 & 1 & 1985 & 1 & 999 & 2 & 24 & 10 \Bstrut\\
		 \hline
		 & \multicolumn{8}{l}{Recapture Data} \Tstrut\\
		 & TG &  & Year &  & Season &  & Fleet & Number \Bstrut\\ 
		%  \hline
		 \pagebreak
		 & 1 & & 1982 & & 1 & & 1 & 7 \Tstrut\\
		 & 1 & & 1982 & & 1 & & 2 & 5 \\
		 & 1 & & 1985 & & 1 & & 2 & 0 \\
		 & 2 & & 1997 & & 1 & & 1 & 6 \\
		 & 2 & & 1997 & & 2 & & 1 & 4 \\
		 & 3 & & 1986 & & 1 & & 1 & 7 \\
		 & 3 & & 1986 & & 2 & & 1 & 5 \Bstrut\\
		 \hline
	\end{longtable}
\end{center}

\myparagraph{Note}
\begin{itemize}
	\item The release data must be entered in tag group order.
	\item <tfill> values are placeholders and are replaced by program generated values for model time.
	\item Analysis of the tag-recapture data has one negative log likelihood component for the distribution of recaptures across areas and another negative log likelihood component for the decay of tag recaptures from a group over time. Note the decay of tag recaptures from a group over time suggests information about mortality is available in the tag-recapture data. More on this is in the \hyperlink{tagrecapture}{control file documentation}.
	\item  Do tags option 2 adds an input compared to do tags option 1, minimum recaptures. Minimum recaptures option allows the user to exclude tag groups that have few recaptures after the mixing period from the likelihood. This may be useful when few tags from a group have been recaptured as an alternative to manually removing the groups with these low numbers of recaptured tags from the tagging data.
	\item Warning for earlier versions of SS3: A shortcoming in the recapture calculations when also using Pope's $F$ approach was identified and corrected in v.3.30.14.
\end{itemize}

\hypertarget{StockComp}{}
\subsection[Stock (Morph) Composition Data]{\protect\hyperlink{StockComp}{Stock (Morph) Composition Data}}
It is sometimes possible to observe the fraction of a sample that is composed of fish from different stocks. These data could come from genetics, otolith microchemistry, tags, or other means. The growth pattern feature allows definition of cohorts of fish that have different biological characteristics and which are independently tracked as they move among areas. SS3 now incorporates the capability to calculate the expected proportion of a sample of fish that come from different growth patterns, ``morphs''. In the inaugural application of this feature, there was a 3 area model with one stock spawning and recruiting in area 1, the other stock in area 3, then seasonally the stocks would move into area 2 where stock composition observations were collected, then they moved back to their natal area later in the year.

\begin{center}
	\begin{tabular}{p{1.1cm} p{1.1cm} p{1.1cm} p{1.1cm} p{1.1cm} p{1.1cm} p{1.1cm} p{3.5cm}}
		\multicolumn{8}{l}{Stock composition by growth pattern (morph) data can be entered in as follows:} \\
		\hline
		1 &  \multicolumn{7}{l}{Do morph composition, if zero, then do not enter any further input below.} \Tstrut\Bstrut\\
		\hline
		\multicolumn{8}{l}{COND = 1} \Tstrut\\ 
		& 3 & \multicolumn{6}{l}{Number of observations} \Bstrut\\
		\hline
		& 2 & \multicolumn{6}{l}{Number of morphs} \Tstrut\Bstrut\\
		\hline
		& 0.0001 & \multicolumn{6}{l}{Minimum Compression} \Tstrut\Bstrut\\
		\hline
		& Year & Month & Fleet & Null & Nsamp & \multicolumn{2}{l}{Data by N Morphs} \Tstrut\Bstrut\\
		\hline
		& 1980 & 1 & 1 & 0 & 36 & 0.4 & 0.6 \Tstrut\\
		& 1981 & 1 & 1 & 0 & 40 & 0.44 & 0.54 \\
		& 1982 & 1 & 1 & 0 & 50 & 0.37 & 0.63 \Bstrut\\
		\hline
	\end{tabular}
\end{center}

\myparagraph{Note}
\begin{itemize}
	\item The number of stocks entered with these data must match the number of growth patterns (morphs) in the control file.
	\item Each data line for unique observations should enter data for morph 1 first followed sequentially for each morph included in the model.
	\item The expected value is combined across sexes. The entered data values will be normalized to sum to one within SS3.
	\item The ``null'' flag is included here in the data input section and is a reserved spot for future features. 
	\item Note that there is a specific value of minimum compression to add to all values of observed and expected.
	\item Warning for earlier versions of SS3: A flaw was identified in the calculation of accumulation by morph. This has been corrected in v.3.30.14. Older versions were incorrectly calculating the catch by morph using the expectation around age-at-length which already was accounting for the accumulation by morph.   
\end{itemize}

\hypertarget{SelexEmperical}{}
\subsection[Selectivity Empirical Data (future feature)]{\protect\hyperlink{SelexEmperical}{Selectivity Empirical Data (future feature)}}
It is sometimes possible to conduct field experiments or other studies to provide direct information about the selectivity of a particular length or age relative to the length or age that has peak selectivity, or to have a prior for selectivity that is more easily stated than a prior on a highly transformed selectivity parameter. This section provides a way to input data that would be compared to the specified derived value for selectivity. This is a placeholder at this time, required to include in the data file and will be fully implemented soon.

\begin{center}
	\begin{tabular}{p{1cm} p{1.5cm} p{1.5cm} p{1.5cm} p{1.5cm} p{1.5cm} p{2.5cm} p{2.5cm} p{2.5cm}}
		\multicolumn{9}{l}{Selectivity data feature is under development for a future option and is not yet implemented.} \\
		\multicolumn{9}{l}{The input line still must be specified in as follows:} \\
		\hline
		0 & \multicolumn{8}{l}{Do data read for selectivity (future option)} \Tstrut\Bstrut\\
        \hline
		%& Year & Month & Fleet & Age/Size & Bin \# & Datum & Datum SE\Tstrut\Bstrut\\
		%\hline
	\end{tabular}
\end{center}

\begin{center}
	\begin{tabular}{p{2cm} p{14cm}} \\
		\multicolumn{2}{l}{End of Data File} \\
		\hline
		999 & \#End of data file marker \Tstrut\Bstrut\\
		\hline
	\end{tabular}
\end{center}

\hypertarget{ExcludingData}{}
\subsection[Excluding Data]{\protect\hyperlink{ExcludingData}{Excluding Data}}
Data that are before the model start year or greater than the retrospective year are not moved into the internal working arrays at all. So if you have any alternative observations that are used in some model runs and not in others, you can simply give them a negative year value rather than having to comment them out. The first output to \texttt{data.ss\_new} has the unaltered and complete input data. Subsequent reports to \texttt{data.ss\_new} produce expected values or bootstraps only for the data that are being used. Additional information on bootstrapping is available in \hyperlink{bootstrap}{Bootstrap Data Files Section}. 

Data that are to be included in the calculations of expected values, but excluded from the calculation of negative log likelihood, are flagged by use of a negative value for fleet number.

\hypertarget{SuperPeriod}{}
\subsection[Data Super-Periods]{\protect\hyperlink{SuperPeriod}{Data Super-Periods}}
The super-period capability allows the user to introduce data that represent a blend across a set of time steps and to cause the model to create an expected value for this observation that uses the same set of time steps. The option is available for all types of data and a similar syntax is used. 

All super-period observations must be contiguous in the data file. All but one of the observations in the sequence will have a negative value for fleet ID so the data associated with these dummy observations will be ignored. The observed values must be combined outside the model and then inserted into the data file for the one observation with a positive fleet number.

Super-periods are started with a negative value for month, and then stopped with a negative value for month, observations within the super-period are designated with a negative fleet field. The standard error or input sample size field is now used for weighting of the expected values. An error message is generated if the super-period does not contain one observation with a positive fleet field.

An expected value for the observation will be computed for each selected time period within the super-period. The expected values are weighted according to the values entered in the \gls{se} (or input sample size) field for all observations except the single observation holding the combined data. The expected value for that year gets a relative weight of 1.0. So in the example below, the relative weights are: 1982, 1.0 (fixed); 1983, 0.85; 1985, 0.4; 1986, 0.4. These weights are summed and rescaled to sum to 1.0, and are output in the \texttt{echoinput.sso} file.

Not all time steps within the extent of a super-period need be included. For example, in a three season model, a super-period could be set up to combine information from season 2 across 3 years, e.g., skip over the season 1 and season 3 for the purposes of calculating the expected value for the super-period. The key is to create a dummy observation (negative fleet value) for all time steps, except 1, that will be included in the super-period and to include one real observation (positive fleet value; which contains the real combined data from all the specified time steps).

\begin{center}
	\vspace*{-\baselineskip}
	\begin{tabular}{p{1cm} p{1cm} p{1cm} p{1cm} p{1cm} p{9cm}}
		\multicolumn{6}{l}{Super-period example:} \\
		\hline
		Year & Month & Fleet & Obs & \gls{se} & Comment \Tstrut\Bstrut\\
		\hline
		1982 \Tstrut & \textbf{-2} & 3 & 34.2 & 0.3 & Start super-period. This observation has positive fleet value, so is expected to contain combined data from all identified periods of the super-period. The \gls{se} entered here is use as the \gls{se} of the combined observation. The expected value for the survey in 1982 will have a relative weight of 1.0 (default) in calculating the combined expected value.\Bstrut\\
		\hline
		1983 \Tstrut & 2 & \textbf{-3} & 55 & 0.3 & In super-period; entered observation is ignored. The expected value for the survey in 1983 will have a relative weight equal to the value in the \gls{se} field (0.3) in calculating the combined expected value. \Bstrut\\
		\hline
		1985 \Tstrut & 2 & \textbf{-3}& 88 & 0.40 & Note that 1984 is not included in the super-period. Relative weight for 1985 is 0.4 \Bstrut\\
		\hline
		1986 & \textbf{-2} & \textbf{-3} & 88 & 0.40 & End super-period \Tstrut\Bstrut\\
		\hline
	\end{tabular}
\end{center}

A time step that is within the time extent of the super-period can still have its own separate observation. In the above example, the survey observation in 1984 could be entered as a separate observation, but it must not be entered inside the contiguous block of super-period observations. For composition data (which allow for replicate observations), a particular time steps' observations could be entered as a member of a super-period and as a separate observation.

The super-period concept can also be used to combine seasons within a year with multiple seasons. This usage could be preferred if fish are growing rapidly within the year so their effective age selectivity is changing within year as they grow; fish are growing within the year so fishery data collected year round have a broader size-at-age modes than a mid-year model approximation can produce; and it could be useful in situations with very high fishing mortality.

\pagebreak