abundance_capturerecapture.Rmd

---
title: "Estimating abundance in open populations using capture-recapture models"
author: "Olivier Gimenez"
date: "August 8, 2016"
output:
  word_document: default
  pdf_document: default
  html_document: default
---

## Introduction

Lately, I have found myself repeating the same analyses again and again to estimate population size from capture-recapture models, and the Cormack-Jolly-Seber (CJS) model in particular. I do not intend here to provide extensive details on this model and its variants. It is just a basic attempt to put together some R code to avoid spending hours digging up in my files how to do this analysis.

I use [RMark](http://www.phidot.org/software/mark/docs/book/pdf/app_3.pdf) because everything can be done in R, and it's cool for reproducible research. But other pieces of software are fine too. I consider simple CJS models and models with transience. In passing, I also fit models with heterogeneity in the detection process with finite mixtures and an individual random effect. The bootstrap is used to obtain confidence intervals. I also illustrate multi-model inference and use the bootstrap to perform model selection (Buckland et al. 1997). 

## Abundance estimates from a Cormack-Jolly-Seber model

First, let's load the RMark package for analysing capture-recapture data by calling MARK from R. 

```{r, echo=TRUE, message=FALSE, warning=FALSE}
library(RMark)
```

Each row is an individual that has been detected (coded as a 1) or non-detected (coded as a 0) over the years in columns. Have a look to the file `dataset1.txt` in your favorite text editor. Other formats are fine.

```{r, message=FALSE, warning=FALSE}
hw.dat = import.chdata("dataset1.txt", header = F, field.names = c("ch"), field.types = NULL)
summary(hw.dat)
attach(hw.dat)
```

Now it is time to build our model. We're gonna use a standard Cormack-Jolly-Seber model. I have carried out goodness-of-fit tests before and found that everything was OK. 

```{r, message=FALSE, warning=FALSE}
hw.proc = process.data(hw.dat, model="CJS")
hw.ddl = make.design.data(hw.proc)
```

Then we specify the effects we'd like to consider on survival and detection probabilities.

```{r, message=FALSE, warning=FALSE}
# survival process
Phi.ct = list(formula=~1) # constant
Phi.time = list(formula=~time) # year effect
# detection process
p.ct = list(formula=~1) # constant
p.time = list(formula=~time) # year effect
```

Let's roll and run four models with or without a year effect! 

```{r, message=FALSE, warning=FALSE}
# constant survival, constant recapture
Model.1 = mark(hw.proc,hw.ddl,model.parameters=list(Phi=Phi.ct,p=p.ct),output = FALSE,delete=T)
# constant survival, time-dependent recapture
Model.2 = mark(hw.proc,hw.ddl,model.parameters=list(Phi=Phi.ct,p=p.time),output = FALSE,delete=T)
# time-dependent survival, constant recapture
Model.3 = mark(hw.proc,hw.ddl,model.parameters=list(Phi=Phi.time,p=p.ct),output = FALSE,delete=T)
# time-dependent survival, time-dependent recapture
Model.4 = mark(hw.proc,hw.ddl,model.parameters=list(Phi=Phi.time,p=p.time),output = FALSE,delete=T)
```

Let's have a look to the AIC for these models. 

```{r, message=FALSE, warning=FALSE}
summary(Model.1)$AICc
summary(Model.2)$AICc
summary(Model.3)$AICc
summary(Model.4)$AICc
```

For convenience, we will say that `model 2` is the model best supported by the data, the one with constant survival probability and time-dependent recapture probability. Multi-model selection would be more appropriate here. 
Let's have a look to the parameter estimates: survival, then recapture probabilities estimates.

```{r, message=FALSE, warning=FALSE}
phitable = get.real(Model.2,"Phi", se= TRUE)
# names(phitable)
phitable[c("estimate","se","lcl","ucl")][1,]
```

```{r, message=FALSE, warning=FALSE}
ptable = get.real(Model.2,"p", se= TRUE)
ptable[c("estimate","se","lcl","ucl")][1:7,]
```

Now it's easy to estimate abundance estimates by calculating the ratios of the number of individuals detected at each occasion over the corresponding estimate of recapture probability. Note that we estimate **re**capture probabilities, so that we cannot estimate abundance on the first occasion. 

```{r, message=FALSE, warning=FALSE}
# calculate the nb of recaptured individiduals / occasion
obs = gregexpr("1", hw.dat$ch)
n_obs = summary(as.factor(unlist(obs)))
estim_abundance = n_obs[-1]/ptable$estimate[1:7] 
estim_abundance
```

We use a boostrap approach to get an idea of the uncertainty surrounding these estimates, in particular to obtain the confidence intervals.

We first define the number of bootstrap iterations (10 here for the sake of illustration, should be 500 instead, or even 1000 if the computational burden is not too heavy), the number of capture occasions and format the dataset in which we'd like to resample (with replacement). This is non-parametric bootstrap (and alternative is parametric bootstrap where data are simulated using the model estimates). We also define a matrix `popsize` in which we will store the results, and we define the seed for simulations (to be able to replicate the results).

```{r, message=FALSE, warning=FALSE}
nb_bootstrap = 50
nb_years = 8
target = data.frame(hw.dat,stringsAsFactors=F)
popsize = matrix(NA,nb_bootstrap, nb_years-1)
set.seed(5)
pseudo = target # initialization
```

Finally, we define the model structure and the effects on parameter (same for all bootstrap samples).

```{r, message=FALSE, warning=FALSE}
# define model structure
hw.proc = process.data(pseudo, model="CJS")
hw.ddl = make.design.data(hw.proc)
# define parameter structure
phi.ct = list(formula=~1)
p.time = list(formula=~time)
```

Let's run the bootstrap now: 

```{r, message=FALSE, warning=FALSE}
for (k in 1:nb_bootstrap){
  # resample in the original dataset with replacement
  pseudo$ch = sample(target$ch, replace=T)
  # fit model with Mark
  res = mark(process.data(pseudo),hw.ddl,model.parameters=list(Phi=phi.ct,p=p.time),delete=TRUE,output=FALSE)
  # get recapture prob estimates
  ptable = get.real(res,"p", se= TRUE)
  # calculate the nb of recaptured individiduals / occasion
  allobs = gregexpr("1", pseudo$ch)
  n = summary(as.factor(unlist(allobs)))
  popsize[k,] <- n[-1]/ptable$estimate[1:(nb_years-1)]
}
```

Now we can get confidence intervals:

```{r, message=FALSE, warning=FALSE}
ci_hw = apply(popsize,2,quantile,probs=c(2.5/100,97.5/100),na.rm=T)
ci_hw
```

A plot:

```{r, message=FALSE, warning=FALSE}
plot(1:(nb_years-1),estim_abundance, col="black", type="n", pch=21, xlab="Years", lty=3, ylab="Estimated abundance", main="dataset 1",lwd=3,ylim=c(0,300))
polygon(c(rev(1:(nb_years-1)), 1:(nb_years-1)), c(rev(ci_hw[2,]), ci_hw[1,]), col = 'grey80', border = NA)
lines(1:(nb_years-1), estim_abundance, col="black",lty=3,type='o',lwd=3,pch=21)
```

## What if transience occurs?

We now analyse another dataset `dataset2.txt`.

```{r, echo=TRUE, message=FALSE, warning=FALSE}
library(RMark)
mw.dat = import.chdata("dataset2.txt", header = F, field.names = c("ch"), field.types = NULL)
summary(mw.dat)
attach(mw.dat)
```

The goodness-of-fit tests showed that Test3SR was significant, hence a transient effect due to true transient individuals, an age effet or a bit of both. To account for this effect, we use a two-age class structure on survival.

```{r, message=FALSE, warning=FALSE}
mw.proc = process.data(mw.dat, model = "CJS")
mw.ddl = make.design.data(mw.proc)
mw.ddl = add.design.data(mw.proc, mw.ddl,"Phi", type = "age", bins = c(0,1,6), name = "ageclass", right = FALSE)
```

Then we specify the effects on survival and detection probabilities. Age is always included. We consider time-dependent variation or not on both survival and recapture probabilities. 

```{r, message=FALSE, warning=FALSE}
# survival process
phi.age = list(formula=~ageclass)
phi.ageptime = list(formula=~ageclass+time)
# detection process
p.ct = list(formula=~1)
p.time = list(formula=~time)
```

Let's roll and run models with or without a year effect!

```{r, message=FALSE, warning=FALSE}
Model.1 = mark(mw.proc, mw.ddl, model.parameters = list(Phi = phi.age, p = p.ct),output = FALSE,delete=T)
Model.2 = mark(mw.proc, mw.ddl,model.parameters = list(Phi = phi.age, p = p.time),output = FALSE,delete=T)
Model.3 = mark(mw.proc, mw.ddl,model.parameters = list(Phi = phi.ageptime, p = p.ct),output = FALSE,delete=T)
Model.4 = mark(mw.proc, mw.ddl,model.parameters = list(Phi = phi.ageptime, p = p.time),output = FALSE,delete=T)
```

Let's have a look to the AIC for these models.

```{r, message=FALSE, warning=FALSE}
summary(Model.1)$AICc
summary(Model.2)$AICc
summary(Model.3)$AICc
summary(Model.4)$AICc
```

For simplicity here, we will say that `model 1` is the model that is best supported by the data, the one with constant survival and recapture probabilities. Let's have a look to the parameter estimates: survival, then recapture probabilities estimates.

```{r, message=FALSE, warning=FALSE}
phitable = get.real(Model.1,"Phi", se= TRUE)
# names(phitable)
phitable[c("estimate","se","lcl","ucl")][1:2,]
```

On the first row, the survival is for both transient and resident individuals. On the second row, this is survival for resident individuals.

```{r, message=FALSE, warning=FALSE}
ptable = get.real(Model.1,"p", se= TRUE)
ptable[c("estimate","se","lcl","ucl")][1,]
```

An estimate of abundance is obtained as in the previous section: 

```{r, message=FALSE, warning=FALSE}
# calculate the nb of recaptured individiduals / occasion
obs = gregexpr("1", mw.dat$ch)
n_obs = summary(as.factor(unlist(obs)))
estim_abundance = n_obs[-1]/ptable$estimate[1] 
estim_abundance
```

We use a boostrap approach to get an idea of the uncertainty surrounding these estimates, in particular to obtain the confidence intervals. The bootstrap approach was proposed by [Madon et al. (2013)](https://dl.dropboxusercontent.com/u/23160641/my-pubs/Madonetal2012MMS.pdf). Roger Pradel discovered a bug in the appendix that he corrected. He also substantially simplified the code. I found a minor problem in Roger's code that I corrected. I know, version control would have been great... 

We first define a few quantities. See previous section for details.

```{r, message=FALSE, warning=FALSE}
nb_bootstrap = 10
nb_years = 6

target = data.frame(mw.dat,stringsAsFactors=F)
pseudo = target # initialization

popTot = popT = popR = matrix(NA, nb_bootstrap, nb_years-1) # abundance
tau = rep(NA, nb_bootstrap) # transient rate
det.p = rep(NA, nb_bootstrap) # recapture

set.seed(5)

# model structure
mw.proc <- process.data(mw.dat, model = "CJS")
mw.ddl <- make.design.data(mw.proc)
mw.ddl <- add.design.data(mw.proc, mw.ddl,"Phi", type = "age", bins = c(0,1,6), name = "ageclass", right = FALSE)

# parameters
phiage <- list(formula=~ageclass)
p.ct <- list(formula=~1)
```

Let's run the bootstrap now: 

```{r, message=FALSE, warning=FALSE}
for (k in 1:nb_bootstrap){
  # draw new sample
  pseudo$ch = sample(target$ch, replace=T)
  # calculate R and m
  firstobs = regexpr("1", pseudo$ch)
  R = summary(factor(firstobs,levels=1:nb_years)) 
  allobs = gregexpr("1", pseudo$ch)
  n = summary(as.factor(unlist(allobs)))
  m = n-R
  # fit model with 2 age classes on survival and constant recapture with MARK
  phiage.pct = mark(process.data(pseudo),mw.ddl,model.parameters=list(Phi=phiage,p=p.ct),output = FALSE,delete=T)
  tau[k] = 1 - phiage.pct$results$real[1,1] / phiage.pct$results$real[2,1] # transient rate
  det.p[k] = phiage.pct$results$real[3,1]
  # calculate abundance of residents and transients
  popR[k,] = (m[-1] + R[-1] * (1 - tau[k])) / det.p[k]
  popT[k,] = R[-1] * tau[k] / det.p[k]
}
```

Now we can calculate the abundance of residents and a confidence interval:

```{r, message=FALSE, warning=FALSE}
popRmean = apply(popR,2,mean) # mean resident population size
popRmean
popRci = apply(popR,2,quantile,c(0.025,0.975)) 
popRci
```

A plot:

```{r, message=FALSE, warning=FALSE}
plot(1:(nb_years-1),popRmean, col="black", type="n", pch=21, xlab="Years", lty=3, ylab="Estimated abundance", main="dataset 2",lwd=3,ylim=c(0,120))
polygon(c(rev(1:(nb_years-1)), 1:(nb_years-1)), c(rev(popRci[2,]), popRci[1,]), col = 'grey80', border = NA)
lines(1:(nb_years-1), popRmean, col="black",lty=3,type='o',lwd=3,pch=21)
```

Lastly, it is also possible to calculate an estimate of the transient rate along with its confidence interval using the bootstrap. Alternatively, [the delta-method could be used](https://github.com/oliviergimenez/delta_method).

```{r, message=FALSE, warning=FALSE}
mean(tau)
quantile(tau,probs=c(2.5,97.5)/100)
```

## Dealing with heterogeneity

As we said before, heterogeneity in the detection process may cause bias in abundance estimates (e.g., [Cubaynes et al. 2010)](https://dl.dropboxusercontent.com/u/23160641/my-pubs/Cubaynesetal2010.pdf). I will consider here two ways of dealing with this issue: individual random-effect models and finite-mixture models.

### Individual random effect model 

Let's use `dataset1.txt` of the first section. 

```{r, echo=TRUE, message=FALSE, warning=FALSE}
library(RMark)
hw.dat = import.chdata("dataset1.txt", header = F, field.names = c("ch"), field.types = NULL)
summary(hw.dat)
attach(hw.dat)
```

We use a Cormack-Jolly-Seber model with a random effect in the detection process [(Gimenez and Choquet 2010)](https://dl.dropboxusercontent.com/u/23160641/my-pubs/Gimenez%26Choquet2010Ecology.pdf). The model structure is specified with the `model="CJSRandom"` option. 

```{r, message=FALSE, warning=FALSE}
hw.proc <- process.data(hw.dat, model="CJSRandom")
hw.ddl <- make.design.data(hw.proc)
```

Then we specify the effects on survival and detection probabilities. By default, because we use the random structure in MARK, there is a random effect on both parameters, ie these probabilities are drawn from a normal distribution with a mean and a standard deviation. We fix the standard deviation of the random effect on survival to 0 to fit a model with a constant survival. In contrast, we let MARK estimate both parameters of the random effect for the recapture probability.

```{r, message=FALSE, warning=FALSE}
# mean survival
phi.ct = list(formula=~1) # constant
# standard deviation of the random effect on survival is fixed to 0
# in other words, no random effect on survival
sigmaphi.fixed=list(formula=~1,fixed=0)

# mean recapture probability
p.dot=list(formula=~1)
# standard deviation of the random effect on recapture
sigmap.dot=list(formula=~1)
```

Let's roll and fit this model.

```{r, message=FALSE, warning=FALSE}
model.re = mark(hw.proc,hw.ddl,model.parameters=list(Phi=phi.ct,p=p.dot,sigmap=sigmap.dot,sigmaphi=sigmaphi.fixed),output = FALSE,delete=T)
```

Let's have a look to the parameter estimates:

```{r, message=FALSE, warning=FALSE}
mle_p = get.real(model.re,"p", se= TRUE)[1,c('estimate','se','lcl','ucl')]
sigma_p = get.real(model.re,"sigmap", se= TRUE)[c('estimate','se','lcl','ucl')]
phi = get.real(model.re,"Phi", se= TRUE)[1,c('estimate','se','lcl','ucl')]

mle_p
sigma_p
phi
```

To test whether the random effect is significant, in other words to test the null hypothesis that the standard deviation of the random effect is null, we need to carry out a likelihood ratio test (LRT). The asymptotic behavior of the LRT statistic is a bit weird in that particular situation (see [Gimenez and Choquet 2010](https://dl.dropboxusercontent.com/u/23160641/my-pubs/Gimenez%26Choquet2010Ecology.pdf) for details). 

We first need the deviance of the two models with and without the random effect. To get the deviance of the model without random effect, we could use the results from the first section above, or run a model with the random structure by fixing the standard deviation of the random effect on recapture probability to 0. For the sake of complexity (...), let's use the latter option:

```{r, message=FALSE, warning=FALSE}
phi.ct = list(formula=~1) # constant
sigmaphi.fixed=list(formula=~1,fixed=0)
p.dot=list(formula=~1)
sigmap.fixed=list(formula=~1,fixed=0)
model.without.re = mark(hw.proc,hw.ddl,model.parameters=list(Phi=phi.ct,p=p.dot,sigmap=sigmap.fixed,sigmaphi=sigmaphi.fixed),output = FALSE,delete=T)
```

Then we can form the LRT statistic:

```{r, message=FALSE, warning=FALSE}
dev_model_with_RE = model.re$results$deviance
dev_model_without_RE = model.without.re$results$deviance
LRT = dev_model_without_RE - dev_model_with_RE
```

And calculate the p-value of the test:
```{r, message=FALSE, warning=FALSE}
1-pchisq(LRT,1)
```

The test is significant, we reject the null hypothesis that the standard deviation is 0, therefore there seems to be heterogeneity as detection by the random effect.

From there, one can use the bootstrap as in the first section to estimate abundance along with its confidence interval using the recapture probability estimate `mle_p`. This value was calculated for us by MARK as the inverse [reciprocal function] logit of the mean recapture probability. Have a look to the results: 

```{r, message=FALSE, warning=FALSE}
model.re$results$beta
```

The mean value of the random effect on the recapture is `p:(Intercept)`. 
If you apply the standard $1/(1+exp(-x))$ transformation, you obtain `mle_p`.

```{r, message=FALSE, warning=FALSE}
mean_p = model.re$results$beta[3,1] # extract the mean value of the random effect
1/(1+exp(-mean_p)) # calculate by hand
mle_p[1] # produced by MARK
```

### Finite mixture models

Here, we estimate abundance while accounting for heterogeneity in the detection process using a model with finite mixture (see first section above). To do so, we follow [Cubaynes et al. (2010)](https://dl.dropboxusercontent.com/u/23160641/my-pubs/Cubaynesetal2010.pdf) who extended the appraoch developed by Shirley Pledger and colleagues to account for heterogeneity to estimate abundance. Please, have a look to this paper and its appendix for details about the methods and formulas.

For illustration, let's first fit a model with heterogeneity in the recapture probability, with time-dependent survival and constant recapture probabilities. As usual, we first load the `RMark` package and read in the data. Let's use `dataset2.txt` of the second section. 

```{r, echo=TRUE, message=FALSE, warning=FALSE}
# load RMark package
library(RMark)
# read in data
mw.dat = import.chdata("dataset2.txt", header = F, field.names = c("ch"), field.types = NULL)
summary(mw.dat)
attach(mw.dat)
```

Then we define the model structure, by using the `model="CJSMixture"` option.

```{r, echo=TRUE, message=FALSE, warning=FALSE}
mw.proc = process.data(mw.dat, model="CJSMixture")
mw.ddl = make.design.data(mw.proc)
```

We also define the effect on the parameters. Constant survival, two-finite mixture on the recapture probability and a constant proportion of individual in each class.

```{r, message=FALSE, warning=FALSE}
# survival
phi.ct = list(formula=~1) # constant
# recapture
p.mix = list(formula=~mixture) # mixture
# mixture proportion
pi.dot=list(formula=~1) # constant
```

Let's fit that model:

```{r, message=FALSE, warning=FALSE}
model.het = mark(mw.proc,mw.ddl,model.parameters=list(Phi=phi.ct,p=p.mix,pi=pi.dot),output = FALSE,delete=T)
```

Now how to decide whether heterogeneity is important? The cool thing is that it's fine to use the AIC to compare models with/without heterogeneity [(Cubaynes et al. 2012)](https://dl.dropboxusercontent.com/u/23160641/my-pubs/Cubaynesetal2011MEE.pdf). So let's fit the same model with homogeneous recapture probability and compare the AIC values:

```{r, message=FALSE, warning=FALSE}
mw.proc2 = process.data(mw.dat, model="CJS")
mw.ddl2 = make.design.data(mw.proc2)
model.hom = mark(mw.proc2,mw.ddl2,model.parameters=list(Phi=phi.ct,p=p.ct),output = FALSE,delete=T)
```

Compare the AIC values:

```{r, message=FALSE, warning=FALSE}
summary(model.het)$AICc
summary(model.hom)$AICc
```

Sounds like there is some heterogeneity. Let's have a look to the parameter estimates of the model with heterogeneity:

```{r, message=FALSE, warning=FALSE}
model.het$results$real
```

The proportion of individuals in mixture 1 is:
```{r, message=FALSE, warning=FALSE}
prop = model.het$results$real[1,1]
prop
```

with detection probability:
```{r, message=FALSE, warning=FALSE}
p1 = model.het$results$real[3,1]
p1
```

For the other mixture, the proportion is the complementary and the detection probability is:
```{r, message=FALSE, warning=FALSE}
p2 = model.het$results$real[4,1]
p2
```

Lastly, survival is:
```{r, message=FALSE, warning=FALSE}
phi = model.het$results$real[2,1]
phi
```

Now let's use the bootstrap to get confidence intervals (and median) for abundance.

We will need a function to spot the first detections in the encounter histories:
```{r, message=FALSE, warning=FALSE}
firstdetection = function(x){
  b = sort(x,index.return=T,decreasing=T)
  res = b$ix[1]
  return(res)}
```

We will also need to add spaces between the columns of the dataset:
```{r, message=FALSE, warning=FALSE}
mw.dat.spaces = matrix(as.numeric(unlist(strsplit(mw.dat$ch, ''))),nrow=nrow(mw.dat),byrow=T)
```

Let's run the bootstrap:

```{r, message=FALSE, warning=FALSE}
Nhet = NULL # initialization
nb_bootstrap = 10 # nb of bootstrap iterations (should be 500 or 1000!)

for (i in 1:nb_bootstrap){

  # resample in original dataset
  mask = sample.int(nrow(mw.dat.spaces),replace=T)
  pseudodata = mw.dat.spaces[mask,]

  # nb of ind detected per occasion
  cijdata = apply(pseudodata,2,sum)

  # first detection
  d = apply(pseudodata,1,firstdetection)

  # u (newly marked)
  udata = NULL
  for(kk in 1:ncol(pseudodata)) {udata[kk] = sum(d == kk)}

  # m (already marked)
  mdata = cijdata - udata

  # delete first occasion
  cij = cijdata[-1]
  m = mdata[-1]
  u = udata[-1]
  
  # expected newly marked (Cubaynes et al. 2010)
  bigU <- matrix(0,nrow=length(p1),ncol=length(u)) # here length(p1) = 1, but if time-dependent, length(p1) > 1
  for(zz in 1:ncol(bigU)) {
    bigU[,zz] = (1-prop) * u[zz] / p2 + prop * u[zz] / p1
    }
  # expected already marked (Cubaynes et al. 2010)
  # M2 = u1 (pi phi1 + (1-pi) phi1)
  # M3 = u1 (pi phi1 phi2 + (1-pi) phi1 phi2) + u2 (pi phi2 + (1-pi) phi2)
  # M4 = u1 (pi phi1 phi2 phi3 + (1-pi) phi1 phi2 phi3) + u2 (pi phi2 phi3 + (1-pi) phi2 phi3) + u3 (pi phi3 + (1-pi) phi3)
  # ...
  surv <- matrix(rep(phi,length(u)),ncol=length(u),byrow=T) # to be modified if phi is time-dependent
  bigM <- matrix(0,nrow=nrow(surv),ncol=ncol(surv))
  for(ii in 1:nrow(bigM)) {
    for(t in 1:ncol(bigM)) {
      temp <- rep(NA,t)
      for(j in 1:t) {
        temp[j] <- u[j] * ((1-prop) * prod(surv[ii,1:j]) + prop * prod(surv[ii,1:j]))
        }
      bigM[ii,t] = sum(temp)
    }
    }
  # compute abundance estimate for current bootstrap sample
  Nhet <- rbind(Nhet,bigU + bigM)
}
```

Get median and confidence interval:
```{r, message=FALSE, warning=FALSE}
res <- apply(Nhet,2,quantile,c(2.5,50,97.5)/100)
res
```

Let's do a nice plot:

```{r, message=FALSE, warning=FALSE}
library(ggplot2)
mp = data.frame(year = 1:ncol(Nhet), Nhat = res[2,])
N = nrow(mp)
predframe <- with(mp,data.frame(year,Nhat,lwr=res[1,],upr=res[3,]))
(p1 <- ggplot(mp, aes(year, Nhat))+
    geom_point()+
    geom_line(data=predframe)+
    geom_ribbon(data=predframe,aes(ymin=lwr,ymax=upr),alpha=0.3)
    + ylab("Estimated abundance") + xlab("Year"))
```

## Model-averaging abundance estimates

If you remember the first section, the 4 models we fitted to `dataset1` had close AICc values. In this situation, multi-model inference is recommended. Following Buckland et al. (1997), we use the bootstrap to obtain model-averaged abundance estimates.  

First, load RMark and read in that data:
```{r, echo=TRUE, message=FALSE, warning=FALSE}
rm(list=ls(all=TRUE))
library(RMark)
hw.dat = import.chdata("dataset1.txt", header = F, field.names = c("ch"), field.types = NULL)
```

Define the effects on parameters:
```{r, message=FALSE, warning=FALSE}
# define parameter structure
Phi.ct = list(formula=~1) # constant survival
Phi.time = list(formula=~time) # year effect on survival
p.ct = list(formula=~1) # constant detection
p.time = list(formula=~time) # year effect on detection
```

Let's define a few quantities we will need later on:
```{r, message=FALSE, warning=FALSE}
nb_bootstrap = 10 # use 500 or 1000 instead
nb_years = 8
target = data.frame(hw.dat,stringsAsFactors=F)
set.seed(5)
pseudo = target # initialization
# storage quantities
outlist <- vector("list", nb_bootstrap)
outdata <- vector("list", nb_bootstrap)
outreal <- vector("list", nb_bootstrap)
outnhat <- vector("list", nb_bootstrap)
```

Let's run the boostrap:
```{r, message=FALSE, warning=FALSE}
for (k in 1:nb_bootstrap){
  # resample in the original dataset with replacement
  pseudo$ch = sample(target$ch, replace=T)
  # define model structure
  hw.proc = process.data(pseudo, model="CJS")
  hw.ddl = make.design.data(hw.proc)
  # fit all 4 models with Mark
  # constant survival, constant recapture
  Model.1 = mark(hw.proc,hw.ddl,model.parameters=list(Phi=Phi.ct,p=p.ct),output = FALSE,delete=T)
  # constant survival, time-dependent recapture
  Model.2 = mark(hw.proc,hw.ddl,model.parameters=list(Phi=Phi.ct,p=p.time),output = FALSE,delete=T)
  # time-dependent survival, constant recapture
  Model.3 = mark(hw.proc,hw.ddl,model.parameters=list(Phi=Phi.time,p=p.ct),output = FALSE,delete=T)
  # time-dependent survival, time-dependent recapture
  Model.4 = mark(hw.proc,hw.ddl,model.parameters=list(Phi=Phi.time,p=p.time),output = FALSE,delete=T)
  # gather results
  all.models <- collect.models()

  # store bootstrap sample
  outdata[[k]] <- pseudo
  outlist[[k]] <- all.models$model.table
  
  # get best model
  ind_best_mod = row.names(all.models$model.table)[1]
  name_best_mod = paste('Model.',ind_best_mod,sep='')
  
  # get recapture estimate from best model
  # if constant, duplicate values
  # if time-dep, take the whole vector
  mle.p = get.real(get(name_best_mod),"p", se= TRUE)$estimate[1:(nb_years-1)]
  outreal[[k]] = mle.p
  
  # get abundance estimate
  allobs = gregexpr("1", pseudo$ch)
  n = summary(as.factor(unlist(allobs)))
  nhat = n[-1]/mle.p
  outnhat[[k]] = nhat
  }
```

Convert the list of abundance estimate in a matrix, and calculate quantiles over the bootstrap iterations:
```{r, message=FALSE, warning=FALSE}
res = matrix(unlist(outnhat),nrow=length(outnhat),byrow=T)
apply(res,2,mean) # mean estimate
apply(res,2,mean) # median estimate
apply(res,2,quantile,probs=c(2.5,97.5)/100) # confidence interval
```

This is not exactly what Buckland et al. (1997) advocates, see last paragraph of section 3 in this paper, but this will do for now. 

## To do

* check the calculations for `bigU` and `bigM` and make them generic (what if survival is time-dependen?) 
* add Jolly-Seber as in [Karamanlidis et al. (2015)](https://dl.dropboxusercontent.com/u/23160641/my-pubs/Karamanlidisetal2015-Arcturos.pdf)
* add robust-design as in papers currently in reviews (including model selection with bootstrap).

## References

* Buckland ST, Burnham KP, Augustin NH (1997). Model selection: An integral part of inference. Biometrics. 53:603–618.

* Cubaynes, S., C. Lavergne, E. Marboutin, and O. Gimenez (2012). Assessing individual heterogeneity using model selection criteria: How many mixture components in capture-recapture models? Methods in Ecology and Evolution 3: 564-573.

* Cubaynes, S. Pradel, R. Choquet, R. Duchamp, C. Gaillard, J.-M., Lebreton, J.-D., Marboutin, E., Miquel, C., Reboulet, A.-M., Poillot, C., Taberlet, P. and O. Gimenez. (2010). Importance of accounting for detection heterogeneity when estimating abundance: the case of French wolves. Conservation Biology 24:621-626.

* Gimenez, O. and R. Choquet (2010). Incorporating individual heterogeneity in studies on marked animals using numerical integration: capture-recapture mixed models. Ecology 91: 951-957.

* Karamanlidis, A.A., M. de Gabriel Hernando, L. Krambokoukis, O. Gimenez (2015). Evidence of a large carnivore population recovery: counting bears in Greece. Journal for Nature Conservation. 27: 10–17

* Madon, B., C. Garrigue, R. Pradel, O. Gimenez (2013). Transience in the humpback whale population of New Caledonia and implications for abundance estimation. Marine Mammal Science 29: 669-678.