-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathabundance_capturerecapture.Rmd
675 lines (527 loc) · 27.6 KB
/
abundance_capturerecapture.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
---
title: "Estimating abundance in open populations using capture-recapture models"
author: "Olivier Gimenez"
date: "August 8, 2016"
output:
word_document: default
pdf_document: default
html_document: default
---
## Introduction
Lately, I have found myself repeating the same analyses again and again to estimate population size from capture-recapture models, and the Cormack-Jolly-Seber (CJS) model in particular. I do not intend here to provide extensive details on this model and its variants. It is just a basic attempt to put together some R code to avoid spending hours digging up in my files how to do this analysis.
I use [RMark](http://www.phidot.org/software/mark/docs/book/pdf/app_3.pdf) because everything can be done in R, and it's cool for reproducible research. But other pieces of software are fine too. I consider simple CJS models and models with transience. In passing, I also fit models with heterogeneity in the detection process with finite mixtures and an individual random effect. The bootstrap is used to obtain confidence intervals. I also illustrate multi-model inference and use the bootstrap to perform model selection (Buckland et al. 1997).
## Abundance estimates from a Cormack-Jolly-Seber model
First, let's load the RMark package for analysing capture-recapture data by calling MARK from R.
```{r, echo=TRUE, message=FALSE, warning=FALSE}
library(RMark)
```
Each row is an individual that has been detected (coded as a 1) or non-detected (coded as a 0) over the years in columns. Have a look to the file `dataset1.txt` in your favorite text editor. Other formats are fine.
```{r, message=FALSE, warning=FALSE}
hw.dat = import.chdata("dataset1.txt", header = F, field.names = c("ch"), field.types = NULL)
summary(hw.dat)
attach(hw.dat)
```
Now it is time to build our model. We're gonna use a standard Cormack-Jolly-Seber model. I have carried out goodness-of-fit tests before and found that everything was OK.
```{r, message=FALSE, warning=FALSE}
hw.proc = process.data(hw.dat, model="CJS")
hw.ddl = make.design.data(hw.proc)
```
Then we specify the effects we'd like to consider on survival and detection probabilities.
```{r, message=FALSE, warning=FALSE}
# survival process
Phi.ct = list(formula=~1) # constant
Phi.time = list(formula=~time) # year effect
# detection process
p.ct = list(formula=~1) # constant
p.time = list(formula=~time) # year effect
```
Let's roll and run four models with or without a year effect!
```{r, message=FALSE, warning=FALSE}
# constant survival, constant recapture
Model.1 = mark(hw.proc,hw.ddl,model.parameters=list(Phi=Phi.ct,p=p.ct),output = FALSE,delete=T)
# constant survival, time-dependent recapture
Model.2 = mark(hw.proc,hw.ddl,model.parameters=list(Phi=Phi.ct,p=p.time),output = FALSE,delete=T)
# time-dependent survival, constant recapture
Model.3 = mark(hw.proc,hw.ddl,model.parameters=list(Phi=Phi.time,p=p.ct),output = FALSE,delete=T)
# time-dependent survival, time-dependent recapture
Model.4 = mark(hw.proc,hw.ddl,model.parameters=list(Phi=Phi.time,p=p.time),output = FALSE,delete=T)
```
Let's have a look to the AIC for these models.
```{r, message=FALSE, warning=FALSE}
summary(Model.1)$AICc
summary(Model.2)$AICc
summary(Model.3)$AICc
summary(Model.4)$AICc
```
For convenience, we will say that `model 2` is the model best supported by the data, the one with constant survival probability and time-dependent recapture probability. Multi-model selection would be more appropriate here.
Let's have a look to the parameter estimates: survival, then recapture probabilities estimates.
```{r, message=FALSE, warning=FALSE}
phitable = get.real(Model.2,"Phi", se= TRUE)
# names(phitable)
phitable[c("estimate","se","lcl","ucl")][1,]
```
```{r, message=FALSE, warning=FALSE}
ptable = get.real(Model.2,"p", se= TRUE)
ptable[c("estimate","se","lcl","ucl")][1:7,]
```
Now it's easy to estimate abundance estimates by calculating the ratios of the number of individuals detected at each occasion over the corresponding estimate of recapture probability. Note that we estimate **re**capture probabilities, so that we cannot estimate abundance on the first occasion.
```{r, message=FALSE, warning=FALSE}
# calculate the nb of recaptured individiduals / occasion
obs = gregexpr("1", hw.dat$ch)
n_obs = summary(as.factor(unlist(obs)))
estim_abundance = n_obs[-1]/ptable$estimate[1:7]
estim_abundance
```
We use a boostrap approach to get an idea of the uncertainty surrounding these estimates, in particular to obtain the confidence intervals.
We first define the number of bootstrap iterations (10 here for the sake of illustration, should be 500 instead, or even 1000 if the computational burden is not too heavy), the number of capture occasions and format the dataset in which we'd like to resample (with replacement). This is non-parametric bootstrap (and alternative is parametric bootstrap where data are simulated using the model estimates). We also define a matrix `popsize` in which we will store the results, and we define the seed for simulations (to be able to replicate the results).
```{r, message=FALSE, warning=FALSE}
nb_bootstrap = 50
nb_years = 8
target = data.frame(hw.dat,stringsAsFactors=F)
popsize = matrix(NA,nb_bootstrap, nb_years-1)
set.seed(5)
pseudo = target # initialization
```
Finally, we define the model structure and the effects on parameter (same for all bootstrap samples).
```{r, message=FALSE, warning=FALSE}
# define model structure
hw.proc = process.data(pseudo, model="CJS")
hw.ddl = make.design.data(hw.proc)
# define parameter structure
phi.ct = list(formula=~1)
p.time = list(formula=~time)
```
Let's run the bootstrap now:
```{r, message=FALSE, warning=FALSE}
for (k in 1:nb_bootstrap){
# resample in the original dataset with replacement
pseudo$ch = sample(target$ch, replace=T)
# fit model with Mark
res = mark(process.data(pseudo),hw.ddl,model.parameters=list(Phi=phi.ct,p=p.time),delete=TRUE,output=FALSE)
# get recapture prob estimates
ptable = get.real(res,"p", se= TRUE)
# calculate the nb of recaptured individiduals / occasion
allobs = gregexpr("1", pseudo$ch)
n = summary(as.factor(unlist(allobs)))
popsize[k,] <- n[-1]/ptable$estimate[1:(nb_years-1)]
}
```
Now we can get confidence intervals:
```{r, message=FALSE, warning=FALSE}
ci_hw = apply(popsize,2,quantile,probs=c(2.5/100,97.5/100),na.rm=T)
ci_hw
```
A plot:
```{r, message=FALSE, warning=FALSE}
plot(1:(nb_years-1),estim_abundance, col="black", type="n", pch=21, xlab="Years", lty=3, ylab="Estimated abundance", main="dataset 1",lwd=3,ylim=c(0,300))
polygon(c(rev(1:(nb_years-1)), 1:(nb_years-1)), c(rev(ci_hw[2,]), ci_hw[1,]), col = 'grey80', border = NA)
lines(1:(nb_years-1), estim_abundance, col="black",lty=3,type='o',lwd=3,pch=21)
```
## What if transience occurs?
We now analyse another dataset `dataset2.txt`.
```{r, echo=TRUE, message=FALSE, warning=FALSE}
library(RMark)
mw.dat = import.chdata("dataset2.txt", header = F, field.names = c("ch"), field.types = NULL)
summary(mw.dat)
attach(mw.dat)
```
The goodness-of-fit tests showed that Test3SR was significant, hence a transient effect due to true transient individuals, an age effet or a bit of both. To account for this effect, we use a two-age class structure on survival.
```{r, message=FALSE, warning=FALSE}
mw.proc = process.data(mw.dat, model = "CJS")
mw.ddl = make.design.data(mw.proc)
mw.ddl = add.design.data(mw.proc, mw.ddl,"Phi", type = "age", bins = c(0,1,6), name = "ageclass", right = FALSE)
```
Then we specify the effects on survival and detection probabilities. Age is always included. We consider time-dependent variation or not on both survival and recapture probabilities.
```{r, message=FALSE, warning=FALSE}
# survival process
phi.age = list(formula=~ageclass)
phi.ageptime = list(formula=~ageclass+time)
# detection process
p.ct = list(formula=~1)
p.time = list(formula=~time)
```
Let's roll and run models with or without a year effect!
```{r, message=FALSE, warning=FALSE}
Model.1 = mark(mw.proc, mw.ddl, model.parameters = list(Phi = phi.age, p = p.ct),output = FALSE,delete=T)
Model.2 = mark(mw.proc, mw.ddl,model.parameters = list(Phi = phi.age, p = p.time),output = FALSE,delete=T)
Model.3 = mark(mw.proc, mw.ddl,model.parameters = list(Phi = phi.ageptime, p = p.ct),output = FALSE,delete=T)
Model.4 = mark(mw.proc, mw.ddl,model.parameters = list(Phi = phi.ageptime, p = p.time),output = FALSE,delete=T)
```
Let's have a look to the AIC for these models.
```{r, message=FALSE, warning=FALSE}
summary(Model.1)$AICc
summary(Model.2)$AICc
summary(Model.3)$AICc
summary(Model.4)$AICc
```
For simplicity here, we will say that `model 1` is the model that is best supported by the data, the one with constant survival and recapture probabilities. Let's have a look to the parameter estimates: survival, then recapture probabilities estimates.
```{r, message=FALSE, warning=FALSE}
phitable = get.real(Model.1,"Phi", se= TRUE)
# names(phitable)
phitable[c("estimate","se","lcl","ucl")][1:2,]
```
On the first row, the survival is for both transient and resident individuals. On the second row, this is survival for resident individuals.
```{r, message=FALSE, warning=FALSE}
ptable = get.real(Model.1,"p", se= TRUE)
ptable[c("estimate","se","lcl","ucl")][1,]
```
An estimate of abundance is obtained as in the previous section:
```{r, message=FALSE, warning=FALSE}
# calculate the nb of recaptured individiduals / occasion
obs = gregexpr("1", mw.dat$ch)
n_obs = summary(as.factor(unlist(obs)))
estim_abundance = n_obs[-1]/ptable$estimate[1]
estim_abundance
```
We use a boostrap approach to get an idea of the uncertainty surrounding these estimates, in particular to obtain the confidence intervals. The bootstrap approach was proposed by [Madon et al. (2013)](https://dl.dropboxusercontent.com/u/23160641/my-pubs/Madonetal2012MMS.pdf). Roger Pradel discovered a bug in the appendix that he corrected. He also substantially simplified the code. I found a minor problem in Roger's code that I corrected. I know, version control would have been great...
We first define a few quantities. See previous section for details.
```{r, message=FALSE, warning=FALSE}
nb_bootstrap = 10
nb_years = 6
target = data.frame(mw.dat,stringsAsFactors=F)
pseudo = target # initialization
popTot = popT = popR = matrix(NA, nb_bootstrap, nb_years-1) # abundance
tau = rep(NA, nb_bootstrap) # transient rate
det.p = rep(NA, nb_bootstrap) # recapture
set.seed(5)
# model structure
mw.proc <- process.data(mw.dat, model = "CJS")
mw.ddl <- make.design.data(mw.proc)
mw.ddl <- add.design.data(mw.proc, mw.ddl,"Phi", type = "age", bins = c(0,1,6), name = "ageclass", right = FALSE)
# parameters
phiage <- list(formula=~ageclass)
p.ct <- list(formula=~1)
```
Let's run the bootstrap now:
```{r, message=FALSE, warning=FALSE}
for (k in 1:nb_bootstrap){
# draw new sample
pseudo$ch = sample(target$ch, replace=T)
# calculate R and m
firstobs = regexpr("1", pseudo$ch)
R = summary(factor(firstobs,levels=1:nb_years))
allobs = gregexpr("1", pseudo$ch)
n = summary(as.factor(unlist(allobs)))
m = n-R
# fit model with 2 age classes on survival and constant recapture with MARK
phiage.pct = mark(process.data(pseudo),mw.ddl,model.parameters=list(Phi=phiage,p=p.ct),output = FALSE,delete=T)
tau[k] = 1 - phiage.pct$results$real[1,1] / phiage.pct$results$real[2,1] # transient rate
det.p[k] = phiage.pct$results$real[3,1]
# calculate abundance of residents and transients
popR[k,] = (m[-1] + R[-1] * (1 - tau[k])) / det.p[k]
popT[k,] = R[-1] * tau[k] / det.p[k]
}
```
Now we can calculate the abundance of residents and a confidence interval:
```{r, message=FALSE, warning=FALSE}
popRmean = apply(popR,2,mean) # mean resident population size
popRmean
popRci = apply(popR,2,quantile,c(0.025,0.975))
popRci
```
A plot:
```{r, message=FALSE, warning=FALSE}
plot(1:(nb_years-1),popRmean, col="black", type="n", pch=21, xlab="Years", lty=3, ylab="Estimated abundance", main="dataset 2",lwd=3,ylim=c(0,120))
polygon(c(rev(1:(nb_years-1)), 1:(nb_years-1)), c(rev(popRci[2,]), popRci[1,]), col = 'grey80', border = NA)
lines(1:(nb_years-1), popRmean, col="black",lty=3,type='o',lwd=3,pch=21)
```
Lastly, it is also possible to calculate an estimate of the transient rate along with its confidence interval using the bootstrap. Alternatively, [the delta-method could be used](https://github.com/oliviergimenez/delta_method).
```{r, message=FALSE, warning=FALSE}
mean(tau)
quantile(tau,probs=c(2.5,97.5)/100)
```
## Dealing with heterogeneity
As we said before, heterogeneity in the detection process may cause bias in abundance estimates (e.g., [Cubaynes et al. 2010)](https://dl.dropboxusercontent.com/u/23160641/my-pubs/Cubaynesetal2010.pdf). I will consider here two ways of dealing with this issue: individual random-effect models and finite-mixture models.
### Individual random effect model
Let's use `dataset1.txt` of the first section.
```{r, echo=TRUE, message=FALSE, warning=FALSE}
library(RMark)
hw.dat = import.chdata("dataset1.txt", header = F, field.names = c("ch"), field.types = NULL)
summary(hw.dat)
attach(hw.dat)
```
We use a Cormack-Jolly-Seber model with a random effect in the detection process [(Gimenez and Choquet 2010)](https://dl.dropboxusercontent.com/u/23160641/my-pubs/Gimenez%26Choquet2010Ecology.pdf). The model structure is specified with the `model="CJSRandom"` option.
```{r, message=FALSE, warning=FALSE}
hw.proc <- process.data(hw.dat, model="CJSRandom")
hw.ddl <- make.design.data(hw.proc)
```
Then we specify the effects on survival and detection probabilities. By default, because we use the random structure in MARK, there is a random effect on both parameters, ie these probabilities are drawn from a normal distribution with a mean and a standard deviation. We fix the standard deviation of the random effect on survival to 0 to fit a model with a constant survival. In contrast, we let MARK estimate both parameters of the random effect for the recapture probability.
```{r, message=FALSE, warning=FALSE}
# mean survival
phi.ct = list(formula=~1) # constant
# standard deviation of the random effect on survival is fixed to 0
# in other words, no random effect on survival
sigmaphi.fixed=list(formula=~1,fixed=0)
# mean recapture probability
p.dot=list(formula=~1)
# standard deviation of the random effect on recapture
sigmap.dot=list(formula=~1)
```
Let's roll and fit this model.
```{r, message=FALSE, warning=FALSE}
model.re = mark(hw.proc,hw.ddl,model.parameters=list(Phi=phi.ct,p=p.dot,sigmap=sigmap.dot,sigmaphi=sigmaphi.fixed),output = FALSE,delete=T)
```
Let's have a look to the parameter estimates:
```{r, message=FALSE, warning=FALSE}
mle_p = get.real(model.re,"p", se= TRUE)[1,c('estimate','se','lcl','ucl')]
sigma_p = get.real(model.re,"sigmap", se= TRUE)[c('estimate','se','lcl','ucl')]
phi = get.real(model.re,"Phi", se= TRUE)[1,c('estimate','se','lcl','ucl')]
mle_p
sigma_p
phi
```
To test whether the random effect is significant, in other words to test the null hypothesis that the standard deviation of the random effect is null, we need to carry out a likelihood ratio test (LRT). The asymptotic behavior of the LRT statistic is a bit weird in that particular situation (see [Gimenez and Choquet 2010](https://dl.dropboxusercontent.com/u/23160641/my-pubs/Gimenez%26Choquet2010Ecology.pdf) for details).
We first need the deviance of the two models with and without the random effect. To get the deviance of the model without random effect, we could use the results from the first section above, or run a model with the random structure by fixing the standard deviation of the random effect on recapture probability to 0. For the sake of complexity (...), let's use the latter option:
```{r, message=FALSE, warning=FALSE}
phi.ct = list(formula=~1) # constant
sigmaphi.fixed=list(formula=~1,fixed=0)
p.dot=list(formula=~1)
sigmap.fixed=list(formula=~1,fixed=0)
model.without.re = mark(hw.proc,hw.ddl,model.parameters=list(Phi=phi.ct,p=p.dot,sigmap=sigmap.fixed,sigmaphi=sigmaphi.fixed),output = FALSE,delete=T)
```
Then we can form the LRT statistic:
```{r, message=FALSE, warning=FALSE}
dev_model_with_RE = model.re$results$deviance
dev_model_without_RE = model.without.re$results$deviance
LRT = dev_model_without_RE - dev_model_with_RE
```
And calculate the p-value of the test:
```{r, message=FALSE, warning=FALSE}
1-pchisq(LRT,1)
```
The test is significant, we reject the null hypothesis that the standard deviation is 0, therefore there seems to be heterogeneity as detection by the random effect.
From there, one can use the bootstrap as in the first section to estimate abundance along with its confidence interval using the recapture probability estimate `mle_p`. This value was calculated for us by MARK as the inverse [reciprocal function] logit of the mean recapture probability. Have a look to the results:
```{r, message=FALSE, warning=FALSE}
model.re$results$beta
```
The mean value of the random effect on the recapture is `p:(Intercept)`.
If you apply the standard $1/(1+exp(-x))$ transformation, you obtain `mle_p`.
```{r, message=FALSE, warning=FALSE}
mean_p = model.re$results$beta[3,1] # extract the mean value of the random effect
1/(1+exp(-mean_p)) # calculate by hand
mle_p[1] # produced by MARK
```
### Finite mixture models
Here, we estimate abundance while accounting for heterogeneity in the detection process using a model with finite mixture (see first section above). To do so, we follow [Cubaynes et al. (2010)](https://dl.dropboxusercontent.com/u/23160641/my-pubs/Cubaynesetal2010.pdf) who extended the appraoch developed by Shirley Pledger and colleagues to account for heterogeneity to estimate abundance. Please, have a look to this paper and its appendix for details about the methods and formulas.
For illustration, let's first fit a model with heterogeneity in the recapture probability, with time-dependent survival and constant recapture probabilities. As usual, we first load the `RMark` package and read in the data. Let's use `dataset2.txt` of the second section.
```{r, echo=TRUE, message=FALSE, warning=FALSE}
# load RMark package
library(RMark)
# read in data
mw.dat = import.chdata("dataset2.txt", header = F, field.names = c("ch"), field.types = NULL)
summary(mw.dat)
attach(mw.dat)
```
Then we define the model structure, by using the `model="CJSMixture"` option.
```{r, echo=TRUE, message=FALSE, warning=FALSE}
mw.proc = process.data(mw.dat, model="CJSMixture")
mw.ddl = make.design.data(mw.proc)
```
We also define the effect on the parameters. Constant survival, two-finite mixture on the recapture probability and a constant proportion of individual in each class.
```{r, message=FALSE, warning=FALSE}
# survival
phi.ct = list(formula=~1) # constant
# recapture
p.mix = list(formula=~mixture) # mixture
# mixture proportion
pi.dot=list(formula=~1) # constant
```
Let's fit that model:
```{r, message=FALSE, warning=FALSE}
model.het = mark(mw.proc,mw.ddl,model.parameters=list(Phi=phi.ct,p=p.mix,pi=pi.dot),output = FALSE,delete=T)
```
Now how to decide whether heterogeneity is important? The cool thing is that it's fine to use the AIC to compare models with/without heterogeneity [(Cubaynes et al. 2012)](https://dl.dropboxusercontent.com/u/23160641/my-pubs/Cubaynesetal2011MEE.pdf). So let's fit the same model with homogeneous recapture probability and compare the AIC values:
```{r, message=FALSE, warning=FALSE}
mw.proc2 = process.data(mw.dat, model="CJS")
mw.ddl2 = make.design.data(mw.proc2)
model.hom = mark(mw.proc2,mw.ddl2,model.parameters=list(Phi=phi.ct,p=p.ct),output = FALSE,delete=T)
```
Compare the AIC values:
```{r, message=FALSE, warning=FALSE}
summary(model.het)$AICc
summary(model.hom)$AICc
```
Sounds like there is some heterogeneity. Let's have a look to the parameter estimates of the model with heterogeneity:
```{r, message=FALSE, warning=FALSE}
model.het$results$real
```
The proportion of individuals in mixture 1 is:
```{r, message=FALSE, warning=FALSE}
prop = model.het$results$real[1,1]
prop
```
with detection probability:
```{r, message=FALSE, warning=FALSE}
p1 = model.het$results$real[3,1]
p1
```
For the other mixture, the proportion is the complementary and the detection probability is:
```{r, message=FALSE, warning=FALSE}
p2 = model.het$results$real[4,1]
p2
```
Lastly, survival is:
```{r, message=FALSE, warning=FALSE}
phi = model.het$results$real[2,1]
phi
```
Now let's use the bootstrap to get confidence intervals (and median) for abundance.
We will need a function to spot the first detections in the encounter histories:
```{r, message=FALSE, warning=FALSE}
firstdetection = function(x){
b = sort(x,index.return=T,decreasing=T)
res = b$ix[1]
return(res)}
```
We will also need to add spaces between the columns of the dataset:
```{r, message=FALSE, warning=FALSE}
mw.dat.spaces = matrix(as.numeric(unlist(strsplit(mw.dat$ch, ''))),nrow=nrow(mw.dat),byrow=T)
```
Let's run the bootstrap:
```{r, message=FALSE, warning=FALSE}
Nhet = NULL # initialization
nb_bootstrap = 10 # nb of bootstrap iterations (should be 500 or 1000!)
for (i in 1:nb_bootstrap){
# resample in original dataset
mask = sample.int(nrow(mw.dat.spaces),replace=T)
pseudodata = mw.dat.spaces[mask,]
# nb of ind detected per occasion
cijdata = apply(pseudodata,2,sum)
# first detection
d = apply(pseudodata,1,firstdetection)
# u (newly marked)
udata = NULL
for(kk in 1:ncol(pseudodata)) {udata[kk] = sum(d == kk)}
# m (already marked)
mdata = cijdata - udata
# delete first occasion
cij = cijdata[-1]
m = mdata[-1]
u = udata[-1]
# expected newly marked (Cubaynes et al. 2010)
bigU <- matrix(0,nrow=length(p1),ncol=length(u)) # here length(p1) = 1, but if time-dependent, length(p1) > 1
for(zz in 1:ncol(bigU)) {
bigU[,zz] = (1-prop) * u[zz] / p2 + prop * u[zz] / p1
}
# expected already marked (Cubaynes et al. 2010)
# M2 = u1 (pi phi1 + (1-pi) phi1)
# M3 = u1 (pi phi1 phi2 + (1-pi) phi1 phi2) + u2 (pi phi2 + (1-pi) phi2)
# M4 = u1 (pi phi1 phi2 phi3 + (1-pi) phi1 phi2 phi3) + u2 (pi phi2 phi3 + (1-pi) phi2 phi3) + u3 (pi phi3 + (1-pi) phi3)
# ...
surv <- matrix(rep(phi,length(u)),ncol=length(u),byrow=T) # to be modified if phi is time-dependent
bigM <- matrix(0,nrow=nrow(surv),ncol=ncol(surv))
for(ii in 1:nrow(bigM)) {
for(t in 1:ncol(bigM)) {
temp <- rep(NA,t)
for(j in 1:t) {
temp[j] <- u[j] * ((1-prop) * prod(surv[ii,1:j]) + prop * prod(surv[ii,1:j]))
}
bigM[ii,t] = sum(temp)
}
}
# compute abundance estimate for current bootstrap sample
Nhet <- rbind(Nhet,bigU + bigM)
}
```
Get median and confidence interval:
```{r, message=FALSE, warning=FALSE}
res <- apply(Nhet,2,quantile,c(2.5,50,97.5)/100)
res
```
Let's do a nice plot:
```{r, message=FALSE, warning=FALSE}
library(ggplot2)
mp = data.frame(year = 1:ncol(Nhet), Nhat = res[2,])
N = nrow(mp)
predframe <- with(mp,data.frame(year,Nhat,lwr=res[1,],upr=res[3,]))
(p1 <- ggplot(mp, aes(year, Nhat))+
geom_point()+
geom_line(data=predframe)+
geom_ribbon(data=predframe,aes(ymin=lwr,ymax=upr),alpha=0.3)
+ ylab("Estimated abundance") + xlab("Year"))
```
## Model-averaging abundance estimates
If you remember the first section, the 4 models we fitted to `dataset1` had close AICc values. In this situation, multi-model inference is recommended. Following Buckland et al. (1997), we use the bootstrap to obtain model-averaged abundance estimates.
First, load RMark and read in that data:
```{r, echo=TRUE, message=FALSE, warning=FALSE}
rm(list=ls(all=TRUE))
library(RMark)
hw.dat = import.chdata("dataset1.txt", header = F, field.names = c("ch"), field.types = NULL)
```
Define the effects on parameters:
```{r, message=FALSE, warning=FALSE}
# define parameter structure
Phi.ct = list(formula=~1) # constant survival
Phi.time = list(formula=~time) # year effect on survival
p.ct = list(formula=~1) # constant detection
p.time = list(formula=~time) # year effect on detection
```
Let's define a few quantities we will need later on:
```{r, message=FALSE, warning=FALSE}
nb_bootstrap = 10 # use 500 or 1000 instead
nb_years = 8
target = data.frame(hw.dat,stringsAsFactors=F)
set.seed(5)
pseudo = target # initialization
# storage quantities
outlist <- vector("list", nb_bootstrap)
outdata <- vector("list", nb_bootstrap)
outreal <- vector("list", nb_bootstrap)
outnhat <- vector("list", nb_bootstrap)
```
Let's run the boostrap:
```{r, message=FALSE, warning=FALSE}
for (k in 1:nb_bootstrap){
# resample in the original dataset with replacement
pseudo$ch = sample(target$ch, replace=T)
# define model structure
hw.proc = process.data(pseudo, model="CJS")
hw.ddl = make.design.data(hw.proc)
# fit all 4 models with Mark
# constant survival, constant recapture
Model.1 = mark(hw.proc,hw.ddl,model.parameters=list(Phi=Phi.ct,p=p.ct),output = FALSE,delete=T)
# constant survival, time-dependent recapture
Model.2 = mark(hw.proc,hw.ddl,model.parameters=list(Phi=Phi.ct,p=p.time),output = FALSE,delete=T)
# time-dependent survival, constant recapture
Model.3 = mark(hw.proc,hw.ddl,model.parameters=list(Phi=Phi.time,p=p.ct),output = FALSE,delete=T)
# time-dependent survival, time-dependent recapture
Model.4 = mark(hw.proc,hw.ddl,model.parameters=list(Phi=Phi.time,p=p.time),output = FALSE,delete=T)
# gather results
all.models <- collect.models()
# store bootstrap sample
outdata[[k]] <- pseudo
outlist[[k]] <- all.models$model.table
# get best model
ind_best_mod = row.names(all.models$model.table)[1]
name_best_mod = paste('Model.',ind_best_mod,sep='')
# get recapture estimate from best model
# if constant, duplicate values
# if time-dep, take the whole vector
mle.p = get.real(get(name_best_mod),"p", se= TRUE)$estimate[1:(nb_years-1)]
outreal[[k]] = mle.p
# get abundance estimate
allobs = gregexpr("1", pseudo$ch)
n = summary(as.factor(unlist(allobs)))
nhat = n[-1]/mle.p
outnhat[[k]] = nhat
}
```
Convert the list of abundance estimate in a matrix, and calculate quantiles over the bootstrap iterations:
```{r, message=FALSE, warning=FALSE}
res = matrix(unlist(outnhat),nrow=length(outnhat),byrow=T)
apply(res,2,mean) # mean estimate
apply(res,2,mean) # median estimate
apply(res,2,quantile,probs=c(2.5,97.5)/100) # confidence interval
```
This is not exactly what Buckland et al. (1997) advocates, see last paragraph of section 3 in this paper, but this will do for now.
## To do
* check the calculations for `bigU` and `bigM` and make them generic (what if survival is time-dependen?)
* add Jolly-Seber as in [Karamanlidis et al. (2015)](https://dl.dropboxusercontent.com/u/23160641/my-pubs/Karamanlidisetal2015-Arcturos.pdf)
* add robust-design as in papers currently in reviews (including model selection with bootstrap).
## References
* Buckland ST, Burnham KP, Augustin NH (1997). Model selection: An integral part of inference. Biometrics. 53:603–618.
* Cubaynes, S., C. Lavergne, E. Marboutin, and O. Gimenez (2012). Assessing individual heterogeneity using model selection criteria: How many mixture components in capture-recapture models? Methods in Ecology and Evolution 3: 564-573.
* Cubaynes, S. Pradel, R. Choquet, R. Duchamp, C. Gaillard, J.-M., Lebreton, J.-D., Marboutin, E., Miquel, C., Reboulet, A.-M., Poillot, C., Taberlet, P. and O. Gimenez. (2010). Importance of accounting for detection heterogeneity when estimating abundance: the case of French wolves. Conservation Biology 24:621-626.
* Gimenez, O. and R. Choquet (2010). Incorporating individual heterogeneity in studies on marked animals using numerical integration: capture-recapture mixed models. Ecology 91: 951-957.
* Karamanlidis, A.A., M. de Gabriel Hernando, L. Krambokoukis, O. Gimenez (2015). Evidence of a large carnivore population recovery: counting bears in Greece. Journal for Nature Conservation. 27: 10–17
* Madon, B., C. Garrigue, R. Pradel, O. Gimenez (2013). Transience in the humpback whale population of New Caledonia and implications for abundance estimation. Marine Mammal Science 29: 669-678.