-
Notifications
You must be signed in to change notification settings - Fork 0
/
model_3.rnw
148 lines (123 loc) · 5.38 KB
/
model_3.rnw
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
Systematic error is something that is typically difficult to deal with in statistical analysis, but is easy here. First, let's examine the case when the systematic error is proportional to \code{x}. Imagine we have two data sets with qualitatively different slopes:
<<fig.width=6, fig.height=5>>=
## generate some data
slope = 2
intercept = 5
N1 = 20
shift1 = 0.7;
obs_x1 = runif(N1,0,10)
err_y1 = runif(N1, 0.1, 0.2)
obs_y1 = shift1 * rnorm(N1, slope * obs_x1 + intercept, err_y1)
N2 = 30
shift2 = 1.3;
obs_x2 = runif(N2,0,10)
err_y2 = runif(N2, 0.1, 0.2)
obs_y2 = shift2 * rnorm(N2, slope * obs_x2 + intercept, err_y2)
plot(obs_x1, obs_y1, xlim = c(0,10),ylim = c(4,35), xlab = "x", ylab = "y")
points(obs_x2, obs_y2, pch=19)
segments(obs_x1, obs_y1 - 2*err_y1, obs_x1, obs_y1 + 2*err_y1)
segments(obs_x2, obs_y2 - 2*err_y2, obs_x2, obs_y2 + 2*err_y2)
@
This could be caused by, say, the result of a lab instrument that became miscalibrated between the data collections. Some researchers would take a weighted average of some kind to combine the two datasets, but that method is statistically dubious at best. Let's construct and run a model that includes a multiplicative factor for the two datasets:
<<results="hide", cache=TRUE>>=
model_string = "
model {
## priors:
a ~ dunif(0,5)
b ~ dunif(-10,10)
## shifts must be > 0. Choose centered near 1 (0% shift), with ~0.1 = 10% spread
multiplier1 ~ dlnorm(log(1.0), pow(log(1.1),-2))
multiplier2 ~ dlnorm(log(1.0), pow(log(1.1),-2))
## structure:
for (i in 1:length(obs_x1)) {
y1[i] = multiplier1 * (a*obs_x1[i] + b)
obs_y1[i] ~ dnorm(y1[i], pow(err_y1[i],-2))
}
for (i in 1:length(obs_x2)) {
y2[i] = multiplier2 * (a*obs_x2[i] + b)
obs_y2[i] ~ dnorm(y2[i], pow(err_y2[i],-2))
}
}
"
model = jags.model(file = textConnection(model_string),
data = list('obs_x1' = obs_x1,
'obs_x2' = obs_x2,
'obs_y1' = obs_y1,
'obs_y2' = obs_y2,
'err_y1' = err_y1,
'err_y2' = err_y2),
n.chains = 1,
n.adapt = 1000,
inits = list('.RNG.name' = 'base::Mersenne-Twister',
'.RNG.seed' = 1))
output = coda.samples(model = model,
variable.names = c("a", "b", "multiplier1", "multiplier2"),
n.iter=500000,
#n.iter =10000,
thin=1)
@
Note that the number of iterations has been increased to 500,000 from the standard 10,000. This model's added complexity required more iterations for the Markov chains to converge. The posterior distributions are amazingly accurate:
<<dev='png', dev.args=list(type="cairo"), dpi=200, cache=TRUE>>=
plot(output)
@
The slope, intercept, and both systematic error multipliers are all centered exactly on their true values. Now, let's look at an additive systematic error. Imagine we have two datasets, each with a positive systematic error (implying that taking a weighted average would not help very much).
<<fig.width=6, fig.height=5>>=
## generate some data
slope = 2
intercept = 5
N1 = 20
shift1 = 1.5;
obs_x1 = runif(N1,0,10)
err_y1 = runif(N1, 0.1, 0.2)
obs_y1 = shift1 + rnorm(N1, slope * obs_x1 + intercept, err_y1)
N2 = 30
shift2 = 3;
obs_x2 = runif(N2,0,10)
err_y2 = runif(N2, 0.1, 0.2)
obs_y2 = shift2 + rnorm(N2, slope * obs_x2 + intercept, err_y2)
plot(obs_x1, obs_y1, xlim = c(0,10),ylim = c(5,25), xlab = "x", ylab = "y")
points(obs_x2, obs_y2, pch=19)
segments(obs_x1, obs_y1 - 2*err_y1, obs_x1, obs_y1 + 2*err_y1)
segments(obs_x2, obs_y2 - 2*err_y2, obs_x2, obs_y2 + 2*err_y2)
@
The model will be similar to before:
<<results="hide", cache=TRUE>>=
model_string = "
model {
## priors:
a ~ dunif(0,5)
b ~ dunif(-10,10)
shift1 ~ dnorm(0, 1)
shift2 ~ dnorm(0, 1)
## structure:
for (i in 1:length(obs_x1)) {
y1[i] = shift1 + (a*obs_x1[i] + b)
obs_y1[i] ~ dnorm(y1[i], pow(err_y1[i],-2))
}
for (i in 1:length(obs_x2)) {
y2[i] = shift2 + (a*obs_x2[i] + b)
obs_y2[i] ~ dnorm(y2[i], pow(err_y2[i],-2))
}
}
"
model = jags.model(file = textConnection(model_string),
data = list('obs_x1' = obs_x1,
'obs_x2' = obs_x2,
'obs_y1' = obs_y1,
'obs_y2' = obs_y2,
'err_y1' = err_y1,
'err_y2' = err_y2),
n.chains = 1,
n.adapt = 1000,
inits = list('.RNG.name' = 'base::Mersenne-Twister',
'.RNG.seed' = 1))
output = coda.samples(model = model,
variable.names = c("a", "b", "shift1", "shift2"),
n.iter = 500000,
thin=1)
@
Again the number of iterations has been increased to 500,000.
<<dev='png', dev.args=list(type="cairo"), dpi=200, cache=TRUE>>=
plot(output)
@
This time, the results are rather disappointing. The systematic error posteriors are not centered near the true values at all. The intercept posterior has been shifted as well. This is not too unexpected-- we didn't give it much to work with, and really no additive shift would be unreasonable to report here.