-
Notifications
You must be signed in to change notification settings - Fork 0
/
model_2.rnw
152 lines (127 loc) · 5.27 KB
/
model_2.rnw
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
The basic model we assumed last time is rarely the most applicable to real-life data analysis. In general, there will be some measurement error (that is not necessarily constant across measurements) associated with the dataset in addition to scatter from other variables. Let's generate some data that reflect this:
<<>>=
N = 25
true_x = runif(N,0,10)
true_slope = 1
true_intercept = 0
scatter = 1
true_y = rnorm(N, true_slope * true_x + true_intercept, scatter)
## known measurement uncertainties
x_sigma = rlnorm(N, -2, 0.5)
y_sigma = rlnorm(N, -2, 0.5)
obs_x = rnorm(N, true_x, x_sigma)
obs_y = rnorm(N, true_y, y_sigma)
@
<<fig.pos="H", fig.height=5, fig.width=5>>=
plot(obs_x, obs_y)
segments(obs_x, obs_y - 2*y_sigma, obs_x, obs_y + 2*y_sigma)
segments(obs_x - 2*x_sigma, obs_y, obs_x + 2*x_sigma, obs_y)
@
Clearly, the group scatter is larger than the individual measurement error bars allow, implying that one or more unmeasured variables are influencing the y values. The model we'll use is a linear relationship between y and x, with (uniform) measurement error on both y and x and additional scatter in y. The JAGS code is as follows:
<<>>=
model_string = "
model {
## priors:
a ~ dunif(0,5)
b ~ dunif(-10,10)
scatter ~ dunif(0,3)
## structure:
for (i in 1:N) {
## the true x:
x[i] ~ dunif(0,10)
## the observed x:
obs_x[i] ~ dnorm(x[i], pow(x_sigma[i],-2))
## y, as it would be if it only depended on the true x:
y[i] = a*x[i] + b
## y, with the effect of the unmeasured confounding variable
y_scatter[i] ~ dnorm(y[i], pow(scatter,-2))
## y, with the confounding variable, with observational error:
obs_y[i] ~ dnorm(y_scatter[i], pow(y_sigma[i],-2))
}
}
"
@
Now we proceed as before: feed the data and model structure into JAGS and look at the output:
<<results="hide">>=
model = jags.model(file = textConnection(model_string),
data = list('obs_x' = obs_x,
'x_sigma' = x_sigma,
'obs_y' = obs_y,
'y_sigma' = y_sigma,
'N' = N),
n.chains = 1,
n.adapt = 1000,
inits = list('.RNG.name' = 'base::Mersenne-Twister',
'.RNG.seed' = 1))
output = coda.samples(model = model,
variable.names = c("a", "b", "scatter"),
n.iter=1000,
thin=1)
@
<<fig.pos="H", fig.height=5, fig.width=5>>=
plot(output)
@
Again, the model has reproduced the parameters used to generate the data. What would happen if we ran the code again, but increased the measurement uncertainty?
<<>>=
## these values are the same as in the previous model:
#N = 25
#true_x = runif(N,0,10)
#true_slope = 1
#true_intercept = 0
#scatter = 0.25
#true_y = rnorm(N, true_slope * true_x + true_intercept, scatter)
## known measurement uncertainties (much larger than before)
x_sigma = rlnorm(N, 1, 0.5)
y_sigma = rlnorm(N, 1, 0.5)
obs_x = rnorm(N, true_x, x_sigma)
obs_y = rnorm(N, true_y, y_sigma)
@
<<fig.pos="H", fig.height=4, fig.width=4>>=
plot(obs_x, obs_y)
segments(obs_x, obs_y - 2*y_sigma, obs_x, obs_y + 2*y_sigma)
segments(obs_x - 2*x_sigma, obs_y, obs_x + 2*x_sigma, obs_y)
@
This time, the uncertainty bars are much larger, and you would think that very little information could be extracted from these data. Using the same model as before, let's see what the output looks like:
<<echo=F>>=
model_string = "
model {
## priors:
a ~ dunif(0,5)
b ~ dunif(-10,10)
scatter ~ dunif(0,3)
## structure:
for (i in 1:N) {
## the true x:
x[i] ~ dunif(0,10)
## the observed x:
obs_x[i] ~ dnorm(x[i], pow(x_sigma[i],-2))
## y, as it would be if it only depended on the true x:
y[i] = a*x[i] + b
## y, with the effect of the unmeasured confounding variable
y_scatter[i] ~ dnorm(y[i], pow(scatter,-2))
## y, with the confounding variable, with observational error:
obs_y[i] ~ dnorm(y_scatter[i], pow(y_sigma[i],-2))
}
}
"
@
<<echo=F,results="hide">>=
model = jags.model(file = textConnection(model_string),
data = list('obs_x' = obs_x,
'x_sigma' = x_sigma,
'obs_y' = obs_y,
'y_sigma' = y_sigma,
'N' = N),
n.chains = 1,
n.adapt = 1000,
inits = list('.RNG.name' = 'base::Mersenne-Twister',
'.RNG.seed' = 1))
output = coda.samples(model = model,
variable.names = c("a", "b", "scatter"),
n.iter=10000,
thin=1)
@
<<fig.pos="H", fig.height=5, fig.width=5, echo=F>>=
plot(output)
@
Here, the number of MCMC iterations has been increased to 10,000. With the 1,000 iterations used before, the mixing diagrams did not appear robust. The slope and intercept posterior distributions are not as smooth and are wider than before, and the scatter posterior is maximized far from the true value. As expected, using uninformative data results in uninformative posteriors.