model_2.rnw

The basic model we assumed last time is rarely the most applicable to real-life data analysis.  In general, there will be some measurement error (that is not necessarily constant across measurements) associated with the dataset in addition to scatter from other variables.  Let's generate some data that reflect this:

<<>>=
N = 25
true_x = runif(N,0,10)
true_slope = 1
true_intercept = 0
scatter = 1
true_y = rnorm(N, true_slope * true_x + true_intercept, scatter)
## known measurement uncertainties
x_sigma = rlnorm(N, -2, 0.5)
y_sigma = rlnorm(N, -2, 0.5)
obs_x = rnorm(N, true_x, x_sigma)
obs_y = rnorm(N, true_y, y_sigma)

@ 

<<fig.pos="H", fig.height=5, fig.width=5>>=
plot(obs_x, obs_y)
segments(obs_x, obs_y - 2*y_sigma, obs_x, obs_y + 2*y_sigma)
segments(obs_x - 2*x_sigma, obs_y, obs_x + 2*x_sigma, obs_y)
@

Clearly, the group scatter is larger than the individual measurement error bars allow, implying that one or more unmeasured variables are influencing the y values.  The model we'll use is a linear relationship between y and x, with (uniform) measurement error on both y and x and additional scatter in y.  The JAGS code is as follows:  

<<>>=
model_string = "
  model {
    ## priors:
    a ~ dunif(0,5)
    b ~ dunif(-10,10)
    scatter ~ dunif(0,3)
    
    ## structure:
    for (i in 1:N) {
        ## the true x:
        x[i] ~ dunif(0,10)
        ## the observed x:
        obs_x[i] ~ dnorm(x[i], pow(x_sigma[i],-2))
    
        ## y, as it would be if it only depended on the true x:
        y[i] = a*x[i] + b
        ## y, with the effect of the unmeasured confounding variable
        y_scatter[i] ~ dnorm(y[i], pow(scatter,-2))
        ## y, with the confounding variable, with observational error:
        obs_y[i] ~ dnorm(y_scatter[i], pow(y_sigma[i],-2))
    }
  }
"
@ 
 
Now we proceed as before:  feed the data and model structure into JAGS and look at the output:

<<results="hide">>=
model = jags.model(file = textConnection(model_string), 
                   data = list('obs_x' = obs_x,
                               'x_sigma' = x_sigma,
                               'obs_y' = obs_y,
                               'y_sigma' = y_sigma,
                               'N' = N),
                   n.chains = 1,
                   n.adapt = 1000,
                   inits = list('.RNG.name' = 'base::Mersenne-Twister', 
                                '.RNG.seed' = 1))

output = coda.samples(model = model,
                      variable.names = c("a", "b", "scatter"), 
                      n.iter=1000,
                      thin=1)
@

<<fig.pos="H", fig.height=5, fig.width=5>>=
plot(output)
@

Again, the model has reproduced the parameters used to generate the data.  What would happen if we ran the code again, but increased the measurement uncertainty?

<<>>=
## these values are the same as in the previous model:
#N = 25
#true_x = runif(N,0,10)
#true_slope = 1
#true_intercept = 0
#scatter = 0.25
#true_y = rnorm(N, true_slope * true_x + true_intercept, scatter)
## known measurement uncertainties (much larger than before)
x_sigma = rlnorm(N, 1, 0.5)
y_sigma = rlnorm(N, 1, 0.5)
obs_x = rnorm(N, true_x, x_sigma)
obs_y = rnorm(N, true_y, y_sigma)

@ 

<<fig.pos="H", fig.height=4, fig.width=4>>=
plot(obs_x, obs_y)
segments(obs_x, obs_y - 2*y_sigma, obs_x, obs_y + 2*y_sigma)
segments(obs_x - 2*x_sigma, obs_y, obs_x + 2*x_sigma, obs_y)
@

This time, the uncertainty bars are much larger, and you would think that very little information could be extracted from these data.  Using the same model as before, let's see what the output looks like:

<<echo=F>>=
model_string = "
  model {
    ## priors:
    a ~ dunif(0,5)
    b ~ dunif(-10,10)
    scatter ~ dunif(0,3)
    
    ## structure:
    for (i in 1:N) {
        ## the true x:
        x[i] ~ dunif(0,10)
        ## the observed x:
        obs_x[i] ~ dnorm(x[i], pow(x_sigma[i],-2))
    
        ## y, as it would be if it only depended on the true x:
        y[i] = a*x[i] + b
        ## y, with the effect of the unmeasured confounding variable
        y_scatter[i] ~ dnorm(y[i], pow(scatter,-2))
        ## y, with the confounding variable, with observational error:
        obs_y[i] ~ dnorm(y_scatter[i], pow(y_sigma[i],-2))
    }
  }
"
@ 

 
 <<echo=F,results="hide">>=
model = jags.model(file = textConnection(model_string), 
                   data = list('obs_x' = obs_x,
                               'x_sigma' = x_sigma,
                               'obs_y' = obs_y,
                               'y_sigma' = y_sigma,
                               'N' = N),
                   n.chains = 1,
                   n.adapt = 1000,
                   inits = list('.RNG.name' = 'base::Mersenne-Twister', 
                                '.RNG.seed' = 1))

output = coda.samples(model = model,
                      variable.names = c("a", "b", "scatter"), 
                      n.iter=10000,
                      thin=1)
@

<<fig.pos="H", fig.height=5, fig.width=5, echo=F>>=
plot(output)
@

Here, the number of MCMC iterations has been increased to 10,000. With the 1,000 iterations used before, the mixing diagrams did not appear robust.  The slope and intercept posterior distributions are not as smooth and are wider than before, and the scatter posterior is maximized far from the true value.  As expected, using uninformative data results in uninformative posteriors.