-
Notifications
You must be signed in to change notification settings - Fork 18
/
growth-curves.Rmd
526 lines (347 loc) · 28.1 KB
/
growth-curves.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
# (PART\*) Part II: Common Extensions {-}
# Latent Growth Curves
```{r spline_curves, echo=FALSE}
library(mgcv)
set.seed(01)
## simulate data...
f0 <- function(x) 2 * sin(pi * x)
f1 <- function(x,a=2,b=-1) exp(a * x)+b
f2 <- function(x) 0.2 * x^11 * (10 * (1 - x))^6 + 10 *
(10 * x)^3 * (1 - x)^10
n <- 500; nf <- 25
fac <- sample(1:nf,n,replace=TRUE)
x0 <- runif(n); x1 <- runif(n); x2 <- runif(n)
a <- rnorm(nf)*.2 + 2; b <- rnorm(nf)*.5
f <- f0(x0) + f1(x1,a[fac],b[fac]) + f2(x2)
fac <- factor(fac)
y <- f + rnorm(n)*2
b <- gam(y ~ s(x0) + t2(
x1,
fac,
bs = c("tp", "re"),
k = 5,
full = TRUE
) +
s(x2, k = 20), method = "ML")
visibly::plot_gam_by(b, x1, fac) +
scico::scale_color_scico_d(palette = 'acton', alpha=.5) +
theme_trueMinimal() +
theme_void() +
theme(legend.position = 'none')
```
<span class="emph">Latent growth curve (LGC)</span> models are in a sense, just a different form of the very commonly used <span class="emph">mixed model</span> framework. In some ways they are more flexible, mostly in the standard structural equation modeling framework that allows for indirect, and other complex covariate relationships. In other ways, they are less flexible (e.g. requiring balanced data, estimating nonlinear relationships, data with many time points, dealing with time-varying covariates). With appropriate tools there is little one can't do with the normal mixed model approach relative to the SEM approach, and one would likely have easier interpretation. As such I'd recommend sticking with the standard mixed model framework unless you really need to.
That said, growth curve models are a very popular SEM technique, so it makes sense to become familiar with them. To best understand a growth curve model, I still think it's instructive to see it from the mixed model perspective, where things are mostly interpretable from what you know from a SLiM.
## Random effects
Often data is clustered, e.g. students within schools or observations for individuals over time. The standard linear model assumes *independent* observations, and in these situations we definitely do not have that.
One very popular way to deal with these are a class of models called <span class="emph">mixed effects models</span>, or simply mixed models. They are mixed, because there is a mixture of <span class="emph">fixed effects</span> and <span class="emph">random effects</span>. The fixed effects are the regression coefficients one has in standard modeling approaches.
The *random effects* allow each cluster to have its own unique effect in addition to the overall fixed effect. This is simply a random deviation, almost always normally distributed in practice, from the overall intercept and slopes. Mixed models are a balanced approach between ignoring these unique contributions, and over-contextualizing by running separate regressions for every cluster.
### Model formality
The following depicts a mixed model from a multilevel modeling context, which is just using a mixed model with possibly multiply nested grouping structures. The cluster that produce the random effects do not have to be hierarchical however, we just use the depiction here as it may be easier than the matrix approach.
As an example, consider the case of repeated measures within an individual or individuals clustered within a geographic region. To model some target variable $y$, within a group/cluster, each observation $i$ in that cluster $c$, will have a cluster specific intercept:
$$y_{ic} = b_{0c} + b_1*X_{ic} + e_{ic}$$
At this point it looks like a standard regression, and is conceptually. In addition, the cluster specific intercepts have overall means (the fixed effect) plus some cluster specific deviation (random effect). These two can be thought of as additional regression models for the intercepts and slopes.
$$b_{0c} = b_0 + u_c$$
These random effects $u$ are typically assumed normally distributed with some standard deviation, just like our 'error'.
$$u_c \sim N(0, \tau)$$
$$e_{ic} \sim N(0, \sigma)$$
Plugging in the cluster level model to the initial model, we get the following:
$$y_{ic} = [b_0 + u_c] + b1 * X_{ic} + e_{ic}$$
This is where the notion of 'random coefficients' comes from. We also might display the model as follows:
$$y_{ic} = b_0 + b_1 * X_i + [u_c + e_{ic}]$$
Now the focus is on a standard regression model with an additional source of variance.
This is just one depiction, we might allow the slope to vary also, have random effects from multiple clustering sources, cluster level covariates, allow the random intercepts and slopes to correlate, and a host of other interesting approaches. The goal here is to keep things within the standard regression context as much as possible.
## Random Effects in SEM
As we've seen with other models, the latent variables are assumed normally distributed, usually with zero mean, and some estimated variance. Well so are the random effects in mixed models, and it's through this that we can maybe start to get a sense of random effects as latent variables (or vice versa). Indeed, mixed models have ties to many other kinds of models (e.g. spatial, additive), because they too add a 'random' component to the model in some fashion.
## Simulating Random Effects
Through simulation we can demonstrate conceptual understanding of mixed models, and be well on our way toward better understanding LGC models. We'll have balanced data, with scores across four time-points for 500 individuals (subjects). We will only investigate the trend ('growth'), and allow subject-specific intercepts and slopes.
```{r randomEffectsSim}
set.seed(1234)
n = 500
timepoints = 4
time = rep(0:3, times=n)
subject = rep(1:n, each=4)
```
We'll have 'fixed' effects, i.e. our standard regression intercept and slope, set at .5 and .25 respectively. We'll allow their associated subject-specific intercept and slope to have a slight correlation (.2), and as such we'll draw them from a multivariate normal distribution (variance of 1 for both effects).
```{r createSimData}
intercept = .5
slope = .25
randomEffectsCorr = matrix(c(1,.2,.2,1), ncol=2)
randomEffects = MASS::mvrnorm(n, mu=c(0,0), Sigma = randomEffectsCorr, empirical=T) %>%
data.frame()
colnames(randomEffects) = c('Int', 'Slope')
```
Let's take a look at the data thus far.
```{r inspectSimData, echo=FALSE}
library(DT)
data.frame(Subject=subject, time=time, randomEffects[subject,]) %>%
# datatable(options=list(dom='t'), rownames=F, width='50%') %>% formatRound(~Int+Slope, 4) %>%
head(10) %>%
kable(format='html', row.names = F, digits = 4) %>%
kable_styling(full_width = F)
```
Now, to get a target variable, we simply add the random effects for the intercept to the overall intercept, and likewise for the slopes. Note how I'm using subject as a row index. This will spread out the n random effects to n*timepoints total, while being constant within a subject. We'll throw in some noise at the end with standard deviation equal to $\sigma$.
```{r createSimTarget}
sigma = .5
y1 = (intercept + randomEffects$Int[subject]) + # random intercepts
(slope + randomEffects$Slope[subject])*time + # random slopes
rnorm(n*timepoints, mean=0, sd=sigma)
d = data.frame(subject, time, y1)
```
```{r inspectSimData2, echo=FALSE}
# datatable(d, options=list(dom='t'), rownames=F, width='50%') %>% formatRound(~y1, 4)
d %>%
head(10) %>%
kable(format='html', row.names = F, digits = 4) %>%
kable_styling(full_width = F)
```
```{r spaghetti, echo=FALSE}
ggplot(d, aes(x=time, y=y1)) +
geom_path(aes(group=subject), alpha=.05) +
geom_smooth(method = 'lm', color='#ff5500') +
theme_trueMinimal()
```
Let's estimate this as a mixed model first[^reml] using the <span class="pack">lme4</span> package. See if you can match the parameters from our simulated data to the output.
```{r mixedModel, echo=1:3, eval=c(1:2,4)}
library(lme4)
mixedModel = lmer(y1 ~ time + (1 + time|subject), data=d) # 1 represents the intercept
summary(mixedModel)
summary(mixedModel, correlation=F)
```
Our fixed effects are at the values we set for the overall intercept and slope. The estimated random effects variances are at 1, the correlation near .2, and finally, our residual standard deviation is near the .5 value we set.
## Running a Growth Curve Model
As before, we'll use <span class="pack">lavaan</span>, but now the syntax will look a bit strange compared to what we're used to with our prior SEM, because we have to fix the factor loadings to specific values in order to make it work. This also leads to non-standard output relative to other SEM models, as there is nothing to estimate for the many fixed parameters.
Specifically, we'll have a latent variable representing the random intercepts, as well as one representing the random slopes. All loadings for the intercept factor are 1[^likematrix]. The loadings for the effect of time are arbitrary, but should accurately reflect the time spacing, and typically it is good to start at zero, so that the zero has a meaningful interpretation.
```{r growth_graph, echo=FALSE}
# note to self Diagrammer is difficult at best for multifactor loading situations, mostly because there is no control over label placement
tags$div(style="width:50%; margin:auto auto; font-size:50%",
grViz('scripts/growth.gv', width='100%', height='25%')
)
```
<br>
As can probably be guessed, additionally our data needs to be in *wide* format, where each row represents a person and we have separate columns for each time point of the target variable, as opposed to the *long* format we used in the previous mixed model. We can use the <span class="func">spread</span> function from <span class="pack">tidyr</span> to help with that.
```{r wideData, echo=-1}
head(d)
# also change the names, as usually things don't work well if they start with a
# number. yes, it is the 21st century.
dWide = d %>%
spread(time, y1) %>%
rename_at(vars(-subject), function(x) paste0('y', x))
head(dWide)
```
Now we're ready to run the model. Note that <span class="pack">lavaan</span> has a specific function, <span class="func">growth</span>, to use for these models. It doesn't spare us any effort for the model syntax, but does make it unnecessary to set various things with the <span class="func">sem</span> function.
```{r runGrowth}
model = "
# intercept and slope with fixed coefficients
i =~ 1*y0 + 1*y1 + 1*y2 + 1*y3
s =~ 0*y0 + 1*y1 + 2*y2 + 3*y3
"
growthCurveModel = growth(model, data=dWide)
summary(growthCurveModel)
```
Most of the output is blank, which is needless clutter, but we do get the same five parameter values we are interested in though.
Start with the 'intercepts':
```{}
Intercepts:
Estimate Std.Err Z-value P(>|z|)
i 0.487 0.048 10.072 0.000
s 0.267 0.045 5.884 0.000
```
It might be odd to call your fixed effects 'intercepts', but it makes sense if we are thinking of it as a multilevel model as depicted previously, where we actually broke out the random effects as a separate model. The estimates here are pretty much spot on with our mixed model estimates, which are identical to just the standard regression estimates.
```{r fixefs}
fixef(mixedModel)
lm(y1 ~ time, data=d)
```
Now let's look at the variance estimates. The estimation of residual variance for each y in the LGC distinguishes the two approaches, but not necessarily so. We could fix them to be identical here, or conversely allow them to be estimated in the mixed model framework. Just know that's why the results are not identical (to go along with their respective estimation approaches, which are also different by default). Again though, the variances are near one, and the correlation between the intercepts and slopes is around the .2 value.
```{}
Covariances:
Estimate Std.Err Z-value P(>|z|)
i ~~
s 0.226 0.050 4.512 0.000
Variances:
Estimate Std.Err Z-value P(>|z|)
y0 0.287 0.041 6.924 0.000
y1 0.219 0.021 10.501 0.000
y2 0.185 0.027 6.748 0.000
y3 0.357 0.065 5.485 0.000
i 0.977 0.076 12.882 0.000
s 0.969 0.065 14.841 0.000
```
```{r ranefVAr}
VarCorr(mixedModel)
```
The differences provide some insight. LGC by default assumes heterogeneous variance for each time point. Mixed models by default assume the same variance for each time point, but can allow them to be estimated separately in most modeling packages.
As an example, if we fix the variances to be equal, the models are now identical.
```{r compareMixedLGC, results='hold'}
model = "
# intercept and slope with fixed coefficients
i =~ 1*y0 + 1*y1 + 1*y2 + 1*y3
s =~ 0*y0 + 1*y1 + 2*y2 + 3*y3
y0 ~~ resvar*y0
y1 ~~ resvar*y1
y2 ~~ resvar*y2
y3 ~~ resvar*y3
"
growthCurveModel = growth(model, data=dWide)
summary(growthCurveModel)
```
Compare to the <span class="pack">lme4</span> output.
```{r compareVar, echo=F}
print(VarCorr(mixedModel), comp='Var')
```
In addition, the random coefficients estimates from the mixed model perfectly correlate with those of the latent variables. I obtained the latent variable scores using <span class="func">lavPredict</span>.
```{r compareRandomeffects, echo=FALSE}
ranefLatent = data.frame(coef(mixedModel)[[1]], lavPredict(growthCurveModel)) %>%
rename(Int_mix = X.Intercept.,
Slope_mix = time,
Int_lgc = i,
Slope_lgc = s)
ranefLatent %>% round(2) %>% head
ranefLatent %>% cor %>% round(2)
```
Both approaches allow those residuals to covary, though it gets tedious in SEM syntax, while it is a natural extension in the mixed model framework. Here is the syntax for letting each time point covary with the next, at least, what it might be. It's unclear if <span class="pack">lavaan</span> actually will do this, and the syntax here roughly follows the Mplus manual, except that we can't define new variables in <span class="pack">lavaan</span> as there. As such, the hope is that the `a` parameter should equal `resvar*corr` as in the Mplus syntax, but there's not a clear way to fix it to be. It seems consistent here and with larger sample sizes.
```{r lavaan_res_cor_attempt, echo=1}
model = "
# intercept and slope with fixed coefficients
i =~ 1*y0 + 1*y1 + 1*y2 + 1*y3
s =~ 0*y0 + 1*y1 + 2*y2 + 3*y3
# all of the following is needed for what are essentially only two parameters
# to estimate- resvar and correlation (the latter defined explicitly here)
y0 ~~ resvar*y0
y1 ~~ resvar*y1
y2 ~~ resvar*y2
y3 ~~ resvar*y3
# timepoints 1 step apart; technically the covariance is e.g. a*sqrt(y0)*sqrt(y1),
# but since the variances are constrained to be equal, we don't have to be so verbose.
y0 ~~ a*y1
y1 ~~ a*y2
y2 ~~ a*y3
# two steps apart
y0 ~~ b*y2
y1 ~~ b*y3
# three steps apart
y0 ~~ c*y3
# fix parameters according to ar1
b == a^2
c == a^3
"
gcOut = growth(model, data=dWide)
summary(gcOut, standardized=T)
library(nlme)
lmeOut = lme(y1 ~ time, random= ~1 + time|subject, data=d, correlation=corAR1(form=~ time | subject), method='ML')
summary(lmeOut, correlation=F) # compare Residual StdDev to resvar^.5, and phi to 'a/resvar' or std.all 'a' from lavaan
```
```{r brmcomparison, echo=FALSE, eval=F}
test = brm(y1 ~ time + (1+time|subject), d, cores=4)
test = brm(y1 ~ time + (1+time|subject), autocor = cor_ar(~time|subject), data= d, cores=4)
summary(test)
```
## Thinking more generally about regression
In fact, your standard regression is already equipped to handle heterogeneous variances and a specific correlation structure for the residuals. The linear model can be depicted as the following:
$$y \sim N(X\beta, \Sigma)$$
$X\beta$ represents the linear predictor, i.e. the linear combination of your predictors, and a big, N by N covariance matrix $\Sigma$. Thus the target variable $y$ is multivariate normal with mean vector $X\beta$ and covariance $\Sigma$.
SLiMs assume that the covariance matrix is constant diagonal. A single value on the diagonal, $\sigma^2$, and zeros on the off-diagonals. Mixed models, and other approaches as well, can allow the covariance structure to be specified in myriad ways, and it ties them to still other models, which in the end produces a very flexible modeling framework.
## More on LGC
### LGC are non-standard SEM
In no other SEM situation are you likely to fix so many parameters or think about your latent variables in this manner. This can make for difficult interpretations relative to the mixed model (unless you are aware of the parallels).
### Residual correlations
Typical models that would be investigated with LGC have correlated residuals as depicted above.
### Nonlinear time effect
A nonlinear time effect can be estimated if we don't fix all the parameters for the slope factor. As an example, the following would actually estimate the loadings for times in between the first and last point.
```{r nonlinearTime, eval=FALSE}
s =~ 0*y0 + y1 + y2 + 1*y3
```
It may be difficult to assess nonlinear relationships unless one has many time points[^nonlinearfewtimepts], and even then, one might get more with an additive mixed model approach.
### Growth Mixture Models
Adding a latent categorical variable would allow for different trajectories across the latent groups. Most clients that I've seen typically did not have enough data to support it, as one essentially can be estimating a whole growth model for each group. Some might restrict certain parameters for certain groups, but given that the classes are a latent construct to be discovered, there would not be a theoretical justification to do so, and it would only complicate interpretation at best. Researchers rarely if ever predict test data, nor provide evidence that the clusters hold up with alternate data. In addition, it seems that typical interpretation of the classes takes on an ordered structure (e.g. low, medium, and high), which means they just have a coarsely measured continuous latent variable. In other cases, the groups actually reflect intact groups represented by covariates they have not included in the data (or perhaps are an interaction of those). Had they started under the assumption of a continuous latent variable, it might have made things easier to interpret and estimate.
As of this writing, Mplus is perhaps the only SEM software used for these <span class="emph">Growth Mixture Models</span>, and it requires yet another syntax style, and, depending on the model you run, some of the most confusing output you'll ever see in SEM. Alternatives in R include <span class="pack">flexmix</span> (demonstrated in the Mixture Models Module) for standard mixture modeling (including mixed effects models), as well as the R package <span class="pack">OpenMx</span>.
### Other covariates
#### Cluster level
To add a <span class="emph">cluster-level covariate</span>, for a mixed model, it looks something like this (ignoring lowest level subscript):
*standard random intercept*
$$y = b_{0c} + b_1*\mathrm{time} + e $$
$$b_{0c} = b_0 + u_c$$
Plugging in becomes:
$$y = b_0 + b_1*\mathrm{time} + u_c + e $$
*subject level covariate added*
$$b_{0c} = b_0 + c_1*\mathrm{sex} + u_c$$
But if we plug that into our level 1 model, it just becomes:
$$y = b_0 + c_1*\mathrm{sex} + b_1*\mathrm{time} + u_c + e$$
In our previous modeling syntax it would look like this:
```{r clusterLevelVar, eval=F}
mixedModel = lmer(y1 ~ sex + time + (time|subject), data=d)
```
We'd have a fixed effect for sex and interpret it just like in the standard setting. Similarly, if we had a time-varying covariate, say socioeconomic status, it'd look like the following:
```{r timevaryingVar, eval=F}
mixedModel = lmer(y1 ~ time + ses + (time|subject), data=d)
```
Though we could have a random slope for SES if we wanted. You get the picture. Most of the model is still standard regression interpretation.
With LGC, there is a tendency to interpret the model as an SEM, and certainly one can. But adding additional covariates typically causes confusion for those not familiar with mixed models. We literally do have to regress the intercept and slope latent variables on cluster level covariates as follows.
```{r lgcClutersLevelVar}
model.syntax <- '
# intercept and slope with fixed coefficients
i =~ 1*y1 + 1*y2 + 1*y3 + 1*y4
s =~ 0*y1 + 1*y2 + 2*y3 + 3*y4
# regressions
i ~ x1 + x2
s ~ x1 + x2
'
```
Applied researchers commonly have difficulty interpreting the model due to past experience with SEM. While these are latent variables, they aren't *just* latent variables or underlying constructs. It doesn't help that the output can be confusing, because now one has an 'intercept for your intercepts' and an 'intercept for your slopes'. In the multilevel context it makes sense, but there you know 'intercept' is just 'fixed effect'.
#### Time-varying covariates
With <span class="emph">time varying covariates</span>, i.e. those that can have a different value at each time point, the syntax starts to get tedious. Here we add just one such covariate, $c$.
```{r lgcTimeVar, eval=F}
model.syntax <- '
# intercept and slope with fixed coefficients
i =~ 1*y1 + 1*y2 + 1*y3 + 1*y4
s =~ 0*y1 + 1*y2 + 2*y3 + 3*y4
# regressions
i ~ x1 + x2
s ~ x1 + x2
# time-varying covariates
y1 ~ c1
y2 ~ c2
y3 ~ c3
y4 ~ c4
'
fit <- growth(model.syntax, data=Demo.growth)
summary(fit)
```
Now imagine having just a few of those kinds of variables as would be common in most longitudinal settings. In the mixed model framework one would add them in as any covariate in a regression model, and each covariate would be associated with a single fixed effect. In the LGC framework, one has to regress each time point for the target variable on its corresponding predictor time point. It might take a few paragraphs to explain the coefficients for just a handful of covariates. If you fix them to a single value, you would duplicate the mixed model, but the syntax requires even more tedium.
## Some Differences between Mixed Models and Growth Curves
### Random slopes
One difference seen in comparing LGC models vs. mixed models is that in the former, random slopes are always assumed, whereas in the latter, one would typically see if it's worth adding random slopes in the first place, or simply not assume them. There is currently a fad of 'maximal' mixed models in some disciplines, that would require testing every possible random effect. All I can say is good luck with that.
### Time-varying covariates
LGC depictions are *always* showing a time*x interaction with time-varying covariates, i.e. that the effect of the covariate changes depending on the time point. The mixed model, like every other model you run, does not assume any interaction unless you explicitly ask for it. As mentioned, you can fix the path value to be constant across time if you don't want an interaction.
### Unequal variances
One more thing LGC assumes that would not be in standard mixed model settings is unequal variances across time points, i.e. you estimate a separate variance for each time point. And once again, if you even had reason to consider this in the mixed model setting you could, and then you would compare that model to the equal variance model for best performance. Almost every LGC you see simply estimates those additional variances whether it's needed or not.
### Wide vs. long
The SEM framework is inherently multivariate, and your data will need to be in wide format. This isn't too big of a deal until you have many time-varying covariates, then the model syntax is tedious and you end up having the number of parameters to estimate climb rapidly.
### Sample size
As we have noted before, SEM is inherently a large sample technique. The growth curve model does not require as much for standard approaches, but may require a lot more depending on the model one tries to estimate. In [my own simulations](https://m-clark.github.io/docs/mixedModels/growth_vs_mixed_sim.html), I haven't seen too much difference compared to mixed models even for notably small sample sizes, but those were for very simple models.
### Number of time points
A basic growth curve model requires four time points to incorporate the flexibility that would make it worthwhile. Mixed models don't have the restriction (outside of the obvious need of two).
### Balance
Mixed models can run even if some clusters have a single value. SEM requires balanced data and so one will always have to estimate missing values or drop them. Whether this missingness can be ignored in the standard mixed model framework is a matter of some debate in certain circles.
### Numbering the time points
Numbering your time from zero makes sense in both worlds. This leads to the natural interpretation that the intercept is the mean for your first time point. In other cases having a centered value would make sense, or numbering from 0 to a final value of 1, which would mean the slope coefficient represents the change over the whole time span.
## Other stuff
### Parallel Processes
In the [appendix][Parallel Process Example] I provide an example of a parallel process in which we posit two growth curves at the same time, with possible correlations among them. This could be accomplished quite easily with a standard mixed model in the Bayesian framework with a multivariate response, or with a standard mixed model in long format with an observation(s) pertaining to each outcome and appropriate covariance structure to account for it. I'll have to come back to that later.
### Cross-lagged Models
Sometimes you'll come across what are called cross-lagged models. These are path models applied to longitudinal data. Here's an example regarding two variables and three time points.
```{r crosslagged, echo=FALSE, cache=FALSE}
tags$div(style="width:50%; margin:auto auto; font-size:50%",
DiagrammeR::grViz('scripts/crosslag.gv', width='100%', height='25%')
)
```
<br>
To be honest, the only reason I can think of to do something like this is if one isn't familiar with mixed models, growth curves, or bayesian networks. You can imagine what this would look like if we allow for lingering effects from earlier times to later ones (i.e. more lags), have more time points, have residual correlations at specific time points, and several more variables, as would be typical in any plausible modeling setting. The result would be unwieldy at best. Some using these models are interested in the *stability* of a measure over time. However, if this in question, one should probably be using factor analytic techniques to assess the reliability of the measure. In general, you'd always have a better option than a cross-lagged model in my opinion.
## Summary
Growth curve modeling is an alternative way to do what is very commonly accomplished through mixed models, and allow for more complex models than typically seen for standard mixed models. One's default should probably be to use the more common, and probably more flexible (in most situations), mixed modeling tools, where there are packages in R that could handle nonlinear effects, mediation and multivariate outcomes for mixed models. I have other documents regarding mixed models on my [website](https://m-clark.github.io/documents) and code at [GitHub](https://github.com/m-clark/Miscellaneous-R-Code), including a document that does [more comparison to growth curve models](http://m-clark.github.io/mixed-growth-comparison/). However, the latent variable approach may provide what you need, and at the very least gives you a fresh take on the standard mixed model perspective.
## R Packages Used
- <span class="pack">lavaan</span>
- <span class="pack">lme4</span>
- <span class="pack">nlme</span>
[^reml]: One can set REML=F so as to use standard maximum likelihood and make the results directly comparable to <span class="pack">lavaan</span>.
[^nonlinearfewtimepts]: I personally cannot *see* bends with only four time points, at least such that I couldn't just as easily posit a linear trend.
[^likematrix]: Those familiar with the model matrix for regression will recognize 'intercept as 1'.