-
Notifications
You must be signed in to change notification settings - Fork 36
/
lab15.Rmd
121 lines (100 loc) · 4.82 KB
/
lab15.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
---
title: "In-Class Lab 15"
author: "ECON 4223 (Prof. Tyler Ransom, U of Oklahoma)"
date: "April 12, 2022"
output:
pdf_document: default
html_document:
df_print: paged
word_document: default
bibliography: biblio.bib
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, results = 'hide', fig.keep = 'none')
```
The purpose of this in-class lab is to use R to practice with panel data estimation methods. To get credit, upload your .R script to the appropriate place on Canvas.
## For starters
Open up a new R script (named `ICL15_XYZ.R`, where `XYZ` are your initials) and add the usual "preamble" to the top:
```{r message=FALSE, warning=FALSE, paged.print=FALSE}
library(tidyverse)
library(wooldridge)
library(broom)
library(magrittr)
library(clubSandwich)
library(modelsummary)
library(estimatr)
library(plm) # You may need to install this package
```
### Load the data
Our data set will be a panel of wages for 545 men. Load the data from the `wooldridge` package, format `year` to be a factor, and rename the variable `nr` to something more descriptive (`id`):
```{r}
df <- as_tibble(wagepan)
df %<>% mutate(year=as.factor(year))
df %<>% rename(id = nr)
```
### Summary statistics for panel data
It is important to know what your panel looks like. How many units? How many time periods? The easiest way to do this is the `pdim()` function in the `plm` package.
```{r}
pdim(df)
```
It is also helpful to "convert" the data to a cross-section of within-unit averages. Let's do this for some of the key variables of our analysis.
```{r}
df.within <- df %>% select(id,year,educ,married,union,rur) %>%
group_by(id) %>%
summarize(
mean.edu = mean(educ),
var.edu = var(educ),
mean.marr = mean(married),
var.marr = var(married),
mean.union = mean(union),
var.union = var(union),
mean.rural = mean(rur),
var.rural = var(rur)
)
df.within %>% datasummary_skim()
```
1. Is there any within-person variance in the `educ` variable? What about `married`, `union`, and `rural`?
2. What does it mean for the married, union, or rural variables to have a positive within-person variance?
3. Why is it important to know if a variable has positive within-person variance?
## Pooled OLS, Random Effects, and Fixed Effects Models
Now estimate the following model using various options of the `plm()` function:
\begin{align*}
\log(wage_{it}) & = \beta_0 + \beta_1 educ_{i} + \beta_2 black_{i} + \beta_3 hisp_{i} + \beta_4 exper_{it} + \beta_5 exper_{it}^2 + \beta_6 married_{it} + \\
&\phantom{=\,\,}\beta_7 union_{it} + \beta_8 rur_{it} + \sum_t \beta_{9,t}year_{it} + a_i + u_{it}
\end{align*}
### Pooled OLS
The pooled OLS model can be run from the `lm_robust()` function as follows.
```{r}
est.pols <- lm_robust(lwage ~ educ + black + hisp + exper + I(exper^2) + married + union + rur + year,
data = df, clusters=id)
```
4. Interpret the coefficient on $\beta_7$ in the pooled OLS model
### Random effects
RE uses the `plm()` function as follows.
```{r}
est.re <- plm(lwage ~ educ + black + hisp + exper + I(exper^2) + married + union + rur + year,
data = df, index = c("id","year"), model = "random")
```
5. What is the estimate of $\theta$ in the RE model? (Hint: check `est.re$ercomp$theta`) What does this tell you about what you expect the random effects estimates to be relative to the fixed effects estimates?
### Fixed effects
FE also come from the `lm_robust()` function:
```{r}
est.fe <- lm_robust(lwage ~ I(exper^2) + married + union + rur + year,
data = df, fixed_effects = ~id)
```
6. Explain why we cannot estimate coefficients on `educ`, `black`, `hisp`, or `exper` in the fixed effects model. (Note: the reason for not being able to estimate `exper` is more nuanced)
## Clustered standard errors
As discussed in class, the most appropriate standard errors account for within-person serial correlation and are robust to heteroskedasticity. We already used these for pooled OLS and fixed effects, but not for random effects.
```{r}
clust.re <- coef_test(est.re, vcov = "CR1", cluster = "individual")
clust.re.SE <- clust.re$SE
names(clust.re.SE) <- names(est.re$coefficients)
```
We can put these all in one `modelsummary` table:
```{r, results = 'show'}
modelsummary(list("POLS"=est.pols,"RE"=est.re,"FE"=est.fe),
statistic_override=list(sqrt(diag(est.pols$vcov)),clust.re.SE,sqrt(diag(est.fe$vcov))),
output="markdown")
```
7. Interpret the coefficient on `union` across the three models.
8. Comment on the comparability of the pooled OLS and RE estimators when within-person serial correlation has been properly addressed.