Constanza F. Schibber 2017-03-07
- 1 Data: Voting Intentions in the 1988 Chilean Plebiscite
- 2 Fitting Logit Model
- 3 Likelihood Ratio Test
- 4 How well does our model predict the data? Error rate
- 5 Odds Ratio
- 6 Predicted Probablities
- 7 First Differences: Plotting predicted probabilities for men and women, and first difference (Monte Carlo simulation)
- 8 Marginal Change in Probability
- 9 Marginal Change in Probability (for Income)
- 10 Plotting using
curve
instead oflines
- 11 Multinomial Logit
The Chile data frame has 2700 rows and 8 columns. The data are from a national survey conducted in April and May of 1988 by FLACSO/Chile. There are some missing data.
The survey wanted to capture Chileans attitudes and whether they would vote in favor/against the authoritarian regime during the Chilean plebiscite/referendum. In this referendum, all citizens would vote on whether Chile would continue as a military regime or it would transition to open and free elections towards democracy.
library(car)
## Loading required package: carData
#help(Chile) # for more information on the dataset
summary(Chile)
## region population sex age education
## C :600 Min. : 3750 F:1379 Min. :18.00 P :1107
## M :100 1st Qu.: 25000 M:1321 1st Qu.:26.00 PS : 462
## N :322 Median :175000 Median :36.00 S :1120
## S :718 Mean :152222 Mean :38.55 NA's: 11
## SA:960 3rd Qu.:250000 3rd Qu.:49.00
## Max. :250000 Max. :70.00
## NA's :1
## income statusquo vote
## Min. : 2500 Min. :-1.80301 A :187
## 1st Qu.: 7500 1st Qu.:-1.00223 N :889
## Median : 15000 Median :-0.04558 U :588
## Mean : 33876 Mean : 0.00000 Y :868
## 3rd Qu.: 35000 3rd Qu.: 0.96857 NA's:168
## Max. :200000 Max. : 2.04859
## NA's :98 NA's :17
# Recode yes/no
# We recode "undecided" and "abstain" as NA
Chile$yes <- with(Chile, ifelse(vote == "Y", 1, ifelse(vote=="N", 0, NA)))
table(Chile$vote)
##
## A N U Y
## 187 889 588 868
table(Chile$yes)
##
## 0 1
## 889 868
Then omit observations with missing data - Issue is that if you don’t omit missing the predicted values will have a different length than the data and it can cause problems. We will cover in a few weeks how to handle missing values using multiple imputation. Do not omit missing data when you conduct your own research work! Missing data can be (and most times is) a source of bias.
Chile<-na.omit(Chile)
The outcome variable is vote
. Yes, which is code 1, means voting in
favor of the authoritarian regime (Pinochet) and no, coded 0, voting
against the authoritarian regime.
The covariates:
- status quo: Scale of support for the status-quo.
- age in years.
- income: Monthly income, in Pesos.
- education: A factor with levels (note: out of order): P, Primary; PS, Post-secondary; S, Secondary.
Chile.out <- glm(yes ~ statusquo+age+income+education, family=binomial(link=logit), data=Chile)
summary(Chile.out)
##
## Call:
## glm(formula = yes ~ statusquo + age + income + education, family = binomial(link = logit),
## data = Chile)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.3380 -0.2785 -0.1464 0.1961 2.8652
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 7.009e-01 3.568e-01 1.965 0.04946 *
## statusquo 3.186e+00 1.463e-01 21.782 < 2e-16 ***
## age 2.228e-03 7.372e-03 0.302 0.76246
## income -2.419e-06 2.792e-06 -0.866 0.38635
## educationPS -1.028e+00 3.417e-01 -3.009 0.00262 **
## educationS -6.696e-01 2.395e-01 -2.796 0.00518 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2360.29 on 1702 degrees of freedom
## Residual deviance: 715.58 on 1697 degrees of freedom
## AIC: 727.58
##
## Number of Fisher Scoring iterations: 6
In another model below, I add sex (code F for female, M for male).
### Null vs. Specified
pchisq(2360.29-715.58, df=1702-1697,lower=FALSE)
## [1] 0
The value is in the tail of the chi-square distribution with 5 degrees of freedom, so the mean model is statistically far from the specified model. Thus, we can reject the null hypothesis stating that the mean model provides as good a fit for the data as the specified model.
pchisq(deviance(Chile.out),df.residual(Chile.out),lower=FALSE)
## [1] 1
The value is NOT in the tail of the chi-square with 1697 degrees of freedom, so the saturated model is statistically close to the specified model. Thus, we cannot reject the null hypothesis stating that the saturated model provides as good a fit for the data as the specified model.
Chile.pred <- Chile.out$fitted.values
Chile.pred[Chile.pred < 0.5] <- 0
Chile.pred[Chile.pred > 0.5] <- 1
table(Chile$yes, Chile.pred)
## Chile.pred
## 0 1
## 0 809 58
## 1 68 768
The two by two table shows that (809 + 58 = 867) respondents mentioned they would not vote in support of the military regime. The logit model is correctly predicting 93.3% (809) of these votes, and incorrectly predicting 6.7% (58). On the other hand, (68 + 768 = 836) respondents indicated they would vote in support of the military regime. The model is correctly predicts 91.9% (768) of these votes, and incorrectly predicts 8.1% (68) of them.
The error rate for the specified model is,
error.rate <- mean ((Chile.out$fitted.values>0.5 & Chile$yes==0) | (Chile.out$fitted.values<.5 & Chile$yes==1))
error.rate
## [1] 0.07398708
The error rate for a mean model (also called null model) is,
Chile.mean <- glm(yes ~ 1, family=binomial(link=logit), data=Chile)
#summary(Chile.out)
Chile.pred.mean <- Chile.mean$fitted.values
Chile.pred.mean[Chile.pred.mean < 0.5] <- 0
Chile.pred.mean[Chile.pred.mean > 0.5] <- 1
table(Chile.pred.mean,Chile$yes)
##
## Chile.pred.mean 0 1
## 0 867 836
error.rate.mean <- mean ((Chile.mean$fitted.values>0.5 & Chile$yes==0) | (Chile.mean$fitted.values<.5 & Chile$yes==1))
error.rate.mean
## [1] 0.4908984
Which actually is the same as the mean of the outcome variable, so basically there is no need to fit a model with just the intercept.
mean(Chile$yes)
## [1] 0.4908984
Thus, the error rate of the specified model is 0.074. This corresponds to the proportion of observations being predicted incorrectly by the logit model. The error rate for the specified model is much smaller than the error rate for the mean model, which is 0.491. By adding the covariates, the specified model does a better job at predicting respondents’ vote choice.
Odds of an event is the ratio of the probability of an event happening
to the probability of the event not happening:
where
Nicely, the odds ratio for failure is just the inverse of the odds ratio for success (symmetry).
Some people prefer to think in terms of odds rather than probability:
$$Odd = \frac{p}{1-p} = \frac{p(y=1)}{p(y=0)} \qquad\qquad\qquad p = \frac{Odd}{1+Odd}$$
where $0$ is obviously on the support $(0:\infty)$.
This is essentially how logit works since:
So if
But this statement doesn’t really tell us much about the substance of an effect. In most social science research, you won’t see odds ratio interpretations, though sometimes you will which is important to understand how to interpret them. There are more compelling ways of presenting results (figures!).
#This gives you the second coefficient in the regression, status quo
coef(Chile.out)[2]
## statusquo
## 3.186245
# Odds ratio
exp(coef(Chile.out)[2])
## statusquo
## 24.19741
After controlling for other variables, for a one-unit change in “support of the status quo” the odds of voting in favor of the military regime v. not voting in support of the military regime are, on average, exp(coef(Chile.out)$$2$$)=24.19 times larger.
When the probability of success (1=vote in favor) is less than the probability of failure (0=vote against), the odds will be less than 1. When the probability of success (1=vote in favor) is greater than the probability of failure, the odds > 1. If the odds are exactly 1, then the odds of success and failure are even.
The confidence interval for the odds ratio is,
confint(Chile.out)
## Waiting for profiling to be done...
## 2.5 % 97.5 %
## (Intercept) 5.675965e-03 1.405836e+00
## statusquo 2.911594e+00 3.485918e+00
## age -1.226490e-02 1.667208e-02
## income -7.874150e-06 3.052903e-06
## educationPS -1.707832e+00 -3.669337e-01
## educationS -1.142213e+00 -2.019137e-01
Exercise: Calculate the odds ratio for sex and provide an interpretation
The baseline category is: respondent has “primary” education.
summary(Chile$education)
## P PS S
## 671 343 689
summary(Chile.out)
##
## Call:
## glm(formula = yes ~ statusquo + age + income + education, family = binomial(link = logit),
## data = Chile)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.3380 -0.2785 -0.1464 0.1961 2.8652
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 7.009e-01 3.568e-01 1.965 0.04946 *
## statusquo 3.186e+00 1.463e-01 21.782 < 2e-16 ***
## age 2.228e-03 7.372e-03 0.302 0.76246
## income -2.419e-06 2.792e-06 -0.866 0.38635
## educationPS -1.028e+00 3.417e-01 -3.009 0.00262 **
## educationS -6.696e-01 2.395e-01 -2.796 0.00518 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2360.29 on 1702 degrees of freedom
## Residual deviance: 715.58 on 1697 degrees of freedom
## AIC: 727.58
##
## Number of Fisher Scoring iterations: 6
Odds-ratio of a respondent supporting the military regime if she has primary education is,
exp(coef(Chile.out)[1])
## (Intercept)
## 2.015537
Odds-ratio of a respondent supporting the military regime if she has secondary education in comparison to primary education is,
exp(coef(Chile.out)['educationS'])
## educationS
## 0.511917
Odds-ratio of a respondent supporting the military regime if she has post-secondary education in comparison to primary education is,
exp(coef(Chile.out)['educationPS'])
## educationPS
## 0.3577184
Or we can calculate in %:
100*(exp(coef(Chile.out)['educationPS'])-1)
## educationPS
## -64.22816
INTERPRETATION: The odds of voting in favor of the military regime are 64.2% smaller (or less likely) for a respondent with post-secondary education than a respondent with primary education.
We can also calculate odds for post-secondary versus secondary education by doing exp(beta post secondary)/exp(beta secondary). We have to do this calculation because we do not want to compare to the baseline category.
exp(coef(Chile.out)['educationPS'])/exp(coef(Chile.out)['educationS'])
## educationPS
## 0.698782
INTERPRETATION: We can interpret this as the odds of a respondent with post-secondary education voting in favor of the military regime are 69.88% the odds of respondent with secondary education voting in favor of the military regime.
We get a percentage,
100*((exp(coef(Chile.out)['educationPS'])/exp(coef(Chile.out)['educationS']))-1)
## educationPS
## -30.1218
INTERPRETATION: The odds of voting in favor of the military regime are 30.12% smaller for a respondent with post-secondary education than for a respondent with secondary education.
For logit, predicted probability is,
$P(y_i=1) = \pi_i = \frac{1}{1+e^{-(\hat{\alpha} + \hat{\beta}1 x{i1} + \hat{\beta}2 x{i2} + ... + \hat{\beta}k x{ik})}}$
or,
$P(y_i=1) = \pi_i = \frac{e^{(\hat{\alpha} + \hat{\beta}1 x{i1} + \hat{\beta}2 x{i2} + ... + \hat{\beta}k x{ik})}}{1+e^{(\hat{\alpha} + \hat{\beta}1 x{i1} + \hat{\beta}2 x{i2} + ... + \hat{\beta}k x{ik})}}$
where
You can compute predicted probabilities “by hand” using either equation. You should memorize one.
Notice that:
- Predicted probabilities are different for every observation, depending on the covariates.
- You have to plug in values for every single
$x$ variable, even the controls. - Plot the predicted probabilities over the
$x$ of interest to show what’s really going on with the model.
In R
, there are different ways of calculating predicted probabilities.
In the next few weeks, we will learn how to use Monte Carlo simulations
and the observed value approach. Now, we will use a simpler way to
calculate quick predictions.
To make predictions, I set age and income to their means, and sex and education to their modes. I take a low value of status quo to be the 1st quartile of the variable and a high value to be the 3rd quartile of the variable. This is a conservative approach in comparison to taking the minimum and maximum values of the variable for which you are the most likely to see different predicted probabilities.
# Predict for lower support for SQ
data.frame.Chile.LowSQ<-data.frame(statusquo=-1.09700, age=mean(Chile$age), income=mean(Chile$income), education="S")
LowSupportSQ <- predict(Chile.out, newdata=data.frame.Chile.LowSQ, type="response", se.fit=TRUE)
LowSupportSQ
## $fit
## 1
## 0.03022738
##
## $se.fit
## 1
## 0.006266086
##
## $residual.scale
## [1] 1
c(LowSupportSQ$fit-1.96*LowSupportSQ$se.fit, LowSupportSQ$fit+1.96*LowSupportSQ$se.fit)
## 1 1
## 0.01794585 0.04250890
Considering a respondent who has a lower support of the status quo, the
probability that they will vote in favor of keeping the military regime
in power is, on average, 0.03. The prediction interval for this
predicted probability is
# Predict for higher support for SQ
data.frame.Chile.HighSQ<-data.frame(statusquo=1.16602, age=mean(Chile$age), income=mean(Chile$income), education="S")
HighSupportSQ <- predict(Chile.out, newdata=data.frame.Chile.HighSQ, type="response", se.fit=TRUE)
HighSupportSQ
## $fit
## 1
## 0.9768474
##
## $se.fit
## 1
## 0.005504139
##
## $residual.scale
## [1] 1
c(HighSupportSQ$fit-1.96*HighSupportSQ$se.fit, HighSupportSQ$fit+1.96*HighSupportSQ$se.fit)
## 1 1
## 0.9660593 0.9876355
Considering a respondent who supports the status quo, the probability he
votes in favor of keeping the military regime in power is 0.976, on
average. The prediction interval for this predicted probability is
## Low tech - A la Faraway
sq<-seq(from=min(na.omit(Chile$statusquo)), to=max(na.omit(Chile$statusquo)), length.out=100)
# Look at sq
head(sq)
## [1] -1.725940 -1.691198 -1.656455 -1.621713 -1.586971 -1.552228
min(na.omit(Chile$statusquo))
## [1] -1.72594
max(na.omit(Chile$statusquo))
## [1] 1.71355
#
new.data.sq<-data.frame(statusquo=sq, age=mean(Chile$age), income=mean(Chile$income), education="S")
pred.prob<-predict(Chile.out, newdata=new.data.sq, type="response", se.fit=TRUE )
# this object includes pred.prob$fit and pred.prob$se.fit
length(pred.prob$fit)
## [1] 100
plot(NULL,xlab="Support for the Status Quo", ylab="P(Vote in Support of the Military Regime)", ylim=c(0,1), xlim=c(min(na.omit(Chile$statusquo)),max(na.omit(Chile$statusquo))))
lines(sq, pred.prob$fit, lty=1, lwd=2)
lines(sq, pred.prob$fit+1.96*pred.prob$se.fit, lty=2 )
lines(sq, pred.prob$fit-1.96*pred.prob$se.fit, lty=2 )
Remember to read Gelman and Hill, Chapter 7!
Simulation of predictions:
library(arm) #sim #this library has plogis which is the ilogit from the faraway library.
## Loading required package: MASS
## Loading required package: Matrix
## Loading required package: lme4
##
## arm (Version 1.13-1, built: 2022-8-25)
## Working directory is /Users/connie/Generalized-Linear-Models/Labs/02.week
##
## Attaching package: 'arm'
## The following object is masked from 'package:car':
##
## logit
#Use either one but for ilogit you have to call library(faraway). By hand, exp(eta)/(1+exp(eta)) gives the same result.
library(faraway) # for ilogit
##
## Attaching package: 'faraway'
## The following objects are masked from 'package:arm':
##
## fround, logit, pfround
## The following objects are masked from 'package:car':
##
## logit, vif
#The following function produces the coefficient from the model
coef(Chile.out)
## (Intercept) statusquo age income educationPS
## 7.008857e-01 3.186245e+00 2.228265e-03 -2.418958e-06 -1.028009e+00
## educationS
## -6.695928e-01
Code Chunk Below: Set variables to their variable. Here you have to follow the order in which they appear in the summary of the model. You can use coef(Chile.out) as a guideline. This line is different that creating a new.data for the previous example
Important! Use na.omit otherwise mean(Chile$age) will be NA! The same with income
HOW TO USE: BELOW NOTICE THE ORDER OF THE VALUES FOLLOW THE ORDER OF THE COEFFFICIENTS IN THE SUMMARY OF MODEL RESULTS. THEY WILL NOT FOLLOW THE ORDER IN WHICH YOU PUT THEM IN THE REGRESSION glm(y~….). YOU HAVE TO CHANGE THIS WHENEVER YOU ARE USING THIS BECAUSE YOU WILL HAVE DIFFERENT VARIABLES AND VALUES.
# Create a range of values for "status quo"
sq<-seq(from=min(na.omit(Chile$statusquo)), to=max(na.omit(Chile$statusquo)), length.out=1000)
sq.x<-cbind(1, #intercept
sq, #statusquo we want to vary
mean(na.omit(Chile$age)), #age set at its mean
mean(na.omit(Chile$income)), #income set at its mean
0, # we set education at its mode, secondary education, so PS is 0
1)
# Simulate from the model sim(model, n=1000) 1000 draws,
# and extract coefficients from the simulation
sim.chile <- coef(sim(Chile.out, n = 1000))
DETOUR: To understand the simulation, do this:
head(sim.chile)
## (Intercept) statusquo age income educationPS educationS
## [1,] 0.8461052 3.243148 0.004895294 -2.414281e-06 -1.3856031 -0.7037498
## [2,] 0.2620542 3.153198 0.008365041 -1.090742e-06 -0.8026585 -0.2888620
## [3,] 0.4673173 3.154859 0.008564236 8.905675e-07 -1.1801188 -0.8309914
## [4,] 0.6728983 3.027277 -0.001459054 -5.960062e-07 -0.6543748 -0.5403103
## [5,] 0.7714839 3.075492 0.004176502 -5.233798e-06 -0.9582917 -0.6234888
## [6,] 1.0775863 3.234696 -0.006377367 9.766404e-07 -1.6575217 -0.9798216
summary(Chile.out)
##
## Call:
## glm(formula = yes ~ statusquo + age + income + education, family = binomial(link = logit),
## data = Chile)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.3380 -0.2785 -0.1464 0.1961 2.8652
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 7.009e-01 3.568e-01 1.965 0.04946 *
## statusquo 3.186e+00 1.463e-01 21.782 < 2e-16 ***
## age 2.228e-03 7.372e-03 0.302 0.76246
## income -2.419e-06 2.792e-06 -0.866 0.38635
## educationPS -1.028e+00 3.417e-01 -3.009 0.00262 **
## educationS -6.696e-01 2.395e-01 -2.796 0.00518 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2360.29 on 1702 degrees of freedom
## Residual deviance: 715.58 on 1697 degrees of freedom
## AIC: 727.58
##
## Number of Fisher Scoring iterations: 6
The intercept is approx 0.7 and the SE is approx 0.35.
The sim
function is taking 1000 draws from a normal distribution with
mean 0.7 and SD 0.35. Remember that asymptotic normality of the MLE is
one of the properties! So we can do this.
If you do a summary of the simulation, you can see that the mean for each coefficient is close to the estimate in the summary of the model:
summary(sim.chile)
## (Intercept) statusquo age income
## Min. :-0.4400 Min. :2.649 Min. :-0.021481 Min. :-1.084e-05
## 1st Qu.: 0.4557 1st Qu.:3.089 1st Qu.:-0.002613 1st Qu.:-4.057e-06
## Median : 0.6770 Median :3.183 Median : 0.002395 Median :-2.321e-06
## Mean : 0.6930 Mean :3.185 Mean : 0.002356 Mean :-2.367e-06
## 3rd Qu.: 0.9382 3rd Qu.:3.283 3rd Qu.: 0.007272 3rd Qu.:-6.319e-07
## Max. : 1.7834 Max. :3.706 Max. : 0.023119 Max. : 7.576e-06
## educationPS educationS
## Min. :-2.0580 Min. :-1.5047
## 1st Qu.:-1.2523 1st Qu.:-0.8174
## Median :-1.0237 Median :-0.6615
## Mean :-1.0271 Mean :-0.6673
## 3rd Qu.:-0.7867 3rd Qu.:-0.5075
## Max. :-0.1170 Max. : 0.1272
You can make a histogram of the simulated intercepts:
hist(sim.chile[,1], main="intercept", 30)
If you calculate the standard deviation you get the SE of the intercept from the model output,
sd(sim.chile[,1])
## [1] 0.3434194
In essence, simulation allows us to account for uncertainty when calculating predictions.
Let’s get back at creating a figure of predicted probabilities.
### Cont. with figure here!
pred.sq <- ilogit(sq.x%*%t(sim.chile))
#This is matrix multiplication. This is like on OLS X beta^T. If you see:
dim(sq.x) #100x6
## [1] 1000 6
dim(t(sim.chile)) # 6x1000
## [1] 6 1000
#So the PP will be 100x1000 -
#This is 1000 predictions for each value of sq (we had set sq to have 100 values).
dim(pred.sq)
## [1] 1000 1000
# We create the figure.
#Observe that x is sq and y is calculating with each of the 1000 predictions of each x value
plot(NULL, xlab="Support for the Status Quo", ylab="P(Vote in Support of the Military Regime)", ylim=c(0,1), xlim=c(min(na.omit(Chile$statusquo)),max(na.omit(Chile$statusquo))))
lines(sq, apply(pred.sq, 1, quantile, .05), lty = 2)
lines(sq, apply(pred.sq, 1, quantile, .5), lwd = 3)
lines(sq, apply(pred.sq, 1, quantile, .95), lty = 2)
7 First Differences: Plotting predicted probabilities for men and women, and first difference (Monte Carlo simulation)
##########
# Plot for gender
##############
Chile.out.gender = glm(yes ~ statusquo+age+income+education+sex, family=binomial(link=logit), data=Chile)
new.data.M<-data.frame(sex="M", age=mean(Chile$age), income=mean(Chile$income), education="S", statusquo=mean(Chile$statusquo))
new.data.F<-data.frame(sex="F", age=mean(Chile$age), income=mean(Chile$income), education="S",statusquo=mean(Chile$statusquo))
pred.prob<-predict(Chile.out.gender, newdata=new.data.M, type="response", se.fit=TRUE )
# this object includes pred.prob$fit and pred.prob$se.fit
pred.prob.F<-predict(Chile.out, newdata=new.data.F, type="response", se.fit=TRUE )
dif<-pred.prob.F$fit-pred.prob$fit
SE.diff <- sqrt(pred.prob.F$se.fit^2+pred.prob$se.fit^2)
par(mar=c(2,19,1,1))
plot(-20,-10, pch=17, ylim=c(0,4), xlim=c(-0.1,.6), ylab="",yaxt = 'n', xlab="")
points(pred.prob$fit[1], 3, pch=17, cex=1) # point for male
points(pred.prob.F$fit, 2, pch=17, cex=1) # point for female
points(dif, 1, pch=17, cex=1) # point for female
text(pred.prob$fit[1], 3.2, "0.4")
text(pred.prob.F$fit[1], 2.2, "0.53")
text(dif, 1.2, "0.13")
segments(pred.prob$fit-1.96*pred.prob$se.fit, 3, pred.prob$fit+1.96*pred.prob$se.fit, 3, lty=1, col='black', lwd=2) #CI for male
segments(pred.prob.F$fit-1.96*pred.prob$se.fit, 2, pred.prob.F$fit+1.96*pred.prob$se.fit, 2, lty=1, col='black', lwd=2) # CI for female
segments(dif-1.96*SE.diff, 1, dif+1.96*SE.diff, 1, lty=1, col='green', lwd=2) # CI for female
abline(v=0, lty=3)
axis(2, at=3:1,labels=c("P(Vote Yes|Male)", "P(Vote Yes|Female)", "P(Vote Yes|Female)-P(Vote Yes|Female)"), las=1)
We will cover on Monte Carlo simulations and first differences on the next Lab + lecture. Check the lecture slides for the calculation of the standard error for the first differences!
For logit, the marginal change in probability (or just marginal
effect) of
$$\frac{\partial \pi_i}{\partial x_{ik}} = \frac{\partial}{\partial x_{ik}}\bigg(\frac{1}{1+e^{-(\hat{\alpha} + \hat{\beta}1 x{i1} + \hat{\beta}2 x{i2} + ... + \hat{\beta}k x{ik})}}\bigg)$$
$$ =\hat{\beta}_k \frac{e^{-(\hat{\alpha} + \hat{\beta}1 x{i1} + \hat{\beta}2 x{i2} + ... + \hat{\beta}k x{ik})}}{(1+e^{-(\hat{\alpha} + \hat{\beta}1 x{i1} + \hat{\beta}2 x{i2} + ... + \hat{\beta}k x{ik})})^2}$$
where
In R
the Logistic Probability Density Function is given by dlogis
.
Notice that you have to plug in values for every single
I am using the values of statusquo
rather than creating the values
because I also want to calculate the average marginal effect and for
that, we want to use the data. The SE produced via simulation captures
the uncertainty in the regression coefficients and the PDF.
sq<-sort(Chile$statusquo)
x.sq.1<-cbind(1, sq, mean(na.omit(Chile$age)), mean(na.omit(Chile$income)), 0, 1)
Here we are not using dlogis but using the equation we get when taking the derivative w/respect to statusquo:
# Different way of calculating the same as below.
#Here we are not using dlogis but using the equation we get when taking the derivative w/respect to statusquo.
#sim.chile <- coef(sim(Chile.out, n = 10000))
#eta<-x.sq.1%*%t(sim.chile)
#Marginal.change<-beta.sq*(exp(-eta)/(1+exp(-eta))^2)
#mean(Marginal.change)
#The SE produced via simulation captures the uncertainty in the regression coefficients and the PDF
sim.chile <- coef(sim(Chile.out, n = 10000))
# Beta for status quo
beta.sq<-sim.chile[,2]
# Calculate ME
Marginal.change<-beta.sq*dlogis(x.sq.1%*%t(sim.chile))
mean(Marginal.change)
## [1] 0.1924114
INTERPRETATION 1: The increase of the probability of voting in favor of the military regime, for a one unit increase in support for the status quo, is 0.19.(on average and holding other variables constant)
INTERPRETATION 2 (the same, different words): As the respondent identifies one point more strongly with the Status Quo, the respondent’s probability of voting in favor of Pinochet increases by 0.19, on average, after controlling for other covariates.
Look at the figure below. The marginal change varies at different values of status quo. Above we were calculating the average marginal effect. Remember the equation from the lecture! The ME is non-linear. To understand this figure, look at the predicted probabilities for status quo, which we calculated previously, and see how a 1-unit change to the right tracks with the figure below. One unit changes for the middle of status quot correspond to bigger changes in probability, while one unit changes on the higher values for SQ correspond to very small changes in probability. (This is basically the definition of partial derivative and why calculus is so important here.)
plot(NULL, xlab="Support for the Status Quo", ylab="Marginal Change in P(Vote in Support of the Military Regime)", ylim=c(0,1), xlim=c(min(na.omit(Chile$statusquo)),max(na.omit(Chile$statusquo))))
lines(sq, apply(Marginal.change, 1, quantile, .05), lty = 2)
lines(sq, apply(Marginal.change, 1, quantile, .5), lwd = 3)
lines(sq, apply(Marginal.change, 1, quantile, .95), lty = 2)
Note: My opinion is that, for nonlinear models, first differences figures are more easy to interpret and provide my substantive insights for readers than Marginal Effects figures like the one above. However, these figures are very common on research publications.
I am using the values of income
rather than creating the values
because I also want to calculate the average marginal effect and for
that, we want to use the data. The SE produced via simulation captures
the uncertainty in the regression coefficients and the PDF.
Income takes values from 2500 to 200,000 with a median of 15,000.
Note: In future Lab we will see how to do the observed value approach. It should feel uncomfortable to do the following to calculate predictions because, for instance, we know that we would observe higher income to be associated with older people or certain levels of education.
income<-sort(na.omit(Chile$income))
x.income.1<-cbind(1, mean(na.omit(Chile$statusquo)), mean(na.omit(Chile$age)),income, 0, 1)
# Different way of calculating the same as below.
#Here we are not using dlogis but using the equation we get when taking the derivative w/respect to statusquo.
sim.chile <- coef(sim(Chile.out, n = 10000))
beta.income <- sim.chile[,'income']
eta<-x.income.1%*%t(sim.chile)
marginal.change.income<-beta.income*(exp(-eta)/(1+exp(-eta))^2)
mean(marginal.change.income)
## [1] -6.006561e-07
The marginal change is negative, thus, we observe a decrease in probability. It is also very small because a 1 unit increase is equal to 1 peso. We can calculate a marginal change for 10,000 pesos to provide a more substantive interpretation though the change is still extremely very small.
INTERPRETATION 1: The decrease of the probability of voting in favor of the military regime, for a 10,000 pesos increase in income, is 0.006.(on average and holding other variables constant)
INTERPRETATION 2 (the same, different words): As the respondent’s income increases by 10,000 pesos, the respondent’s probability of voting in favor of Pinochet decreases by 0.006, on average, after controlling for other covariates.
This follows curve
use from Gelman and Hill, assigned reading chapter.
Call sq “x” because it required a variable called “x” which is the one
we want to plot in the x axis. Rather than creating objects, we have to
use the expressions within curve. It will take x to be the variable for
the x axis automatically.
# important - call sq "x"
x<-seq(from=min(na.omit(Chile$statusquo)), to=max(na.omit(Chile$statusquo)), length.out=1000)
#Same code as above, but replacing sq for x
x.sq.1<-cbind(1, x, mean(na.omit(Chile$age)), mean(na.omit(Chile$income)), 0, 1)
sim.chile <- coef(sim(Chile.out, n = 1000))
beta.sq<-sim.chile[,2]
Marginal.change<-beta.sq*dlogis(x.sq.1%*%t(sim.chile))
mean(Marginal.change)
## [1] 0.2881838
plot(NULL, xlab="Support for the Status Quo", ylab="Marginal Change in P(Vote in Support of the Military Regime)", ylim=c(0,1), xlim=c(min(na.omit(Chile$statusquo)),max(na.omit(Chile$statusquo))))
curve(apply(beta.sq*dlogis(cbind(1, x, mean(na.omit(Chile$age)), mean(na.omit(Chile$income)), 0, 1)%*%t(sim.chile)), 1, quantile, .5), add=TRUE, lwd=3)
curve(apply(beta.sq*dlogis(cbind(1, x, mean(na.omit(Chile$age)), mean(na.omit(Chile$income)), 0, 1)%*%t(sim.chile)), 1, quantile, .05), add=TRUE, lty=2)
curve(apply(beta.sq*dlogis(cbind(1, x, mean(na.omit(Chile$age)), mean(na.omit(Chile$income)), 0, 1)%*%t(sim.chile)), 1, quantile, .95), add=TRUE, lty=2)
In a few weeks we will cover Multinomial Logit. Here is a short
introduction. In Chile voting is mandatory, so we can recode those
saying they do not know who they will vote/refused to respond (coded
NA
) as undecided
and also, we keep those who abstain
.
You can also run a different model in which we join abstain + undecided to analyze differences.
# reset data
data(Chile)
#Chile$vote.3<-recode(Chile$vote, "'A'='U'; NA='U'")
#summary(Chile$vote.3)
Chile$vote.4<-recode(Chile$vote, "NA='U'")
library(nnet)
#m<-multinom(vote.3~ statusquo+age+income+education+sex, data=Chile)
m<-multinom(vote.4~ statusquo+age+income+education+sex, data=Chile)
## # weights: 32 (21 variable)
## initial value 3578.025746
## iter 10 value 2957.545064
## iter 20 value 2275.704778
## iter 30 value 2234.775173
## final value 2234.754515
## converged
summary(m)
## Call:
## multinom(formula = vote.4 ~ statusquo + age + income + education +
## sex, data = Chile)
##
## Coefficients:
## (Intercept) statusquo age income educationPS educationS
## N -0.1968252 -1.7627492 0.009104577 4.503851e-06 0.1135230 -0.3216046
## U 0.7961120 0.3097937 0.028863839 -2.582798e-06 -0.7067184 -0.5336052
## Y 0.3826581 1.8282613 0.020230832 1.433993e-06 -0.5002619 -0.6538045
## sexM
## N 0.66112572
## U -0.07716410
## Y -0.02686175
##
## Std. Errors:
## (Intercept) statusquo age income educationPS educationS
## N 7.212246e-05 6.298179e-05 0.003019956 2.249483e-06 5.718983e-06 3.617958e-05
## U 6.524647e-05 3.526413e-05 0.002915679 2.309495e-06 7.624751e-07 2.948582e-05
## Y 6.743857e-05 3.888137e-05 0.003052540 2.276962e-06 2.733047e-07 2.761520e-05
## sexM
## N 3.649633e-05
## U 2.371169e-05
## Y 2.923839e-05
##
## Residual Deviance: 4469.509
## AIC: 4511.509
Let’s make a plot and not dwell on the summary output.Rember that
in favor
means voting in favor of the authoritarian regime.
sq<-seq(min(na.omit(Chile$statusquo)), max(na.omit(Chile$statusquo)), length.out=50)
pF<-predict(m, newdata=data.frame(statusquo=sq, age=mean(na.omit(Chile$age)), income=mean(na.omit(Chile$income)), education="S", sex="F" ), type="probs")
pM<-predict(m, newdata=data.frame(statusquo=sq, age=mean(na.omit(Chile$age)), income=mean(na.omit(Chile$income)), education="S", sex="M" ), type="probs")
par(mfrow=c(2,2))
plot("NULL", xlab="Support of the status quo", ylab="P(Abstain)", ylim=c(0,1), xlim=c(min(sq), max(sq)), main="Abstain")
lines(sq, pF[,1], lwd=3, col="gray", lty=1)
lines(sq, pM[,1], lwd=3, lty=3)
plot("NULL", xlab="Support of the status quo", ylab="P(Undecided Vote)", ylim=c(0,1), xlim=c(min(sq), max(sq)), main="Undecided")
lines(sq, pF[,3], lwd=3, col="orange", lty=1)
lines(sq, pM[,3], lwd=3, lty=3)
plot("NULL", xlab="Support of the status quo", ylab="P(Voting No)", ylim=c(0,1), xlim=c(min(sq), max(sq)), main="Against")
lines(sq, pF[,2], lwd=3, col="purple", lty=1)
lines(sq, pM[,2], lwd=3, lty=3)
plot("NULL", xlab="Support of the status quo", ylab="P(Voting Yes)", ylim=c(0,1), xlim=c(min(sq), max(sq)), main="In Favor")
lines(sq, pF[,4], lwd=3, col="red", lty=1)
lines(sq, pM[,4], lwd=3, lty=3)
Discussion: What can we infer from this figure? What are we learning with this modeling strategy compared to the logit model?
We can also fit a model using the mlogit
package and include two
different categories for those undecided
and those who claimed they
would abstain
.
library(mlogit)
## Loading required package: dfidx
##
## Attaching package: 'dfidx'
## The following object is masked from 'package:MASS':
##
## select
## The following object is masked from 'package:stats':
##
## filter
Chile$respondent<-seq(1, nrow(Chile))
data<-mlogit.data(Chile, choice="vote", shape="wide", alt.levels= c("A", "Y", "N", "U"), id.var="respondent")
## Warning in dfidx::dfidx(data = data, dfa$idx, drop.index = dfa$drop.index, :
## the levels shouldn't be provided with a data set in wide format
#m<-mlogit(vote~ |statusquo+age+income+education+sex, data=data)
m<-mlogit(vote~ 1|statusquo+age+income+education+sex, data=Chile, shape="wide")
m2<-mlogit(vote~ 1|statusquo+age+income+education+sex, data=Chile, shape="wide", alt.subset=c("A", "U"))
hmftest(m, m2)
##
## Hausman-McFadden test
##
## data: Chile
## chisq = -0.49351, df = 7, p-value = 1
## alternative hypothesis: IIA is rejected