-
Notifications
You must be signed in to change notification settings - Fork 0
/
persuasio4ytz.ado
444 lines (300 loc) · 14.3 KB
/
persuasio4ytz.ado
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
/***
Title
-----
{phang}{cmd:persuasio4ytz} {hline 2} Conduct causal inference on persuasive effects
for binary outcomes _y_, binary treatments _t_ and binary instruments _z_
Syntax
------
> {cmd:persuasio4ytz} _depvar_ _treatvar_ _instrvar_ [_covariates_] [_if_] [_in_] [, {cmd:level}(#) {cmd:model}(_string_) {cmd:method}(_string_) {cmd:nboot}(#) {cmd:title}(_string_)]
### Options
| _option_ | _Description_ |
|-------------------|-------------------------|
| {cmd:level}(#) | Set confidence level; default is {cmd:level}(95) |
| {cmd:model}(_string_) | Regression model when _covariates_ are present |
| {cmd:method}(_string_) | Inference method; default is {cmd:method}("normal") |
| {cmd:nboot}(#) | Perform # bootstrap replications |
| {cmd:title}(_string_) | Title |
Description
-----------
{cmd:persuasio4ytz} conducts causal inference on persuasive effects.
It is assumed that binary outcomes _y_, binary treatments _t_, and binary instruments _z_ are observed.
This command is for the case when persuasive treatment (_t_) is observed,
using estimates of the lower and upper bounds on the average persuasion rate (APR) via
this package's commands {cmd:aprlb} and {cmd:aprub}.
_varlist_ should include _depvar_ _treatvar_ _instrvar_ _covariates_ in order.
Here, _depvar_ is binary outcomes (_y_), _treatvar_ is binary treatments,
_instrvar_ is binary instruments (_z_), and _covariates_ (_x_) are optional.
There are two cases: (i) _covariates_ are absent and (ii) _covariates_ are present.
- Without _x_, the lower bound ({cmd:theta_L}) on the APR is defined by
{cmd:theta_L} = {Pr({it:y}=1|{it:z}=1) - Pr({it:y}=1|{it:z}=0)}/{1 - Pr({it:y}=1|{it:z}=0)},
and the upper bound ({cmd:theta_U}) on the APR is defined by
{cmd:theta_U} = {E[{it:A}|{it:z}=1] - E[{it:B}|{it:z}=0]}/{1 - E[{it:B}|{it:z}=0]},
where {it:A} = 1({it:y}=1,{it:t}=1)+1-1({it:t}=1) and
{it:B} = 1({it:y}=1,{it:t}=0).
The lower bound is estimated by the following procedure:
1. Pr({it:y}=1|{it:z}=1) and Pr({it:y}=1|{it:z}=0) are estimated by regressing _y_ on _z_.
2. {cmd:theta_L} is computed using the estimates obtained above.
3. The standard error is computed via STATA command __nlcom__.
The upper bound is estimated by the following procedure:
1. E[{it:A}|{it:z}=1] is estimated by regressing {it:A} on _z_.
2. E[{it:B}|{it:z}=0] is estimated by regressing {it:B} on _z_.
3. {cmd:theta_U} is computed using the estimates obtained above.
4. The standard error is computed via STATA command __nlcom__.
Then, a confidence interval for the APR is set by
{p 8 8 2} [ _est_lb_ - _cv_ * _se_lb_ , _est_ub_ + _cv_ * _se_ub_ ],
where _est_lb_ and _est_ub_ are the estimates of the lower and upper bounds,
_se_lb_ and _se_ub_ are the corresponding standard errors, and
_cv_ is the critical value obtained via the method of Stoye (2009).
- With _x_, the lower bound ({cmd:theta_L}) on the APR is defined by
{cmd:theta_L} = E[{cmd:theta_L_num}({it:x})]/E[{cmd:theta_L_den}({it:x})],
where
{cmd:theta_L_num}({it:x}) = Pr({it:y}=1|{it:z}=1,{it:x}) - Pr({it:y}=1|{it:z}=0,{it:x})
and
{cmd:theta_L_den}({it:x}) = 1 - Pr({it:y}=1|{it:z}=0,{it:x}).
- With _x_, the upper bound ({cmd:theta_U}) on the APR is defined by
{cmd:theta_U} = E[{cmd:theta_U_num}({it:x})]/E[{cmd:theta_U_den}({it:x})],
where
{cmd:theta_U_num}({it:x}) = E[{it:A}|{it:z}=1,{it:x}] - E[{it:B}|{it:z}=0,{it:x}]
and
{cmd:theta_U_den}({it:x}) = 1 - E[{it:B}|{it:z}=0,{it:x}].
The lower bound is estimated by the following procedure:
If {cmd:model}("no_interaction") is selected (default choice),
1. Pr({it:y}=1|{it:z},{it:x}) is estimated by regressing _y_ on _z_ and _x_.
Alternatively, if {cmd:model}("interaction") is selected,
1a. Pr({it:y}=1|{it:z}=1,{it:x}) is estimated by regressing _y_ on _x_ given _z_ = 1.
1b. Pr({it:y}=1|{it:z}=0,{it:x}) is estimated by regressing _y_ on _x_ given _z_ = 0.
After step 1, both options are followed by:
{p 4 8 2}2. For each _x_ in the estimation sample, {cmd:theta_L_num}({it:x}) and {cmd:theta_L_den}({it:x}) are evaluated.
{p 4 8 2}3. The estimates of {cmd:theta_L_num}({it:x}) and {cmd:theta_L_den}({it:x}) are averaged to estimate {cmd:theta_L}.
The upper bound is estimated by the following procedure:
If {cmd:model}("no_interaction") is selected (default choice),
1. E[{it:A}|{it:z}=1,{it:x}] is estimated by regressing {it:A} on _z_ and _x_.
2. E[{it:B}|{it:z}=0,{it:x}] is estimated by regressing {it:B} on _z_ and _x_.
Alternatively, if {cmd:model}("interaction") is selected,
1. E[{it:A}|{it:z}=1,{it:x}] is estimated by regressing {it:A} on _x_ given _z_ = 1.
2. E[{it:B}|{it:z}=0,{it:x}] is estimated by regressing {it:B} on _x_ given _z_ = 0.
After step 1, both options are followed by:
{p 4 8 2}3. For each _x_ in the estimation sample, {cmd:theta_U_num}({it:x}) and {cmd:theta_U_den}({it:x}) are evaluated.
{p 4 8 2}4. The estimates of {cmd:theta_U_num}({it:x}) and {cmd:theta_U_den}({it:x}) are averaged to estimate {cmd:theta_U}.
Then, a bootstrap confidence interval for the APR is set by
{p 8 8 2} [ bs_est_lb(_alpha_) , bs_est_ub(1 - _alpha_) ],
where bs_est_lb(_alpha_) is the _alpha_ quantile of the bootstrap estimates of {cmd:theta_L},
bs_est_ub(_alpha_) is the 1 - _alpha_ quantile of the bootstrap estimates of {cmd:theta_U},
and 1 - _alpha_ is the confidence level.
The resulting coverage probability is 1 - _alpha_ if the identified interval never reduces to a singleton set.
More generally, it will be 1 - 2*{it:alpha} by Bonferroni correction.
The bootstrap procedure is implemented via STATA command {cmd:bootstrap}.
Options
-------
{cmd:model}(_string_) specifies a regression model of _y_ on _z_ and _x_.
This option is only relevant when _x_ is present.
The default option is "no_interaction" between _z_ and _x_.
When "interaction" is selected, full interactions between _z_ and _x_ are allowed.
{cmd:level}(#) sets confidence level; default is {cmd:level}(95).
{cmd:method}(_string_) refers the method for inference.
The default option is {cmd:method}("normal").
By the nature of identification, one-sided confidence intervals are produced.
{p 4 8 2}1. When _x_ is present, it needs to be set as {cmd:method}("bootstrap");
otherwise, the confidence interval will be missing.
{p 4 8 2}2. When _x_ is absent, both options yield non-missing confidence intervals.
{cmd:nboot}(#) chooses the number of bootstrap replications.
The default option is {cmd:nboot}(50).
It is only relevant when {cmd:method}("bootstrap") is selected.
{cmd:title}(_string_) specifies a title.
Remarks
-------
It is recommended to use {cmd:nboot}(#) with # at least 1000.
A default choice of 50 is meant to check the code initially
because it may take a long time to run the bootstrap part.
The bootstrap confidence interval is based on percentile bootstrap.
Normality-based bootstrap confidence interval is not recommended
because bootstrap standard errors can be unreasonably large in applications.
Examples
--------
We first call the dataset included in the package.
. use GKB_persuasio, clear
The first example conducts inference on the APR without covariates, using normal approximation.
. persuasio4ytz voteddem_all readsome post, level(80) method("normal")
The second example conducts bootstrap inference on the APR.
. persuasio4ytz voteddem_all readsome post, level(80) method("bootstrap") nboot(1000)
The third example conducts bootstrap inference on the APR with a covariate, MZwave2, interacting with the instrument, post.
. persuasio4ytz voteddem_all readsome post MZwave2, level(80) model("interaction") method("bootstrap") nboot(1000)
Stored results
--------------
### Matrices
> __e(apr_est)__: (1*2 matrix) bounds on the average persuasion rate in the form of [lb, ub]
> __e(apr_ci)__: (1*2 matrix) confidence interval for the average persuasion rate in the form of [lb_ci, ub_ci]
### Macros
> __e(cilevel)__: confidence level
> __e(inference_method)__: inference method: "normal" or "bootstrap"
Authors
-------
Sung Jae Jun, Penn State University, <sjun@psu.edu>
Sokbae Lee, Columbia University, <sl3841@columbia.edu>
License
-------
GPL-3
References
----------
Sung Jae Jun and Sokbae Lee (2022),
Identifying the Effect of Persuasion,
[arXiv:1812.02276 [econ.EM]](https://arxiv.org/abs/1812.02276)
Version
-------
0.2.1 20 November 2022
***/
capture program drop persuasio4ytz
program persuasio4ytz, eclass sortpreserve byable(recall)
version 16.1
syntax varlist (min=3) [if] [in] [, level(cilevel) model(string) method(string) nboot(numlist >0 integer) title(string)]
marksample touse
gettoken Y varlist_without_Y : varlist
gettoken T varlist_without_YT : varlist_without_Y
gettoken Z X : varlist_without_YT
quietly aprlb `Y' `Z' `X' if `touse', model("`model'")
tempname lb_coef lb_se
scalar `lb_coef' = e(lb_coef)
scalar `lb_se' = e(lb_se)
quietly aprub `Y' `T' `Z' `X' if `touse', model("`model'")
tempname ub_coef ub_se
scalar `ub_coef' = e(ub_coef)
scalar `ub_se' = e(ub_se)
* displaying results
if "`title'" != "" {
display "`title':"
}
* inference based on normal approximation
if "`method'" == "" | "`method'" == "normal" {
if "`level'" != "" {
local alpha_level = 1 - `level'/100
}
if "`level'" == "" {
local alpha_level = 0.05
}
/* compute the critical value using Stoye (2009) */
tempname cv_cns1 cv_cns2 correction_term mincv maxcv gridsize cv_cns_stoye lb_end ub_end
scalar `cv_cns1' = invnormal(1-`alpha_level') /* one-sided critical value */
scalar `cv_cns2' = invnormal(1-`alpha_level'/2) /* two-sided critical value */
scalar `mincv' = `cv_cns1'-0.01
scalar `maxcv' = `cv_cns2'+0.01
scalar `gridsize' = (`maxcv'-`mincv')/(e(N)-1)
scalar `correction_term' = (`ub_coef'-`lb_coef')/max(`ub_se',`lb_se')
quietly {
tempvar cvtmp difftmp
egen `cvtmp' = fill("0 `=`gridsize''")
replace `cvtmp' = `cvtmp' + `mincv'
gen `difftmp' = abs(normal(`cvtmp' + `correction_term') - normal(-`cvtmp') - (1-`alpha_level'))
sum `difftmp'
replace `cvtmp' = . if `difftmp' > r(min)
sum `cvtmp'
scalar `cv_cns_stoye' = r(mean)
}
scalar `lb_end' = `lb_coef' - `cv_cns_stoye'*`lb_se'
scalar `ub_end' = `ub_coef' + `cv_cns_stoye'*`ub_se'
* Displaying results
display " "
display as text "{hline 65}"
display "{bf:persuasio4ytz:} Causal inference on the Average Persuasion Rate"
display " when outcome, instrument and instrument are observed"
display as text "{hline 65}"
display " "
if "`title'" != "" {
display "Title: `title'"
}
display " - Binary outcome: `e(outcome)'"
display " - Binary treatment: `e(treatment)'"
display " - Binary instrument: `e(instrument)'"
display " "
display as text "{hline 25}{c TT}{hline 40}"
display as text %24s "Parameter" " {c |}" /*
*/ _col(28) "Bound Estimate" /*
*/ _col(48) "`level'% Conf. Interval"
display as text "{hline 25}{c +}{hline 40}"
display as text %24s "Average Persuasion Rate" " {c |}" /*
*/ as result /*
*/ _col(27) %8.0g `lb_coef' " " /*
*/ _col(33) %8.0g `ub_coef' " " /*
*/ _col(47) %8.0g `lb_end' " " /*
*/ _col(53) %8.0g `ub_end' " "
display as text "{hline 25}{c BT}{hline 40}"
display " "
display "Note: `level'% conf. interval is based on normal approximation"
display " using the method of Stoye (2009). "
display " Conf. interval is missing if covariates are present."
display " Use option bootstrap for that case."
display " "
}
* inference based on bootstrap
if "`method'" == "bootstrap" {
* Displaying results
display " "
display as text "{hline 65}"
display "{bf:persuasio4ytz:} Causal inference on the Average Persuasion Rate"
display " when outcome, instrument and instrument are observed"
display " along with covariates"
display as text "{hline 65}"
display " "
if "`title'" != "" {
display "Title: `title'"
}
display " - Binary outcome: `e(outcome)'"
display " - Binary treatment: `e(treatment)'"
display " - Binary instrument: `e(instrument)'"
display " - Covariates (if exist): `e(covariates)'"
display " - Regression model (if specified): `e(model)'"
display " "
if "`level'" != "" {
local alpha_level = 1 - `level'/100
}
if "`level'" == "" {
local alpha_level = 0.05
}
local bs_level = round(10000*(1 - `alpha_level'*2))/100 /* level for bootstrap */
* lower bound
if "`nboot'" != "" {
bootstrap coef=e(lb_coef), reps(`nboot') level(`bs_level') notable nowarn: aprlb `Y' `Z' `X' if `touse', model("`model'")
}
if "`nboot'" == "" {
bootstrap coef=e(lb_coef), reps(50) level(`bs_level') notable nowarn: aprlb `Y' `Z' `X' if `touse', model("`model'")
}
tempname bs_ci_percentile lb_end ub_end
matrix `bs_ci_percentile' = e(ci_percentile)
scalar `lb_end' = `bs_ci_percentile'[1,1]
* upper bound
if "`nboot'" != "" {
bootstrap coef=e(ub_coef), reps(`nboot') level(`bs_level') notable nowarn: aprub `Y' `T' `Z' `X' if `touse', model("`model'")
}
if "`nboot'" == "" {
bootstrap coef=e(ub_coef), reps(50) level(`bs_level') notable nowarn: aprub `Y' `T' `Z' `X' if `touse', model("`model'")
}
matrix `bs_ci_percentile' = e(ci_percentile)
scalar `ub_end' = `bs_ci_percentile'[2,1]
* Displaying results further
display " "
display as text "{hline 25}{c TT}{hline 40}"
display as text %24s "Parameter" " {c |}" /*
*/ _col(28) "Bound Estimate" /*
*/ _col(48) "`level'% Conf. Interval"
display as text "{hline 25}{c +}{hline 40}"
display as text %24s "Average Persuasion Rate" " {c |}" /*
*/ as result /*
*/ _col(27) %8.0g `lb_coef' " " /*
*/ _col(33) %8.0g `ub_coef' " " /*
*/ _col(47) %8.0g `lb_end' " " /*
*/ _col(53) %8.0g `ub_end' " "
display as text "{hline 25}{c BT}{hline 40}"
display " "
display "Note: `level'% conf. interval is based on percentile bootstrap."
display " The conf. level is one-sided for the lower and upper bounds separately."
display " "
}
tempname coef_matrix ci_matrix
matrix `coef_matrix' = (`lb_coef',`ub_coef')
matrix `ci_matrix' = (`lb_end',`ub_end')
ereturn clear
ereturn matrix apr_est = `coef_matrix'
ereturn matrix apr_ci = `ci_matrix'
ereturn local cilevel = (1-`alpha_level')*100
ereturn local inference_method "`method'"
display "Reference: Jun and Lee (2022), arXiv:1812.02276 [econ.EM]"
end