Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update README.md #324

Merged
merged 2 commits into from
Nov 19, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 42 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ techniques with econometrics to bring automation to complex causal inference pro
* Build on standard Python packages for Machine Learning and Data Analysis

One of the biggest promises of machine learning is to automate decision making in a multitude of domains. At the core of many data-driven personalized decision scenarios is the estimation of heterogeneous treatment effects: what is the causal effect of an intervention on an outcome of interest for a sample with a particular set of features? In a nutshell, this toolkit is designed to measure the causal effect of some treatment variable(s) `T` on an outcome
variable `Y`, controlling for a set of features `X`. The methods implemented are applicable even with observational (non-experimental or historical) datasets.
variable `Y`, controlling for a set of features `X, W` and how does that effect vary as a function of `X`. The methods implemented are applicable even with observational (non-experimental or historical) datasets. For the estimation results to have a causal interpretation, some methods assume no unobserved confounders (i.e. there is no unobserved variable not included in `X, W` that simultaneously has an effect on both `T` and `Y`), while others assume access to an instrument `Z` (i.e. an observed variable `Z` that has an effect on the treatment `T` but no direct effect on the outcome `Y`). Most methods provide confidence intervals and inference results.

For detailed information about the package, consult the documentation at https://econml.azurewebsites.net/.

Expand Down Expand Up @@ -84,7 +84,7 @@ To install from source, see [For Developers](#for-developers) section below.
### Estimation Methods

<details>
<summary>Double Machine Learning (click to expand)</summary>
<summary>Double Machine Learning (aka RLearner) (click to expand)</summary>

* Linear final stage

Expand Down Expand Up @@ -117,7 +117,7 @@ To install from source, see [For Developers](#for-developers) section below.
lb, ub = est.effect_interval(X_test, alpha=0.05) # Confidence intervals via debiased lasso
```

* Nonparametric last stage
* Forest last stage

```Python
from econml.dml import ForestDML
Expand All @@ -129,6 +129,20 @@ To install from source, see [For Developers](#for-developers) section below.
# Confidence intervals via Bootstrap-of-Little-Bags for forests
lb, ub = est.effect_interval(X_test, alpha=0.05)
```

* Generic Machine Learning last stage

```Python
from econml.dml import NonParamDML
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier

est = NonParamDML(model_y=RandomForestRegressor(),
model_t=RandomForestClassifier(),
model_final=RandomForestRegressor(),
discrete_treatment=True)
est.fit(Y, T, X=X, W=W)
treatment_effects = est.effect(X_test)
```

</details>

Expand Down Expand Up @@ -367,16 +381,28 @@ as p-values and z-statistics. When the CATE model is linear and parametric, then
est.effect_inference(X_test).summary_frame(alpha=0.05, value=0, decimals=3)
# Get the population summary for the entire sample X
est.effect_inference(X_test).population_summary(alpha=0.1, value=0, decimals=3, tol=0.001)
# Get the inference summary for the final model
# Get the parameter inference summary for the final model
est.summary()
```

<details><summary>Example Output (click to expand)</summary>

```Python
# Get the effect inference summary, which includes the standard error, z test score, p value, and confidence interval given each sample X[i]
est.effect_inference(X_test).summary_frame(alpha=0.05, value=0, decimals=3)
```
![image](notebooks/images/summary_frame.png)

```Python
# Get the population summary for the entire sample X
est.effect_inference(X_test).population_summary(alpha=0.1, value=0, decimals=3, tol=0.001)
```
![image](notebooks/images/population_summary.png)

```Python
# Get the parameter inference summary for the final model
est.summary()
```
![image](notebooks/images/summary.png)

</details>
Expand Down Expand Up @@ -448,6 +474,10 @@ contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additio

# References

X Nie, S Wager.
**Quasi-Oracle Estimation of Heterogeneous Treatment Effects.**
[*Biometrika*](https://doi.org/10.1093/biomet/asaa076), 2020

V. Syrgkanis, V. Lei, M. Oprescu, M. Hei, K. Battocchi, G. Lewis.
**Machine Learning Estimation of Heterogeneous Treatment Effects with Instruments.**
[*Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS)*](https://arxiv.org/abs/1905.10176), 2019
Expand All @@ -466,10 +496,18 @@ S. Künzel, J. Sekhon, J. Bickel and B. Yu.
**Metalearners for estimating heterogeneous treatment effects using machine learning.**
[*Proceedings of the national academy of sciences, 116(10), 4156-4165*](https://www.pnas.org/content/116/10/4156), 2019.

S. Athey, J. Tibshirani, S. Wager.
**Generalized random forests.**
[*Annals of Statistics, 47, no. 2, 1148--1178*](https://projecteuclid.org/euclid.aos/1547197251), 2019.

V. Chernozhukov, D. Nekipelov, V. Semenova, V. Syrgkanis.
**Plug-in Regularized Estimation of High-Dimensional Parameters in Nonlinear Semiparametric Models.**
[*Arxiv preprint arxiv:1806.04823*](https://arxiv.org/abs/1806.04823), 2018.

S. Wager, S. Athey.
**Estimation and Inference of Heterogeneous Treatment Effects using Random Forests.**
[*Journal of the American Statistical Association, 113:523, 1228-1242*](https://www.tandfonline.com/doi/citedby/10.1080/01621459.2017.1319839), 2018.

Jason Hartford, Greg Lewis, Kevin Leyton-Brown, and Matt Taddy. **Deep IV: A flexible approach for counterfactual prediction.** [*Proceedings of the 34th International Conference on Machine Learning, ICML'17*](http://proceedings.mlr.press/v70/hartford17a/hartford17a.pdf), 2017.

V. Chernozhukov, D. Chetverikov, M. Demirer, E. Duflo, C. Hansen, and a. W. Newey. **Double Machine Learning for Treatment and Causal Parameters.** [*ArXiv preprint arXiv:1608.00060*](https://arxiv.org/abs/1608.00060), 2016.