Update README.md (#324)

* Updated some examples and also updated bibliography to include some papers at the core of the library that we were missing.
py-why · Nov 19, 2020 · 697c595 · 697c595
1 parent af054c4
commit 697c595
Showing 1 changed file with 42 additions and 4 deletions.
diff --git a/README.md b/README.md
@@ -16,7 +16,7 @@ techniques with econometrics to bring automation to complex causal inference pro
 * Build on standard Python packages for Machine Learning and Data Analysis
 
 One of the biggest promises of machine learning is to automate decision making in a multitude of domains. At the core of many data-driven personalized decision scenarios is the estimation of heterogeneous treatment effects: what is the causal effect of an intervention on an outcome of interest for a sample with a particular set of features? In a nutshell, this toolkit is designed to measure the causal effect of some treatment variable(s) `T` on an outcome 
-variable `Y`, controlling for a set of features `X`. The methods implemented are applicable even with observational (non-experimental or historical) datasets.
+variable `Y`, controlling for a set of features `X, W` and how does that effect vary as a function of `X`. The methods implemented are applicable even with observational (non-experimental or historical) datasets. For the estimation results to have a causal interpretation, some methods assume no unobserved confounders (i.e. there is no unobserved variable not included in `X, W` that simultaneously has an effect on both `T` and `Y`), while others assume access to an instrument `Z` (i.e. an observed variable `Z` that has an effect on the treatment `T` but no direct effect on the outcome `Y`). Most methods provide confidence intervals and inference results.
 
 For detailed information about the package, consult the documentation at https://econml.azurewebsites.net/.
 
@@ -84,7 +84,7 @@ To install from source, see [For Developers](#for-developers) section below.
 ### Estimation Methods
 
 <details>
-  <summary>Double Machine Learning (click to expand)</summary>
+  <summary>Double Machine Learning (aka RLearner) (click to expand)</summary>
 
   * Linear final stage
 
@@ -117,7 +117,7 @@ To install from source, see [For Developers](#for-developers) section below.
   lb, ub = est.effect_interval(X_test, alpha=0.05) # Confidence intervals via debiased lasso
   ```
 
-  * Nonparametric last stage
+  * Forest last stage
 
   ```Python
   from econml.dml import ForestDML
@@ -129,6 +129,20 @@ To install from source, see [For Developers](#for-developers) section below.
   # Confidence intervals via Bootstrap-of-Little-Bags for forests
   lb, ub = est.effect_interval(X_test, alpha=0.05)
   ```
+
+  * Generic Machine Learning last stage
+
+  ```Python
+  from econml.dml import NonParamDML
+  from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
+
+  est = NonParamDML(model_y=RandomForestRegressor(),
+                    model_t=RandomForestClassifier(),
+                    model_final=RandomForestRegressor(),
+                    discrete_treatment=True)
+  est.fit(Y, T, X=X, W=W) 
+  treatment_effects = est.effect(X_test)
+  ```
 
 </details>
 
@@ -367,16 +381,28 @@ as p-values and z-statistics. When the CATE model is linear and parametric, then
   est.effect_inference(X_test).summary_frame(alpha=0.05, value=0, decimals=3)
   # Get the population summary for the entire sample X
   est.effect_inference(X_test).population_summary(alpha=0.1, value=0, decimals=3, tol=0.001)
-  #  Get the inference summary for the final model
+  #  Get the parameter inference summary for the final model
   est.summary()
   ```
 
   <details><summary>Example Output (click to expand)</summary>
 
+  ```Python
+  # Get the effect inference summary, which includes the standard error, z test score, p value, and confidence interval given each sample X[i]
+  est.effect_inference(X_test).summary_frame(alpha=0.05, value=0, decimals=3)
+  ```
   ![image](notebooks/images/summary_frame.png)
 
+  ```Python
+  # Get the population summary for the entire sample X
+  est.effect_inference(X_test).population_summary(alpha=0.1, value=0, decimals=3, tol=0.001)
+  ```
   ![image](notebooks/images/population_summary.png)
 
+  ```Python
+  #  Get the parameter inference summary for the final model
+  est.summary()
+  ```
   ![image](notebooks/images/summary.png)
 
   </details>
@@ -448,6 +474,10 @@ contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additio
 
 # References
 
+X Nie, S Wager.
+**Quasi-Oracle Estimation of Heterogeneous Treatment Effects.**
+[*Biometrika*](https://doi.org/10.1093/biomet/asaa076), 2020
+
 V. Syrgkanis, V. Lei, M. Oprescu, M. Hei, K. Battocchi, G. Lewis.
 **Machine Learning Estimation of Heterogeneous Treatment Effects with Instruments.**
 [*Proceedings of the 33rd Conference on Neural Information Processing Systems (NeurIPS)*](https://arxiv.org/abs/1905.10176), 2019
@@ -466,10 +496,18 @@ S. Künzel, J. Sekhon, J. Bickel and B. Yu.
 **Metalearners for estimating heterogeneous treatment effects using machine learning.**
 [*Proceedings of the national academy of sciences, 116(10), 4156-4165*](https://www.pnas.org/content/116/10/4156), 2019.
 
+S. Athey, J. Tibshirani, S. Wager.
+**Generalized random forests.**
+[*Annals of Statistics, 47, no. 2, 1148--1178*](https://projecteuclid.org/euclid.aos/1547197251), 2019.
+
 V. Chernozhukov, D. Nekipelov, V. Semenova, V. Syrgkanis.
 **Plug-in Regularized Estimation of High-Dimensional Parameters in Nonlinear Semiparametric Models.**
 [*Arxiv preprint arxiv:1806.04823*](https://arxiv.org/abs/1806.04823), 2018.
 
+S. Wager, S. Athey.
+**Estimation and Inference of Heterogeneous Treatment Effects using Random Forests.**
+[*Journal of the American Statistical Association, 113:523, 1228-1242*](https://www.tandfonline.com/doi/citedby/10.1080/01621459.2017.1319839), 2018.
+
 Jason Hartford, Greg Lewis, Kevin Leyton-Brown, and Matt Taddy. **Deep IV: A flexible approach for counterfactual prediction.** [*Proceedings of the 34th International Conference on Machine Learning, ICML'17*](http://proceedings.mlr.press/v70/hartford17a/hartford17a.pdf), 2017.
 
 V. Chernozhukov, D. Chetverikov, M. Demirer, E. Duflo, C. Hansen, and a. W. Newey. **Double Machine Learning for Treatment and Causal Parameters.** [*ArXiv preprint arXiv:1608.00060*](https://arxiv.org/abs/1608.00060), 2016.