Models for paper Policy Learning with Adaptively Collected Data.
Authors: Ruohan Zhan, Zhimei Ren, Susan Athey, Zhengyuan Zhou.
Table of contents
Overview •
Development Setup •
Quickstart
Note: For any questions, please file an issue.
Adaptive experimental designs can dramatically improve efficiency in randomized trials. But adaptivity also makes offline policy inference challenging. In the paper Off-Policy Evaluation via Adaptive Weighting with Data from Contextual Bandits, we propose a class of estimators that lead to asymptotically normal and consistent policy evaluation. This repo contains reproducible code for the results shown in the paper.
We organize the code into two directories:
-
./utils is a Python module for policy learning using generalized AIPW estimator, as described in the paper.
-
./scripts contains python scripts to run experiments and make plots shown in the paper, including:
- collecting contextual bandits data with a floored Thompson sampling agent;
- doing off-line policy learning using collected data;
- saving results and making plots.
R and Python are required. We recommend creating the following conda environment for computation.
conda create --name policy_learning python=3.8
conda activate policy_learning
source install.sh
Then in the R, make sure grf
, BH
, policytree
are installed successfully.
- To do policy learning and reproduce results shown in the paper, please follow the instructions in ./scripts/README.md.
- For a quick start on one simulation using synthetic data of sample size 1000 , use
source activate policy_learning
cd ./scripts/
python script_synthetic.py -s 1 -n test
Results will be saved in ./scripts/results/