A PyTorch implementation of the "robust" synthetic control model proposed by Amjad, Shah and Shen [arxiv].
The robust synthetic control is an extension of the original synthetic control model, proposed by Abadie, Diamond and Hainmueller (for overview see: https://www.aeaweb.org/articles?id=10.1257/jel.20191450). The synthetic control method is a causal inference method that uses a panel of data to estimate the effect of an intervention on one member of that panel. The synthetic control for a "treated" object is constructed as a weighted average of all other objects in the panel, with weights estimated controling for relevant covariates. The robust model is an extension of the original model that doesn't use covariates and instead fits the weights based on the data itself. It tries to find the weights that minimize the difference between the outcomes for the treated object and the synthetic control in the pretreatment period.
The main advantages of the robust model are its ability to handle missing data and its speed. The original model involves a non-convex optimization which can make parameter estimation take too long, especially when many synthetic controls need to be generated. The robust model, on the other hand, utilizes a ridge regression with a scalar regularization parameter, which makes it possible to use singular value decomposition for estimation.
The package currently implements most of the optimizations suggested by Amjad et al for their Algorithm 1 (the non-Bayesian variant), with the notable exception of forward-chaining for hyperparameter selection - the standard train-test split is used instead.
The package is in a very early pre-alpha for now. That is to mean that it works correctly until a commit stops it from doing so until it works again :)
No distributions are provided yet, so installing directly from the repo is the only option:
pip install git+https://github.com/agoryuno/robust_control
The main function is get_control()
. This takes a panel matrix, with objects in the rows and timestep observations in columns, and a row index of the object you want to estimate a control for
and returns the timeseries for the synthetic control, original data passed through a denoising filter and the weights of the synthetic control. See below for full details.
Given the matrix of values orig_mat
and the row index
treated_i
, computes synthetic controls for each combination
of eta
and mu
for the respective numbers of eta_n
and
mu_n
.
Returns a tensor of dimensions orig_mat.size()[1] by 1
that contains the synthetic control calculated with the best
found values of parameters treated_i
.
orig_mat
: np.ndarray
The data matrix
treated_i
: int
The index of the treated object (row)
eta_n
: int (optional)
The number of values of
mu_n
: Union[int, Literal[False]] (optional)
The number of values of
cuda
: bool (optional)
Whether to use CUDA - CUDA support for SVD in PyTorch is limited so this is
best left at default value of False
parts
: int (optional)
The number of partitions to use for training or False to not use partitions
(default is False)
preint
: bool (optional)
Number of pre-intervention periods to estimate the control on, if False - uses
all periods (default is False)
train
: float (optional)
The proportion of the data to use for training (default is 0.8)
Y1_c
: torch.Tensor
A tensor of shape (orig_mat.shape[1], 1) containing the synthetic control data for
the treated object
Y0_o
: torch.Tensor
A tensor of shape (orig_mat.shape[1], 1) containing the original
data
v
: torch.Tensor
A tensor of shape (orig_mat.shape[0]-1, 1) containing the weights of the synthetic control.
Keep in mind that these weights are calculated for "normalized" data, so applying them
to untransformed original data will yield incorrect results. See next for an example
of using the weights to calculate the synthetic control.
This is an example of using the v
matrix of weights returned by the get_control()
function. Assuming that your original data is in orig_mat
and the index of the treated
object is treated_i
:
from robust_control import get_control, bind_data_b, unbind_data
# Calculate the synthetic control
Y1_c, Y0_o, v = get_control(orig_mat, treated_i)
# Bind the data for untreated objects
dat, a, b = bind_data_b(torch.concat ((orig_mat[:treated_i], orig_mat[treated_i+1:])))
# Apply the weights to `dat`
control = dat.T @ v
# Unbind the data to bring it back to the original scale
control = unbind_data(control, a, b)
Unfortunately, PyTorch's implementation of SVD isn't fully parallelizable on CUDA, so running the estimation on the GPU may actually be slower than on the CPU (see this issue for details: pytorch/pytorch#41306). For that reason CUDA support is disabled by default but remains an option for the future.