Skip to content

Commit

Permalink
Merge pull request #66 from dscolby/development
Browse files Browse the repository at this point in the history
Development
  • Loading branch information
dscolby authored Jul 5, 2024
2 parents 7baf9a8 + 2df0032 commit 0e490c5
Show file tree
Hide file tree
Showing 28 changed files with 1,060 additions and 1,761 deletions.
8 changes: 4 additions & 4 deletions Project.toml
Original file line number Diff line number Diff line change
@@ -1,20 +1,20 @@
name = "CausalELM"
uuid = "26abab4e-b12e-45db-9809-c199ca6ddca8"
authors = ["Darren Colby <dscolby17@gmail.com> and contributors"]
version = "0.6"
version = "0.7.0"

[deps]
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
Random = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"

[compat]
LinearAlgebra = "1.7"
Random = "1.7"
julia = "1.7"
Aqua = "0.8"
DataFrames = "1.5"
Documenter = "1.2"
LinearAlgebra = "1.7"
Random = "1.7"
Test = "1.7"
julia = "1.7"

[extras]
Aqua = "4c88cf16-eb10-579e-8560-4a9242c79595"
Expand Down
34 changes: 18 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,11 +41,11 @@ series analysis, G-computation, and double machine learning; average treatment e
treated (ATT) with G-computation; cumulative treatment effect with interrupted time series
analysis; and the conditional average treatment effect (CATE) via S-learning, T-learning,
X-learning, R-learning, and doubly robust estimation. Underlying all of these estimators are
extreme learning machines, a simple neural network that uses randomized weights instead of
using gradient descent. Once a model has been estimated, CausalELM can summarize the model,
including computing p-values via randomization inference, and conduct sensitivity analysis
to calidate the plausibility of modeling assumptions. Furthermore, all of this can be done
in four lines of code.
ensembles of extreme learning machines, a simple neural network that uses randomized weights
and least squares optimization instead of gradient descent. Once a model has been estimated,
CausalELM can summarize the model and conduct sensitivity analysis to validate the
plausibility of modeling assumptions. Furthermore, all of this can be done in four lines of
code.
</p>

<h2>Extreme Learning Machines and Causal Inference</h2>
Expand Down Expand Up @@ -73,37 +73,39 @@ to adjust the initial estimates. This approach has three advantages. First, it i
efficient with high dimensional data than conventional methods. Metalearners take a similar
approach to estimate the CATE. While all of these models are different, they have one thing
in common: how well they perform depends on the underlying model they fit to the data. To
that end, CausalELMs use extreme learning machines because they are simple yet flexible
enough to be universal function approximators.
that end, CausalELMs use bagged ensembles of extreme learning machines because they are
simple yet flexible enough to be universal function approximators with lower varaince than
single extreme learning machines.
</p>

<h2>CausalELM Features</h2>
<ul>
<li>Estimate a causal effect, get a summary, and validate assumptions in just four lines of code</li>
<li>All models automatically select the best number of neurons and L2 penalty</li>
<li>Bagging improves performance and reduces variance without the need to tune a regularization parameter</li>
<li>Enables using the same structs for regression and classification</li>
<li>Includes 13 activation functions and allows user-defined activation functions</li>
<li>Most inference and validation tests do not assume functional or distributional forms</li>
<li>Implements the latest techniques form statistics, econometrics, and biostatistics</li>
<li>Works out of the box with DataFrames or arrays</li>
<li>Works out of the box with arrays or any data structure that implements the Tables.jl interface</li>
<li>Codebase is high-quality, well tested, and regularly updated</li>
</ul>

<h2>What's New?</h2>
<ul>
<li>Now includes doubly robust estimator for CATE estimation</li>
<li>Uses generalized cross validation with successive halving to find the best ridge penalty</li>
<li>Double machine learning, R-learning, and doubly robust estimators suppot specifying confounders and covariates of interest separately</li>
<li>Counterfactual consistency validation simulates outcomes that violate the assumption rather than the previous binning approach</li>
<li>Standardized and improved docstrings and added doctests</li>
<li>All estimators now implement bagging to reduce predictive performance and reduce variance</li>
<li>Counterfactual consistency validation simulates more realistic violations of the counterfactual consistency assumption</li>
<li>Uses a simple heuristic to choose the number of neurons, which reduces training time and still works well in practice</li>
<li>Probability clipping for classifier predictions and residuals is no longer necessary due to the bagging procedure</li>
<li>CausalELM talk has been accepted to JuliaCon 2024!</li>
</ul>

<h2>What's Next?</h2>
<p>
Newer versions of CausalELM will hopefully support using GPUs and provide textual
interpretations of the results of calling validate on a model that has been estimated.
However, these priorities could also change depending on feedback recieved at JuliaCon.
Newer versions of CausalELM will hopefully support using GPUs and provide interpretations of
the results of calling validate on a model that has been estimated. In addition, some
estimators will also support using instrumental variables. However, these priorities could
also change depending on feedback recieved at JuliaCon.
</p>

<h2>Disclaimer</h2>
Expand Down
27 changes: 6 additions & 21 deletions docs/src/api.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# CausalELM
Most of the methods and structs here are private, not exported, should not be called by the
user, and are documented for the purpose of developing CausalELM or to facilitate
understanding of the implementation.
```@docs
CausalELM.CausalELM
```

## Types
```@docs
Expand All @@ -15,9 +15,8 @@ RLearner
DoublyRobustLearner
CausalELM.CausalEstimator
CausalELM.Metalearner
CausalELM.ExtremeLearningMachine
CausalELM.ExtremeLearner
CausalELM.RegularizedExtremeLearner
CausalELM.ELMEnsemble
CausalELM.Nonbinary
CausalELM.Binary
CausalELM.Count
Expand All @@ -41,28 +40,15 @@ elish
fourier
```

## Cross Validation
```@docs
CausalELM.generate_folds
CausalELM.generate_temporal_folds
CausalELM.validation_loss
CausalELM.cross_validate
CausalELM.best_size
CausalELM.shuffle_data
```

## Average Causal Effect Estimators
```@docs
CausalELM.g_formula!
CausalELM.causal_loss!
CausalELM.predict_residuals
CausalELM.make_folds
CausalELM.moving_average
```

## Metalearners
```@docs
CausalELM.causal_loss
CausalELM.doubly_robust_formula!
CausalELM.stage1!
CausalELM.stage2!
Expand Down Expand Up @@ -94,7 +80,6 @@ CausalELM.e_value
CausalELM.binarize
CausalELM.risk_ratio
CausalELM.positivity
CausalELM.var_type
```

## Validation Metrics
Expand All @@ -114,17 +99,17 @@ CausalELM.fit!
CausalELM.predict
CausalELM.predict_counterfactual!
CausalELM.placebo_test
CausalELM.ridge_constant
CausalELM.set_weights_biases
```

## Utility Functions
```@docs
CausalELM.var_type
CausalELM.mean
CausalELM.var
CausalELM.one_hot_encode
CausalELM.clip_if_binary
CausalELM.@model_config
CausalELM.@standard_input_data
CausalELM.@double_learner_input_data
CausalELM.generate_folds
```
10 changes: 5 additions & 5 deletions docs/src/contributing.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,15 +27,15 @@ code follows the guidelines below.

* Most new structs for estimating causal effects should have mostly the same fields. To
reduce the burden of repeatedly defining all these fields, it is advisable to use the
model_config, standard_input_data, and double_learner_input_data macros to
programmatically generate fields for new structs. Doing so will ensure that with little
to no effort the new structs will work with the summarize and validate methods.
model_config and standard_input_data macros to programmatically generate fields for new
structs. Doing so will ensure that with little to no effort the new structs will work
with the summarize and validate methods.

* There are no repeated code blocks. If there are repeated codeblocks, then they should be
consolidated into a separate function.

* Methods should generally include types and be type stable. If there is a strong reason
to deviate from this point, there should be a comment in the code explaining why.
* Interanl methods can contain types and be parametric but public methods should be as
general as possible.

* Minimize use of new constants and macros. If they must be included, the reason for their
inclusion should be obvious or included in the docstring.
Expand Down
84 changes: 31 additions & 53 deletions docs/src/guide/doublemachinelearning.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,8 @@ estimating causal effects when the dimensionality of the covariates is too high
regression or the treatment or outcomes cannot be easily modeled parametrically. Double
machine learning estimates models of the treatment assignment and outcome and then combines
them in a final model. This is a semiparametric model in the sense that the first stage
models can take on any functional form but the final stage model is linear.

!!! note
If regularized is set to true then the ridge penalty will be estimated using generalized
cross validation where the maximum number of iterations is 2 * folds for the successive
halving procedure. However, if the penalty in on iteration is approximately the same as in
the previous penalty, then the procedure will stop early.
models can take on any functional form but the final stage model is a linear combination of
the residuals from the first stage models.

!!! note
For more information see:
Expand All @@ -19,70 +14,53 @@ models can take on any functional form but the final stage model is linear.
Whitney Newey, and James Robins. "Double/debiased machine learning for treatment and
structural parameters." (2018): C1-C68.


## Step 1: Initialize a Model
The DoubleMachineLearning constructor takes at least three arguments, an array of
covariates, a treatment vector, and an outcome vector. This estimator supports binary, count,
or continuous treatments and binary, count, continuous, or time to event outcomes. You can
also specify confounders that you do not want to estimate the CATE for by passing a parameter
to the W argument. Otherwise, the model assumes all possible confounders are contained in X.
The DoubleMachineLearning constructor takes at least three arguments—covariates, a
treatment statuses, and outcomes, all of which may be either an array or any struct that
implements the Tables.jl interface (e.g. DataFrames). This estimator supports binary, count,
or continuous treatments and binary, count, continuous, or time to event outcomes.

!!! note
Internally, the outcome and treatment models are treated as a regression since extreme
learning machines minimize the MSE. This means that predicted treatments and outcomes
under treatment and control groups could fall outside [0, 1], although this is not likely
in practice. To deal with this, predicted binary variables are automatically clipped to
[0.0000001, 0.9999999]. This also means that count outcomes will be predicted as continuous
variables.
Non-binary categorical outcomes are treated as continuous.

!!! tip
You can also specify the following options: whether the treatment vector is categorical ie
not continuous and containing more than two classes, whether to use L2 regularization, the
activation function, the validation metric to use when searching for the best number of
neurons, the minimum and maximum number of neurons to consider, the number of folds to use
for cross validation, the number of iterations to perform cross validation, and the number
of neurons to use in the ELM used to learn the function from number of neurons to validation
loss. These arguments are specified with the following keyword arguments: t\_cat,
regularized, activation, validation\_metric, min\_neurons, max\_neurons, folds, iterations,
and approximator\_neurons.
You can also specify the the number of folds to use for cross-fitting, the number of
extreme learning machines to incorporate in the ensemble, the number of features to
consider for each extreme learning machine, the activation function to use, the number
of observations to bootstrap in each extreme learning machine, and the number of neurons
in each extreme learning machine. These arguments are specified with the folds,
num_machines, num_features, activation, sample_size, and num\_neurons keywords.

```julia
# Create some data with a binary treatment
X, T, Y, W = rand(100, 5), [rand()<0.4 for i in 1:100], rand(100), rand(100, 4)

# We could also use DataFrames
# We could also use DataFrames or any other package implementing the Tables.jl API
# using DataFrames
# X = DataFrame(x1=rand(100), x2=rand(100), x3=rand(100), x4=rand(100), x5=rand(100))
# T, Y = DataFrame(t=[rand()<0.4 for i in 1:100]), DataFrame(y=rand(100))
# W = DataFrame(w1=rand(100), w2=rand(100), w3=rand(100), w4=rand(100))

# W is optional and means there are confounders that you are not interested in estimating
# the CATE for
dml = DoubleMachineLearning(X, T, Y, W=W)
dml = DoubleMachineLearning(X, T, Y)
```

## Step 2: Estimate the Causal Effect
To estimate the causal effect, we call estimatecausaleffect! on the model above.
To estimate the causal effect, we call estimate_causal_effect! on the model above.
```julia
# we could also estimate the ATT by passing quantity_of_interest="ATT"
estimate_causal_effect!(dml)
```

# Get a Summary
We can get a summary that includes a p-value and standard error estimated via asymptotic
randomization inference by passing our model to the summarize method.

Calling the summarize method returns a dictionary with the estimator's task (regression or
classification), the quantity of interest being estimated (ATE), whether the model uses an
L2 penalty (always true for DML), the activation function used in the model's outcome
predictors, whether the data is temporal (always false for DML), the validation metric used
for cross validation to find the best number of neurons, the number of neurons used in the
ELMs used by the estimator, the number of neurons used in the ELM used to learn a mapping
from number of neurons to validation loss during cross validation, the causal effect,
standard error, and p-value.
We can get a summary of the model by pasing the model to the summarize method.

!!!note
To calculate the p-value and standard error for the treatmetn effect, you can set the
inference argument to false. However, p-values and standard errors are calculated via
randomization inference, which will take a long time. But can be sped up by launching
Julia with a higher number of threads.

```julia
# Can also use the British spelling
# summarise(dml)

summarize(dml)
```

Expand All @@ -94,12 +72,12 @@ tests do not provide definitive evidence of a violation of these assumptions. To
counterfactual consistency assumption, we simulate counterfactual outcomes that are
different from the observed outcomes, estimate models with the simulated counterfactual
outcomes, and take the averages. If the outcome is continuous, the noise for the simulated
counterfactuals is drawn from N(0, dev) for each element in devs, otherwise the default is
0.25, 0.5, 0.75, and 1.0 standard deviations from the mean outcome. For discrete variables,
each outcome is replaced with a different value in the range of outcomes with probability ϵ
for each ϵ in devs, otherwise the default is 0.025, 0.05, 0.075, 0.1. If the average
estimate for a given level of violation differs greatly from the effect estimated on the
actual data, then the model is very sensitive to violations of the counterfactual
counterfactuals is drawn from N(0, dev) for each element in devs and each outcome,
multiplied by the original outcome, and added to the original outcome. For discrete
variables, each outcome is replaced with a different value in the range of outcomes with
probability ϵ for each ϵ in devs, otherwise the default is 0.025, 0.05, 0.075, 0.1. If the
average estimate for a given level of violation differs greatly from the effect estimated on
the actual data, then the model is very sensitive to violations of the counterfactual
consistency assumption for that level of violation. Next, this method tests the model's
sensitivity to a violation of the exchangeability assumption by calculating the E-value,
which is the minimum strength of association, on the risk ratio scale, that an unobserved
Expand Down
20 changes: 9 additions & 11 deletions docs/src/guide/estimatorselection.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,13 @@ given dataset and causal question.

| Model | Struct | Causal Estimands | Supported Treatment Types | Supported Outcome Types |
|----------------------------------|-----------------------|----------------------------------|---------------------------|------------------------------------------|
| Interrupted Time Series Analysis | InterruptedTimeSeries | ATE, Cumulative Treatment Effect | Binary | Continuous, Count[^2], Time to Event |
| G-computation | GComputation | ATE, ATT, ITT | Binary | Binary[^1],Continuous, Time to Event, Count[^2] |
| Double Machine Learning | DoubleMachineLearning | ATE | Binary[^1], Count[^2], Continuous | Binary[^1], Count[^2], Continuous, Time to Event |
| S-learning | SLearner | CATE | Binary | Binary[^1], Continuous, Time to Event, Count[^2] |
| T-learning | TLearner | CATE | Binary | Binary[^1], Continuous, Count[^2], Time to Event |
| X-learning | XLearner | CATE | Binary[^1] | Binary[^1], Continuous, Count[^2], Time to Event |
| R-learning | RLearner | CATE | Binary[^1], Count[^2], Continuous | Binary[^1], Count[^2], Continuous, Time to Event |
| Doubly Robust Estimation | DoublyRobustLearner | CATE | Binary | Binary[^1], Continuous, Count[^2], Time to Event |
| Interrupted Time Series Analysis | InterruptedTimeSeries | ATE, Cumulative Treatment Effect | Binary | Continuous, Count[^1], Time to Event |
| G-computation | GComputation | ATE, ATT, ITT | Binary | Binary,Continuous, Time to Event, Count[^1] |
| Double Machine Learning | DoubleMachineLearning | ATE | Binary, Count[^1], Continuous | Binary, Count[^1], Continuous, Time to Event |
| S-learning | SLearner | CATE | Binary | Binary, Continuous, Time to Event, Count[^1] |
| T-learning | TLearner | CATE | Binary | Binary, Continuous, Count[^1], Time to Event |
| X-learning | XLearner | CATE | Binary | Binary, Continuous, Count[^1], Time to Event |
| R-learning | RLearner | CATE | Binary, Count[^1], Continuous | Binary, Count[^1], Continuous, Time to Event |
| Doubly Robust Estimation | DoublyRobustLearner | CATE | Binary | Binary, Continuous, Count[^1], Time to Event |

[^1]: Models that use propensity scores or predict binary treatment assignment may, on very rare occasions, return values outside of [0, 1]. In that case, values are clipped to be between 0.0000001 and 0.9999999.

[^2]: Similar to other packages, predictions of count variables is treated as a continuous regression task.
[^1]: Similar to other packages, predictions of count variables is treated as a continuous regression task.
Loading

0 comments on commit 0e490c5

Please sign in to comment.