Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NonParamDMLIV sometimes ZeroDivisionError #837

Open
fverac opened this issue Jan 2, 2024 · 3 comments
Open

NonParamDMLIV sometimes ZeroDivisionError #837

fverac opened this issue Jan 2, 2024 · 3 comments
Assignees

Comments

@fverac
Copy link
Collaborator

fverac commented Jan 2, 2024

Seems that one can run into ZeroDivisionErrors when using NonParamDMLIV. Reproduction code below. Note, the error is not consistent. You may have to run it multiple times before getting the ZeroDivisionError. 

From briefly looking into it, it seems the T residuals are all 0's when the error occurs.

econml version 0.15.0b1. Haven't tried other versions.

Let me know if I'm missing something!

from econml.iv.dml import NonParamDMLIV
import numpy as np
from sklearn.linear_model import LinearRegression

n = 100
d_x = 3

Y = np.random.normal(size=(n,))
T = np.random.normal(size=(n,))
X = np.random.normal(size=(n, d_x))
Z = np.random.normal(size=(n,))

est = NonParamDMLIV(discrete_instrument=False, discrete_treatment=False, model_final=LinearRegression())

est.fit(Y, T, Z=Z, X=X)
@kbattocchi
Copy link
Collaborator

This is an interesting failure mode - if the estimates for E[T|Z,X,W] are always identical to E[T|X,W] then since the final model weights the rows by the estimated variance (E[T|Z,X,W]-E[T|X,W])^2, all the weights are zero which leads to this problem.

In this particular case, Lasso is regularizing all weights to 0 so the estimators always (correctly) predict E[T] = 0 regardless of whether we condition on Z,X,W or just on X,W.

Hopefully with real world data this is less likely to occur, but we could at least throw a more meaningful error message if we do run into this scenario. But I think it is a real error condition in that we depend on the instrument affecting treatment for identification, so I don't think ignoring it and producing an estimate (say, by using all 1s for the weights if they turn out to all be 0s) would be appropriate.

@ronikobrosly
Copy link

Hi @fverac , I'm looking into implementing this error message. I can reproduce this error with the following random seed:

from econml.iv.dml import NonParamDMLIV
import numpy as np
from sklearn.linear_model import LinearRegression

np.random.seed(784)

n = 100
d_x = 3

Y = np.random.normal(size=(n,))
T = np.random.normal(size=(n,))
X = np.random.normal(size=(n, d_x))
Z = np.random.normal(size=(n,))

est = NonParamDMLIV(discrete_instrument=False, discrete_treatment=False, model_final=LinearRegression())

est.fit(Y, T, Z=Z, X=X)

@fverac
Copy link
Collaborator Author

fverac commented Aug 27, 2024

Hi @fverac , I'm looking into implementing this error message. I can reproduce this error with the following random seed:

from econml.iv.dml import NonParamDMLIV
import numpy as np
from sklearn.linear_model import LinearRegression

np.random.seed(784)

n = 100
d_x = 3

Y = np.random.normal(size=(n,))
T = np.random.normal(size=(n,))
X = np.random.normal(size=(n, d_x))
Z = np.random.normal(size=(n,))

est = NonParamDMLIV(discrete_instrument=False, discrete_treatment=False, model_final=LinearRegression())

est.fit(Y, T, Z=Z, X=X)

Great! If the cause of failure aligns with kbattocchi's explanation above, feel free to add a PR with an appropriate error message.

ronikobrosly added a commit to ronikobrosly/EconML that referenced this issue Sep 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants