-
Notifications
You must be signed in to change notification settings - Fork 940
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeWarning: divide by zero encountered in divide when using evaluate_causal_model #1213
Comments
Hi, does your data have columns with only a constant? |
Hi @bloebp thank you for replying to my post. No, it does not have any column with constant value. Please find some more information regarding my data below: <class 'pandas.core.frame.DataFrame'> Column Non-Null Count Dtype0 Date 29 non-null dbdate and below are count of unique values per column Date 29 |
Ok interesting, is there any chance you can provide some artificially generated data that reproduces this issue? I can take a closer look then. |
I confirm this issue is still present in the latest release. I managed to resolve the issue locally by setting assume_unique in gcm/divergence.py on line 64 to False. According to numpy docs: "If True, the input arrays are both assumed to be unique, which can speed up the calculation. Default is False." |
Thanks for checking on this! I am not sure if there was a particular reason why this was set to Can you run the unit tests and, if they pass, do you want to open a PR to change it? |
@bloebp I don't have permissions, so feel free to do it. |
Describe the bug
My data has all the numeric columns and does not have any null, zero or infinite values. It also does not have any duplicate values but still i keep getting this error
"Evaluating causal mechanisms...: 50%|█████ | 10/20 [00:06<00:06, 1.55it/s]/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/dowhy/gcm/divergence.py:84: RuntimeWarning: divide by zero encountered in divide
result = np.sum((d / n) * np.log(nu / rho)) + np.log(m / (n - 1))
/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/dowhy/gcm/divergence.py:84: RuntimeWarning: divide by zero encountered in divide
result = np.sum((d / n) * np.log(nu / rho)) + np.log(m / (n - 1))
Evaluating causal mechanisms...: 100%|██████████| 20/20 [00:17<00:00, 1.16it/s]
/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/dowhy/gcm/divergence.py:84: RuntimeWarning: divide by zero encountered in divide
result = np.sum((d / n) * np.log(nu / rho)) + np.log(m / (n - 1))"
and also this error
""name": "RuntimeError",
"message": "Got a non-finite KL divergence! This can happen if both data sets have overlapping elements. Since these are normally removed by this method, double check whether the arrays are numeric.",
Versions/3.10/lib/python3.10/concurrent/futures/_base.py:403\u001b[0m, in \u001b[0;36mFuture.__get_result\u001b[0;34m(self)\u001b[0m\n\u001b[1;32m 401\u001b[0m \u001b[39mif\u001b[39;00m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_exception:\n\u001b[1;32m 402\u001b[0m \u001b[39mtry\u001b[39;00m:\n\u001b[0;32m--> 403\u001b[0m \u001b[39mraise\u001b[39;00m \u001b[39mself\u001b[39m\u001b[39m.\u001b[39m_exception\n\u001b[1;32m 404\u001b[0m \u001b[39mfinally\u001b[39;00m:\n\u001b[1;32m 405\u001b[0m \u001b[39m# Break a reference cycle with the exception in self._exception\u001b[39;00m\n\u001b[1;32m 406\u001b[0m \u001b[39mself\u001b[39m \u001b[39m=\u001b[39m \u001b[39mNone\u001b[39;00m\n\n\u001b[0;31mRuntimeError\u001b[0m: Got a non-finite KL divergence! This can happen if both data sets have overlapping elements. Since these are normally removed by this method, double check whether the arrays are numeric.""
Steps to reproduce the behavior
This can also include a verbatim copy of outputs, or screenshots.
Expected behavior
A clear and concise description of what you expected to happen.
Version information:
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: