-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skewed Data Distributions and Homoscedasticity #81
Comments
You don't have to take a log transform to make variables more normal. Non-Gaussianty itself does not necessarily violate homoscedasticity (constant variance). |
Hi, Thank you for your reply! What about scaling the data such that all variables are [0,1]? I've ran analyses both with scaling and not scaling finding significantly different DAGs. |
If you transform your data, the data-generating process will change. That would be the reason you get different results. |
I see so is the suggestion to not change the data at all (no minmax scaling, no log transforms) before running causal discovery? |
Well, my point is that it depends on the class of the data generation process you assume. |
Could you elaborate a bit on that? I'm mostly working with survey-type data where respondents answer various questions. |
Ok, well, my suggestion is that you can do log transforms if you find that previous works in your field do that, but it would be better not to do minmax scaling. |
Why is it better not to do minmax scaling? |
I don't have a strong reason. Just because I don't often see minmax scaling is used in the context of causal discovery. The point is that if you do some transformation and apply LiNGAM for example, it means that you are assuming a linear non-Gaussian model for the transformed data. It is necessary to think about the validity. |
Hi,
I'm wondering what's the best approach for data that is highly right-skewed. Is it best to take a log transform of it to make it more "normal" or does DirectLiNGAM deal with skewed data? The causal graphs are substantially different if I take the log and then normalise the data compared to only normalising the data and keeping the skewed distribution. I couldn't find the implementations of Hyvarinen & Smith 2013 for skewed data.
Also, my understanding is that LiNGAM is specifically made for non-Gaussian distributions, but I'm a bit confused about how this impacts the adjacency matrix computation using linear regression since from my understanding non-Gaussian distributions violate homoscedasticity.
Any clarity on these two topics would be greatly appreciated!
The text was updated successfully, but these errors were encountered: