-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Calculation of variance explained appears wrong #317
Comments
Thanks for bringing this up. When you say that the variance explained is highly correlated across components, do you mean that the variance explained appears to at least grow with the amount of variance that a component is purported to explain? |
I guess you could say that. The values differ by calculation method, but are correlated across methods. I have high confidence in one method, but it only works for PCA, so I think we need to either come up with a more accurate method to implement in |
We can calculate voxel-wise variance explained and then average it across voxels to get an estimate of component-wise variance explained. These values are very similar to the PCA-based variance explained values, and can be calculated for both PCA and ICA. Here's what the code could look like in tsoc_dm = tsoc - np.mean(tsoc, axis=-1, keepdims=True)
totvar = np.var(tsoc_dm, axis=1)
...
LGR.info('Fitting TE- and S0-dependent models to components')
for i_comp in range(n_components):
comp_pred_data = np.dot(mmix[:, i_comp:i_comp+1], tsoc_B[:, i_comp:i_comp+1].T).T
compvar = np.var(comp_pred_data, axis=1)
comptable.loc[i_comp, 'variance explained'] = np.mean(compvar / totvar) Otherwise, I can sort of see the logic behind using parameter estimates as a proxy for variance explained, since the IVs (mixing matrix) are all supposed to be z-scores, but I think it would only make sense if the data was also z-scores. I also don't know why we have separate "variance explained" and "normalized variance explained" values. |
Since "normalized variance explained" is only used within the PCA decision tree, and "variance explained" is only used within the ICA decision trees (both v2.5 and v3.2), I propose that we merge the two. As mentioned above, I don't know what the conceptual difference is between the two measures- they both sum to standardized values (100 or 1)- and they at least appear interchangeable. |
Okay, so it looks like squared parameter estimates do match up to variance explained, but only if both the DVs and the IVs have unit variance (which I believe means that the parameter estimates are actually beta values). If we z-score the mixing matrix and the optimally combined data within |
Do we have a PR open for this ? |
No PR. I don't know enough about variance explained in ICA to be sure about my conclusions. I was hoping that others could weigh in on it before trying to change it in tedana. |
I'd say we could roll this into #84 . Maybe once we document what we're doing it will be clearer if we need to change it ? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions to tedana:tada: ! |
Summary
Variance explained and normalized variance explained are calculated within dependence_metrics for PCA and ICA components. However, variance explained is also directly returned by the PCA-fitting method, and the varex values from the original method are different from the ones calculated by our function.
Additional Detail
Here is where we calculate variance explained and normalized variance explained in
dependence_metrics
:tedana/tedana/model/fit.py
Lines 152 to 154 in 65f89e1
The values are not very similar (for example, for PCA the "real" varex might be 37.8%, while the estimated varex is 57.6% and the estimated normalized varex is 53.8%), but they are highly correlated across components.
The text was updated successfully, but these errors were encountered: