Calculation of variance explained appears wrong #317

tsalo · 2019-05-30T18:36:47Z

Summary

Variance explained and normalized variance explained are calculated within dependence_metrics for PCA and ICA components. However, variance explained is also directly returned by the PCA-fitting method, and the varex values from the original method are different from the ones calculated by our function.

Additional Detail

Here is where we calculate variance explained and normalized variance explained in dependence_metrics:

tedana/tedana/model/fit.py

Lines 152 to 154 in 65f89e1

    
           varex[i_comp] = (tsoc_B[:, i_comp]**2).sum() / totvar * 100. 
        
           varex_norm[i_comp] = (utils.unmask(WTS, mask)[t2s != 0][:, i_comp]**2).sum() /\ 
        
               totvar_norm * 100.

The values are not very similar (for example, for PCA the "real" varex might be 37.8%, while the estimated varex is 57.6% and the estimated normalized varex is 53.8%), but they are highly correlated across components.

The text was updated successfully, but these errors were encountered:

jbteves · 2019-06-01T02:42:27Z

Thanks for bringing this up. When you say that the variance explained is highly correlated across components, do you mean that the variance explained appears to at least grow with the amount of variance that a component is purported to explain?

tsalo · 2019-06-01T19:42:09Z

I guess you could say that. The values differ by calculation method, but are correlated across methods. I have high confidence in one method, but it only works for PCA, so I think we need to either come up with a more accurate method to implement in dependence_metrics or to figure out how to calculate variance explained for ICA specifically (i.e., in tedica) so we can use the "official" version for both decompositions.

tsalo · 2019-07-18T13:06:57Z

We can calculate voxel-wise variance explained and then average it across voxels to get an estimate of component-wise variance explained. These values are very similar to the PCA-based variance explained values, and can be calculated for both PCA and ICA.

Here's what the code could look like in dependence_metrics:

tsoc_dm = tsoc - np.mean(tsoc, axis=-1, keepdims=True)
totvar = np.var(tsoc_dm, axis=1)
...
LGR.info('Fitting TE- and S0-dependent models to components')
for i_comp in range(n_components):
    comp_pred_data = np.dot(mmix[:, i_comp:i_comp+1], tsoc_B[:, i_comp:i_comp+1].T).T
    compvar = np.var(comp_pred_data, axis=1)
    comptable.loc[i_comp, 'variance explained'] = np.mean(compvar / totvar)

Otherwise, I can sort of see the logic behind using parameter estimates as a proxy for variance explained, since the IVs (mixing matrix) are all supposed to be z-scores, but I think it would only make sense if the data was also z-scores. I also don't know why we have separate "variance explained" and "normalized variance explained" values.

tsalo · 2019-07-18T13:22:39Z

Since "normalized variance explained" is only used within the PCA decision tree, and "variance explained" is only used within the ICA decision trees (both v2.5 and v3.2), I propose that we merge the two. As mentioned above, I don't know what the conceptual difference is between the two measures- they both sum to standardized values (100 or 1)- and they at least appear interchangeable.

tsalo · 2019-07-20T22:48:17Z

Okay, so it looks like squared parameter estimates do match up to variance explained, but only if both the DVs and the IVs have unit variance (which I believe means that the parameter estimates are actually beta values). If we z-score the mixing matrix and the optimally combined data within kundu_fit, and then square the resulting betas, we'll have voxel-wise estimates of variance explained. We can then just average them across voxels to get a variance explained value per component. I think this is a bit simpler than the method I described above.

emdupre · 2019-11-08T16:27:35Z

Do we have a PR open for this ?

tsalo · 2019-11-08T16:31:11Z

No PR. I don't know enough about variance explained in ICA to be sure about my conclusions. I was hoping that others could weigh in on it before trying to change it in tedana.

emdupre · 2019-11-08T16:34:48Z

I'd say we could roll this into #84 . Maybe once we document what we're doing it will be clearer if we need to change it ?

stale · 2020-02-06T16:46:42Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions to tedana:tada: !

tsalo added the discussion issues that still need to be discussed label May 30, 2019

tsalo changed the title ~~Calculation of variance explained within dependence_metrics appears wrong~~ Calculation of variance explained appears wrong May 30, 2019

tsalo mentioned this issue May 30, 2019

Concerns regarding TE-(in)dependence metric calculation #223

Closed

tsalo added the TE-dependence issues related to TE dependence metrics and component selection label Oct 4, 2019

tsalo mentioned this issue Nov 10, 2019

[REF] Modularize metric calculation #436

Closed

stale bot added the stale label Feb 6, 2020

stale bot closed this as completed Feb 13, 2020

handwerkerd mentioned this issue Mar 13, 2020

Topics for March 2020 Developers’ call: Pandemic Edition #550

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Calculation of variance explained appears wrong #317

Calculation of variance explained appears wrong #317

tsalo commented May 30, 2019

jbteves commented Jun 1, 2019

tsalo commented Jun 1, 2019

tsalo commented Jul 18, 2019

tsalo commented Jul 18, 2019

tsalo commented Jul 20, 2019

emdupre commented Nov 8, 2019

tsalo commented Nov 8, 2019

emdupre commented Nov 8, 2019

stale bot commented Feb 6, 2020

Calculation of variance explained appears wrong #317

Calculation of variance explained appears wrong #317

Comments

tsalo commented May 30, 2019

Summary

Additional Detail

jbteves commented Jun 1, 2019

tsalo commented Jun 1, 2019

tsalo commented Jul 18, 2019

tsalo commented Jul 18, 2019

tsalo commented Jul 20, 2019

emdupre commented Nov 8, 2019

tsalo commented Nov 8, 2019

emdupre commented Nov 8, 2019

stale bot commented Feb 6, 2020