Concerns regarding TE-(in)dependence metric calculation #223

tsalo · 2019-02-19T18:19:19Z

Summary

I would like to better document and understand how model metrics are computed. I have some concerns about some of the steps- especially how WTS is calculated and how regressions with get_coeffs and computefeats2 are performed.

Additional Detail

Compute optimal combination from T2* and echo data.
Compute un-normalized weights for each component for optimally combined data (WTS) with computefeats2 and normalized mixing matrix (when provided, otherwise regular mixing matrix).
1. Use signs of skew values for weights (per component) to flip components and weights so that component time series positively align with the data.
  - PROBLEM: The global signal trend for the 5-echo data is consistently being rendered as going down in the ICA step (although it goes up, as expected, in the PCA step). Nothing I've tried, including using betas, z-statistics, mean, and skew, has fixed this, but it makes me suspicious of the sign determination method.
    - NOTE: This can be discussed in Concerns regarding optimal sign determination for components #316.
2. Estimate normalized total variance explained per component from weights.
- PROBLEM: While the comments say "un-normalized weights", the function computefeats2 does not return un-normalized weights. It computes the weights using the data after it has been z-scored over time. It then caps the weights at +/- 0.999 and z-transforms them using Fisher's transform as if they were correlation coefficients.
  - SOLUTION: If we're interested in using z-values for each of the components to the optimally combined data, we can run a regression and compute those z-scores (as in this post).
  - NOTE: This can be discussed in computefeats2 does not calculate z-statistics accurately #178.
- PROBLEM: I'm pretty sure that the normalized mixing matrix, which is only used with PCA, is actually z-scored along the wrong axis. FIXED WITH [FIX] Normalize PCA mixing matrix over time, not component #228
  - SOLUTION: We can just normalize each component's time series over time (instead of each volume's values over components).
Compute un-normalized weights for each component for demeaned optimally combined data (tsoc_B) with get_coeffs and unnormalized mixing matrix.
1. Convert weights to percent signal change units.
2. Estimate total variance explained per component from weights.
- PROBLEM: Demeaning data should only impact weights if mixing matrix is not demeaned, which it should be, but sometimes isn't.
  - SOLUTION: Mean-center the mixing matrix/X within get_coeffs and drop the add_const argument (per get_coeffs run on mean-centered data for no reason #179).
Compute un-normalized weights for each component for each echo with get_coeffs.
Compute voxel-wise means for each echo.
Compute alpha map for each component: Squared echo coefficients from 4, summed over time.
- QUESTION: What is alpha?
Compute component's variance explained: Sum of squared un-normalized weights from 3 (tsoc_B) divided by total variance from 3b.
- PROBLEM: The variance explained measures calculated within fitmodels_direct do not seem to align well with the variance explained values we get directly from PCA, which makes me doubt fitmodels_direct.
  - NOTE: The values may not be very similar (for example, for PCA the "real" varex might be 37.8%, while the estimated varex is 57.6% and the estimated normalized varex is 53.8%), but the values are highly correlated across components.
  - NOTE: This can be discussed in Calculation of variance explained appears wrong #317.
Compute component's normalized variance explained: Sum of squared un-normalized weights from 2 (WTS) divided by total variance from 2b.
Compute model-wise pseudo-F-statistics for each component:
1. Set up design matrices (X) for S0 and T2* models.
  - S0 model: Echo-specific weights follow echo-specific averages
  - T2* model: Weight peaks at T2*
2. Compute model coefficient maps:
  - Sum of design matrix times component echo weights from 4 over time divided by design matrix squared summed over time.
3. Compute predicted values for models:
  - Design matrix times model coefficient maps.
4. Compute sum of squared error maps for models:
  - Echo coefficients minus predicted values, squared and summed over time.
5. Compute pseudo-F-statistics:
  - Alpha minus SSE, multiplied by number of echoes minus one, all divided by SSE.
6. Cap F-statistics at 500.
- PROBLEM: Shouldn't these F-statistics be calculated only for voxels with positive betas? Signs of component time series should be flipped to match the actual data, and I don't think anticorrelations are meaningful in this context. I could definitely be wrong though.
Compute averaging weight map for each component:
1. Z-score component's un-normalized weights from 2 (WTS) over voxels.
  - PROBLEM: These values are already z-scored within computefeats2. Why would we do this again?
2. Cap z-scores at +/- 8.
3. Square z-score maps.
  - PROBLEM: By squaring the maps, we're treating negative weights as if they're equally as valid as positive weights. If the sign determination in 2.1 is accurate, then only positive weights should be used.
    - SOLUTION: Limit weighted averaging to positive values. Set negative values in this map to zero.
    - NOTE: This can be discussed in How should negative component weights be treated? #318.
Calculate Kappa and Rho metrics by weighted average of pseudo-F-statistics with weight map.
If full metric calculation (full_sel) is desired, calculate additional metrics.
Store several maps for each component in dictionary:
1. Z-score maps (prior to squaring) originally used for weighted averaging
2. Percent signal change of optimally combined data maps
3. Pseudo-F-statistic maps
Perform simple cluster-extent thresholding on pseudo-F-statistic maps using cluster-defining F-value threshold from getfbounds function and cluster size threshold based on number of voxels in mask.
1. Store binarized maps of significant voxels in F_R2_clmaps and F_S0_clmaps arrays.
2. Store number of voxels in significant clusters in countsigFR2 and countsigFS0 arrays.
Perform simple cluster-extent thresholding on Z-score maps using cluster-defining threshold of 1.95 (p < 0.05, two-sided) and the same cluster-extent threshold as everything else.
- NOTE: Cluster-extent thresholding is bidirectional, but mixes adjacent positive and negative values to define clusters. I think we should make it one-sided, based on my concerns regarding the signs of component weights.
Perform simple cluster-extent thresholding on rank-transformed absolute-valued un-normalized weight maps from 3 (tsoc_B) using a cluster-defining threshold of the number of voxels in the mask minus the number of significant voxels from the countsigFR2/countsigFS0 variables and the same cluster-extent threshold as is applied to the F_R2_clmaps/F_S0_clmaps arrays.
- PROBLEM: This should initially result in the same number of significant voxels in Br_R2_clmaps/Br_S0_clmaps (these arrays) as there are in the corresponding component maps from F_R2_clmaps/F_S0_clmaps, but then after cluster-extent thresholding they would have fewer. Does the difference in number of significant voxels cause any problems?
- NOTE: Br_R2_clmaps and Br_S0_clmaps should differ in terms of the number of voxels that survive thresholding because of the different cluster-defining thresholds determined by F_R2_clmaps/F_S0_clmaps, but the underlying data (tsoc_B) is exactly the same.

Next Steps

Could everyone who is familiar with the metrics and fitmodels_direct look through the code and my description to see if I have anything wrong and to address any of the issues I've raised?

The text was updated successfully, but these errors were encountered:

emdupre · 2019-03-10T22:37:34Z

This is a great write-up ! I think it would be valuable to take some of these points to the next developers's call to discuss as a group. WDYT, @tsalo ?

tsalo · 2019-03-11T12:07:44Z

Absolutely. Sounds like a plan.

jbteves · 2019-05-23T18:02:53Z

There has been very little discussion on this since its posting, and this is a pretty big issue. I'm afraid that others (including myself) look at the size of the issue and immediately start having heart palpitations over how big it is. I would like to propose that we take all of the concerns above, including many which actually have a solution spelled out, and make them into their own issues. This is an excellent write-up, though, so maybe we should think about how to incorporate elements of it into documentation or comments in code.

tsalo · 2019-05-30T18:11:02Z

I can take certain concerns/problems and open them as separate issues, but some of them only make sense (at least to me) when considered within the overall metric calculation function.

jbteves · 2019-05-30T18:14:10Z

True. Perhaps this is a "hold out until the next hackathon" type of problem. WDYT?

tsalo · 2019-05-30T18:46:24Z

I've started opening specific issues and linking them here. I think this issue will still be useful, but we can focus discussion on those smaller issues. But I also agree that actually dealing with all of these issues will most likely have to wait until we have a hackathon.

jbteves · 2019-05-30T18:46:51Z

Sounds great to me! Thanks!

stale · 2020-01-02T12:45:27Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions to tedana:tada: !

jbteves · 2020-01-02T13:59:03Z

Bot, you're right that this is a little stale, but please keep it open anyway. kthx

stale · 2020-04-01T14:49:31Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions to tedana:tada: !

tsalo · 2020-10-12T20:18:43Z

I'm going to close this since there are multiple smaller issues open and this issue is stale.

tsalo added help wanted issues where we're particularly excited for contributions discussion issues that still need to be discussed labels Feb 19, 2019

tsalo added this to the transparent and reproducible processing milestone Feb 19, 2019

This was referenced Feb 22, 2019

Optionally supplement component selection with AROMA #217

Closed

[FIX] Normalize PCA mixing matrix over time, not component #228

Merged

tsalo mentioned this issue Mar 18, 2019

[ENH] Update Figure Generation Code #236

Merged

tsalo mentioned this issue May 6, 2019

[REF] Create new stats module #273

Merged

tsalo added the TE-dependence issues related to TE dependence metrics and component selection label Oct 4, 2019

stale bot added the stale label Jan 2, 2020

stale bot removed the stale label Jan 2, 2020

stale bot added the stale label Apr 1, 2020

tsalo closed this as completed Oct 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Concerns regarding TE-(in)dependence metric calculation #223

Concerns regarding TE-(in)dependence metric calculation #223

tsalo commented Feb 19, 2019 •

edited

Loading

emdupre commented Mar 10, 2019

tsalo commented Mar 11, 2019

jbteves commented May 23, 2019

tsalo commented May 30, 2019

jbteves commented May 30, 2019

tsalo commented May 30, 2019 •

edited

Loading

jbteves commented May 30, 2019

stale bot commented Jan 2, 2020

jbteves commented Jan 2, 2020

stale bot commented Apr 1, 2020

tsalo commented Oct 12, 2020

Concerns regarding TE-(in)dependence metric calculation #223

Concerns regarding TE-(in)dependence metric calculation #223

Comments

tsalo commented Feb 19, 2019 • edited Loading

Summary

Additional Detail

Next Steps

emdupre commented Mar 10, 2019

tsalo commented Mar 11, 2019

jbteves commented May 23, 2019

tsalo commented May 30, 2019

jbteves commented May 30, 2019

tsalo commented May 30, 2019 • edited Loading

jbteves commented May 30, 2019

stale bot commented Jan 2, 2020

jbteves commented Jan 2, 2020

stale bot commented Apr 1, 2020

tsalo commented Oct 12, 2020

tsalo commented Feb 19, 2019 •

edited

Loading

tsalo commented May 30, 2019 •

edited

Loading