Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Concerns regarding TE-(in)dependence metric calculation #223

Closed
tsalo opened this issue Feb 19, 2019 · 11 comments
Closed

Concerns regarding TE-(in)dependence metric calculation #223

tsalo opened this issue Feb 19, 2019 · 11 comments
Labels
discussion issues that still need to be discussed help wanted issues where we're particularly excited for contributions TE-dependence issues related to TE dependence metrics and component selection

Comments

@tsalo
Copy link
Member

tsalo commented Feb 19, 2019

Summary

I would like to better document and understand how model metrics are computed. I have some concerns about some of the steps- especially how WTS is calculated and how regressions with get_coeffs and computefeats2 are performed.

Additional Detail

  1. Compute optimal combination from T2* and echo data.
  2. Compute un-normalized weights for each component for optimally combined data (WTS) with computefeats2 and normalized mixing matrix (when provided, otherwise regular mixing matrix).
    1. Use signs of skew values for weights (per component) to flip components and weights so that component time series positively align with the data.
      • PROBLEM: The global signal trend for the 5-echo data is consistently being rendered as going down in the ICA step (although it goes up, as expected, in the PCA step). Nothing I've tried, including using betas, z-statistics, mean, and skew, has fixed this, but it makes me suspicious of the sign determination method.
    2. Estimate normalized total variance explained per component from weights.
    • PROBLEM: While the comments say "un-normalized weights", the function computefeats2 does not return un-normalized weights. It computes the weights using the data after it has been z-scored over time. It then caps the weights at +/- 0.999 and z-transforms them using Fisher's transform as if they were correlation coefficients.
    • PROBLEM: I'm pretty sure that the normalized mixing matrix, which is only used with PCA, is actually z-scored along the wrong axis. FIXED WITH [FIX] Normalize PCA mixing matrix over time, not component #228
      • SOLUTION: We can just normalize each component's time series over time (instead of each volume's values over components).
  3. Compute un-normalized weights for each component for demeaned optimally combined data (tsoc_B) with get_coeffs and unnormalized mixing matrix.
    1. Convert weights to percent signal change units.
    2. Estimate total variance explained per component from weights.
    • PROBLEM: Demeaning data should only impact weights if mixing matrix is not demeaned, which it should be, but sometimes isn't.
  4. Compute un-normalized weights for each component for each echo with get_coeffs.
  5. Compute voxel-wise means for each echo.
  6. Compute alpha map for each component: Squared echo coefficients from 4, summed over time.
    • QUESTION: What is alpha?
  7. Compute component's variance explained: Sum of squared un-normalized weights from 3 (tsoc_B) divided by total variance from 3b.
    • PROBLEM: The variance explained measures calculated within fitmodels_direct do not seem to align well with the variance explained values we get directly from PCA, which makes me doubt fitmodels_direct.
      • NOTE: The values may not be very similar (for example, for PCA the "real" varex might be 37.8%, while the estimated varex is 57.6% and the estimated normalized varex is 53.8%), but the values are highly correlated across components.
      • NOTE: This can be discussed in Calculation of variance explained appears wrong #317.
  8. Compute component's normalized variance explained: Sum of squared un-normalized weights from 2 (WTS) divided by total variance from 2b.
  9. Compute model-wise pseudo-F-statistics for each component:
    1. Set up design matrices (X) for S0 and T2* models.
      • S0 model: Echo-specific weights follow echo-specific averages
      • T2* model: Weight peaks at T2*
    2. Compute model coefficient maps:
      • Sum of design matrix times component echo weights from 4 over time divided by design matrix squared summed over time.
    3. Compute predicted values for models:
      • Design matrix times model coefficient maps.
    4. Compute sum of squared error maps for models:
      • Echo coefficients minus predicted values, squared and summed over time.
    5. Compute pseudo-F-statistics:
      • Alpha minus SSE, multiplied by number of echoes minus one, all divided by SSE.
    6. Cap F-statistics at 500.
    • PROBLEM: Shouldn't these F-statistics be calculated only for voxels with positive betas? Signs of component time series should be flipped to match the actual data, and I don't think anticorrelations are meaningful in this context. I could definitely be wrong though.
  10. Compute averaging weight map for each component:
    1. Z-score component's un-normalized weights from 2 (WTS) over voxels.
      • PROBLEM: These values are already z-scored within computefeats2. Why would we do this again?
    2. Cap z-scores at +/- 8.
    3. Square z-score maps.
      • PROBLEM: By squaring the maps, we're treating negative weights as if they're equally as valid as positive weights. If the sign determination in 2.1 is accurate, then only positive weights should be used.
  11. Calculate Kappa and Rho metrics by weighted average of pseudo-F-statistics with weight map.
  12. If full metric calculation (full_sel) is desired, calculate additional metrics.
  13. Store several maps for each component in dictionary:
    1. Z-score maps (prior to squaring) originally used for weighted averaging
    2. Percent signal change of optimally combined data maps
    3. Pseudo-F-statistic maps
  14. Perform simple cluster-extent thresholding on pseudo-F-statistic maps using cluster-defining F-value threshold from getfbounds function and cluster size threshold based on number of voxels in mask.
    1. Store binarized maps of significant voxels in F_R2_clmaps and F_S0_clmaps arrays.
    2. Store number of voxels in significant clusters in countsigFR2 and countsigFS0 arrays.
  15. Perform simple cluster-extent thresholding on Z-score maps using cluster-defining threshold of 1.95 (p < 0.05, two-sided) and the same cluster-extent threshold as everything else.
    • NOTE: Cluster-extent thresholding is bidirectional, but mixes adjacent positive and negative values to define clusters. I think we should make it one-sided, based on my concerns regarding the signs of component weights.
  16. Perform simple cluster-extent thresholding on rank-transformed absolute-valued un-normalized weight maps from 3 (tsoc_B) using a cluster-defining threshold of the number of voxels in the mask minus the number of significant voxels from the countsigFR2/countsigFS0 variables and the same cluster-extent threshold as is applied to the F_R2_clmaps/F_S0_clmaps arrays.
    • PROBLEM: This should initially result in the same number of significant voxels in Br_R2_clmaps/Br_S0_clmaps (these arrays) as there are in the corresponding component maps from F_R2_clmaps/F_S0_clmaps, but then after cluster-extent thresholding they would have fewer. Does the difference in number of significant voxels cause any problems?
    • NOTE: Br_R2_clmaps and Br_S0_clmaps should differ in terms of the number of voxels that survive thresholding because of the different cluster-defining thresholds determined by F_R2_clmaps/F_S0_clmaps, but the underlying data (tsoc_B) is exactly the same.

Next Steps

  1. Could everyone who is familiar with the metrics and fitmodels_direct look through the code and my description to see if I have anything wrong and to address any of the issues I've raised?
@tsalo tsalo added help wanted issues where we're particularly excited for contributions discussion issues that still need to be discussed labels Feb 19, 2019
@emdupre
Copy link
Member

emdupre commented Mar 10, 2019

This is a great write-up ! I think it would be valuable to take some of these points to the next developers's call to discuss as a group. WDYT, @tsalo ?

@tsalo
Copy link
Member Author

tsalo commented Mar 11, 2019

Absolutely. Sounds like a plan.

@jbteves
Copy link
Collaborator

jbteves commented May 23, 2019

There has been very little discussion on this since its posting, and this is a pretty big issue. I'm afraid that others (including myself) look at the size of the issue and immediately start having heart palpitations over how big it is. I would like to propose that we take all of the concerns above, including many which actually have a solution spelled out, and make them into their own issues. This is an excellent write-up, though, so maybe we should think about how to incorporate elements of it into documentation or comments in code.

@tsalo
Copy link
Member Author

tsalo commented May 30, 2019

I can take certain concerns/problems and open them as separate issues, but some of them only make sense (at least to me) when considered within the overall metric calculation function.

@jbteves
Copy link
Collaborator

jbteves commented May 30, 2019

True. Perhaps this is a "hold out until the next hackathon" type of problem. WDYT?

@tsalo
Copy link
Member Author

tsalo commented May 30, 2019

I've started opening specific issues and linking them here. I think this issue will still be useful, but we can focus discussion on those smaller issues. But I also agree that actually dealing with all of these issues will most likely have to wait until we have a hackathon.

@jbteves
Copy link
Collaborator

jbteves commented May 30, 2019

Sounds great to me! Thanks!

@tsalo tsalo added the TE-dependence issues related to TE dependence metrics and component selection label Oct 4, 2019
@stale
Copy link

stale bot commented Jan 2, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions to tedana:tada: !

@stale stale bot added the stale label Jan 2, 2020
@jbteves
Copy link
Collaborator

jbteves commented Jan 2, 2020

Bot, you're right that this is a little stale, but please keep it open anyway. kthx

@stale stale bot removed the stale label Jan 2, 2020
@stale
Copy link

stale bot commented Apr 1, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions to tedana:tada: !

@stale stale bot added the stale label Apr 1, 2020
@tsalo
Copy link
Member Author

tsalo commented Oct 12, 2020

I'm going to close this since there are multiple smaller issues open and this issue is stale.

@tsalo tsalo closed this as completed Oct 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion issues that still need to be discussed help wanted issues where we're particularly excited for contributions TE-dependence issues related to TE dependence metrics and component selection
Projects
None yet
Development

No branches or pull requests

3 participants