How to deal with zero or near-zero mixture weights? #5

pcarbo · 2021-01-01T22:01:35Z

It doesn't make much sense to update prior covariance matrices with weights that are zero, or near zero. @stephens999 @yunqiyang0215 @zouyuxin Ideas are welcome.

Here's an example (thanks to Yuxin):

set.seed(1)
dat <- readRDS("dat.rds")
f0 <- ud_init(X = as.matrix(dat$data),V = dat$S,U_scaled = list(),
              U_unconstrained = dat$Ulist,n_rank1 = 0)
res <- ud_fit(f0,control = list(unconstrained.update = "teem",
                                resid.update = "none",
                                version = "R"))
# Performing Ultimate Deconvolution on 600 x 20 matrix (udr 0.3-30, "R"):
# data points are i.i.d. (same V)
# prior covariances: 0 scaled, 0 rank-1, 10 unconstrained
# prior covariance updates: none (scaled), none (rank-1), teem (unconstrained)
# mixture weights update: em
# residual covariance update: none
# max 20 updates, conv tol 1.0e-06
# iter          log-likelihood |w - w'| |U - U'| |V - V'|
#    1 -3.0699325059934326e+04 4.65e-01 1.22e+02 0.00e+00
#    2 -3.0330311442511782e+04 9.41e-02 1.11e+02 0.00e+00
#   ...
#   19 -2.9720957242869852e+04 2.79e-05 2.27e-01 0.00e+00
#   20 -2.9720891074856361e+04 7.07e-05 4.21e-01 0.00e+00
print(round(res$w,digits = 6))
#  FLASH_1  FLASH_2  FLASH_3  FLASH_4   tFLASH    PCA_1    PCA_2    PCA_3
# 0.000000 0.000000 0.001667 0.004884 0.229977 0.000000 0.000000 0.000116
#    tPCA       XX
# 0.265002 0.498354

dat.rds.gz

zouyuxin · 2021-01-02T17:35:08Z

I agree there is no need to update covariance if it has weight 0. I was thinking update ku here https://github.com/stephenslab/udr/blob/master/R/fit.R#L334 based on weight w.

pcarbo · 2021-01-02T23:16:39Z

Related to this, there is the question of whether we should prune weights that are smaller than some pre-specific threshold, e.g., 1e-8.

yunqiyang0215 · 2021-01-04T20:51:56Z

From my understanding, what we want is that we can specify many mixture components when fitting the model and use the data to learn weights. Therefore, there should be some components with weight zero because they don't match the data. Specifying a threshold sounds like a good solution to me. If the weight is less than some threshold, say 1e-8, we directly set it to zero. Best, Yunqi

…

________________________________ From: Peter Carbonetto <notifications@github.com> Sent: Friday, January 1, 2021 4:01 PM To: stephenslab/udr <udr@noreply.github.com> Cc: Yunqi Yang <yunqiyang@uchicago.edu>; Mention <mention@noreply.github.com> Subject: [stephenslab/udr] How to deal with zero or near-zero mixture weights? (#5) It doesn't make much sense to update prior covariance matrices with weights that are zero, or near zero. @stephens999<https://urldefense.com/v3/__https://github.com/stephens999__;!!BpyFHLRN4TMTrA!oFArQQl6KBxj8dvDIcM6oBlUR9HFKcrDOsyGnGaFNElAC94Kb2E3qLGzlDJzgdHLjN_K1g$> @yunqiyang0215<https://urldefense.com/v3/__https://github.com/yunqiyang0215__;!!BpyFHLRN4TMTrA!oFArQQl6KBxj8dvDIcM6oBlUR9HFKcrDOsyGnGaFNElAC94Kb2E3qLGzlDJzgdHHwpwUrQ$> @zouyuxin<https://urldefense.com/v3/__https://github.com/zouyuxin__;!!BpyFHLRN4TMTrA!oFArQQl6KBxj8dvDIcM6oBlUR9HFKcrDOsyGnGaFNElAC94Kb2E3qLGzlDJzgdGQFJqzdg$> Ideas are welcome. Here's an example: set.seed(1) dat <- readRDS("dat.rds") f0 <- ud_init(X = as.matrix(dat$data),V = dat$S,U_scaled = list(), U_unconstrained = dat$Ulist,n_rank1 = 0) res <- ud_fit(f0,control = list(unconstrained.update = "teem", resid.update = "none", version = "R")) # Performing Ultimate Deconvolution on 600 x 20 matrix (udr 0.3-30, "R"): # data points are i.i.d. (same V) # prior covariances: 0 scaled, 0 rank-1, 10 unconstrained # prior covariance updates: none (scaled), none (rank-1), teem (unconstrained) # mixture weights update: em # residual covariance update: none # max 20 updates, conv tol 1.0e-06 # iter log-likelihood |w - w'| |U - U'| |V - V'| # 1 -3.0699325059934326e+04 4.65e-01 1.22e+02 0.00e+00 # 2 -3.0330311442511782e+04 9.41e-02 1.11e+02 0.00e+00 # ... # 19 -2.9720957242869852e+04 2.79e-05 2.27e-01 0.00e+00 # 20 -2.9720891074856361e+04 7.07e-05 4.21e-01 0.00e+00 print(round(res$w,digits = 6)) # FLASH_1 FLASH_2 FLASH_3 FLASH_4 tFLASH PCA_1 PCA_2 PCA_3 # 0.000000 0.000000 0.001667 0.004884 0.229977 0.000000 0.000000 0.000116 # tPCA XX # 0.265002 0.498354 dat.rds.gz<https://urldefense.com/v3/__https://github.com/stephenslab/udr/files/5759490/dat.rds.gz__;!!BpyFHLRN4TMTrA!oFArQQl6KBxj8dvDIcM6oBlUR9HFKcrDOsyGnGaFNElAC94Kb2E3qLGzlDJzgdFG8H__WQ$> — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<https://urldefense.com/v3/__https://github.com/stephenslab/udr/issues/5__;!!BpyFHLRN4TMTrA!oFArQQl6KBxj8dvDIcM6oBlUR9HFKcrDOsyGnGaFNElAC94Kb2E3qLGzlDJzgdE-C4cv4A$>, or unsubscribe<https://urldefense.com/v3/__https://github.com/notifications/unsubscribe-auth/AJUIA2JQKOY3KTI3KIIKTPDSXZA4XANCNFSM4VQSABAA__;!!BpyFHLRN4TMTrA!oFArQQl6KBxj8dvDIcM6oBlUR9HFKcrDOsyGnGaFNElAC94Kb2E3qLGzlDJzgdEOG_X5wA$>.

yunqiyang0215 · 2021-12-28T19:18:09Z

I added two checks in the code. The idea is if weight[i] < minval (1e-15, I set for default for now), we set weight[i] == 0 and skip updating U[[i]].

A check in update_mixture_weights.R to see if w[i] < minval.
A check in update_prior_covariances.R. If w[i] == 0, skip updating U[[i]].

pcarbo · 2022-01-04T13:41:26Z

@yunqiyang0215 I wrote this in Slack but I'll post my comments here as well.

I would suggest defining a new control parameter, e.g., zero.threshold, with a default of, say, 1e-8, and the prior covariance matrices will not be updated when the corresponding prior weights are below this threshold.

Also there is another subtlety in your check because you are checking the prior weights w but the the update for U actually doesn't depend on w; it depends on P.

So maybe if w[i] is "too small", then we should also set the corresponding "responsibilities" P[,i] to be all zeros?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to deal with zero or near-zero mixture weights? #5

How to deal with zero or near-zero mixture weights? #5

pcarbo commented Jan 1, 2021 •

edited

Loading

zouyuxin commented Jan 2, 2021

pcarbo commented Jan 2, 2021

yunqiyang0215 commented Jan 4, 2021 via email

yunqiyang0215 commented Dec 28, 2021 •

edited

Loading

pcarbo commented Jan 4, 2022

How to deal with zero or near-zero mixture weights? #5

How to deal with zero or near-zero mixture weights? #5

Comments

pcarbo commented Jan 1, 2021 • edited Loading

zouyuxin commented Jan 2, 2021

pcarbo commented Jan 2, 2021

yunqiyang0215 commented Jan 4, 2021 via email

yunqiyang0215 commented Dec 28, 2021 • edited Loading

pcarbo commented Jan 4, 2022

pcarbo commented Jan 1, 2021 •

edited

Loading

yunqiyang0215 commented Dec 28, 2021 •

edited

Loading