Poor results with only scale-type arguments in irlba and prcomp_irlba #21

jsams · 2017-07-10T20:08:41Z

when I compare the results from the irlba package using the irlba function or prcomp_irlba and compare that to R's svd and prcomp functions, the results are usually quite good ('mean relative difference' on the order of 1e-7 or so as reported by all.equal). However, when using scale.=T or the sd's of the columns of the input matrix for scale in irlba(), the mean relative difference from the pca function is quite large, on the order of 0.1. When using center alone or with scale, the mean relative difference goes back to 1e-7 or so. Is this expected behavior?

I ran this on the latest git installed with install_github

Here is sample code, I left out the centered and scaled centered cases as those are working well

normalize_signs = function(X, Y) {
    for (i in 1:ncol(X)) {
        if (sign(X[1, i]) != sign(Y[1, i])) {
            Y[,i] = -Y[,i]
        }
    }
    return(Y)
}

all.equal_pca = function(X, Y) {
    Y = normalize_signs(X, Y)
    return(all.equal(X, Y, check.attributes=F))
}

set.seed(1)
X = matrix(rnorm(2000), ncol=40)
M = 5 # number of PCA components
centers = colMeans(X)
sds = apply(X, 2, sd)
Xc = sweep(X, 2, centers, `-`)
Xs = sweep(X, 2, sds, `/`)
Xcs = sweep(Xc, 2, sds, `/`)

# unscaled
scaled=F
centered=F
pca = prcomp(X, center=centered, scale.=scaled)
sv = svd(X)
svir = irlba(X, nv=M, nu=M)
pcair = prcomp_irlba(X, n=M, center=centered, scale.=scaled)

Xpca = predict(pca)[,1:M]

Xsvl = sv$u[,1:M] %*% diag(sv$d[1:M])
Xsvr = X %*% sv$v[,1:M]

Xsvirl = svir$u %*% diag(svir$d)
Xsvirr = X %*% svir$v

Xpcair = predict(pcair)
Xpcair2 = X %*% pcair$rotation

all.equal_pca(Xsvl, Xsvr)
all.equal_pca(Xpca, Xsvl)
all.equal_pca(Xsvirl, Xsvirr)
all.equal_pca(Xpca, Xsvirl)
all.equal_pca(Xpcair, Xpcair2)
all.equal_pca(Xpca, Xpcair)
all.equal_pca(Xpcair, Xsvirl)

# scaled, uncentered
scaled=T
centered=F
pca = prcomp(X, center=centered, scale.=scaled)
sv = svd(Xs)
svir = irlba(X, nv=M, nu=M, scale=sds)
pcair = prcomp_irlba(X, n=M, center=centered, scale.=scaled)

Xpca = predict(pca)[,1:M]

Xsvl = sv$u[,1:M] %*% diag(sv$d[1:M])
Xsvr = Xs %*% sv$v[,1:M]

Xsvirl = svir$u %*% diag(svir$d)
Xsvirr = Xs %*% svir$v

Xpcair = predict(pcair)
Xpcair2 = Xs %*% pcair$rotation

all.equal_pca(Xsvl, Xsvr)
all.equal_pca(Xpca, Xsvl)
all.equal_pca(Xsvirl, Xsvirr)
all.equal_pca(Xpca, Xsvirl)
all.equal_pca(Xpcair, Xpcair2)
all.equal_pca(Xpca, Xpcair)
all.equal_pca(Xpcair, Xsvirl)

The text was updated successfully, but these errors were encountered:

bwlewis · 2017-07-12T13:15:38Z

thanks for this detailed example. I will carefully work through it asap and respond after that. Best, Bryan.

…

On Jul 10, 2017 16:08, "jsams" ***@***.***> wrote: when I compare the results from the irlba package using the irlba function or prcomp_irlba and compare that to R's svd and prcomp functions, the results are usually quite good ('mean relative difference' on the order of 1e-7 or so as reported by all.equal). However, when using scale.=T or the sd's of the columns of the input matrix for scale in irlba(), the mean relative difference from the pca function is quite large, on the order of 0.1. When using center alone or with scale, the mean relative difference goes back to 1e-7 or so. Is this expected behavior? I ran this on the latest git installed with install_github Here is sample code, I left out the centered and scaled centered cases as those are working well normalize_signs = function(X, Y) { for (i in 1:ncol(X)) { if (sign(X[1, i]) != sign(Y[1, i])) { Y[,i] = -Y[,i] } } return(Y) } all.equal_pca = function(X, Y) { Y = normalize_signs(X, Y) return(all.equal(X, Y, check.attributes=F)) } set.seed(1) X = matrix(rnorm(2000), ncol=40) M = 5 # number of PCA components centers = colMeans(X) sds = apply(X, 2, sd) Xc = sweep(X, 2, centers, `-`) Xs = sweep(X, 2, sds, `/`) Xcs = sweep(Xc, 2, sds, `/`) # unscaled scaled=F centered=F pca = prcomp(X, center=centered, scale.=scaled) sv = svd(X) svir = irlba(X, nv=M, nu=M) pcair = prcomp_irlba(X, n=M, center=centered, scale.=scaled) Xpca = predict(pca)[,1:M] Xsvl = sv$u[,1:M] %*% diag(sv$d[1:M]) Xsvr = X %*% sv$v[,1:M] Xsvirl = svir$u %*% diag(svir$d) Xsvirr = X %*% svir$v Xpcair = predict(pcair) Xpcair2 = X %*% pcair$rotation all.equal_pca(Xsvl, Xsvr) all.equal_pca(Xpca, Xsvl) all.equal_pca(Xsvirl, Xsvirr) all.equal_pca(Xpca, Xsvirl) all.equal_pca(Xpcair, Xpcair2) all.equal_pca(Xpca, Xpcair) all.equal_pca(Xpcair, Xsvirl) # scaled, uncentered scaled=T centered=F pca = prcomp(X, center=centered, scale.=scaled) sv = svd(Xs) svir = irlba(X, nv=M, nu=M, scale=sds) pcair = prcomp_irlba(X, n=M, center=centered, scale.=scaled) Xpca = predict(pca)[,1:M] Xsvl = sv$u[,1:M] %*% diag(sv$d[1:M]) Xsvr = Xs %*% sv$v[,1:M] Xsvirl = svir$u %*% diag(svir$d) Xsvirr = Xs %*% svir$v Xpcair = predict(pcair) Xpcair2 = Xs %*% pcair$rotation all.equal_pca(Xsvl, Xsvr) all.equal_pca(Xpca, Xsvl) all.equal_pca(Xsvirl, Xsvirr) all.equal_pca(Xpca, Xsvirl) all.equal_pca(Xpcair, Xpcair2) all.equal_pca(Xpca, Xpcair) all.equal_pca(Xpcair, Xsvirl) — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#21>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAIsnqT2UYvmLBfaiMQtxsoXykW_Gzsdks5sMoTJgaJpZM4OTWUo> .

bwlewis · 2017-10-01T23:36:18Z

Sorry for the long latency.

The bug is in prcomp_irlba and its interpretation of scale. From the R documentation for scale:

If ‘scale’ is ‘TRUE’ then scaling is done by dividing the (centered) columns of ‘x’
by their standard deviations if ‘center’ is ‘TRUE’, and the root mean square otherwise.

(Note the root mean square in the non-centered case.) Unfortunately prcomp_irlba (and indeed your code with Xs above) uses standard deviation whether centered or not.

There is another minor complicating factor that can pop up simply due to the sign of the output components (which may vary). But the big issue is the incorrect scaling in prcomp_irlba.

Fixing now...

bwlewis · 2017-10-02T00:33:44Z

This should be fixed in the GitHub 2.2.2 version in the main branch.

See below for an adjusted example based on your code to use the RMS scaling. Thanks for finding this subtle bug!

library(irlba)

normalize_signs = function(X, Y) {
    for (i in 1:ncol(X)) {
        if (sign(X[1, i]) != sign(Y[1, i])) {
            Y[,i] = -Y[,i]
        }
    }
    return(Y)
}

all.equal_pca = function(X, Y) {
    Y = normalize_signs(X, Y)
    return(all.equal(X, Y, check.attributes=F))
}

set.seed(1)
X = matrix(rnorm(2000), ncol=40)
M = 5 # number of PCA components
centers = colMeans(X)
sds = apply(X, 2, sd)
rms = apply(X, 2, function(x) sqrt(sum(x^2) / (length(x) - 1)))
Xc = sweep(X, 2, centers, `-`)
Xs = sweep(X, 2, sds, `/`)
Xcs = sweep(Xc, 2, sds, `/`)
Xrms = sweep(X, 2, rms, `/`)

# scaled, uncentered
scaled=T
centered=F
pca = prcomp(X, center=centered, scale.=scaled)
sv = svd(Xrms)
svir = irlba(X, nv=M, nu=M, scale=rms)
pcair = prcomp_irlba(X, n=M, center=centered, scale.=scaled)

Xpca = predict(pca)[,1:M]

Xsvl = sv$u[,1:M] %*% diag(sv$d[1:M])
Xsvr = Xrms %*% sv$v[,1:M]

Xsvirl = svir$u %*% diag(svir$d)
Xsvirr = Xrms %*% svir$v

Xpcair = predict(pcair)
Xpcair2 = Xrms %*% pcair$rotation

all.equal_pca(Xsvl, Xsvr)
all.equal_pca(Xpca, Xsvl)
all.equal_pca(Xsvirl, Xsvirr)
all.equal_pca(Xpca, Xsvirl)
all.equal_pca(Xpcair, Xpcair2)
all.equal_pca(Xpca, Xpcair)
all.equal_pca(Xpcair, Xsvirl)

Incorrect prcomp_irlba scaling

bwlewis · 2017-11-14T15:44:56Z

pls. re-open if you still run into trouble

bwlewis mentioned this issue Aug 24, 2017

Discrepancy in irlba with center argument #22

Closed

bwlewis added a commit that referenced this issue Oct 2, 2017

Fix and test for issue #21

e50c06d

Incorrect prcomp_irlba scaling

bwlewis closed this as completed Nov 14, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Poor results with only scale-type arguments in irlba and prcomp_irlba #21

Poor results with only scale-type arguments in irlba and prcomp_irlba #21

jsams commented Jul 10, 2017

bwlewis commented Jul 12, 2017 via email

bwlewis commented Oct 1, 2017

bwlewis commented Oct 2, 2017

bwlewis commented Nov 14, 2017

Poor results with only scale-type arguments in irlba and prcomp_irlba #21

Poor results with only scale-type arguments in irlba and prcomp_irlba #21

Comments

jsams commented Jul 10, 2017

bwlewis commented Jul 12, 2017 via email

bwlewis commented Oct 1, 2017

bwlewis commented Oct 2, 2017

bwlewis commented Nov 14, 2017