-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poor results with only scale-type arguments in irlba and prcomp_irlba #21
Comments
thanks for this detailed example. I will carefully work through it asap and
respond after that. Best, Bryan.
…On Jul 10, 2017 16:08, "jsams" ***@***.***> wrote:
when I compare the results from the irlba package using the irlba function
or prcomp_irlba and compare that to R's svd and prcomp functions, the
results are usually quite good ('mean relative difference' on the order of
1e-7 or so as reported by all.equal). However, when using scale.=T or the
sd's of the columns of the input matrix for scale in irlba(), the mean
relative difference from the pca function is quite large, on the order of
0.1. When using center alone or with scale, the mean relative difference
goes back to 1e-7 or so. Is this expected behavior?
I ran this on the latest git installed with install_github
Here is sample code, I left out the centered and scaled centered cases as
those are working well
normalize_signs = function(X, Y) {
for (i in 1:ncol(X)) {
if (sign(X[1, i]) != sign(Y[1, i])) {
Y[,i] = -Y[,i]
}
}
return(Y)
}
all.equal_pca = function(X, Y) {
Y = normalize_signs(X, Y)
return(all.equal(X, Y, check.attributes=F))
}
set.seed(1)
X = matrix(rnorm(2000), ncol=40)
M = 5 # number of PCA components
centers = colMeans(X)
sds = apply(X, 2, sd)
Xc = sweep(X, 2, centers, `-`)
Xs = sweep(X, 2, sds, `/`)
Xcs = sweep(Xc, 2, sds, `/`)
# unscaled
scaled=F
centered=F
pca = prcomp(X, center=centered, scale.=scaled)
sv = svd(X)
svir = irlba(X, nv=M, nu=M)
pcair = prcomp_irlba(X, n=M, center=centered, scale.=scaled)
Xpca = predict(pca)[,1:M]
Xsvl = sv$u[,1:M] %*% diag(sv$d[1:M])
Xsvr = X %*% sv$v[,1:M]
Xsvirl = svir$u %*% diag(svir$d)
Xsvirr = X %*% svir$v
Xpcair = predict(pcair)
Xpcair2 = X %*% pcair$rotation
all.equal_pca(Xsvl, Xsvr)
all.equal_pca(Xpca, Xsvl)
all.equal_pca(Xsvirl, Xsvirr)
all.equal_pca(Xpca, Xsvirl)
all.equal_pca(Xpcair, Xpcair2)
all.equal_pca(Xpca, Xpcair)
all.equal_pca(Xpcair, Xsvirl)
# scaled, uncentered
scaled=T
centered=F
pca = prcomp(X, center=centered, scale.=scaled)
sv = svd(Xs)
svir = irlba(X, nv=M, nu=M, scale=sds)
pcair = prcomp_irlba(X, n=M, center=centered, scale.=scaled)
Xpca = predict(pca)[,1:M]
Xsvl = sv$u[,1:M] %*% diag(sv$d[1:M])
Xsvr = Xs %*% sv$v[,1:M]
Xsvirl = svir$u %*% diag(svir$d)
Xsvirr = Xs %*% svir$v
Xpcair = predict(pcair)
Xpcair2 = Xs %*% pcair$rotation
all.equal_pca(Xsvl, Xsvr)
all.equal_pca(Xpca, Xsvl)
all.equal_pca(Xsvirl, Xsvirr)
all.equal_pca(Xpca, Xsvirl)
all.equal_pca(Xpcair, Xpcair2)
all.equal_pca(Xpca, Xpcair)
all.equal_pca(Xpcair, Xsvirl)
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#21>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAIsnqT2UYvmLBfaiMQtxsoXykW_Gzsdks5sMoTJgaJpZM4OTWUo>
.
|
Sorry for the long latency. The bug is in prcomp_irlba and its interpretation of scale. From the R documentation for
(Note the root mean square in the non-centered case.) Unfortunately prcomp_irlba (and indeed your code with Xs above) uses standard deviation whether centered or not. There is another minor complicating factor that can pop up simply due to the sign of the output components (which may vary). But the big issue is the incorrect scaling in prcomp_irlba. Fixing now... |
This should be fixed in the GitHub 2.2.2 version in the main branch. See below for an adjusted example based on your code to use the RMS scaling. Thanks for finding this subtle bug!
|
pls. re-open if you still run into trouble |
when I compare the results from the irlba package using the irlba function or prcomp_irlba and compare that to R's svd and prcomp functions, the results are usually quite good ('mean relative difference' on the order of 1e-7 or so as reported by all.equal). However, when using scale.=T or the sd's of the columns of the input matrix for scale in irlba(), the mean relative difference from the pca function is quite large, on the order of 0.1. When using center alone or with scale, the mean relative difference goes back to 1e-7 or so. Is this expected behavior?
I ran this on the latest git installed with install_github
Here is sample code, I left out the centered and scaled centered cases as those are working well
The text was updated successfully, but these errors were encountered: