Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducibility issue with cosine metric #55

Open
juba opened this issue Mar 23, 2020 · 4 comments
Open

Reproducibility issue with cosine metric #55

juba opened this issue Mar 23, 2020 · 4 comments

Comments

@juba
Copy link

juba commented Mar 23, 2020

This is a followup to issue #46.

The reproducibility issues described there have been fixed for me in 0.1.8 by using approx_pow = TRUE with an euclidean or manhattan metric, but I still face problems when using cosine.

Here's a result on my laptop (Ubuntu 18.04, R 3.6.3, uwot 0.1.8) :

> set.seed(13); head(uwot::umap(iris, metric = "cosine", init="spca", a=1, b=1, approx_pow=TRUE), 5)
         [,1]      [,2]
[1,] 2.190465 -14.45460
[2,] 2.153269 -11.64510
[3,] 2.337686 -14.14382
[4,] 1.191009 -12.59075
[5,] 1.472325 -15.06042

And here's the same thing on a server (CentOS 7, R 3.6.1, uwot 0.1.8) :

> set.seed(13); head(uwot::umap(iris, metric = "cosine", init="spca", a=1, b=1, approx_pow=TRUE), 5)                                                                
          [,1]      [,2]                                                                                                                                            
[1,] -15.45597 -4.156313
[2,] -17.59474 -4.357967
[3,] -15.25843 -4.456960
[4,] -17.01195 -2.813276
[5,] -14.92331 -3.548293

The results are the same when run with metric = "euclidean".

@tlz4320
Copy link

tlz4320 commented Nov 27, 2022

But in newest version, euclidean + approx_pow still face same problem.

@jlmelville
Copy link
Owner

Unfortunately, I cannot give you a satisfactory solution to these issues. As far as I can tell, we are at the mercy of whatever system libraries are part of the base OS.

@jlmelville
Copy link
Owner

One thing that could cause issues is the spca initialization: here you are the mercy of the SVD routine which also can produce arbitrary signs, e.g. from the man page for prcomp:

The signs of the columns of the rotation matrix are arbitrary, and so may differ between different programs for PCA, and even between different builds of R.

But I assume the spectral initialization will have the same issue. I don't recommend using init="random" but if it gives consistent results across architectures then at least you know the initialization is the issue.

@tlz4320
Copy link

tlz4320 commented Nov 28, 2022

Thanks. I will continue to use only one OS for preventing this problem. Now, I could reproduce my result in Ubuntu 22 in different machine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants