Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducibility issues #46

Closed
juba opened this issue Jan 30, 2020 · 35 comments
Closed

Reproducibility issues #46

juba opened this issue Jan 30, 2020 · 35 comments
Labels
bug Something isn't working

Comments

@juba
Copy link

juba commented Jan 30, 2020

First of all, thanks for this package. It is very convenient to have a pure R implementation of UMAP which is fast and reliable !

I am meeting a small and a bit strange problem of reproducibility of results. To be able to get the same UMAP results twice, I use set.seed before calling umap, and locally on my machine it works well :

 set.seed(82223)
 umap <- uwot::umap(USArrests)

The problem is, sometimes if I run the same code on another machine, for example during tests for a package CRAN check, the test fails because the results are different.

I've tried several things : testing that the uwot version are the same, and even testing that set.seed give the same suite of random numbers on every machine. This is true, but when I compare umap results they are different.

I'm not sure I'm completely clear here... But if you have any idea on why this could happen, I'd be glad to hear it :-)

@juba
Copy link
Author

juba commented Jan 30, 2020

After looking a bit more at uwot source code, I wonder if the problem can come from the fact that the Rcpp optimization code uses its own pcg_prng or tau_prng functions, which are not affected by set.seed. But then what I don't understand is why results are reproducible when I run umap on the same machine...

@jlmelville
Copy link
Owner

For reproducibility you must also set n_sgd_threads = 0. This will slow the optimization down, although for most settings the nearest neighbor search tends to be the bottleneck.

This is a consequence of the asynchronous SGD method used by UMAP (which is basically the same as LargeVis).

The internal PRNGs should be (indirectly) seeded via set.seed, so that shouldn't be an issue.

@LTLA
Copy link
Contributor

LTLA commented Jan 30, 2020

For reproducibility you must also set n_sgd_threads = 0.

@jlmelville Isn't this the default? I'm assuming this is related to the discussion we had about avoiding race conditions waaaay back.

Anyway, @juba, you don't mention what machines you're using. For example, 32-bit Windows is notorious for giving numerically different answers due to some kind of difference in the precision of the calculations - I think it is something to do with the x87 floating point instruction set. Because UMAP is a iterative approach, even small differences propagate over iterations to increase in size. t-SNE was definitely the worst, a 16th decimal place difference in initialization values would yield completely different results after 1000 iterations.

The other possibility is that we have UB or a memory leak somewhere, but I hope not.

As an aside, I do not think it would be wise to hard-code expected UMAP return values in your tests. That makes your testing framework very fragile, liable to break upon changes to things that don't really matter to your end users (who just want to see a pretty plot, really).

@juba
Copy link
Author

juba commented Jan 30, 2020

Thanks for your quick answer.

When I launch umap with set.seed and n_sgd_threads = 0 on 3 different machines, I get 3 different results :

> set.seed(13); head(uwot::umap(iris, n_sgd_threads = 0), 5)
          [,1]       [,2]
[1,] -7.319500  0.5256426
[2,] -6.069137 -0.3592826
[3,] -6.287758 -1.0141467
[4,] -6.164398 -0.9203850
[5,] -7.184044  0.4606612
> set.seed(13); head(uwot::umap(iris, n_sgd_threads = 0), 5)
           [,1]      [,2]
[1,]  -8.597481 -6.791648
[2,] -10.378400 -6.350265
[3,] -10.039313 -5.793700
[4,] -10.221246 -5.900794
[5,]  -8.641421 -6.677541
> set.seed(13); head(uwot::umap(iris, n_sgd_threads = 0), 5)
          [,1]      [,2]
[1,] -9.425414 -2.557605
[2,] -8.890960 -4.117106
[3,] -8.370828 -3.932996
[4,] -8.447809 -4.151634
[5,] -9.281413 -2.635931

Am I doing something wrong ?

@LTLA
Copy link
Contributor

LTLA commented Jan 30, 2020

Well, what are these machines? Linux? Mac? Windows? BSD? ... Solaris?

@juba
Copy link
Author

juba commented Jan 30, 2020

@LTLA Sorry, I didn't see your message.

I've tested it with a total of 6 machines :

  • Four Linuxes (one debian, one centos, two ubuntu 18.04)
  • One windows and one MacOS (with Github Actions)

The three that I copy/pasted are all linux boxes.

Well, as I try to use umap inside a package function, it seems quite important to me that the results can be reproducible from one run to another, otherwise a user could re run an analysis later and get different outputs...

@LTLA
Copy link
Contributor

LTLA commented Jan 30, 2020

Hm. Well, that's interesting. What happens if you turn off multithreading with n_threads=1?

Another possibility is processor intrinsics, but I don't think we use them. RcppAnnoy might use them but iris should not be big enough to trigger the use of approximate NN's anyway.

Edit: I get something different as well (Mac):

> set.seed(13); head(uwot::umap(iris, n_sgd_threads = 0), 5)
          [,1]      [,2]
[1,] -9.694707 -7.596921
[2,] -9.145758 -5.884610
[3,] -8.574868 -6.230881
[4,] -8.701025 -6.052582
[5,] -9.562439 -7.518391

... and the results don't change with n_threads=1, so that's probably not the cause. Hm...

@juba
Copy link
Author

juba commented Jan 30, 2020

Yes, I can confirm, with n_threads = 1 nothing change : same results on the same machine, different results on different ones.

@jlmelville
Copy link
Owner

Sorry about the incorrect statement about n_sgd_threads, I failed to understand that this was about reproducibility across different machines. I have confirmed the issue comparing output between R on Windows and on a Linux VM with the iris example above.

@juba, @LTLA, could either of you try running:

set.seed(13); head(umap(iris, a = 1, b = 1), 5)

and seeing if that gives better results in terms of reproducibility? For the record, for both Windows and my Linux VM, I get the output of:

          [,1]       [,2]
[1,] -8.185683 -2.3828293
[2,] -6.676588 -0.6949825
[3,] -6.222400 -1.0343811
[4,] -6.009416 -0.5351661
[5,] -8.006457 -2.1892876

If that checks out then it looks like there is a numerical issue. I haven't fully nailed it down yet, so here are my preliminary findings. a and b are found by a nonlinear least squares fit and then used as exponents in a power calculation, so the fact that setting them both to 1 fixes things is very suspicious. The find_ab_params routine does return different results for a with the default settings but only in the eleventh decimal place so it's probably not the culprit. On Windows:

UMAP embedding parameters a = 1.89560586631727 b = 0.800637844175666

On Linux:

UMAP embedding parameters a = 1.89560586636672 b = 0.800637844175666

Unfortunately, manually setting those values still isn't enough to make the results agree. However, if you just use the first four decimal places and set approx_pow = TRUE, then the results do seem to agree, i.e.

set.seed(13); head(umap(iris, approx_pow = TRUE, a = 1.8956, b = 0.8006), 5)
          [,1]      [,2]
[1,] -13.89810 -3.735377
[2,] -13.35056 -2.117875
[3,] -12.75033 -2.292292
[4,] -12.84131 -2.176536
[5,] -13.94507 -3.658790

So probably the error lies in the power calculation in the gradient. I've not taken a closer look there yet, but I hope to get enough time shortly. Perhaps the use of approx_pow and fixed a and b values as above is an ok workaround for now?

@jlmelville jlmelville added the bug Something isn't working label Jan 31, 2020
@LTLA
Copy link
Contributor

LTLA commented Jan 31, 2020

@jlmelville Both of those examples give the same results as posted above, though I am on my Linux box (Ubuntu 18.04) at home rather than my Mac at work.

Incidentally, running set.seed(13); head(uwot::umap(iris, n_sgd_threads = 0), 5) gives me the same results as @juba's last set of results (starting -9.425414), so it is not entirely machine-specific chaos. Possibly it depends on the library's implementation of std::pow on each machine...

The find_ab_params routine does return different results for a with the default settings but only in the eleventh decimal place so it's probably not the culprit.

Those errors can amplify pretty quickly, if the t-SNE experience was anything to go by.

@juba
Copy link
Author

juba commented Feb 2, 2020

Thanks for your detailed and quick replies ! Here are the results on my machines.

First example

Linux box 1 :

> set.seed(13); head(uwot::umap(iris, a = 1, b = 1), 5)
          [,1]      [,2]
[1,] -12.32907 -7.980761
[2,] -14.15007 -6.869046
[3,] -14.62219 -7.096888
[4,] -14.76138 -7.277305
[5,] -12.37633 -7.737141

Linux box 2 :

>  set.seed(13); head(uwot::umap(iris, a = 1, b = 1), 5)
          [,1]       [,2]
[1,] -8.185683 -2.3828293
[2,] -6.676588 -0.6949825
[3,] -6.222400 -1.0343811
[4,] -6.009416 -0.5351661
[5,] -8.006457 -2.1892876

Linux box 3 :

> set.seed(13); head(uwot::umap(iris, a = 1, b = 1), 5)
          [,1]         [,2]
[1,] -8.742520 -0.644353479
[2,] -6.608887 -1.108330991
[3,] -6.484410 -0.005383829
[4,] -6.325613 -0.405331981
[5,] -8.540133 -0.360579988

Second example

Linux box 1 :

> set.seed(13); head(uwot::umap(iris, approx_pow = TRUE, a = 1.8956, b = 0.8006), 5)
          [,1]        [,2]
[1,] -13.29704 -1.89562185
[2,] -13.77207 -0.16590410
[3,] -13.77290  0.14170374
[4,] -13.45787  0.09581583
[5,] -13.35800 -1.76931200

Linux box 2 :

> set.seed(13); head(uwot::umap(iris, approx_pow = TRUE, a = 1.8956, b = 0.8006), 5) 
          [,1]      [,2]
[1,] -13.89810 -3.735377
[2,] -13.35056 -2.117875
[3,] -12.75033 -2.292292
[4,] -12.84131 -2.176536
[5,] -13.94507 -3.658790

Linux box 3 :

> set.seed(13); head(uwot::umap(iris, approx_pow = TRUE, a = 1.8956, b = 0.8006), 5) 
          [,1]       [,2]
[1,] -4.225160 -10.458575
[2,] -2.987507  -9.383705
[3,] -3.448075  -9.188345
[4,] -3.296139  -9.051793
[5,] -4.168506 -10.302560

So this seems a bit strange to me : I get the same results as you on one of the boxes (which runs R 3.6.1, by the way), but they are quite different on the two others (which run R 3.6.2).

@SamGG
Copy link

SamGG commented Feb 2, 2020

@jlmelville Get same results as you on my Windows 64, R3.6.2, uwot 0.1.5, in case it may help.

@jlmelville
Copy link
Owner

A couple of things to try:

  • By default n_epoch = 500. Set n_epochs = 1, and see if the output is similar at the early part of optimization.
  • The default initialization for iris will use PCA, which may only be guaranteed to give identical results up to a rotation/reflection depending on the architecture and would give different coordinate values as output even if the interpoint distances were identical. If head(rnorm(300)), 5) gives the same result across machines, then using init = matrix(rnorm(300), ncol=2)) will remove that as a source of variability.

Some results based on the above on my Windows and Linux VM with R 3.6.2:

set.seed(13); head(uwot::umap(iris, a = 1.8956, b = 0.8006, init = matrix(rnorm(300), ncol=2), n_epochs = 1), 5) 
           [,1]        [,2]
[1,]  0.6147206 -2.76264281
[2,] -0.2198782  1.79669863
[3,]  1.8355571 -1.14669579
[4,]  0.2477138 -0.23195193
[5,]  1.2029198  0.01220871
set.seed(13); head(uwot::umap(iris, a = 1.8956, b = 0.8006, approx_pow = TRUE, init = matrix(rnorm(300), ncol=2), n_epochs = 1), 5) 
           [,1]        [,2]
[1,]  0.6147206 -2.76264281
[2,] -0.2198782  1.79669863
[3,]  1.8355571 -1.14669579
[4,]  0.2477138 -0.23195193
[5,]  1.2029198  0.01220871

At least early on then, the results agree.

Also with approx_pow = TRUE, the results still agree after 500 epochs on Windows and Linux:

set.seed(13); head(uwot::umap(iris, a = 1.8956, b = 0.8006, approx_pow = TRUE, init = matrix(rnorm(300), ncol=2)), 5) 
          [,1]     [,2]
[1,]  8.840334 12.03719
[2,] 10.490212 13.65378
[3,] 10.532321 12.99068
[4,] 10.511846 13.09358
[5,]  8.911314 12.19013

But not when using approx_pow = FALSE. On Windows I get:

set.seed(13); head(uwot::umap(iris, a = 1.8956, b = 0.8006, init = matrix(rnorm(300), ncol=2)), 5) 
         [,1]     [,2]
[1,] 4.076168 8.341118
[2,] 2.102707 9.429637
[3,] 1.973279 8.806576
[4,] 2.076288 8.900060
[5,] 3.891764 8.247353

On my Linux VM, I get:

set.seed(13); head(umap(iris, a = 1.8956, b = 0.8006, init = matrix(rnorm(300), ncol = 2)), 5)
          [,1]     [,2]
[1,] 1.8333098 6.916174
[2,] 0.2878910 5.478771
[3,] 0.1653603 6.152372
[4,] 0.2099545 6.045371
[5,] 1.7626021 6.757470

I haven't been very successful at finding out how std::pow behaves on different machines.

Thanks @SamGG for the extra testing.

@juba
Copy link
Author

juba commented Feb 2, 2020

On your two first examples (first with n_epochs = 1 then with approx_pow = TRUE, I get the same results as you on the 3 linux boxes I've got access to.

When using the last example, I still get the same results on the three machines, but not the same as what you get :

> set.seed(13); head(uwot::umap(iris, a = 1.8956, b = 0.8006, init = matrix(rnorm(300), ncol = 2)), 5)
         [,1]     [,2]
[1,] 4.191379 8.028221
[2,] 3.445829 9.990005
[3,] 2.992829 9.464233
[4,] 3.100481 9.648516
[5,] 4.226606 8.193149

@SamGG
Copy link

SamGG commented Feb 2, 2020

@jlmelville still get exact same results on Windows.

@LTLA
Copy link
Contributor

LTLA commented Feb 4, 2020

I get the same results as @juba on my Ubuntu machine for all three scenarios (including the last).

So, it is at least somewhat reproducible, at least between @juba and me! If it is a std::pow difference between our machines and @jlmelville's VM, perhaps our C++ compiler versions might help:

# From gcc -v
gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1) 

# From  /sbin/ldconfig -p | grep stdc++; strings <path from last step> | grep LIBCXX
GLIBCXX_3.4
GLIBCXX_3.4.1
...
GLIBCXX_3.4.24
GLIBCXX_3.4.25
GLIBCXX_DEBUG_MESSAGE_LENGTH

For completeness, the glibc is:

# ldd --version
ldd (Ubuntu GLIBC 2.27-3ubuntu1) 2.27

@jlmelville
Copy link
Owner

I am using Pop OS on my VM:

gcc version 9.2.1 20191008 (Ubuntu 9.2.1-9ubuntu2) 

GLIBCXX_3.4
GLIBCXX_3.4.1
...
GLIBCXX_3.4.27
GLIBCXX_3.4.28
GLIBCXX_DEBUG_MESSAGE_LENGTH

ldd (Ubuntu GLIBC 2.30-0ubuntu2) 2.30

@LTLA: I take it you are using the 18.04 LTS? If you confirm, I will see if I can reproduce results on the same OS.

@LTLA
Copy link
Contributor

LTLA commented Feb 4, 2020

Yes, that's right. Nothing special with my set-up beyond keeping updated.

@juba
Copy link
Author

juba commented Feb 4, 2020

The three linux boxes I have access to run :

  • Ubuntu 18.04 with gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
  • Debian stable with gcc (Debian 8.3.0-6) 8.3.0
  • CentoOS 7 with gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39)

@jlmelville
Copy link
Owner

Ok, running Ubuntu 18.04 LTS on my VM, I have the same compiler setup as @LTLA:

gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04.1) 

GLIBCXX_3.4
GLIBCXX_3.4.1
...
GLIBCXX_3.4.24
GLIBCXX_3.4.25
GLIBCXX_DEBUG_MESSAGE_LENGTH

ldd (Ubuntu GLIBC 2.27-3ubuntu1) 2.27

and I also get the same results:

> set.seed(13); head(uwot::umap(iris, a = 1.8956, b = 0.8006, init = matrix(rnorm(300), ncol=2)), 5) 
         [,1]     [,2]
[1,] 4.191379 8.028221
[2,] 3.445829 9.990005
[3,] 2.992829 9.464233
[4,] 3.100481 9.648516
[5,] 4.226606 8.193149

@LTLA
Copy link
Contributor

LTLA commented Feb 5, 2020

Maybe it's worth having a look at what std::pow is returning on each of the platforms.

#include <iostream>
#include <cmath>
#include <limits>

typedef std::numeric_limits< double > dbl;

int main() {
    std::cout.precision(dbl::max_digits10);
    std::cout << std::pow(1.8956, 0.8006) << std::endl;
}

Compiling and running this gives me 1.6686452174198301. Would be interesting to see what happens on Windows and on the later GCC's.

If we can confirm that the discrepancy is introduced by std::pow alone, then it may be worth just setting approx_pow=TRUE as the default, and then noting the weirdness in the docs.

@juba
Copy link
Author

juba commented Feb 5, 2020

Running this on Ubuntu 18.04 (gcc 7.4.0) and Debian stable (gcc 8.3.0) gives me the same result as you. On the machine running CentOS7 (gcc 4.8.5) cout.precision doesn't seem to be supported, so the result is 1.66865.

@jlmelville
Copy link
Owner

@LTLA the number you get is the same as what I get on Windows and both my Pop OS VM and the Ubuntu 18.04 VMs.

I have attempted to debug this further, but it is slow going. I can see discrepancies in the 12th decimal place of coordinate differences emerge pretty early, but I don't have a root cause yet.

@juba
Copy link
Author

juba commented Feb 5, 2020

Thanks for your work on this. I imagine how time consuming tracking this could be.

@jlmelville
Copy link
Owner

Here are some examples of std::pow giving some slightly different results on different OSes.

#include <iostream>
#include <cmath>

// [[Rcpp::export]]
void stdpow(double a = 1.8956, double b = 0.8006) {
  std::cout.precision(std::numeric_limits<double>::max_digits10);
  std::cout << std::pow(a, b) << std::endl;
}

On Windows:

> stdpow(62.906748749432943)
27.544634264263522
> stdpow(4.0950943003678653)
3.0915657259912637

On Ubuntu 18.04 LTS:

> stdpow(62.906748749432943)
27.544634264263525
> stdpow(4.0950943003678653)
3.0915657259912637

On Pop OS:

> stdpow(62.906748749432943)
27.544634264263525
> stdpow(4.0950943003678653)
3.0915657259912641

These sort of differences seem sufficient to eventually accumulate and cause noticeable changes. I am open to any suggestions on remediation, in addition to @LTLA's suggestion to make approx_pow = TRUE the default.

@peteroupc
Copy link

peteroupc commented Feb 9, 2020

Note that these are not operating system differences so much as differences in how the underlying implementation of the C++ or C standard library (e.g., libm, fdlibm, or msvcrt*.dll) implements std::pow.

Thus, a single (and perhaps an accurate) implementation of pow (and other floating-point math functions) is needed to ensure reproducible results. See also these two articles on reproducibility of floating-point numbers:

@LTLA
Copy link
Contributor

LTLA commented Feb 10, 2020

Note that these are not operating system differences so much as differences in how the underlying implementation of the C++ or C standard library (e.g., libm, fdlibm, or msvcrt*.dll) implements std::pow.

Yes, I think we're all aware of that. Hence my comments on GLIBCXX and GLIBC above.

Thus, a single (and perhaps an accurate) implementation of pow (and other floating-point math functions) is needed to ensure reproducible results.

Yes, if one such implementation exists that is well-tested and portable.

Boost is my usual go-to for things where the standard library lets me down (cough <random> cough) but the pow utilities that I've found are based around compile-time knowledge of the exponent. Maybe there's something in multiprecision - eval_pow, perhaps?

I was wondering what R itself was doing, but it seems that it just calls into math.h's pow. I would be surprised if the C standard library's implementation was consistent across platforms while the C++ implementation was not - the latter probably just piggy-backs off the former on all platforms - but who knows, it might be worth a shot.

@jlmelville
Copy link
Owner

I've done some more debugging and the problem is not restricted to power calculations, sadly. tumap, which purposely has no power calculations in its gradient, still has issues, because of differences in the input data among other things.

Consider:

//[[Rcpp::export]]
void vecprint(Rcpp::NumericVector v, std::size_t idx) {
  std::cout.precision(std::numeric_limits<double>::max_digits10);
  std::cout << v[idx] << std::endl;
}

When I run:

set.seed(13); vecprint(rnorm(140000), 71718)

I get -1.8009308382274591 on my Windows and Ubuntu VM, but -1.8009308382274598 on Pop OS. This starts the ball rolling in accumulating slight differences in gradients.

In addition, there is some "clever" filtering of edges in the input graph based on the number of epochs, which also seems to result in differences between Windows and Ubuntu about when an edge is skipped in sampling. I have yet to see what effect turning this off does (it isn't going to solve the problem anyway).

Thank you @peteroupc for the link to the Bruce Dawson blog which helps reframe the problem as being about floating point determinism rather than accuracy.

Would doing these calculations in float rather than double (or at least some kind of truncation somewhere) help?

@LTLA
Copy link
Contributor

LTLA commented Feb 10, 2020

Truncating the input sounds like a sensible option, at least to resolve the input issue. Probably no one would notice a round-trip through float for input that's meant to be random anyway.

Internally, Boost has some arbitrary-precision types that could also be used to avoid throwing out too much precision when switching from double to float. For example, from this link:

number<cpp_bin_float<53, digit_base_2> >

gives the same precision as a double, so maybe if we knocked that down to 50 and made the output of pow do a round-trip through this type, we would get the reproducible precision.

@peteroupc
Copy link

If feasible, an alternative is to use fixed-point numbers and arithmetic rather than floating-point arithmetic. One case of its use I am aware of is in the article "The Butterfly Effect: Deterministic Physics in The Incredible Machine and Contraption Maker". I say "if feasible", however, because switching this code repository to use fixed-point rather than floating-point numbers is anything but trivial, and I am not aware of any popular "well-tested and portable" fixed-point math libraries. On the other hand, fdlibm is a popular library for reproducible floating-point math operations, and is the same one used in Java's StrictMath class.

@jlmelville
Copy link
Owner

I think that by using float for the coordinates (and the edge sampler code), fp determinism can be achieved, at least with MNIST and 500 epochs, using:

uwot::umap(mnist, n_epochs = 500, init = matrix(rnorm(nrow(mnist) * 2), ncol = 2))

However, with default initialization I still see divergence. This might be because the spectral initialization is not currently deterministic. So more digging needed on that.

Assuming some sort of resolution is reached, this will for sure be a breaking change to the output of uwot.

On the one hand I could make this an option, but it would require some rewriting of the C++ code to make it more templated.

On the other hand, now that I know uwot is being used by the likes of seurat and monocle I don't want to further break reproducibility on a single machine.

On an anatomically improbable third hand, I have yet to even get to uwot 0.2, and really want to be able to break things where necessary.

On a now metaphor-shattering fourth hand, I have never actually documented anywhere a deprecation policy or followed semantic versioning or anything like that, so I should probably at least do that.

I welcome opinions on this one.

@LTLA
Copy link
Contributor

LTLA commented Feb 16, 2020

Assuming some sort of resolution is reached, this will for sure be a breaking change to the output of uwot.

Not a big deal. In fact, I wouldn't even say it's a breaking change to the output. uwot::umap is still giving me a 2-column numeric matrix, so none of my downstream functions need to behave differently. The values might be different but that doesn't really violate any programmatic contract AFAIC.

Now, we might be breaking the end-user's expectation of reproducibility, but that was always much more difficult to achieve, and I'm willing to bet that a version bump in uwot is the least of the user's problems with respect to updates across the scRNA-seq analysis stack. If people want exact same results every time, any time, they had better be using pinned versions in a container.

So I reckon: make the change and bump the version. Everyone's code will still run, it's just that the plots will be a bit different, and that's okay.

Bona fides: I'm using uwot to power the UMAPs throughout https://osca.bioconductor.org (via scater wrappers), and even then, it literally does not bother me if the figures change. I'll give them a once-over to check that they still look pretty but that's about the limit of my concern.

@peteroupc
Copy link

peteroupc commented Feb 16, 2020

On the other hand, now that I know uwot is being used by the likes of seurat and monocle I don't want to further break reproducibility on a single machine.

On an anatomically improbable third hand, I have yet to even get to uwot 0.2, and really want to be able to break things where necessary.

On a now metaphor-shattering fourth hand, I have never actually documented anywhere a deprecation policy or followed semantic versioning or anything like that, so I should probably at least do that.

I welcome opinions on this one.

Note that projects still at version 0.x, such as this one, can theoretically include "breaking changes" between 0.x versions; as the semantic versioning spec says: "Anything MAY change at any time [between 0.x versions]. The public API SHOULD NOT be considered stable." Thus, changing the version from 0.1 to 0.2 should be unproblematic even if they differ by only one change to the code. But see the answer to "How do I know when to release 1.0.0?" in that spec's FAQ.

@jlmelville
Copy link
Owner

uwot 0.1.8 has introduced the changes in this discussion. Closing for now.

@juba
Copy link
Author

juba commented Mar 19, 2020

Hi,

Just to let you know that since upgrading to 0.1.8, just adding approx_pow=TRUE to my umap calls has solved my reproducibility issues.

So many thanks for your hard work on this, it is really appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants