-
-
Notifications
You must be signed in to change notification settings - Fork 381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-deterministic result of antsAffineInitializer #444
Comments
How many threads are you using? I've been having issues today with antsRegistration and the number of threads (I haven't got a minimal test case working so was holding off on reporting it).
Toby
|
@spinicist that's hilarious: I had a very long issue already written, ready to click on the submit button about the reproducibility of antsRegistration w.r.t. the number of threads. I have that test case, I can post it if you want. Bit-to-bit reproducibility of parallel computing is not trivial, so I thought I would need to set some tolerance and held off posting the issue. While investigating that issue, when I was checking the inputs to antsRegistration, I realized that antsAffineInitializer gives you different results regardless the number of threads. And that is more surprising. That said, the test case for antsRegistration that I mentioned before is built without antsAffineInitializer (so all inputs to antsRegistration are exactly the same) |
Ha! Good to know I'm not crazy. It's @stnava or @ntustison who would have to decide if such a test case was useful to them. I think I have additional problems with masks that I'm trying to isolate. But - maybe we should keep this thread about antsAffineInitializer? It sounds like your issue there is totally different, and threads was my only guess. |
I don't know if The number of threads definitely has to be constant or the results will differ. |
https://www.nist.gov/sites/default/files/documents/itl/ssd/is/NRE-2015-07-Nguyen_slides.pdf
there are some deep computational issues at hand with such issues that
combine floating point error / randomization / resource availability.
see also
https://itk.org/pipermail/insight-developers/2014-March/023731.html
we tried compensated summation and "cheap" rounding tricks ... in my own
experiments, the latter provided the easiest solution but the approach was
really hacky and probably would not truly fix the problem across platforms
so we abandoned it.
in the end, we know there are reproducibility issues. the reasons go all
the way back to the itk pipeline.
surely there would be a better solution than what we have but no
convergence yet.
brian
…On Thu, May 11, 2017 at 4:17 PM, Philip Cook ***@***.***> wrote:
I don't know if antsAffineInitializer uses random sampling. Have you
tried antsAI? The random seed for that is hard coded.
The number of threads definitely has to be constant or the results will
differ.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#444 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AATyfknaTXNxM2NecsFK09GkyDoCM-d2ks5r42zMgaJpZM4NYfFC>
.
|
@stnava thanks, that is why I held off posting anything about the reproducibility of antsRegistration w.r.t. the number of threads. Is @cookpa |
I just checked that setting |
This PR just enables the general control for number of threads for this tool. I just learned that it works differently depending on it ANTsX/ANTs#444
For posterity, the issues I was having were at least partly to do with specifying |
This makes the output deterministic, at the cost of running unnecessarily slowly. The order of the floating point sums used internally is numerically unstable. See https://github.com/ANTsX/ANTs/wiki/antsRegistration-reproducibility-issues ANTsX/ANTs#444 (comment) ANTsX/ANTsR#210 (comment)
This is also to make the output deterministic, at the cost of running slow. It turned out that using dense sampling wasn't enough; there was still some numerical instability that came from the order of addition: * https://github.com/ANTsX/ANTs/wiki/antsRegistration-reproducibility-issues#variance-due-to-floating-point-precision-errors For some reason it only appeared on OS X, and only about 10% of the time, and never on Linux. I [showed](#2642 (comment)) that the instability in isct_antsSliceRegularizedRegistration did exist on Linux, so something still unknown about how we call it was hiding it there. See: * https://github.com/ANTsX/ANTs/wiki/antsRegistration-reproducibility-issues * ANTsX/ANTs#444 (comment) * ANTsX/ANTsR#210 (comment)
This is also to make the output deterministic, at the cost of running slow. It turned out that using dense sampling wasn't enough; there was still some numerical instability that came from the order of addition: * https://github.com/ANTsX/ANTs/wiki/antsRegistration-reproducibility-issues#variance-due-to-floating-point-precision-errors For some reason it only appeared on OS X, and only about 10% of the time, and never on Linux. I [showed](#2642 (comment)) that the instability in isct_antsSliceRegularizedRegistration did exist on Linux, so something still unknown about how we call it was hiding it there. This was actually supposed to be in place already but the code had atrophied, so all this does is fix it up. See: * https://github.com/ANTsX/ANTs/wiki/antsRegistration-reproducibility-issues * ANTsX/ANTs#444 (comment) * ANTsX/ANTsR#210 (comment)
This is also to make the output deterministic, at the cost of running slow. It turned out that using dense sampling wasn't enough; there was still some numerical instability that came from the order of addition: * https://github.com/ANTsX/ANTs/wiki/antsRegistration-reproducibility-issues#variance-due-to-floating-point-precision-errors For some reason it only appeared on OS X, and only about 10% of the time, and never on Linux. I [showed](#2642 (comment)) that the instability in isct_antsSliceRegularizedRegistration did exist on Linux, so something still unknown about how we call it was hiding it there. This was actually supposed to be in place already but the code had atrophied, so all this does is fix it up. See: * https://github.com/ANTsX/ANTs/wiki/antsRegistration-reproducibility-issues * ANTsX/ANTs#444 (comment) * ANTsX/ANTsR#210 (comment)
This is also to make the output deterministic, at the cost of running slow. It turned out that using dense sampling wasn't enough; there was still some numerical instability that came from the order of addition: * https://github.com/ANTsX/ANTs/wiki/antsRegistration-reproducibility-issues#variance-due-to-floating-point-precision-errors For some reason it only appeared on OS X, and only about 10% of the time, and never on Linux. I [showed](#2642 (comment)) that the instability in isct_antsSliceRegularizedRegistration did exist on Linux, so something still unknown about how we call it was hiding it there. This was actually supposed to be in place already but the code had atrophied, so all this does is fix it up. See: * https://github.com/ANTsX/ANTs/wiki/antsRegistration-reproducibility-issues * ANTsX/ANTs#444 (comment) * ANTsX/ANTsR#210 (comment)
This is also to make the output deterministic, at the cost of running slow. It turned out that using dense sampling wasn't enough; there was still some numerical instability that came from the order of addition: * https://github.com/ANTsX/ANTs/wiki/antsRegistration-reproducibility-issues#variance-due-to-floating-point-precision-errors For some reason it only appeared on OS X, and only about 10% of the time, and never on Linux. I [showed](#2642 (comment)) that the instability in isct_antsSliceRegularizedRegistration did exist on Linux, so something still unknown about how we call it was hiding it there. This was actually supposed to be in place already but the code had atrophied, so all this does is fix it up. See: * https://github.com/ANTsX/ANTs/wiki/antsRegistration-reproducibility-issues * ANTsX/ANTs#444 (comment) * ANTsX/ANTsR#210 (comment)
This is also to make the output deterministic, at the cost of running slow. It turned out that using dense sampling wasn't enough; there was still some numerical instability that came from the order of addition: * https://github.com/ANTsX/ANTs/wiki/antsRegistration-reproducibility-issues#variance-due-to-floating-point-precision-errors For some reason it only appeared on OS X, and only about 10% of the time, and never on Linux. I [showed](#2642 (comment)) that the instability in isct_antsSliceRegularizedRegistration did exist on Linux, so something still unknown about how we call it was hiding it there. This was actually supposed to be in place already but the code had atrophied, so all this does is fix it up. See: * https://github.com/ANTsX/ANTs/wiki/antsRegistration-reproducibility-issues * ANTsX/ANTs#444 (comment) * ANTsX/ANTsR#210 (comment)
I've run 10 times antsAffineInitializer on the same inputs and the 10 resulting transform.mat files are different (judging from their md5 sums). I can imagine the 10 transforms are very close to one another but not exactly the same. Does the transform.mat encode some variable metadata (like date and time) that invalidates checking on the md5 sums?
If checking the md5 is ok, is there a way to get deterministic results from this utility?
Thanks very much
The text was updated successfully, but these errors were encountered: