Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-deterministic result of antsAffineInitializer #444

Closed
oesteban opened this issue May 11, 2017 · 8 comments
Closed

Non-deterministic result of antsAffineInitializer #444

oesteban opened this issue May 11, 2017 · 8 comments

Comments

@oesteban
Copy link

I've run 10 times antsAffineInitializer on the same inputs and the 10 resulting transform.mat files are different (judging from their md5 sums). I can imagine the 10 transforms are very close to one another but not exactly the same. Does the transform.mat encode some variable metadata (like date and time) that invalidates checking on the md5 sums?

If checking the md5 is ok, is there a way to get deterministic results from this utility?

Thanks very much

@spinicist
Copy link
Contributor

spinicist commented May 11, 2017 via email

@oesteban
Copy link
Author

@spinicist that's hilarious: I had a very long issue already written, ready to click on the submit button about the reproducibility of antsRegistration w.r.t. the number of threads. I have that test case, I can post it if you want. Bit-to-bit reproducibility of parallel computing is not trivial, so I thought I would need to set some tolerance and held off posting the issue.

While investigating that issue, when I was checking the inputs to antsRegistration, I realized that antsAffineInitializer gives you different results regardless the number of threads. And that is more surprising.

That said, the test case for antsRegistration that I mentioned before is built without antsAffineInitializer (so all inputs to antsRegistration are exactly the same)

@spinicist
Copy link
Contributor

Ha! Good to know I'm not crazy. It's @stnava or @ntustison who would have to decide if such a test case was useful to them. I think I have additional problems with masks that I'm trying to isolate.

But - maybe we should keep this thread about antsAffineInitializer? It sounds like your issue there is totally different, and threads was my only guess.

@cookpa
Copy link
Member

cookpa commented May 11, 2017

I don't know if antsAffineInitializer uses random sampling. Have you tried antsAI? The random seed for that is hard coded.

The number of threads definitely has to be constant or the results will differ.

@stnava
Copy link
Member

stnava commented May 11, 2017 via email

@oesteban
Copy link
Author

@stnava thanks, that is why I held off posting anything about the reproducibility of antsRegistration w.r.t. the number of threads. Is antsAffineInitializer affected by the very same issue? I'm under the impression that here we are facing a different problem.

@cookpa antsAI is not in the previous 2.1.0 release, and since there are no binaries (yet) for the latest release, I would need to hold on. I don't see any random sampling in the antsAffineInitializer code.

@oesteban
Copy link
Author

I just checked that setting ITK_GLOBAL_DEFAULT_NUMBER_OF_THREADS=1 yields a deterministic result for antsAffineInitializer. So we are talking of the same reproducibility issue of antsRegistration, and, in general, the parallelization of floating-point calculations. Thank you all for your answers.

oesteban added a commit to oesteban/nipype that referenced this issue May 11, 2017
This PR just enables the general control for number of threads for
this tool. I just learned that it works differently depending on it
ANTsX/ANTs#444
@spinicist
Copy link
Contributor

For posterity, the issues I was having were at least partly to do with specifying --float in my antsRegistration call. Using double precision appears to make my metric values stable across runs with a limited number of threads (4). This is completely unsurprising! I should never have got into the habit of specifying --float in the first place.

kousu added a commit to spinalcordtoolbox/spinalcordtoolbox that referenced this issue May 4, 2020
This makes the output deterministic, at the cost of running unnecessarily slowly.

The order of the floating point sums used internally is numerically unstable.

See

https://github.com/ANTsX/ANTs/wiki/antsRegistration-reproducibility-issues
ANTsX/ANTs#444 (comment)
ANTsX/ANTsR#210 (comment)
kousu added a commit to spinalcordtoolbox/spinalcordtoolbox that referenced this issue May 4, 2020
This is also to make the output deterministic, at the cost of running slow.

It turned out that using dense sampling wasn't enough; there was still some
numerical instability that came from the order of addition:

* https://github.com/ANTsX/ANTs/wiki/antsRegistration-reproducibility-issues#variance-due-to-floating-point-precision-errors

For some reason it only appeared on OS X, and only about 10% of the time, and never on Linux.
I [showed](#2642 (comment))
that the instability in isct_antsSliceRegularizedRegistration did exist on Linux, so something
still unknown about how we call it was hiding it there.

See:

* https://github.com/ANTsX/ANTs/wiki/antsRegistration-reproducibility-issues
* ANTsX/ANTs#444 (comment)
* ANTsX/ANTsR#210 (comment)
kousu added a commit to spinalcordtoolbox/spinalcordtoolbox that referenced this issue May 4, 2020
This is also to make the output deterministic, at the cost of running slow.

It turned out that using dense sampling wasn't enough; there was still some
numerical instability that came from the order of addition:

* https://github.com/ANTsX/ANTs/wiki/antsRegistration-reproducibility-issues#variance-due-to-floating-point-precision-errors

For some reason it only appeared on OS X, and only about 10% of the time, and never on Linux.
I [showed](#2642 (comment))
that the instability in isct_antsSliceRegularizedRegistration did exist on Linux, so something
still unknown about how we call it was hiding it there.

This was actually supposed to be in place already but the code had atrophied,
so all this does is fix it up.

See:

* https://github.com/ANTsX/ANTs/wiki/antsRegistration-reproducibility-issues
* ANTsX/ANTs#444 (comment)
* ANTsX/ANTsR#210 (comment)
kousu added a commit to spinalcordtoolbox/spinalcordtoolbox that referenced this issue May 9, 2020
This is also to make the output deterministic, at the cost of running slow.

It turned out that using dense sampling wasn't enough; there was still some
numerical instability that came from the order of addition:

* https://github.com/ANTsX/ANTs/wiki/antsRegistration-reproducibility-issues#variance-due-to-floating-point-precision-errors

For some reason it only appeared on OS X, and only about 10% of the time, and never on Linux.
I [showed](#2642 (comment))
that the instability in isct_antsSliceRegularizedRegistration did exist on Linux, so something
still unknown about how we call it was hiding it there.

This was actually supposed to be in place already but the code had atrophied,
so all this does is fix it up.

See:

* https://github.com/ANTsX/ANTs/wiki/antsRegistration-reproducibility-issues
* ANTsX/ANTs#444 (comment)
* ANTsX/ANTsR#210 (comment)
kousu added a commit to spinalcordtoolbox/spinalcordtoolbox that referenced this issue May 11, 2020
This is also to make the output deterministic, at the cost of running slow.

It turned out that using dense sampling wasn't enough; there was still some
numerical instability that came from the order of addition:

* https://github.com/ANTsX/ANTs/wiki/antsRegistration-reproducibility-issues#variance-due-to-floating-point-precision-errors

For some reason it only appeared on OS X, and only about 10% of the time, and never on Linux.
I [showed](#2642 (comment))
that the instability in isct_antsSliceRegularizedRegistration did exist on Linux, so something
still unknown about how we call it was hiding it there.

This was actually supposed to be in place already but the code had atrophied,
so all this does is fix it up.

See:

* https://github.com/ANTsX/ANTs/wiki/antsRegistration-reproducibility-issues
* ANTsX/ANTs#444 (comment)
* ANTsX/ANTsR#210 (comment)
kousu added a commit to spinalcordtoolbox/spinalcordtoolbox that referenced this issue May 17, 2020
This is also to make the output deterministic, at the cost of running slow.

It turned out that using dense sampling wasn't enough; there was still some
numerical instability that came from the order of addition:

* https://github.com/ANTsX/ANTs/wiki/antsRegistration-reproducibility-issues#variance-due-to-floating-point-precision-errors

For some reason it only appeared on OS X, and only about 10% of the time, and never on Linux.
I [showed](#2642 (comment))
that the instability in isct_antsSliceRegularizedRegistration did exist on Linux, so something
still unknown about how we call it was hiding it there.

This was actually supposed to be in place already but the code had atrophied,
so all this does is fix it up.

See:

* https://github.com/ANTsX/ANTs/wiki/antsRegistration-reproducibility-issues
* ANTsX/ANTs#444 (comment)
* ANTsX/ANTsR#210 (comment)
jcohenadad pushed a commit to spinalcordtoolbox/spinalcordtoolbox that referenced this issue May 22, 2020
This is also to make the output deterministic, at the cost of running slow.

It turned out that using dense sampling wasn't enough; there was still some
numerical instability that came from the order of addition:

* https://github.com/ANTsX/ANTs/wiki/antsRegistration-reproducibility-issues#variance-due-to-floating-point-precision-errors

For some reason it only appeared on OS X, and only about 10% of the time, and never on Linux.
I [showed](#2642 (comment))
that the instability in isct_antsSliceRegularizedRegistration did exist on Linux, so something
still unknown about how we call it was hiding it there.

This was actually supposed to be in place already but the code had atrophied,
so all this does is fix it up.

See:

* https://github.com/ANTsX/ANTs/wiki/antsRegistration-reproducibility-issues
* ANTsX/ANTs#444 (comment)
* ANTsX/ANTsR#210 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants