Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI for FisherExactTest is extremely slow for large numbers #148

Open
ilia-kats opened this issue Feb 5, 2019 · 5 comments
Open

CI for FisherExactTest is extremely slow for large numbers #148

ilia-kats opened this issue Feb 5, 2019 · 5 comments

Comments

@ilia-kats
Copy link

When trying to calculate a confidence interval for FisherExactTest with large numbers, e.g. confint(FisherExactTest(2216,9338, 3172,1335)), Julia gets stuck at 100% CPU for 10-15 minutes. As a comparison, in R fisher.test(matrix(c(2216, 9338, 3712, 1335), ncol=2, byrow=TRUE)) returns instantly and shows a confidence interval. confint(FisherExactTest(226,938, 312,135)) is also slow and crashes with

ERROR: ArgumentError: The interval [a,b] is not a bracketing interval.
You need f(a) and f(b) to have different signs (f(a) * f(b) < 0).
Consider a different bracket or try fzero(f, c) with an initial guess c.

(perhaps related to #122 ?)

@josephmarturano
Copy link

I have also experienced this - FisherExactTest appears unusable when the sum of the arguments exceeds ~5000. The function hangs even without calling confint. I also occasionally see the error "ArgumentError: The interval [a,b] is not a bracketing interval".

I noticed that when I run your example with the time macro:

using HypothesisTests; @time begin; FisherExactTest(2216,9338, 3172,1335); end

I get the following result:
2.333884 seconds (4.72 M allocations: 236.939 MiB, 8.12% gc time)

So the time macro prints the elapsed time after two seconds but there is no result from the FisherExactTest function. I am not familiar enough with the source code to understand the cause.

@nalimilan
Copy link
Member

``

So the time macro prints the elapsed time after two seconds but there is no result from the FisherExactTest function. I am not familiar enough with the source code to understand the cause.

That's probably because the p-value and confidence intervals are only computing when printing the object.

@MitsuhaMiyamizu
Copy link

I have also experienced this - FisherExactTest appears unusable when the sum of the arguments exceeds ~5000. The function hangs even without calling confint. I also occasionally see the error "ArgumentError: The interval [a,b] is not a bracketing interval".

I noticed that when I run your example with the time macro:

using HypothesisTests; @time begin; FisherExactTest(2216,9338, 3172,1335); end

I get the following result: 2.333884 seconds (4.72 M allocations: 236.939 MiB, 8.12% gc time)

So the time macro prints the elapsed time after two seconds but there is no result from the FisherExactTest function. I am not familiar enough with the source code to understand the cause.

Same here, cannot get correct P values for Fisher's Exact Test when args are too large.
FYI:

Fisher's exact test
-------------------
Population details:
    parameter of interest:   Odds ratio
    value under h_0:         1.0
    point estimate:          0.742925
Error showing value of type FisherExactTest:
ERROR: ArgumentError: The interval [a,b] is not a bracketing interval.
You need f(a) and f(b) to have different signs (f(a) * f(b) < 0).
Consider a different bracket or try fzero(f, c) with an initial guess c.

While I can get the correct answer using a C++ version of Fisher's Exact Test:

{0.0008557, 1.825331208626037*10^(-31)}

where the left is the absolute time it consumes, and the right is the P value.

I guess the reason is that the implementation of Julia is different from that of the one which is applied in bedtools?

Digged deeper into the Julia implementation, found that the root finding might be the cause?

@devmotion
Copy link
Member

(perhaps related to #122 ?)

Yes, the bracketing error seems to be a duplicate of #122.

@devmotion
Copy link
Member

BTW the general performance issues should have been fixed by JuliaStats/Distributions.jl#1277.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants