Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fischer's Exact test does extraneous calculations when printing #224

Closed
pdeffebach opened this issue Feb 4, 2021 · 6 comments
Closed

Fischer's Exact test does extraneous calculations when printing #224

pdeffebach opened this issue Feb 4, 2021 · 6 comments

Comments

@pdeffebach
Copy link
Contributor

See this discussion on Reddit.

using HypothesisTests
HypothesisTests.FisherExactTest(777,888,999,1000);

With the ; the calculation is very fast.

But when we omit the ; it takes a very long time to print the result, because it's calculating other things, like computing the p-value, on the fly.

@pdeffebach
Copy link
Contributor Author

pdeffebach commented Feb 6, 2021

The cause is the calculation of the confidence intervals.

In particular, the calculation of a unit root.

MWE

julia> x = FisherExactTest(777,888,999,1000);

julia> level = .95; tail = :right;

julia> dist(ω) = HypothesisTests.FisherNoncentralHypergeometric(x.a+x.b, x.c+x.d, x.a+x.c, ω);

julia> obj(ω) = pvalue(dist(ω), x.a, tail=tail) - (1-level);

julia> using BenchmarkTools;

julia> HypothesisTests.fzero(obj, HypothesisTests.find_brackets(obj)...)
0.7835029530244386

julia> @time HypothesisTests.fzero(obj, HypothesisTests.find_brackets(obj)...)
 19.567936 seconds (51.44 k allocations: 640.463 MiB, 0.15% gc time)
0.7835029530244386

The HypothesisTests.find_brackets(obj) line returns values for which obj(x1) < 0 and obj(x2) > , so fzero knows which interval to search within.

Frustratingly, our implementation of confint is almost exactly the same as R's here. They also find a root with anonymous functions.

However the R code generates an extra branch based on the value of the p value calculated from the hypergeometric function.

This is probably above my paygrade so I am cc-ing @andreasnoack as someone who might have deeper insight into what is going on.

R code to verify is

x = fisher.test(matrix (c(777,888,999,1000),ncol=2,byrow=T))    

@andreasnoack
Copy link
Member

I think there are two issues here. One is that the show method is doing heavy computations. It's a design issue here in HypothesisTests and we should try to get rid of that behavior. Anything expensive should be computed before it's shown and if it's too expensive to always compute it then it shouldn't always be printed.

The other issue is the cdf method for FisherNoncentralHypergeometric since it evaluates the pdf of all outcomes at and below the argument. We should probably consider modifying it to use the tricks mentioned in

Liao, J. G; Rosen, Ori (2001). Fast and Stable Algorithms for Computing and Sampling From the Noncentral Hypergeometric Distribution. The American Statistician, 55(4), 366–369. doi:10.1198/000313001753272547

to reduce the costs.

@pdeffebach
Copy link
Contributor Author

I don't think the first issue is what's going on. R prints the same information that is costly to compute in Julia, the confidence interval.

So if we get our calculation of the confidence interval to be as fast as R's, the show method would not be an issue.

@andreasnoack
Copy link
Member

I decided to implement the method of Liao and Rosen. With JuliaStats/Distributions.jl#1277 I'm getting

julia> @time show(x);
Fisher's exact test
-------------------
Population details:
    parameter of interest:   Odds ratio
    value under h_0:         1.0
    point estimate:          0.875908
    95% confidence interval: (0.7673, 0.9999)

Test summary:
    outcome with 95% confidence: reject h_0
    two-sided p-value:           0.0497

Details:
    contingency table:
        777   888
        999  1000
  0.002868 seconds (627 allocations: 20.422 KiB)

instead of 23 seconds on my machine. @pdeffebach it would be great if you could review the PR.

@pdeffebach
Copy link
Contributor Author

I can confirm that this works on Distributions.jl master.

Does this deserve a patch release?

@andreasnoack
Copy link
Member

I've made a patch release of Distributions, see JuliaRegistries/General#30355, so the fix will be available shortly. Nothing should be needed here so I'll close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants