Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The quantile function can return incorrect results for integer arrays (Int8, Int16, Int32) #119

Closed
yurivish opened this issue Jun 27, 2022 · 1 comment

Comments

@yurivish
Copy link

yurivish commented Jun 27, 2022

For example, using quantile to compute the maximum of the array [-128, 127] incorrectly returns -129 if the array consists of Int8 values:

julia> using Statistics

julia> quantile([-128, 127], 1)     # right
127

julia> quantile(Int8[-128, 127], 1) # wrong
-129

This happens because of the following code in _quantile:

return a + γ*(b-a)

For the above array, that line computes the following expression, causing the incorrect result:

julia> Int8(-128) + Int64(Int8(127) - Int8(-128))
-129

This behavior occurs with arrays of Int8, Int16, and Int32:

julia> function test(type)
           A = [typemin(type), typemax(type)]
           result = quantile(A, 1)
           correct_result = quantile(map(Int, A), 1)
           (; result, correct_result)
       end;

julia> test(Int8)
(result = -129, correct_result = 127)

julia> test(Int16)
(result = -32769, correct_result = 32767)

julia> test(Int32)
(result = -2147483649, correct_result = 2147483647)

Using quantile to compute the median also returns an answer that is not correct:

julia> quantile(Int8[-128, 127], .5)
-128.5

The equivalent function in NumPy, np.quantile, returns the correct result in all cases:

julia> using PyCall

julia> np = pyimport("numpy");

julia> a = np.array(Int8[-128, 127], dtype=np.int8);

julia> np.quantile(a, 1)
127

julia> function test_python()
	quantiles = (1, 0.5)
	types = (
		Int8 => np.int8,
		Int16 => np.int16,
		Int32 => np.int32
	)
	for q in quantiles, (type, dtype) in types
	    A = np.array([typemin(type), typemax(type)]; dtype)
	    result = np.quantile(A, q)
	    correct_result = quantile(map(Int, A), q)
	    println((; q, type, result, correct_result))
    end
end;

julia> test_python()
(q = 1, type = Int8, result = 127, correct_result = 127)
(q = 1, type = Int16, result = 32767, correct_result = 32767)
(q = 1, type = Int32, result = 2147483647, correct_result = 2147483647)
(q = 0.5, type = Int8, result = -0.5, correct_result = -0.5)
(q = 0.5, type = Int16, result = -0.5, correct_result = -0.5)
(q = 0.5, type = Int32, result = -0.5, correct_result = -0.5)
@yurivish yurivish changed the title The quantile function can return incorrect results for some integer arrays (Int8, Int16, Int32) The quantile function can return incorrect results for integer arrays (Int8, Int16, Int32) Jun 27, 2022
@nalimilan
Copy link
Member

Fixed by #145.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants