-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
qcut bin formatting bugs #1979
Comments
related also #1978 |
Could you give some sufficiently anonymized data that illustrates these issues? |
Sure. please allow me some time to post it. array([[0.6, 0.991], (0.991, 0.992], (0.992, 0.993], (0.993, 0.993], Note (0.993, 0.993], and (0.998, 0.1] |
1.001 |
cool, can you give me the qcut function calls and arguments that fail? |
For case(3), using the above data as a column tmpcol array([[0.6, 0.991], (0.991, 0.992], (0.992, 0.993], (0.993, 0.993], Note (0.993, 0.993], and (0.998, 0.1], seems two kind of different errors case (2) can be replicated by executing qcut on any column containing 2 distinct values |
I'm reading CSV tables and then use qcut to bin the continuous numbers. Given the number of bins, most of the time the qcut works fine. However, some times, it doesn't function correctly. The following are some weird cases I got in the result levels index. Sorry I can't give the experimental data now because I'm running it on a very large data collections and currently I'm not allowed to public the data
(1) When dealing with negative numbers, sometimes it gives [-117.-1, 1], This happens at the first bin index in my test case.
(2) when bin# is two, it gives results like [1, 2005], (2005, 2005]
(3)a bin result like this: please note the value 0.993, and value 0.1
array([[0.6, 0.991], (0.991, 0.992], (0.992, 0.993], (0.993, 0.993],
(0.993, 0.994], (0.994, 0.995], (0.995, 0.996], (0.996, 0.997],
(0.997, 0.998], (0.998, 0.1], (0.1, 1.2], (1.2, 1.5], (1.5, 2.1],
(2.1, 3.5], (3.5, 5.2], (5.2, 7.1], (7.1, 8.5], (8.5, 11],
(11, 13.7], (13.7, 65.8]], dtype=object)
(2) and (3) might be caused by the same reason
I'm using pandas 0.8.1 thanks.
The text was updated successfully, but these errors were encountered: