-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discretizer gives error with NaNs #39
Comments
This happens when a continuous variable has
This should be printed in the log (how many infs there were). The replacement has to happen before any preprocessing starts (the error is caused because pandas interval cannot handle these values), probably in this function:
|
After more investigation, I was able to reproduce the bug and the conclusion from the previous comment is wrong. Below correct source. DescriptionWhat actually happens is that during binning, sorting of the bin edges does not always produce the same results.
So the unsorted bins look like this:
And here the unsorted bins look like this: However, after sorting, the results is different:
You can easily test this:
Interestingly when the float is smaller (04): Now, when we construct the interval index
but for
And this raises the error during Extra sources: |
(reported by user, to be investigated)
When fitting the preprocessor, some continuous variables gave Value Error.
I suspect this happends if a continuous variables of type np.float64 has only NaN values (or in one of the splits).
The error probably is raised because pandas cannot set interval index properly.
To be investigated (attached picture and traceback in text file)
traceback.txt
The text was updated successfully, but these errors were encountered: