[WIP] update type inference for string columns #343
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
Closes #249
I've noticed that we have a type deduction error in case the field had NaNs as described in the attached issue.
I furthur noticed that we don't check if the string columns can be casted to a float/int type. I've added extra check to see id a string column can be casted to an int/float column and deduce the proper data_type accordingly.
Update (04/10/2021)
execute_binning
is_numeric_nan_column
to run 2x fasterChanges
I've added a logic to
Example Output
The result shows that the two columns mentioned in the issue:
# Instances
and# Attributes
have a correct data_type now