-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-2432: [Python] Fix Pandas decimal type conversion with None values #1878
Changes from all commits
916aa8d
89e53fc
dabdcda
ca16983
7c3977d
e60ec09
00a1b4d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -743,7 +743,9 @@ Status NumPyConverter::ConvertDecimals() { | |
|
||
if (type_ == NULLPTR) { | ||
for (PyObject* object : objects) { | ||
RETURN_NOT_OK(max_decimal_metadata.Update(object)); | ||
if (!internal::PandasObjectIsNull(object)) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we care about accepting other NULL-like objects such as There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure, is it possible to get NaNs from operations on Decimals? Or is that something the user might mix in somehow? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Python decimal objects can be nan, unfortunately: >>> import decimal
>>> decimal.Decimal('nan')
Decimal('NaN') There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Seems like it could be NaN also:
|
||
RETURN_NOT_OK(max_decimal_metadata.Update(object)); | ||
} | ||
} | ||
|
||
type_ = | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. By the way, what happens here if all items are None? Do we have a test for that? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll add that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done |
||
|
@@ -758,22 +760,19 @@ Status NumPyConverter::ConvertDecimals() { | |
for (PyObject* object : objects) { | ||
const int is_decimal = PyObject_IsInstance(object, decimal_type_.obj()); | ||
|
||
if (ARROW_PREDICT_FALSE(is_decimal == 0)) { | ||
if (is_decimal == 1) { | ||
Decimal128 value; | ||
RETURN_NOT_OK(internal::DecimalFromPythonDecimal(object, decimal_type, &value)); | ||
RETURN_NOT_OK(builder.Append(value)); | ||
} else if (is_decimal == 0 && internal::PandasObjectIsNull(object)) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same question as above: do we care about other NULL-like values than simply None? |
||
RETURN_NOT_OK(builder.AppendNull()); | ||
} else { | ||
// PyObject_IsInstance could error and set an exception | ||
RETURN_IF_PYERROR(); | ||
std::stringstream ss; | ||
ss << "Error converting from Python objects to Decimal: "; | ||
RETURN_NOT_OK(InvalidConversion(object, "decimal.Decimal", &ss)); | ||
return Status::Invalid(ss.str()); | ||
} else if (ARROW_PREDICT_FALSE(is_decimal == -1)) { | ||
DCHECK_NE(PyErr_Occurred(), nullptr); | ||
RETURN_IF_PYERROR(); | ||
} | ||
|
||
if (internal::PandasObjectIsNull(object)) { | ||
RETURN_NOT_OK(builder.AppendNull()); | ||
} else { | ||
Decimal128 value; | ||
RETURN_NOT_OK(internal::DecimalFromPythonDecimal(object, decimal_type, &value)); | ||
RETURN_NOT_OK(builder.Append(value)); | ||
} | ||
} | ||
return PushBuilderResult(&builder); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's ok to do this in optimized build.
DecimalMetadata
expects you to pass a decimal object. @cpcloud may confirm.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't necessary because I added a check before calling
Update
but it does prevent a seg fault if for some reason it's called with non Decimal objects - which is not nice to get. If it's hurts an optimization though, I can remove it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now we are doing the check twice in optimized builds, which is not nice IMHO.
DecimalMetadata::Update
is a private API so it's up to the caller to provide appropriate input.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you mean remove
PyDecimal_Check
all together? This is only called when the type is not specified by the user and then yes, it will end doing 2 passes over the objects and checks both times if they are decimal. It might be possible to do less checks on the second pass if we keep a list of which ones are decimal objects, but I'm not sure that would be worth it.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, we can optimize later if we find it too slow. The conversion itself is very slow anyway :-)