-
-
Notifications
You must be signed in to change notification settings - Fork 18.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
combine_first not retaining dtypes #7509
Comments
can you ref the issues? |
none of those address empty types (and thus their are prob no tests). care to do a pull-request to put in tests/fix? |
I haven't looked under the hood; I've no clue how to go about fixing this. Also, there seem to be two separate problems here (perhaps I should have opened two issues):
|
the first is not feasible to fix, since The 2nd could be fixed (as why don't you write up some tests then....get's you started :) |
Oh, of course. The In [3]: dfa = pd.DataFrame([[datetime.now(), 2]], columns=['a','b'])
In [4]: dfb = pd.DataFrame([[4]], columns=['b'])
In [5]: dfa.combine_first(dfb).dtypes
Out[5]:
a datetime64[ns]
b int64
dtype: object I should have thought of that. |
When you combine_first the original float32 column is transformed to float64: """Example of Pandas bug."""
from pandas import DataFrame
from numpy import float32
print('-' * 15)
d1 = DataFrame(index=[0])
d1['A'] = [3.5]
d1['A'] = d1['A'].astype(float32)
print(d1.dtypes)
print('-' * 15)
d2 = DataFrame(index=[0])
d2['B'] = [35] # if uncomment this line the result is correct, nonsense for me
d2 = d2.combine_first(d1)
print(d2.dtypes) Current output:
Expected output:
|
This is still a bug in 2020: >>> dfa = pd.DataFrame({"A": [0, 1, 2]}, index=[0, 1, 2])
>>> dfb = pd.DataFrame({"A": [7, 8, 9]}, index=[1, 2, 3])
>>> dfa.combine_first(dfb).dtypes # Expect to see int64
A float64
dtype: object Any plans on addressing this issue? |
pandas is an all volunteer project. if you would like to submit a PR then one of the volunteers would be able to code review |
I think think this is a serious issue. If I had the knowledge to fix it myself I would. Maybe someone from the core developers can have a look |
@danielhrisca there are many 'serious issues' again pandas is all volunteer and there are quite a number of open issues |
There is a dtype promotion going on in take_nd Does it make sense to try there to apply the older dtype or just before returning from |
On a second look the root cause seems to be that the alignment is done with I think that if we check that the two dataframe have identical columns, then we can provide a better @jreback what do you think? |
sure can check if they have identical columns then no action, also |
I found a number of issues that seemed related, all closed over a year ago, but there still seem to be some inconsistencies here:
The text was updated successfully, but these errors were encountered: