Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: combine_first fails when one series is timezone-aware and the other is emtpy. #41800

Closed
2 of 3 tasks
johnands opened this issue Jun 3, 2021 · 3 comments
Closed
2 of 3 tasks
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions Timezones Timezone data dtype

Comments

@johnands
Copy link

johnands commented Jun 3, 2021

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of pandas.
  • (optional) I have confirmed this bug exists on the master branch of pandas.

Code Sample, a copy-pastable example

import pandas as pd
import numpy as np
from datetime import datetime

>>> time_index = pd.date_range(datetime(2021, 1, 1, 1), datetime(2021, 1, 1, 10), freq='H', tz='Europe/Oslo')
>>> s1 = pd.Series(np.random.random(10), index=time_index)
>>> s2 = pd.Series()
>>> s1.combine_first(s2)

Traceback

AttributeError                            Traceback (most recent call last)
<ipython-input-14-9af79f353bf5> in <module>
----> 1 s1.combine_first(pd.Series())

~/anaconda3/envs/powerml_dev/lib/python3.7/site-packages/pandas/core/series.py in combine_first(self, other)
   2983             other = to_datetime(other)
   2984
-> 2985         return this.where(notna(this), other)
   2986
   2987     def update(self, other) -> None:

~/anaconda3/envs/powerml_dev/lib/python3.7/site-packages/pandas/core/generic.py in where(self, cond, other, inplace, axis, level, errors, try_cast)
   9285         other = com.apply_if_callable(other, self)
   9286         return self._where(
-> 9287             cond, other, inplace, axis, level, errors=errors, try_cast=try_cast
   9288         )
   9289

~/anaconda3/envs/powerml_dev/lib/python3.7/site-packages/pandas/core/generic.py in _where(self, cond, other, inplace, axis, level, errors, try_cast)
   9019         cond = com.apply_if_callable(cond, self)
   9020         if isinstance(cond, NDFrame):
-> 9021             cond, _ = cond.align(self, join="right", broadcast_axis=1)
   9022         else:
   9023             if not hasattr(cond, "shape"):

~/anaconda3/envs/powerml_dev/lib/python3.7/site-packages/pandas/core/series.py in align(self, other, join, axis, level, copy, fill_value, method, limit, fill_axis, broadcast_axis)
   4228             limit=limit,
   4229             fill_axis=fill_axis,
-> 4230             broadcast_axis=broadcast_axis,
   4231         )
   4232

~/anaconda3/envs/powerml_dev/lib/python3.7/site-packages/pandas/core/generic.py in align(self, other, join, axis, level, copy, fill_value, method, limit, fill_axis, broadcast_axis)
   8832                 method=method,
   8833                 limit=limit,
-> 8834                 fill_axis=fill_axis,
   8835             )
   8836         else:  # pragma: no cover

~/anaconda3/envs/powerml_dev/lib/python3.7/site-packages/pandas/core/generic.py in _align_series(self, other, join, axis, level, copy, fill_value, method, limit, fill_axis)
   8985         if is_series or (not is_series and axis == 0):
   8986             if is_datetime64tz_dtype(left.index.dtype):
-> 8987                 if left.index.tz != right.index.tz:
   8988                     if join_index is not None:
   8989                         # GH#33671 ensure we don't change the index on

AttributeError: 'Index' object has no attribute 'tz'

Problem description

Combine fails when one series has a timezone-aware datetime index and the other is empty. This does not happen if s1 has naive datetimes.

Expected output

The output should be equal to s1.

Output of python pd.show_versions()

INSTALLED VERSIONS

commit : 2cb9652
python : 3.7.7.final.0
python-bits : 64
OS : Linux
OS-release : 4.15.0-142-generic
Version : #146-Ubuntu SMP Tue Apr 13 01:11:19 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.2.4
numpy : 1.19.0
pytz : 2020.5
dateutil : 2.8.1
pip : 20.1.1
setuptools : 49.2.0.post20200714
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : 1.3.7
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.8.3 (dt dec pq3 ext lo64)
jinja2 : 2.11.2
IPython : 7.19.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.4.1
numexpr : None
odfpy : None
openpyxl : 3.0.5
pandas_gbq : None
pyarrow : 2.0.0
pyxlsb : None
s3fs : None
scipy : 1.2.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : None
numba : None

@mroeschke mroeschke added Bug Timezones Timezone data dtype labels Aug 21, 2021
@MarcoGorelli MarcoGorelli added Needs Tests Unit test(s) needed to prevent regressions and removed Bug labels Dec 17, 2022
@MarcoGorelli
Copy link
Member

This works now:

In [2]: time_index = pd.date_range(datetime(2021, 1, 1, 1), datetime(2021, 1, 1, 10), freq='H', tz='Europe/Oslo')

In [3]: s1 = pd.Series(np.random.random(10), index=time_index)

In [4]: s2 = pd.Series()

In [5]: s1.combine_first(s2)
Out[5]: 
2021-01-01 01:00:00+01:00    0.859067
2021-01-01 02:00:00+01:00    0.553345
2021-01-01 03:00:00+01:00    0.836079
2021-01-01 04:00:00+01:00    0.695629
2021-01-01 05:00:00+01:00    0.824679
2021-01-01 06:00:00+01:00    0.429457
2021-01-01 07:00:00+01:00    0.917812
2021-01-01 08:00:00+01:00    0.234022
2021-01-01 09:00:00+01:00    0.100995
2021-01-01 10:00:00+01:00    0.239752
dtype: float64

A PR to add a test would be welcome

@MarcoGorelli
Copy link
Member

lpizzinidev added a commit to lpizzinidev/pandas that referenced this issue Jan 11, 2023
lpizzinidev added a commit to lpizzinidev/pandas that referenced this issue Jan 12, 2023
mroeschke pushed a commit that referenced this issue Jan 12, 2023
* TEST: added test case for issue 41800

addresses #41800

* fix lint error

* fix lint error

* added assert_series_equal control and changed date_range call
phofl pushed a commit to phofl/pandas that referenced this issue Jan 12, 2023
* TEST: added test case for issue 41800

addresses pandas-dev#41800

* fix lint error

* fix lint error

* added assert_series_equal control and changed date_range call
phofl pushed a commit to phofl/pandas that referenced this issue Jan 13, 2023
* TEST: added test case for issue 41800

addresses pandas-dev#41800

* fix lint error

* fix lint error

* added assert_series_equal control and changed date_range call
@MarcoGorelli
Copy link
Member

closed in #50677

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions Timezones Timezone data dtype
Projects
None yet
Development

No branches or pull requests

3 participants