Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Series(DTI/TDI) loses the frequency information from original DTI/TDI #20937

Open
jschendel opened this issue May 2, 2018 · 3 comments
Open
Assignees
Labels
Bug Datetime Datetime data dtype freq retention User expects "freq" attribute to be preserved Frequency DateOffsets Needs Discussion Requires discussion from core team before further action

Comments

@jschendel
Copy link
Member

Code Sample, a copy-pastable example if possible

DTI example below, same general behavior occurs for TDI:

In [2]: dti = pd.date_range('20180101', periods=3, freq='B')

In [3]: dti.freq
Out[3]: <BusinessDay>

In [4]: s = pd.Series(dti)

In [5]: s.dt.freq
Out[5]: 'D'

This appears to be caused by the dt accessor overriding the freq definition, which was done for perf reasons in #17210:

@property
def freq(self):
return self._get_values().inferred_freq

If I remove the code above, the resulting frequency of the Series is None. This is consistent with one of the error messages displayed when trying to add an integer to the Series, which works at the DTI/TDI level when a frequency is present:

In [6]: dti + 4
Out[6]: DatetimeIndex(['2018-01-05', '2018-01-08', '2018-01-09'], dtype='datetime64[ns]', freq='B')

In [7]: s + 4
---------------------------------------------------------------------------
NullFrequencyError: Cannot shift with no freq

During handling of the above exception, another exception occurred:

TypeError: incompatible type for a datetime/timedelta operation [add]

Problem description

Series ignores the original frequency, and subsequent frequency dependent operations do not work.

Expected Output

I'd expect frequency to be retained, and frequency dependent operations to work when one is set.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: b02c69a
python: 3.6.1.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.23.0rc2
pytest: 3.1.2
pip: 9.0.1
setuptools: 39.0.1
Cython: 0.28.2
numpy: 1.13.3
scipy: 1.0.0
pyarrow: 0.6.0
xarray: 0.9.6
IPython: 6.1.0
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: 0.4.0
matplotlib: 2.0.2
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 0.9.8
lxml: 3.8.0
bs4: None
html5lib: 0.999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: 0.1.5
pandas_gbq: None
pandas_datareader: None

@jschendel
Copy link
Member Author

Not really sure if there's much benefit to actually implementing this. AFAIK, the only thing this currently impacts is add/sub, since the other freq dependent method for DTI/TDI is shift, but that has another meaning for Series.

Primarily documenting this for #20892, as this issue prevents set_freq from working with the dt accessor.

@jorisvandenbossche
Copy link
Member

Not really sure if there's much benefit to actually implementing this. AFAIK

Yes, I would also raise the question to what extent this property is needed on a Series.
As we could also say that .freq is something specific for a DatetimeIndex.

Apart from shift, are there other use cases that we would be preventing here? The only thing I can think of is if you would want to check the frequency of the data, and then would need to infer this each time again. But, this is already like that in the implementation ..

@jbrockmendel
Copy link
Member

"Fixing" this would require effectively making freq part of the dtype.

Upsides:

  • ser.shift would behave more nicely at the endpoints
  • Make Series[datetime64.freq] behave more like Index[datetime64.freq]
  • Series addition would broadcast correctly, i.e. ser + other == Series([x + other for x in ser]) (ATM this holds for DatetimeIndex but not for Series)
  • In some sense the dtype is the "natural" place to hold information that affects how arithmetic behaves.

Downsides:

  • Implementation is very much non-trivial
  • changing a freq attribute would be messy, exactly how messy depending on whether it is actually part of the dtype or "next to" the `dtype.
  • It's somewhat ambiguous whether the freq should be retained when a Series[datetime64.freq] is modified.

@jbrockmendel jbrockmendel self-assigned this Jan 28, 2019
@mroeschke mroeschke added the Frequency DateOffsets label Apr 1, 2020
@mroeschke mroeschke added the Needs Discussion Requires discussion from core team before further action label Jun 19, 2021
@jbrockmendel jbrockmendel added the freq retention User expects "freq" attribute to be preserved label Aug 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype freq retention User expects "freq" attribute to be preserved Frequency DateOffsets Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

No branches or pull requests

5 participants