Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DEPR: Deprecate pandas/io/date_converters.py #35741

Merged
merged 21 commits into from
Sep 12, 2020
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
4281f16
DEPR: Deprecate parse_date_time in pandas.io.date_converters and upda…
avinashpancham Aug 15, 2020
852d4d9
DEPR: Deprecate parse_date_fields in pandas.io.date_converters and up…
avinashpancham Aug 15, 2020
eed8f16
DEPR: Deprecate parse_all_fields in pandas.io.date_converters and upd…
avinashpancham Aug 15, 2020
65ec570
DEPR: Deprecate generic_parser in pandas.io.date_converters
avinashpancham Aug 15, 2020
344e5de
DOC: Update docstrings
avinashpancham Aug 15, 2020
04d3416
DOC: remove mentions of the generic_parser functionality in the docum…
avinashpancham Aug 15, 2020
3df6bf9
CLN: remove date_parser argument from test where it is not necessary
avinashpancham Aug 15, 2020
c995111
TYP: Add MutableMapping to type hinting for pd.to_datetime
avinashpancham Aug 15, 2020
ef02e73
ENH: Add overloading to pd.to_datetime for MutableMapping
avinashpancham Aug 15, 2020
3376f49
TST: Update tests where generic_parser is used
avinashpancham Aug 18, 2020
71297cf
TST: Update tests for generic_parser in line with pandas styleguide
avinashpancham Aug 18, 2020
a4a3203
Assert warnings in test_parse_dates.py instead of filtering warnings
avinashpancham Aug 20, 2020
8d970ca
Update deprecated version and remove double doc string text
avinashpancham Aug 25, 2020
2033dc3
Revert to the old implementation of the date_converters function and …
avinashpancham Aug 25, 2020
8f90441
Remove superfluous noqa statement
avinashpancham Aug 26, 2020
2a2a271
Only assert warnings for lines that produce a warning
avinashpancham Aug 26, 2020
427bbf0
Add pytest parametrize to test old and new dateparser function
avinashpancham Aug 29, 2020
b4ed5be
Merge branch 'master' of https://github.com/pandas-dev/pandas into de…
avinashpancham Sep 5, 2020
2d80bdc
Add whatsnew entry
avinashpancham Sep 6, 2020
1458bc7
Update references in whatsnew message
avinashpancham Sep 12, 2020
a33a604
Merge remote-tracking branch 'upstream/master' into deprecate_date_co…
avinashpancham Sep 12, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 1 addition & 14 deletions doc/source/user_guide/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -927,7 +927,7 @@ take full advantage of the flexibility of the date parsing API:
.. ipython:: python

df = pd.read_csv('tmp.csv', header=None, parse_dates=date_spec,
date_parser=pd.io.date_converters.parse_date_time)
date_parser=pd.to_datetime)
df

Pandas will try to call the ``date_parser`` function in three different ways. If
Expand All @@ -939,11 +939,6 @@ an exception is raised, the next one is tried:
2. If #1 fails, ``date_parser`` is called with all the columns
concatenated row-wise into a single array (e.g., ``date_parser(['2013 1', '2013 2'])``).

3. If #2 fails, ``date_parser`` is called once for every row with one or more
string arguments from the columns indicated with `parse_dates`
(e.g., ``date_parser('2013', '1')`` for the first row, ``date_parser('2013', '2')``
for the second, etc.).

Note that performance-wise, you should try these methods of parsing dates in order:

1. Try to infer the format using ``infer_datetime_format=True`` (see section below).
Expand All @@ -955,14 +950,6 @@ Note that performance-wise, you should try these methods of parsing dates in ord
For optimal performance, this should be vectorized, i.e., it should accept arrays
as arguments.

You can explore the date parsing functionality in
`date_converters.py <https://github.com/pandas-dev/pandas/blob/master/pandas/io/date_converters.py>`__
and add your own. We would love to turn this module into a community supported
set of date/time parsers. To get you started, ``date_converters.py`` contains
functions to parse dual date and time columns, year/month/day columns,
and year/month/day/hour/minute/second columns. It also contains a
``generic_parser`` function so you can curry it with a function that deals with
a single date rather than the entire array.

.. ipython:: python
:suppress:
Expand Down
62 changes: 62 additions & 0 deletions pandas/io/date_converters.py
Original file line number Diff line number Diff line change
@@ -1,23 +1,71 @@
"""This module is designed for community supported date conversion functions"""
import warnings

import numpy as np

from pandas._libs.tslibs import parsing


def parse_date_time(date_col, time_col):
"""
Parse columns with dates and times into a single datetime column.

.. deprecated:: 1.2
"""
warnings.warn(
"""
Use pd.to_datetime(date_col + " " + time_col) instead to get a Pandas Series.
Use pd.to_datetime(date_col + " " + time_col).to_pydatetime() instead to get a Numpy array.
""", # noqa: E501
FutureWarning,
stacklevel=2,
)
date_col = _maybe_cast(date_col)
time_col = _maybe_cast(time_col)
return parsing.try_parse_date_and_time(date_col, time_col)


def parse_date_fields(year_col, month_col, day_col):
"""
Parse columns with years, months and days into a single date column.

.. deprecated:: 1.2
"""
warnings.warn(
"""
Use pd.to_datetime({"year": year_col, "month": month_col, "day": day_col}) instead to get a Pandas Series.
Use ser = pd.to_datetime({"year": year_col, "month": month_col, "day": day_col}) and
np.array([s.to_pydatetime() for s in ser]) instead to get a Numpy array.
""", # noqa: E501
FutureWarning,
stacklevel=2,
)

year_col = _maybe_cast(year_col)
month_col = _maybe_cast(month_col)
day_col = _maybe_cast(day_col)
return parsing.try_parse_year_month_day(year_col, month_col, day_col)


def parse_all_fields(year_col, month_col, day_col, hour_col, minute_col, second_col):
"""
Parse columns with datetime information into a single datetime column.

.. deprecated:: 1.2
"""

warnings.warn(
"""
Use pd.to_datetime({"year": year_col, "month": month_col, "day": day_col,
"hour": hour_col, "minute": minute_col, second": second_col}) instead to get a Pandas Series.
Use ser = pd.to_datetime({"year": year_col, "month": month_col, "day": day_col,
"hour": hour_col, "minute": minute_col, second": second_col}) and
np.array([s.to_pydatetime() for s in ser]) instead to get a Numpy array.
""", # noqa: E501
FutureWarning,
stacklevel=2,
)

year_col = _maybe_cast(year_col)
month_col = _maybe_cast(month_col)
day_col = _maybe_cast(day_col)
Expand All @@ -30,6 +78,20 @@ def parse_all_fields(year_col, month_col, day_col, hour_col, minute_col, second_


def generic_parser(parse_func, *cols):
"""
Use dateparser to parse columns with data information into a single datetime column.

.. deprecated:: 1.2
"""

warnings.warn(
"""
Use pd.to_datetime instead.
""", # noqa: E501
MarcoGorelli marked this conversation as resolved.
Show resolved Hide resolved
FutureWarning,
stacklevel=2,
)

N = _check_columns(cols)
results = np.empty(N, dtype=object)

Expand Down
45 changes: 18 additions & 27 deletions pandas/tests/io/parser/test_parse_dates.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,6 @@
import pandas._testing as tm
from pandas.core.indexes.datetimes import date_range

import pandas.io.date_converters as conv

# constant
_DEFAULT_DATETIME = datetime(1, 1, 1)

Expand Down Expand Up @@ -383,11 +381,7 @@ def test_multiple_date_cols_int_cast(all_parsers):
parser = all_parsers

result = parser.read_csv(
StringIO(data),
header=None,
date_parser=conv.parse_date_time,
parse_dates=parse_dates,
prefix="X",
StringIO(data), header=None, parse_dates=parse_dates, prefix="X",
)
expected = DataFrame(
[
Expand Down Expand Up @@ -808,7 +802,9 @@ def test_parse_dates_custom_euro_format(all_parsers, kwargs):
tm.assert_frame_equal(df, expected)
else:
msg = "got an unexpected keyword argument 'day_first'"
with pytest.raises(TypeError, match=msg):
with pytest.raises(TypeError, match=msg), tm.assert_produces_warning(
FutureWarning
):
parser.read_csv(
StringIO(data),
names=["time", "Q", "NTU"],
Expand Down Expand Up @@ -1175,10 +1171,7 @@ def test_parse_date_time_multi_level_column_name(all_parsers):
"""
parser = all_parsers
result = parser.read_csv(
StringIO(data),
header=[0, 1],
parse_dates={"date_time": [0, 1]},
date_parser=conv.parse_date_time,
StringIO(data), header=[0, 1], parse_dates={"date_time": [0, 1]},
)

expected_data = [
Expand Down Expand Up @@ -1263,7 +1256,7 @@ def test_parse_date_time_multi_level_column_name(all_parsers):
)
def test_parse_date_time(all_parsers, data, kwargs, expected):
parser = all_parsers
result = parser.read_csv(StringIO(data), date_parser=conv.parse_date_time, **kwargs)
result = parser.read_csv(StringIO(data), **kwargs)
avinashpancham marked this conversation as resolved.
Show resolved Hide resolved

# Python can sometimes be flaky about how
# the aggregated columns are entered, so
Expand All @@ -1275,12 +1268,7 @@ def test_parse_date_time(all_parsers, data, kwargs, expected):
def test_parse_date_fields(all_parsers):
parser = all_parsers
data = "year,month,day,a\n2001,01,10,10.\n2001,02,1,11."
result = parser.read_csv(
StringIO(data),
header=0,
parse_dates={"ymd": [0, 1, 2]},
date_parser=conv.parse_date_fields,
)
result = parser.read_csv(StringIO(data), header=0, parse_dates={"ymd": [0, 1, 2]},)
avinashpancham marked this conversation as resolved.
Show resolved Hide resolved

expected = DataFrame(
[[datetime(2001, 1, 10), 10.0], [datetime(2001, 2, 1), 11.0]],
Expand All @@ -1290,6 +1278,7 @@ def test_parse_date_fields(all_parsers):


def test_parse_date_all_fields(all_parsers):

parser = all_parsers
data = """\
year,month,day,hour,minute,second,a,b
Expand All @@ -1299,7 +1288,7 @@ def test_parse_date_all_fields(all_parsers):
result = parser.read_csv(
StringIO(data),
header=0,
date_parser=conv.parse_all_fields,
date_parser=lambda x: pd.to_datetime(x, format="%Y %m %d %H %M %S"),
parse_dates={"ymdHMS": [0, 1, 2, 3, 4, 5]},
)
expected = DataFrame(
Expand All @@ -1322,7 +1311,8 @@ def test_datetime_fractional_seconds(all_parsers):
result = parser.read_csv(
StringIO(data),
header=0,
date_parser=conv.parse_all_fields,
# date_parser=conv.parse_all_fields,
date_parser=lambda x: pd.to_datetime(x, format="%Y %m %d %H %M %S.%f"),
parse_dates={"ymdHMS": [0, 1, 2, 3, 4, 5]},
)
expected = DataFrame(
Expand All @@ -1339,12 +1329,13 @@ def test_generic(all_parsers):
parser = all_parsers
data = "year,month,day,a\n2001,01,10,10.\n2001,02,1,11."

result = parser.read_csv(
StringIO(data),
header=0,
parse_dates={"ym": [0, 1]},
date_parser=lambda y, m: date(year=int(y), month=int(m), day=1),
)
with tm.assert_produces_warning(FutureWarning, check_stacklevel=False):
result = parser.read_csv(
StringIO(data),
header=0,
parse_dates={"ym": [0, 1]},
date_parser=lambda y, m: date(year=int(y), month=int(m), day=1),
)
expected = DataFrame(
[[date(2001, 1, 1), 10, 10.0], [date(2001, 2, 1), 1, 11.0]],
columns=["ym", "day", "a"],
Expand Down
21 changes: 12 additions & 9 deletions pandas/tests/io/test_date_converters.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,22 +8,24 @@


def test_parse_date_time():

dates = np.array(["2007/1/3", "2008/2/4"], dtype=object)
times = np.array(["05:07:09", "06:08:00"], dtype=object)
expected = np.array([datetime(2007, 1, 3, 5, 7, 9), datetime(2008, 2, 4, 6, 8, 0)])

result = conv.parse_date_time(dates, times)
tm.assert_numpy_array_equal(result, expected)
with tm.assert_produces_warning(FutureWarning):
result = conv.parse_date_time(dates, times)
tm.assert_numpy_array_equal(result, expected)
MarcoGorelli marked this conversation as resolved.
Show resolved Hide resolved


def test_parse_date_fields():
days = np.array([3, 4])
months = np.array([1, 2])
years = np.array([2007, 2008])
result = conv.parse_date_fields(years, months, days)

expected = np.array([datetime(2007, 1, 3), datetime(2008, 2, 4)])
tm.assert_numpy_array_equal(result, expected)

with tm.assert_produces_warning(FutureWarning):
result = conv.parse_date_fields(years, months, days)
tm.assert_numpy_array_equal(result, expected)


def test_parse_all_fields():
Expand All @@ -34,7 +36,8 @@ def test_parse_all_fields():
days = np.array([3, 4])
years = np.array([2007, 2008])
months = np.array([1, 2])

result = conv.parse_all_fields(years, months, days, hours, minutes, seconds)
expected = np.array([datetime(2007, 1, 3, 5, 7, 9), datetime(2008, 2, 4, 6, 8, 0)])
tm.assert_numpy_array_equal(result, expected)

with tm.assert_produces_warning(FutureWarning):
result = conv.parse_all_fields(years, months, days, hours, minutes, seconds)
tm.assert_numpy_array_equal(result, expected)