Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate out _convert_datetime_to_tsobject #17715

Merged
merged 7 commits into from
Oct 2, 2017

Conversation

jbrockmendel
Copy link
Member

A bunch of calls to convert_to_tsobject are made in cases where we already know that the input is a datetime object. This PR separates out the datetime case into a new function _convert_datetime_to_tsobject.

Second, to make the dependency between _TSObject and Timestamp one-way, this removes the three references to Timestamp that are made in convert_str_to_tsobject. Verifying the last of these changes may be non-trivial.

This is a prelude to separating _TSobject logic out into tslibs.conversion.

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@codecov
Copy link

codecov bot commented Sep 29, 2017

Codecov Report

Merging #17715 into master will decrease coverage by 0.02%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17715      +/-   ##
==========================================
- Coverage   91.27%   91.25%   -0.03%     
==========================================
  Files         163      163              
  Lines       49783    49783              
==========================================
- Hits        45439    45429      -10     
- Misses       4344     4354      +10
Flag Coverage Δ
#multiple 89.05% <ø> (-0.01%) ⬇️
#single 40.34% <ø> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.73% <0%> (-0.1%) ⬇️
pandas/core/indexes/datetimes.py 95.44% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bbf0dda...6ced748. Read the comment docs.

@codecov
Copy link

codecov bot commented Sep 29, 2017

Codecov Report

Merging #17715 into master will decrease coverage by 0.05%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #17715      +/-   ##
==========================================
- Coverage   91.27%   91.21%   -0.06%     
==========================================
  Files         163      163              
  Lines       49783    49796      +13     
==========================================
- Hits        45439    45421      -18     
- Misses       4344     4375      +31
Flag Coverage Δ
#multiple 89.01% <ø> (-0.05%) ⬇️
#single 40.33% <ø> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/plotting/_converter.py 63.38% <0%> (-1.82%) ⬇️
pandas/core/tools/datetimes.py 82.97% <0%> (-0.83%) ⬇️
pandas/core/indexes/multi.py 96.39% <0%> (-0.51%) ⬇️
pandas/core/indexes/category.py 97.46% <0%> (-0.29%) ⬇️
pandas/core/groupby.py 92.04% <0%> (-0.2%) ⬇️
pandas/core/frame.py 97.73% <0%> (-0.11%) ⬇️
pandas/core/resample.py 96.14% <0%> (-0.03%) ⬇️
pandas/core/indexes/interval.py 92.85% <0%> (ø) ⬆️
pandas/core/dtypes/inference.py 98.36% <0%> (+0.02%) ⬆️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bbf0dda...9deb75b. Read the comment docs.

@@ -728,7 +728,7 @@ class Timestamp(_Timestamp):
# reconstruct & check bounds
ts_input = datetime(dts.year, dts.month, dts.day, dts.hour, dts.min,
dts.sec, dts.us, tzinfo=_tzinfo)
ts = convert_to_tsobject(ts_input, _tzinfo, None, 0, 0)
ts = _convert_datetime_to_tsobject(ts_input, _tzinfo)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to break with spelling convention. call convert_from_datetime_to_tsobject

obj2.dts.day,
obj2.dts.hour, obj2.dts.min, obj2.dts.sec,
obj2.dts.us, obj2.tzinfo)
obj2 = _convert_datetime_to_tsobject(dtime, tz)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have convert_datetime_to_tsobject take an optional ps= to facilitate this type of post-conversion.

also then you don't need a obj2 avoiding more confusion here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea.

@@ -1577,7 +1589,17 @@ cpdef convert_str_to_tsobject(object ts, object tz, object unit,
return obj
else:
# Keep the converter same as PyDateTime's
ts = Timestamp(obj.value, tz=obj.tzinfo)
obj2 = convert_to_tsobject(obj.value, obj.tzinfo,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is just a variation on create_datatime_from_ts

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow.

@@ -1538,12 +1497,64 @@ cdef convert_to_tsobject(object ts, object tz, object unit,
return obj


cdef _TSObject _convert_datetime_to_tsobject(datetime ts, object tz):
cdef:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls pls always add doc-strings

@sinhrks sinhrks added the Clean label Sep 29, 2017
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does perf change? (pos or neg)?

int32_t nanos=0):
"""
Extract datetime and int64 from any of:
- python datetime object
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can u put a proper doc string

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is adapted from the convert_to_tsobject docstring.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

take the opportunity to improve it

ts = Timestamp(obj.value, tz=obj.tzinfo)
obj = convert_to_tsobject(obj.value, obj.tzinfo,
None, 0, 0)
dtime = datetime(obj.dts.year, obj.dts.month, obj.dts.day,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use the existing routine to do this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you referring to create_datetime_from_ts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create_datetime_from_ts is a compatibility func specifically for ints_to_pydatetime (you can tell because it has several unused arguments that are retained to match the signature of create_timestamp_from_ts).

Replacing a one-line call to datetime(...) with a call to a one-line func that does exactly the same thing (but with less clarity because of the unused args) doesn't achieve much, with the possible exception of function call overhead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know I wrote it!

yet you are repeating code, I would prefer not to.

with the possible exception of function call overhead.

these take / return objects so the overhead is the same.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine.

@jbrockmendel
Copy link
Member Author

No significant changes

$asv continuous -f 1.1 -E virtualenv master HEAD -b timeseries
[...]
       before           after         ratio
     [cc58b84f]       [a7b66c1a]
+        11.8±0ms         15.9±2ms     1.35  timeseries.DatetimeIndex.time_infer_freq_none
+           1.55s            2.05s     1.32  timeseries.AsOfDataFrame.time_asof_nan
+     19.6±0.07μs         24.0±3μs     1.23  timeseries.Offsets.time_timeseries_day_apply
+      16.6±0.1μs       18.3±0.1μs     1.10  timeseries.Offsets.time_custom_bday_apply
-       174±0.9μs        158±0.3μs     0.91  timeseries.Offsets.time_custom_bmonthend_decr_n
-        422±20ms        380±0.2ms     0.90  timeseries.DatetimeIndex.time_add_offset_slow
-     22.1±0.09μs      19.8±0.05μs     0.90  timeseries.Offsets.time_timeseries_day_incr
-           1.99s        1.48±0.3s     0.74  timeseries.AsOfDataFrame.time_asof

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.

@@ -1577,7 +1614,14 @@ cpdef convert_str_to_tsobject(object ts, object tz, object unit,
return obj
else:
# Keep the converter same as PyDateTime's
ts = Timestamp(obj.value, tz=obj.tzinfo)
obj = convert_to_tsobject(obj.value, obj.tzinfo,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok so this is getting cumbersome. why don't you make a separate routine to do these 3 (and I guess go back to consrructing datetime). But why does the

ts = Timestamp(obj.value, tz=obj.tzinfo) not work here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't you make a separate routine to do these 3 (and I guess go back to consrructing datetime)

OK

But why does the ts = Timestamp(obj.value, tz=obj.tzinfo) not work here?

It works, but the goal is to disentangle Timestamp from _TSObject logic. Or more specifically, to make the dependency go in one direction.

@jreback jreback added this to the 0.21.0 milestone Oct 1, 2017
@jreback
Copy link
Contributor

jreback commented Oct 1, 2017

ok ping on green. Still would like to simplify some of this logic generally (as the is pr just moves it around), but can do a bit later (maybe add to TODO).

@jbrockmendel
Copy link
Member Author

Still would like to simplify some of this logic generally

Yah. As I look at it, some pieces might be simpler/clearer as methods (or classmethods) of _TSObject). Would that defeat the purpose of _TSObject, since it is explicitly supposed to be "lightweight"?

@jbrockmendel
Copy link
Member Author

Not sure what's up here. Tests started failing when create_datetime_from_ts was introduced, continue failing with just _convert_tsobject_tz. My inclination is to revert to two commits ago when it was passing and add this mystery to the TODO list.

@jreback jreback removed this from the 0.21.0 milestone Oct 1, 2017
@jbrockmendel
Copy link
Member Author

ping, though note reversion mentioned above.

@jreback jreback added this to the 0.21.0 milestone Oct 2, 2017
@jreback jreback merged commit a3d538a into pandas-dev:master Oct 2, 2017
@jreback
Copy link
Contributor

jreback commented Oct 2, 2017

thanks

ghost pushed a commit to reef-technologies/pandas that referenced this pull request Oct 2, 2017
* 'master' of github.com:pandas-dev/pandas: (188 commits)
  Separate out _convert_datetime_to_tsobject (pandas-dev#17715)
  DOC: remove whatsnew note for xref pandas-dev#17131
  BUG: Regression in .loc accepting a boolean Index as an indexer (pandas-dev#17738)
  DEPR: Deprecate cdate_range and merge into bdate_range (pandas-dev#17691)
  CLN: replace %s syntax with .format in pandas.core: categorical, common, config, config_init (pandas-dev#17735)
  Fixed the memory usage explanation of categorical in gotchas from O(nm) to O(n+m) (pandas-dev#17736)
  TST: add backward compat for offset testing for pickles (pandas-dev#17733)
  remove unused time conversion funcs (pandas-dev#17711)
  DEPR: Deprecate convert parameter in take (pandas-dev#17352)
  BUG:Time Grouper bug fix when applied for list groupers (pandas-dev#17587)
  BUG: Fix some PeriodIndex resampling issues (pandas-dev#16153)
  BUG: Fix unexpected sort in groupby (pandas-dev#17621)
  DOC: Fixed typo in documentation for 'pandas.DataFrame.replace' (pandas-dev#17731)
  BUG: Fix series rename called with str altering name rather index (GH17407) (pandas-dev#17654)
  DOC: Add examples for MultiIndex.get_locs + cleanups (pandas-dev#17675)
  Doc improvements for IntervalIndex and Interval (pandas-dev#17714)
  BUG: DataFrame sort_values and multiple "by" columns fails to order NaT correctly
  Last of the timezones funcs (pandas-dev#17669)
  Add missing file to _pyxfiles, delete commented-out (pandas-dev#17712)
  update imports of DateParseError, remove unused imports from tslib (pandas-dev#17713)
  ...
@jbrockmendel jbrockmendel deleted the tslibs-pre_conversion branch October 30, 2017 16:25
alanbato pushed a commit to alanbato/pandas that referenced this pull request Nov 10, 2017
No-Stream pushed a commit to No-Stream/pandas that referenced this pull request Nov 28, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants