RFE: a faster way to construct a pandas.Timestamp from an epoch time #14658

radekholy24 · 2016-11-14T18:22:51Z

pandas.to_datetime called with an int is too slow for my use case. Basically, I have a loop that sequentially gets an integer from a generator of about 1 000 000 numbers, converts it to pandas.Timestamp and passes it to a function. A profiler says that the call of pandas.to_datetime takes about 40 % of the total run time of my program.

Compared to datetime.datetime.fromtimestamp, it's more than 60 times slower:

$ python -m timeit -n 1000000 -s 'import datetime' 'datetime.datetime.fromtimestamp(30, tz=datetime.timezone.utc)'
1000000 loops, best of 3: 0.889 usec per loop
$ python -m timeit -n 1000000 -s 'import pandas' 'pandas.to_datetime(30, utc=True, unit="s")'
1000000 loops, best of 3: 62.8 usec per loop

$ python -c 'import pandas;pandas.show_versions()'

INSTALLED VERSIONS
------------------
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-47-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.19.1
nose: None
pip: 9.0.1
setuptools: 20.7.0
Cython: None
numpy: 1.11.2
scipy: None
statsmodels: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2016.7
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None

Can you please provide/document a faster way to instantiate a pandas.Timestamp instance from an epoch time?

The text was updated successfully, but these errors were encountered:

jreback · 2016-11-14T18:37:43Z

why would you do this in a loop?
simply pass the entire list

jreback · 2016-11-14T18:50:43Z

In [10]: r = list(range(100000))

In [11]: %timeit [ datetime.datetime.fromtimestamp(30+v, tz=datetime.timezone.utc) for v in r ]
1 loop, best of 3: 251 ms per loop

In [12]: %timeit pd.to_datetime(r, utc=True, unit='s')
10 loops, best of 3: 84.5 ms per loop

radekholy24 · 2016-11-14T19:21:58Z

Because the whole data from the generator do not fit into memory?
But yeah, I can do that in my case.

TomAugspurger · 2016-11-14T19:30:57Z

FYI @PyDeQ

In [652]: %timeit pd.Timestamp.utcfromtimestamp(30)
The slowest run took 11.44 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 1.76 µs per loop

vs

In [653]: %timeit datetime.datetime.fromtimestamp(30, tz=datetime.timezone.utc)
The slowest run took 14.00 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 1.62 µs per loop

But agreed with @jreback, you're much better off using vectorized methods in pandas.

radekholy24 · 2016-11-14T19:36:42Z

@TomAugspurger thanks. Unfortunately, pd.Timestamp.utcfromtimestamp is not documented.

TomAugspurger · 2016-11-14T19:45:33Z

Mind opening a PR to fix that?

radekholy24 · 2016-11-14T19:48:38Z

No promises but I can consider doing a PR in case of spare time, sure.

TomAugspurger · 2016-11-14T19:50:52Z

#5218 seems to be the reason it's not in the API docs at the moment.

jorisvandenbossche · 2016-11-15T08:45:03Z

Using just plain Timestamp constructor is actually also fast:

In [51]: %timeit pd.Timestamp.utcfromtimestamp(30)
The slowest run took 39.10 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 1.77 µs per loop

In [52]: %timeit pd.Timestamp(30, unit='s', tz='UTC')
The slowest run took 14.56 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 1.97 µs per loop

And this one is documented (so I would prefer this over Timestamp.utcfromtimestamp)

(and it also gives me the impression that the performance of to_datetime can certainly be improved for this case)

radekholy24 · 2016-11-15T22:25:14Z

@jorisvandenbossche, what document do you mean? So far, I've found only examples with strings as the first arguments and no unit nor tz arguments.
Anyway, thanks. That is actually what I expected to be the resolution of this request.

radekholy24 changed the title ~~RFE: a faster way to construct pandas.Timestamp from epoch times~~ RFE: a faster way to construct a pandas.Timestamp from an epoch time Nov 14, 2016

jreback closed this as completed Nov 14, 2016

jreback added Performance Memory or execution speed performance Datetime Datetime data dtype Timezones Timezone data dtype labels Nov 14, 2016

jreback added this to the No action milestone Nov 14, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFE: a faster way to construct a pandas.Timestamp from an epoch time #14658

RFE: a faster way to construct a pandas.Timestamp from an epoch time #14658

radekholy24 commented Nov 14, 2016 •

edited

Loading

jreback commented Nov 14, 2016

jreback commented Nov 14, 2016

radekholy24 commented Nov 14, 2016

TomAugspurger commented Nov 14, 2016

radekholy24 commented Nov 14, 2016 •

edited

Loading

TomAugspurger commented Nov 14, 2016

radekholy24 commented Nov 14, 2016

TomAugspurger commented Nov 14, 2016

jorisvandenbossche commented Nov 15, 2016

radekholy24 commented Nov 15, 2016 •

edited

Loading

RFE: a faster way to construct a pandas.Timestamp from an epoch time #14658

RFE: a faster way to construct a pandas.Timestamp from an epoch time #14658

Comments

radekholy24 commented Nov 14, 2016 • edited Loading

jreback commented Nov 14, 2016

jreback commented Nov 14, 2016

radekholy24 commented Nov 14, 2016

TomAugspurger commented Nov 14, 2016

radekholy24 commented Nov 14, 2016 • edited Loading

TomAugspurger commented Nov 14, 2016

radekholy24 commented Nov 14, 2016

TomAugspurger commented Nov 14, 2016

jorisvandenbossche commented Nov 15, 2016

radekholy24 commented Nov 15, 2016 • edited Loading

radekholy24 commented Nov 14, 2016 •

edited

Loading

radekholy24 commented Nov 14, 2016 •

edited

Loading

radekholy24 commented Nov 15, 2016 •

edited

Loading