Performance : speeding up pickling of cftime arrays #253

aulemahal · 2021-08-10T15:15:37Z

I used cftime 1.4.1 and 1.5.0 when exploring this.

My worklfows involve large datasets and complex functions. I use xarray, backed-up by dask. In one of the more complex processing, I use xarray's map_blocks and a handful of other dask-lazy methods on a large dataset that uses the NoLeap calendar. The dataset is large with 950 chunks and a 55114-element time coordinate. It seems a lot of time is spent in pickling the latter.

More precisely, this line of dask : https://github.com/dask/dask/blob/1c4a84225d1bd26e58d716d2844190cc23ebcfec/dask/base.py#L1028 calls pickle.dumps on the numpy array of type O that stores the cftime.Datetime objects.

When profiling the graph creation (no computation triggered yet), I can see that this step is the one that takes the most time. Slightly more than another function in xarray's CFTimeIndex creation.

MWE:

import pickle
import numpy as np
import pandas as pd
import xarray as xr


cft = xr.cftime_range('1950-01-01', '2100-01-01', freq='D')  # cftime array
npy = pd.date_range('1950-01-01', '2100-01-01', freq='D')  # same shape but numpy's datetime objects
oar = np.array([1] * npy.size, dtype='O')  # sanity check, normal array of object dtype, but builtin element type

timeit calls in a notebook:

So even if it is normal that pickiling an object array is slower, the cftime array is still 2 orders of magnitude slower than a basic array. I am not very knowledgeable in how pickle works, but I believe something could be made to speed this up.

Any ideas?

The text was updated successfully, but these errors were encountered:

spencerkclark · 2021-08-10T16:10:32Z

Very interesting. I agree it seems there is significant room for improvement. I think we may be able to follow a similar line of thought as to what led to speedups in the creation of cftime objects (see discussion in pangeo-data/pangeo#764 (comment) and #158).

Pickling in cftime calls the _getstate method on each cftime.datetime instance. The kwargs dictionary it returns still includes the _dayofwk and _dayofyr attributes, which are slow to compute, and are not needed to reconstruct a datetime instance. I wonder if we simply deleted those from the kwargs dictionary if it would improve your performance metrics?

cftime/src/cftime/_cftime.pyx

Lines 1249 to 1250 in 4f28eb6

    
           'dayofwk': self._dayofwk, 
        
           'dayofyr': self._dayofyr,

spencerkclark · 2021-08-11T08:58:12Z

Oops _dayofwk and _dayofyr are the cached values so they do not trigger computation. Therefore I suspect my proposed strategy will not make a difference.

spencerkclark · 2021-08-11T12:07:57Z

A fairer comparison might be to look at the performance of pickling an array of datetime.datetime objects. There we find pickling an array of cftime objects is ~3x slower -- still a meaningful difference, but not extreme:

In [1]: import pickle; import pandas as pd; import xarray as xr

In [2]: times = pd.date_range("1950-01-01", "2100-01-01", freq="D")

In [3]: datetimes = times.to_pydatetime()

In [4]: %timeit serialized = pickle.dumps(datetimes)
36.1 ms ± 611 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [5]: cftimes = xr.cftime_range("1950-01-01", "2100-01-01", freq="D").values

In [6]: %timeit serialized = pickle.dumps(cftimes)
102 ms ± 2.34 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Unfortunately nothing obvious now sticks out to me to close this gap.

np.datetime64 values are stored in memory as single integers -- I think a np.datetime64 array is basically a view of an integer dtype array -- so it is not a surprise that pickling an array of those is much faster.

aulemahal · 2021-08-11T19:03:13Z

Indeed, I am not surprised that it takes more time than a datetime64 array.

While this might not be the best place, @spencerkclark, you might able to give an opinion. This problem happens in xarray workflows when using map_blocks because of dask.base.tokenize. The latter tries to obtain a deterministic hash of the array. As it is a 'O' dtype ndarray, the current way in dask is to pickle it. Thus, speeding up the pickling would be the most general way of optimizing this, but the usecase is rather narrow. Do you think we could add something specific in xarray.map_blocks to work around the performance issue instead? You can look into the xclim issue I linked above for an idea (transform cftime arrays to integers and then back to cftime within the wrapper).

spencerkclark · 2021-08-11T21:33:33Z

I'm somewhat surprised that encoding a cftime array is faster than pickling it (encoding requires repeated timedelta arithmetic, which is not needed for pickling). Have you done timing experiments to demonstrate this?

Or is the issue that dask.base.tokenize is called many times and it's better to take the up-front cost of converting to integers first?

aulemahal · 2021-08-11T21:39:04Z

Exactly, the difference is in scale. Let's say we have a 3D (spatial+temporal) array divided in 100 spatial chunks (chunks={'time': -1}). Currently a call to map_blocks will create 100 tasks. For each tasks the full time coordinate will be tokenized. That's 100 calls to pickle.dumps().
When encoding the time coordinate before, that's only one call to cftime.date2num and 100 calls to tokenize for an integer ndarray. Above I have calculated that tokenizing a cftime array is 1000x slower than tokenizing a integer array.

spencerkclark · 2021-08-11T21:54:50Z

Ah...that makes perfect sense now, thanks. Indeed it does seem like the optimization might best take place before cftime is involved. If you can put together a simple example that demonstrates this performance bottleneck it might be interesting to get folks' thoughts in an xarray issue.

aulemahal mentioned this issue Aug 10, 2021

[sdba] Optimizations and fixes Ouranosinc/xclim#791

Merged

7 tasks

aulemahal mentioned this issue Aug 12, 2021

Performance issues using map_blocks with cftime indexes. pydata/xarray#5701

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance : speeding up pickling of cftime arrays #253

Performance : speeding up pickling of cftime arrays #253

aulemahal commented Aug 10, 2021

spencerkclark commented Aug 10, 2021

spencerkclark commented Aug 11, 2021

spencerkclark commented Aug 11, 2021 •

edited

Loading

aulemahal commented Aug 11, 2021 •

edited

Loading

spencerkclark commented Aug 11, 2021 •

edited

Loading

aulemahal commented Aug 11, 2021

spencerkclark commented Aug 11, 2021

Performance : speeding up pickling of cftime arrays #253

Performance : speeding up pickling of cftime arrays #253

Comments

aulemahal commented Aug 10, 2021

spencerkclark commented Aug 10, 2021

spencerkclark commented Aug 11, 2021

spencerkclark commented Aug 11, 2021 • edited Loading

aulemahal commented Aug 11, 2021 • edited Loading

spencerkclark commented Aug 11, 2021 • edited Loading

aulemahal commented Aug 11, 2021

spencerkclark commented Aug 11, 2021

spencerkclark commented Aug 11, 2021 •

edited

Loading

aulemahal commented Aug 11, 2021 •

edited

Loading

spencerkclark commented Aug 11, 2021 •

edited

Loading