-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
subtracting CFTimeIndex can cause pd.TimedeltaIndex to overflow #3535
Comments
This happens in import xarray as xr
i1 = xr.cftime_range("4500-12-31", periods=1)
i2 = xr.cftime_range("4600-12-31", periods=1)
i3 = xr.cftime_range("5100-12-31", periods=1)
d1 = xr.DataArray([0], dims=("time", ), coords={"time": ("time", i1)}).to_dataset(name="a")
d2 = xr.DataArray([1], dims=("time", ), coords={"time": ("time", i2)}).to_dataset(name="a")
d3 = xr.DataArray([2], dims=("time", ), coords={"time": ("time", i3)}).to_dataset(name="a")
xr.combine_by_coords([d1, d2, d3]).time returns: <xarray.DataArray 'time' (time: 2)>
array([cftime.DatetimeGregorian(4500-12-31 00:00:00),
cftime.DatetimeGregorian(5100-12-31 00:00:00)], dtype=object)
Coordinates:
* time (time) object 4500-12-31 00:00:00 5100-12-31 00:00:00 note how Within Line 98 in 7b4a286
import pandas as pd
indexes = [i1, i2, i3]
# the code from _infer_concat_order_from_coords
first_items = pd.Index([index.take([0]) for index in indexes])
series = first_items.to_series()
rank = series.rank(method="dense", ascending=ascending)
order = rank.astype(int).values - 1
order
>>> array([0, 1, 1]) This causes the second item to be dropped. |
Thanks for raising this issue @mathause. In hindsight this does not surprise me. Pandas's strict use of nanosecond-resolution datetimes and timedeltas was part of the motivation for the Perhaps a more robust (yet more complex) solution for #2484 would be to write a version of a Regarding the Line 91 in 56c16e4
It appears if we just select the first value of each index (i.e. a first_items = pd.Index([index[0] for index in indexes]) pandas's
|
MCVE Code Sample
Expected Output
a timedelta
Problem Description
returns
OverflowError: Python int too large to convert to C long
. Originally I stumbled upon this when trying toopen_mfdataset
files from a long simulation (piControl). I did not figure out yet where this subtraction happens inopen_mfdataset
. (Opening the single files and usingxr.concat
works).The offending lines are here
xarray/xarray/coding/cftimeindex.py
Line 433 in 40588dc
Ultimately this is probably a pandas problem as it tries to convert
datetime.timedelta(days=803532)
to'<m8[ns]'
.pd.TimedeltaIndex
has a (undocumented)dtype
argument but I was not able to make anything else work (e.g.'<m8[D]'
).@spencerkclark
Output of
xr.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.7.3 | packaged by conda-forge | (default, Jul 1 2019, 21:52:21)
[GCC 7.3.0]
python-bits: 64
OS: Linux
OS-release: 4.12.14-lp151.28.25-default
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.4
libnetcdf: 4.6.2
xarray: 0.14.0+44.g4dce93f1
pandas: 0.25.2
numpy: 1.17.3
scipy: 1.3.1
netCDF4: 1.5.0.1
pydap: None
h5netcdf: 0.7.4
h5py: 2.9.0
Nio: None
zarr: None
cftime: 1.0.4.2
nc_time_axis: 1.2.0
PseudoNetCDF: None
rasterio: 1.0.22
cfgrib: None
iris: None
bottleneck: 1.2.1
dask: 2.6.0
distributed: 2.6.0
matplotlib: 3.1.1
cartopy: 0.17.0
seaborn: 0.9.0
numbagg: None
setuptools: 41.4.0
pip: 19.3.1
conda: None
pytest: 5.2.2
IPython: 7.9.0
sphinx: 2.2.1
The text was updated successfully, but these errors were encountered: