Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: using dtype='int64' argument of Series causes ValueError: values cannot be losslessly cast to int64 for integer strings #44923

Closed
3 tasks done
5j9 opened this issue Dec 16, 2021 · 2 comments · Fixed by #48333
Labels
Bug Constructors Series/DataFrame/Index/pd.array Constructors Regression Functionality that used to work in a prior pandas version

Comments

@5j9
Copy link
Contributor

5j9 commented Dec 16, 2021

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

import pandas as pd
pd.Series(['1', '2'], dtype='int64')

Issue Description

C:\Users\a\AppData\Local\Programs\Python\Python310\lib\site-packages\numpy\core\numeric.py:2446: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  return bool(asarray(a1 == a2).all())
Traceback (most recent call last):
  File "C:\Users\a\AppData\Roaming\JetBrains\PyCharmCE2021.3\scratches\scratch_3.py", line 2, in <module>
    pd.Series(['1', '2'], dtype='int64')
  File "f:\prog\pandas\pandas\core\series.py", line 450, in __init__
    data = sanitize_array(data, index, dtype, copy)
  File "f:\prog\pandas\pandas\core\construction.py", line 583, in sanitize_array
    subarr = _try_cast(data, dtype, copy, raise_cast_failure)
  File "f:\prog\pandas\pandas\core\construction.py", line 768, in _try_cast
    subarr = maybe_cast_to_integer_array(arr, dtype)
  File "f:\prog\pandas\pandas\core\dtypes\cast.py", line 2100, in maybe_cast_to_integer_array
    raise ValueError(f"values cannot be losslessly cast to {dtype}")
ValueError: values cannot be losslessly cast to int64

Expected Behavior

I expect
pd.Series(['1', '2'], dtype='int64')
to work fine and return a Series object with dtype int64, just like how
pd.Series(['1', '2'], dtype='float64')
returns a float64 Series.

Installed Versions

INSTALLED VERSIONS

commit : 39ccb35
python : 3.10.1.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19043
machine : AMD64
processor : Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United States.1252
pandas : 0+untagged.1.g39ccb35
numpy : 1.21.4
pytz : 2021.3
dateutil : 2.8.2
pip : 21.3.1
setuptools : 59.5.0
Cython : 0.29.25
pytest : 6.2.5
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : 3.0.2
lxml.etree : 4.6.3
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.0.3
IPython : 7.29.0
pandas_datareader: None
bs4 : 4.10.0
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.5.0
numexpr : None
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None

@5j9 5j9 added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 16, 2021
simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this issue Dec 16, 2021
@simonjayhawkins
Copy link
Member

Expected Behavior

I expect pd.Series(['1', '2'], dtype='int64') to work fine and return a Series object with dtype int64

This was the result in 1.2.5

first bad commit: [db6e71b] DEPR: silent overflow on Series construction (#41734)

cc @jbrockmendel

also note that we have a rogue /home/simon/miniconda3/envs/pandas-1.2.5/lib/python3.9/site-packages/numpy/core/numeric.py:2446: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison return bool(asarray(a1 == a2).all()) bubbling up from numpy from that looks like a more long-standing issue. (at least 0.25.3)

@simonjayhawkins simonjayhawkins added Constructors Series/DataFrame/Index/pd.array Constructors Regression Functionality that used to work in a prior pandas version and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 16, 2021
@jbrockmendel
Copy link
Member

Looks like in maybe_cast_to_integer_array the check for np.array_equal(arr, casted) isn't the appropriate check in this case. might be we need to cast back to an original dtype before doing the array_equal check

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Constructors Series/DataFrame/Index/pd.array Constructors Regression Functionality that used to work in a prior pandas version
Projects
None yet
3 participants