BUG: Raise on parse int overflow #47167 #47168

SandroCasagrande · 2022-05-30T09:35:38Z

closes BUG: integer overflow in csv_reader #47167
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

squash! TST: integer overflow on parsing with insufficient user dtype

SandroCasagrande · 2022-05-30T09:49:30Z

@github-actions pre-commit

phofl · 2022-05-30T22:06:52Z

pandas/_libs/parsers.pyx

                result = _try_uint64(self.parser, i, start, end,
                                     na_filter, na_hashset)
                na_count = 0

            if result is not None and dtype != 'int64':
-                result = result.astype(dtype)
+                casted = result.astype(dtype)
+                if (casted == result).all():


This looks expensive. Can you run asvs?

Good point, thanks. I thought the same thing, when I saw exactly this check being applied for parsing (and more generally setting) with the nullable integer extension dtypes

pandas/pandas/core/arrays/integer.py

Lines 53 to 55 in 3bf2cb1

casted = values.astype(dtype, copy=copy)

if (casted == values).all():

return casted

. However, I could not come up with a better version (that does not repeat or use fused types(?) in _try_int64). And finally, the impact is negligible (see below).

Just to clarify: The additional check is never performed when parsing with automatically inferred dtype (since TextReader.dtype_cast_order contains only int64 and no other integer dtypes), nor with user specified dtype int64. I just comitted another change that prevents unnecessarily running into this check when parsing with user specified dtype uint64.
The check then only affects parsing with user specified integer dtype of size < 64 bit. As far as I can see, none of the existing asvs covers this.

Just to be sure I ran some existing asvs that seemed most relevant and found no significant change:

before after ratio [c355145c] [994a6345] <raise-on-parse-int-overflow~7> <raise-on-parse-int-overflow> 6.97±0.3ms 7.46±0.7ms 1.07 io.csv.ParseDateComparison.time_read_csv_dayfirst(False) 3.43±0.1ms 3.39±0.04ms 0.99 io.csv.ParseDateComparison.time_read_csv_dayfirst(True) 6.93±0.3ms 7.02±0.1ms 1.01 io.csv.ParseDateComparison.time_to_datetime_dayfirst(False) 3.52±0.3ms 3.28±0.08ms 0.93 io.csv.ParseDateComparison.time_to_datetime_dayfirst(True) 17.2±0.2ms 17.3±0.06ms 1.00 io.csv.ParseDateComparison.time_to_datetime_format_DD_MM_YYYY(False) 3.33±0.2ms 3.35±0.5ms 1.01 io.csv.ParseDateComparison.time_to_datetime_format_DD_MM_YYYY(True) 1.74±0.01ms 1.74±0.09ms 1.00 io.csv.ReadCSVCachedParseDates.time_read_csv_cached(False, 'c') 2.82±0.03ms 2.81±0.03ms 0.99 io.csv.ReadCSVCachedParseDates.time_read_csv_cached(False, 'python') 1.77±0.08ms 1.79±0.02ms 1.01 io.csv.ReadCSVCachedParseDates.time_read_csv_cached(True, 'c') 2.85±0.02ms 2.87±0.1ms 1.01 io.csv.ReadCSVCachedParseDates.time_read_csv_cached(True, 'python') 25.8±0.8ms 26.3±0.8ms 1.02 io.csv.ReadCSVCategorical.time_convert_direct('c') 179±3ms 180±4ms 1.01 io.csv.ReadCSVCategorical.time_convert_direct('python') 44.0±1ms 44.4±1ms 1.01 io.csv.ReadCSVCategorical.time_convert_post('c') 169±4ms 168±3ms 0.99 io.csv.ReadCSVCategorical.time_convert_post('python') 21.8±1ms 21.5±1ms 0.99 io.csv.ReadCSVComment.time_comment('c') 22.5±0.5ms 21.8±0.7ms 0.97 io.csv.ReadCSVComment.time_comment('python') 24.9±0.9ms 25.1±2ms 1.01 io.csv.ReadCSVConcatDatetime.time_read_csv 13.1±1ms 12.0±0.3ms 0.92 io.csv.ReadCSVConcatDatetimeBadDateValue.time_read_csv('') 9.37±0.3ms 9.38±0.1ms 1.00 io.csv.ReadCSVConcatDatetimeBadDateValue.time_read_csv('0') 14.7±0.4ms 15.0±1ms 1.02 io.csv.ReadCSVConcatDatetimeBadDateValue.time_read_csv('nan') 86.1±2ms 87.7±2ms 1.02 io.csv.ReadCSVDInferDatetimeFormat.time_read_csv(False, 'custom') 1.55±0.02ms 1.57±0.02ms 1.01 io.csv.ReadCSVDInferDatetimeFormat.time_read_csv(False, 'iso8601') 1.47±0.01ms 1.51±0.04ms 1.03 io.csv.ReadCSVDInferDatetimeFormat.time_read_csv(False, 'ymd') 4.39±0.06ms 4.45±0.1ms 1.01 io.csv.ReadCSVDInferDatetimeFormat.time_read_csv(True, 'custom') 1.80±0.05ms 1.86±0.06ms 1.03 io.csv.ReadCSVDInferDatetimeFormat.time_read_csv(True, 'iso8601') 2.00±0.01ms 2.00±0.02ms 1.00 io.csv.ReadCSVDInferDatetimeFormat.time_read_csv(True, 'ymd') 19.4±0.3ms 19.8±1ms 1.02 io.csv.ReadCSVEngine.time_read_bytescsv('c') 6.25±0.4ms 6.45±0.6ms 1.03 io.csv.ReadCSVEngine.time_read_bytescsv('pyarrow') 303±2ms 301±3ms 0.99 io.csv.ReadCSVEngine.time_read_bytescsv('python') 20.0±0.6ms 19.3±0.5ms 0.96 io.csv.ReadCSVEngine.time_read_stringcsv('c') 7.43±0.5ms 6.93±0.1ms 0.93 io.csv.ReadCSVEngine.time_read_stringcsv('pyarrow') 264±2ms 262±6ms 0.99 io.csv.ReadCSVEngine.time_read_stringcsv('python') 1.33±0.02ms 1.34±0.02ms 1.01 io.csv.ReadCSVFloatPrecision.time_read_csv(',', '.', 'high') 2.23±0.02ms 2.22±0.02ms 1.00 io.csv.ReadCSVFloatPrecision.time_read_csv(',', '.', 'round_trip') 1.35±0.06ms 1.36±0.03ms 1.01 io.csv.ReadCSVFloatPrecision.time_read_csv(',', '.', None) 1.39±0.01ms 1.41±0.01ms 1.01 io.csv.ReadCSVFloatPrecision.time_read_csv(',', '_', 'high') 1.41±0.01ms 1.44±0.04ms 1.02 io.csv.ReadCSVFloatPrecision.time_read_csv(',', '_', 'round_trip') 1.47±0.2ms 1.43±0.02ms 0.98 io.csv.ReadCSVFloatPrecision.time_read_csv(',', '_', None) 1.30±0.02ms 1.33±0.04ms 1.03 io.csv.ReadCSVFloatPrecision.time_read_csv(';', '.', 'high') 2.21±0.06ms 2.25±0.1ms 1.02 io.csv.ReadCSVFloatPrecision.time_read_csv(';', '.', 'round_trip') 1.35±0.07ms 1.33±0.03ms 0.99 io.csv.ReadCSVFloatPrecision.time_read_csv(';', '.', None) 1.42±0.03ms 1.41±0.01ms 1.00 io.csv.ReadCSVFloatPrecision.time_read_csv(';', '_', 'high') 1.41±0.01ms 1.47±0.2ms 1.05 io.csv.ReadCSVFloatPrecision.time_read_csv(';', '_', 'round_trip') 1.40±0.01ms 1.41±0.01ms 1.01 io.csv.ReadCSVFloatPrecision.time_read_csv(';', '_', None) 3.43±0.3ms 3.40±0.2ms 0.99 io.csv.ReadCSVFloatPrecision.time_read_csv_python_engine(',', '.', 'high') 3.45±0.2ms 3.34±0.03ms 0.97 io.csv.ReadCSVFloatPrecision.time_read_csv_python_engine(',', '.', 'round_trip') 3.41±0.03ms 3.42±0.2ms 1.00 io.csv.ReadCSVFloatPrecision.time_read_csv_python_engine(',', '.', None) 2.83±0.03ms 2.82±0.02ms 1.00 io.csv.ReadCSVFloatPrecision.time_read_csv_python_engine(',', '_', 'high') 2.81±0.01ms 2.82±0.03ms 1.00 io.csv.ReadCSVFloatPrecision.time_read_csv_python_engine(',', '_', 'round_trip') 2.83±0.09ms 2.80±0.02ms 0.99 io.csv.ReadCSVFloatPrecision.time_read_csv_python_engine(',', '_', None) 3.35±0.03ms 3.32±0.03ms 0.99 io.csv.ReadCSVFloatPrecision.time_read_csv_python_engine(';', '.', 'high') 3.52±0.2ms 3.55±0.2ms 1.01 io.csv.ReadCSVFloatPrecision.time_read_csv_python_engine(';', '.', 'round_trip') 3.50±0.1ms 3.76±0.2ms 1.08 io.csv.ReadCSVFloatPrecision.time_read_csv_python_engine(';', '.', None) 2.80±0.02ms 2.79±0.02ms 0.99 io.csv.ReadCSVFloatPrecision.time_read_csv_python_engine(';', '_', 'high') 2.84±0.07ms 2.83±0.07ms 1.00 io.csv.ReadCSVFloatPrecision.time_read_csv_python_engine(';', '_', 'round_trip') 2.81±0.03ms 2.81±0.04ms 1.00 io.csv.ReadCSVFloatPrecision.time_read_csv_python_engine(';', '_', None) 9.98±0.07ms 9.88±0.3ms 0.99 io.csv.ReadCSVIndexCol.time_read_csv_index_col 69.0±0.4ms 74.1±0.3ms 1.07 io.csv.ReadCSVMemMapUTF8.time_read_memmapped_utf8 0 0 n/a io.csv.ReadCSVMemoryGrowth.mem_parser_chunks('c') 0 0 n/a io.csv.ReadCSVMemoryGrowth.mem_parser_chunks('python') 1.45±0.01ms 1.56±0.06ms 1.07 io.csv.ReadCSVParseDates.time_baseline('c') 1.58±0.03ms 1.57±0.02ms 0.99 io.csv.ReadCSVParseDates.time_baseline('python') 1.78±0.01ms 1.81±0.06ms 1.01 io.csv.ReadCSVParseDates.time_multiple_date('c') 1.99±0.03ms 2.30±0.3ms ~1.15 io.csv.ReadCSVParseDates.time_multiple_date('python') 2.99±0.03ms 2.93±0.02ms 0.98 io.csv.ReadCSVParseSpecialDate.time_read_special_date('hm', 'c') 13.3±0.1ms 13.2±0.2ms 0.99 io.csv.ReadCSVParseSpecialDate.time_read_special_date('hm', 'python') 7.70±0.2ms 7.54±0.07ms 0.98 io.csv.ReadCSVParseSpecialDate.time_read_special_date('mY', 'c') 40.0±0.8ms 39.9±2ms 1.00 io.csv.ReadCSVParseSpecialDate.time_read_special_date('mY', 'python') 3.41±0.05ms 3.42±0.03ms 1.00 io.csv.ReadCSVParseSpecialDate.time_read_special_date('mdY', 'c') 14.2±0.8ms 13.9±0.6ms 0.98 io.csv.ReadCSVParseSpecialDate.time_read_special_date('mdY', 'python') 10.3±0.1ms 11.0±0.2ms 1.06 io.csv.ReadCSVSkipRows.time_skipprows(10000, 'c') 8.88±0.6ms 8.13±0.3ms 0.92 io.csv.ReadCSVSkipRows.time_skipprows(10000, 'pyarrow') 46.3±1ms 46.4±2ms 1.00 io.csv.ReadCSVSkipRows.time_skipprows(10000, 'python') 14.8±0.4ms 15.0±0.3ms 1.02 io.csv.ReadCSVSkipRows.time_skipprows(None, 'c') 8.15±0.6ms 8.34±0.3ms 1.02 io.csv.ReadCSVSkipRows.time_skipprows(None, 'pyarrow') 68.0±1ms 64.1±2ms 0.94 io.csv.ReadCSVSkipRows.time_skipprows(None, 'python') 14.1±0.3ms 14.5±0.3ms 1.03 io.csv.ReadCSVThousands.time_thousands(',', ',', 'c') 166±4ms 161±3ms 0.97 io.csv.ReadCSVThousands.time_thousands(',', ',', 'python') 11.4±0.2ms 11.6±0.2ms 1.02 io.csv.ReadCSVThousands.time_thousands(',', None, 'c') 58.4±0.9ms 58.0±1ms 0.99 io.csv.ReadCSVThousands.time_thousands(',', None, 'python') 13.4±0.2ms 13.8±0.2ms 1.02 io.csv.ReadCSVThousands.time_thousands('|', ',', 'c') 168±3ms 164±6ms 0.98 io.csv.ReadCSVThousands.time_thousands('|', ',', 'python') 11.3±0.2ms 11.8±0.4ms 1.04 io.csv.ReadCSVThousands.time_thousands('|', None, 'c') 59.2±1ms 57.7±0.9ms 0.97 io.csv.ReadCSVThousands.time_thousands('|', None, 'python') 3.38±0.6ms 3.55±0.2ms 1.05 io.csv.ReadUint64Integers.time_read_uint64 5.53±0.2ms 5.67±0.2ms 1.03 io.csv.ReadUint64Integers.time_read_uint64_na_values 5.62±0.4ms 5.43±0.2ms 0.96 io.csv.ReadUint64Integers.time_read_uint64_neg_values 114±2ms 112±1ms 0.98 io.csv.ToCSV.time_frame('long') 16.5±0.2ms 16.3±0.2ms 0.99 io.csv.ToCSV.time_frame('mixed') 91.3±1ms 94.2±3ms 1.03 io.csv.ToCSV.time_frame('wide') 7.71±0.4ms 7.79±0.3ms 1.01 io.csv.ToCSVDatetime.time_frame_date_formatting 7.15±0.06ms 7.26±0.08ms 1.02 io.csv.ToCSVDatetimeBig.time_frame(1000) 66.7±1ms 66.7±0.6ms 1.00 io.csv.ToCSVDatetimeBig.time_frame(10000) 671±5ms 668±6ms 1.00 io.csv.ToCSVDatetimeBig.time_frame(100000) 380±2ms 379±7ms 1.00 io.csv.ToCSVDatetimeIndex.time_frame_date_formatting_index 142±1ms 142±2ms 1.00 io.csv.ToCSVDatetimeIndex.time_frame_date_no_format_index 711±9ms 715±9ms 1.01 io.csv.ToCSVIndexes.time_head_of_multiindex 721±5ms 718±6ms 1.00 io.csv.ToCSVIndexes.time_multiindex 580±8ms 579±6ms 1.00 io.csv.ToCSVIndexes.time_standard_index 238±5ms 234±4ms 0.98 io.csv.ToCSVMultiIndexUnusedLevels.time_full_frame 21.5±0.1ms 22.1±0.3ms 1.03 io.csv.ToCSVMultiIndexUnusedLevels.time_single_index_frame 22.5±0.5ms 22.6±0.2ms 1.00 io.csv.ToCSVMultiIndexUnusedLevels.time_sliced_frame 798±60ms 747±10ms 0.94 io.excel.ReadExcel.time_read_excel('odf') 196±10ms 180±3ms 0.92 io.excel.ReadExcel.time_read_excel('openpyxl') 43.5±2ms 41.0±1ms 0.94 io.excel.ReadExcel.time_read_excel('xlrd') 439±20ms 420±7ms 0.96 io.excel.WriteExcel.time_write_excel('openpyxl') 233±10ms 230±10ms 0.99 io.excel.WriteExcel.time_write_excel('xlsxwriter') 220±8ms 227±4ms 1.03 io.excel.WriteExcel.time_write_excel('xlwt')

I added a simple asv benchmark that triggers the relevant check. It is also not affected at all

before after ratio [c355145c] [994a6345] <raise-on-parse-int-overflow~7> <raise-on-parse-int-overflow> 1.34±0.1ms 1.31±0.04ms 0.98 io.csv.ReadUint8Integers.time_read_uint8

A separate timeit on the comparison (arr1 == arr2).all() shows that it takes ~5µs compared to ~1ms of the total read_csv.

Some additional remarks: For running the asvs in io.csv I had to:

Adapt asv.conf.json to use python 3.9 instead of 3.8 in order to get build isolation + conda compilers env working (same problem like discussed e.g. in https://stackoverflow.com/questions/71340058/conda-does-not-look-for-libpthread-and-libpthread-nonshared-at-the-right-place-w)

Bump Cython version to 0.29.30 in asv.conf.json (see min_cython_ver in setup.py)

Fixed the existing ReadCSVIndexCol.time_read_csv_index_col (see 61a36a5)

@jbrockmendel should _dtype_can_hold_range be used here to check int64 lossless conversion?

dtype_can_hold_range is specific to range objects. looks like we have an ndarray here?

Yes, comparing ndarrays here. I could change it to np.array_equal for readability, but apart from some safety boilerplate the same check is performed there: https://github.com/numpy/numpy/blob/50a74fb65fc752e77a2f9e9e2b7227629c2ba953/numpy/core/numeric.py#L2468

doc/source/whatsnew/v1.5.0.rst

pandas/tests/io/parser/test_textreader.py

jreback · 2022-06-06T12:49:30Z

pandas/_libs/parsers.pyx

+            if user_dtype and dtype == 'uint64':
+                do_try_uint64 = True
+            else:
+                try:


the pattern here is to _try_dtype then if that fails try another one, can you just do that here (rather than all of this if/then logic).
e.g. add a _try_uint64 is needed

I tried to stay with the pattern _try_int64 -> if fail -> _try_uint64, but shortcut the two cases with user_dtype and dtype in ["int64", "uint64"], where we can fail after the specific _try_dtype. I added another _try_uint64 instead of the do_try_uint64. Is that okay?

Hm I agree with @jreback, this is hard to read

I also agree and added a _try_uint64 in commit 927ddad like @jreback suggested. Were you also referring to the former version with do_try_uint64 @phofl ? The latest version is https://github.com/SandroCasagrande/pandas/blob/5896e017ca2960e0d535c8c0a0b9db978377bc91/pandas/_libs/parsers.pyx#L1182-L1198. Sorry, if I did not signify correctly that I performed changes and the newest version should be reviewed again. Can you please have a look @jreback or @phofl?

github-actions · 2022-07-07T00:05:53Z

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

mroeschke · 2022-07-22T17:50:39Z

Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen.

…overflow

SandroCasagrande · 2022-07-23T23:25:19Z

I am still interested in this. I just added changes that adress requests raised by @jreback and @simonjayhawkins. Local tests run successfully.

I decided to add the check for engine="python" as I outlined above inside the base parser (with the only difference being that the check is also necessary for user dtype uint64). Concerning the question I raised about having an option to check for overflows in astype_nansafe, I came to the conclusion that this is probably unnecessary. The only relevant situation where I expect to consistently fail on insufficiently specified dtypes is parsing, i.e. what is handled in this PR.

Please reopen and review.

…pecified int dtype

SandroCasagrande · 2022-07-24T22:10:31Z

Thanks @mroeschke for reopening.

SandroCasagrande · 2022-07-24T22:17:51Z

I just recognized that the changes in this PR has a good side effect.

Up to now, implicity lossy coercion of float to int could occur for engine="python" with non-extension integer dtype, i.e.

>>> import pandas as pd
>>> from io import StringIO
>>> pd.read_csv(StringIO("0.1"), header=None, dtype="int", engine="python").squeeze().item()
0

while these similar variants already raise errors

>>> pd.read_csv(StringIO("0.1"), header=None, dtype="Int64", engine="python")
Traceback (most recent call last):
  ...
TypeError: cannot safely cast non-equivalent float64 to int64
>>> pd.read_csv(StringIO("0.1"), header=None, dtype="int", engine="c")
Traceback (most recent call last):
  ...
ValueError: cannot safely convert passed user dtype of int64 for float64 dtyped data in column 0

The changes in this PR let all of the above variants raise an error, such that the "c" and "python" engine will behave consistently.

Two remarks concerning this:

I had to change a test on read_fwf that assumed lossy float to int coercion. The exact behavior of float to int coercion due to given dtype parameter is not explicitly documented for read_csv, read_fwf, and read_table as far as I can see. So the present possibility of lossy coercion to int is not guaranteed. Imho a consistent, and safe behavior is better.
The behavior of engine="pyarrow" is to perform silent lossy float to int coercion and integer overflows.

pandas/_libs/parsers.pyx

pandas/io/parsers/base_parser.py

…rser

asv_bench/benchmarks/io/csv.py

phofl · 2022-09-06T14:22:07Z

doc/source/whatsnew/v1.5.0.rst

@@ -1134,6 +1134,8 @@ I/O
 - Bug in :func:`read_parquet` with ``use_nullable_dtypes=True`` where ``float64`` dtype was returned instead of nullable ``Float64`` dtype (:issue:`45694`)
 - Bug in :meth:`DataFrame.to_json` where ``PeriodDtype`` would not make the serialization roundtrip when read back with :meth:`read_json` (:issue:`44720`)
 - Bug in :func:`read_xml` when reading XML files with Chinese character tags and would raise ``XMLSyntaxError`` (:issue:`47902`)
+- Bug in :func:`read_csv` with specified (non-extension) integer ``dtype`` can cause silent overflow or unexpected return dtype (:issue:`47167`)


Why non-extension? If you want to reference numpy dtypes you can say numpy integer dtype

Okay, did that. Should I also move it to aim for v1.6.0 already?

phofl · 2022-09-06T14:23:06Z

pandas/_libs/parsers.pyx

+            if user_dtype and dtype == 'uint64':
+                do_try_uint64 = True
+            else:
+                try:


Hm I agree with @jreback, this is hard to read

pandas/tests/io/parser/common/test_ints.py

mroeschke · 2023-01-12T23:59:40Z

Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen.

SandroCasagrande added 2 commits May 30, 2022 10:45

TST: integer overflow on parsing with insufficient user dtype

8d0efca

squash! TST: integer overflow on parsing with insufficient user dtype

BUG: raise on integer overflow when parsing with insufficient user dtype

27629f0

SandroCasagrande added 4 commits May 30, 2022 09:49

Fixes from pre-commit [automated commit]

bfb0b89

DOC: added entry in whatsnew

661c853

Introduce emtpy match in pytest.raises for flake8

ccb6f61

Changed import location of is_extension_array_dtype for type check

a3b458a

phofl reviewed May 30, 2022

View reviewed changes

SandroCasagrande added 4 commits May 31, 2022 15:15

PERF: avoid try parse as int64 if user specified uint64

994a634

Merge branch 'main' into raise-on-parse-int-overflow

567fd58

TST: simple asv for uint8 parsing

43bcb22

BUG: stringio rewind in asv ReadCSVIndexCol

61a36a5

simonjayhawkins reviewed Jun 6, 2022

View reviewed changes

doc/source/whatsnew/v1.5.0.rst Outdated Show resolved Hide resolved

pandas/tests/io/parser/test_textreader.py Outdated Show resolved Hide resolved

jreback added the Dtype Conversions Unexpected or buggy dtype conversions label Jun 6, 2022

jreback requested changes Jun 6, 2022

View reviewed changes

CLN: simplified conditional logic for int parsing

927ddad

github-actions bot added the Stale label Jul 7, 2022

mroeschke closed this Jul 22, 2022

SandroCasagrande added 4 commits July 22, 2022 22:37

Merge remote-tracking branch 'upstream/main' into raise-on-parse-int-…

d992af5

…overflow

TST: reduced repetition by using any_int_dtype in test

498b93d

TST: added tests for read_csv with both engines c and python

9e1fbbc

BUG: raise on integer overflow when parsing with insufficient user dtype

ef91ab5

mroeschke reopened this Jul 24, 2022

SandroCasagrande added 2 commits July 24, 2022 20:43

TST: added/modified tests to raise on lossy float conversion due to s…

270eb90

…pecified int dtype

DOC: minor correction in test docstring

dd5cd0e

jbrockmendel reviewed Aug 23, 2022

View reviewed changes

pandas/_libs/parsers.pyx Outdated Show resolved Hide resolved

jbrockmendel reviewed Aug 23, 2022

View reviewed changes

pandas/io/parsers/base_parser.py Outdated Show resolved Hide resolved

SandroCasagrande added 4 commits August 23, 2022 20:28

CLN: moved na-check into else branch

520cae3

CLN: re-use maybe_cast_to_integer_array for checked cast in python pa…

88d8650

…rser

TST: specified expected exception

d96d6b0

Merge branch 'pandas-dev:main' into raise-on-parse-int-overflow

3508a9f

SandroCasagrande force-pushed the raise-on-parse-int-overflow branch from 98af575 to 054289f Compare August 25, 2022 13:48

TST: fixed int-overflow test

2c16f74

SandroCasagrande force-pushed the raise-on-parse-int-overflow branch from 054289f to 2c16f74 Compare August 26, 2022 21:16

SandroCasagrande added 2 commits August 26, 2022 23:26

Merge branch 'pandas-dev:main' into raise-on-parse-int-overflow

485dcfc

Merge branch 'main' into raise-on-parse-int-overflow

5896e01

phofl reviewed Sep 6, 2022

View reviewed changes

Merge branch 'pandas-dev:main' into raise-on-parse-int-overflow

b276196

SandroCasagrande force-pushed the raise-on-parse-int-overflow branch from 069203e to 24c3ef9 Compare September 6, 2022 20:31

CLN: create asv input without overflow to prevent potential warnings

a1a6764

SandroCasagrande force-pushed the raise-on-parse-int-overflow branch from 24c3ef9 to a1a6764 Compare September 6, 2022 20:34

SandroCasagrande added 10 commits September 6, 2022 20:58

DOC: fixed wording in whatsnew

c9e8a92

TST: split float to int coercion test into two separate tests

39b5c91

TST: improved comment and referenced issue

92fab59

TST: aviod conditional raise

f653e96

Merge branch 'main' into raise-on-parse-int-overflow

8b406ab

Merge branch 'main' into raise-on-parse-int-overflow

8d782fb

Merge branch 'pandas-dev:main' into raise-on-parse-int-overflow

09e6773

Merge branch 'main' into raise-on-parse-int-overflow

a545602

Merge branch 'main' into raise-on-parse-int-overflow

0439322

Merge branch 'main' into raise-on-parse-int-overflow

b392f32

SandroCasagrande force-pushed the raise-on-parse-int-overflow branch from dbf5a3c to b392f32 Compare December 15, 2022 14:39

Merge branch 'main' into raise-on-parse-int-overflow

fca428c

mroeschke closed this Jan 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Raise on parse int overflow #47167 #47168

BUG: Raise on parse int overflow #47167 #47168

SandroCasagrande commented May 30, 2022 •

edited

Loading

SandroCasagrande commented May 30, 2022

phofl May 30, 2022

SandroCasagrande May 31, 2022

SandroCasagrande May 31, 2022

mroeschke Jul 25, 2022

jbrockmendel Aug 23, 2022

SandroCasagrande Aug 25, 2022

jreback Jun 6, 2022

SandroCasagrande Jun 6, 2022

phofl Sep 6, 2022

SandroCasagrande Sep 6, 2022 •

edited

Loading

github-actions bot commented Jul 7, 2022

mroeschke commented Jul 22, 2022

SandroCasagrande commented Jul 23, 2022

SandroCasagrande commented Jul 24, 2022

SandroCasagrande commented Jul 24, 2022

phofl Sep 6, 2022

SandroCasagrande Sep 6, 2022

phofl Sep 6, 2022

mroeschke commented Jan 12, 2023

	casted = values.astype(dtype, copy=copy)
	if (casted == values).all():
	return casted

BUG: Raise on parse int overflow #47167 #47168

BUG: Raise on parse int overflow #47167 #47168

Conversation

SandroCasagrande commented May 30, 2022 • edited Loading

SandroCasagrande commented May 30, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SandroCasagrande Sep 6, 2022 • edited Loading

Choose a reason for hiding this comment

github-actions bot commented Jul 7, 2022

mroeschke commented Jul 22, 2022

SandroCasagrande commented Jul 23, 2022

SandroCasagrande commented Jul 24, 2022

SandroCasagrande commented Jul 24, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mroeschke commented Jan 12, 2023

SandroCasagrande commented May 30, 2022 •

edited

Loading

SandroCasagrande Sep 6, 2022 •

edited

Loading