-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: to_json - prevent various segfault conditions (GH14256) #17857
Conversation
doc/source/whatsnew/v0.21.0.txt
Outdated
@@ -940,3 +940,5 @@ Other | |||
^^^^^ | |||
- Bug where some inplace operators were not being wrapped and produced a copy when invoked (:issue:`12962`) | |||
- Bug in :func:`eval` where the ``inplace`` parameter was being incorrectly handled (:issue:`16732`) | |||
- Bug in :func:`to_json` where several conditions (including objects with unprintable symbols, objects with deep recursion, overlong labels) caused segfaults instead of raising the appropriate exception (:issue:`14256`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an I/O bug, so add it under that sub-section.
Buffer_Realloc((__enc), (__len)); \ | ||
} | ||
|
||
void Buffer_Realloc(JSONObjectEncoder *enc, size_t cbNeeded); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do you implement this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's already in ultrajsonenc.c. No changes, just exposed in the header to be usable from objToJSON.c
pandas/tests/io/json/test_pandas.py
Outdated
assert df_printable.to_json() == '{"A":{"0":"%s"}}' % hexed | ||
df_nonprintable = DataFrame({'A': [binthing]}) | ||
pytest.raises(exc_type, df_nonprintable.to_json) | ||
# GH14256: failing column caused segfaults, if it is not the last one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to reference this at the top of the function definition.
pandas/tests/io/json/test_pandas.py
Outdated
'{"A":{"0":"%s"},"B":{"0":1}}' % hexed | ||
|
||
def test_label_overflow(self): | ||
df = pd.DataFrame({'foo': [1337], 'bar' * 100000: [1]}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reference the issue number above this line.
Codecov Report
@@ Coverage Diff @@
## master #17857 +/- ##
==========================================
- Coverage 91.22% 91.2% -0.02%
==========================================
Files 163 163
Lines 50069 50038 -31
==========================================
- Hits 45673 45639 -34
- Misses 4396 4399 +3
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #17857 +/- ##
==========================================
+ Coverage 91.23% 91.24% +<.01%
==========================================
Files 163 163
Lines 50075 50075
==========================================
+ Hits 45688 45691 +3
+ Misses 4387 4384 -3
Continue to review full report at Codecov.
|
Thank you for the comments. I moved the test comments and the whatsnew entry, and pushed the changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. just some minor formatting comments of the tests.
pandas/tests/io/json/test_pandas.py
Outdated
hexed = '574b4454ba8c5eb4f98a8f45' | ||
exc_type = OverflowError | ||
binthing = BinaryThing(hexed) | ||
df_printable = DataFrame({'A': [binthing.hexed]}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
before each sub-section, can put a comment on what you are testing; and blank lines between sub-sections
pandas/tests/io/json/test_pandas.py
Outdated
pytest.raises(exc_type, df_nonprintable.to_json) | ||
df_mixed = DataFrame({'A': [binthing], 'B': [1]}, | ||
columns=['A', 'B']) | ||
pytest.raises(exc_type, df_mixed.to_json) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with the with
version to test raising
pandas/tests/io/json/test_pandas.py
Outdated
return self.hexed | ||
|
||
hexed = '574b4454ba8c5eb4f98a8f45' | ||
exc_type = OverflowError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't define this separately, just inline the exceptions you are checking
@matthiashuschle also pls rebase on master. some CI things were updated to make circleci work with the new version of mpl. |
thanks, I just incorporated your suggestions. |
thanks @matthiashuschle nice patch! |
* upstream/master: (76 commits) CategoricalDtype construction: actually use fastpath (pandas-dev#17891) DEPR: Deprecate tupleize_cols in to_csv (pandas-dev#17877) BUG: Fix wrong column selection in drop_duplicates when duplicate column names (pandas-dev#17879) DOC: Adding examples to update docstring (pandas-dev#16812) (pandas-dev#17859) TST: Skip if no openpyxl in test_excel (pandas-dev#17883) TST: Catch read_html slow test warning (pandas-dev#17874) flake8 cleanup (pandas-dev#17873) TST: remove moar warnings (pandas-dev#17872) ENH: tolerance now takes list-like argument for reindex and get_indexer. (pandas-dev#17367) ERR: Raise ValueError when week is passed in to_datetime format witho… (pandas-dev#17819) TST: remove some deprecation warnings (pandas-dev#17870) Refactor index-as-string groupby tests and fix spurious warning (Bug 17383) (pandas-dev#17843) BUG: merging with a boolean/int categorical column (pandas-dev#17841) DEPR: Deprecate read_csv arguments fully (pandas-dev#17865) BUG: to_json - prevent various segfault conditions (GH14256) (pandas-dev#17857) CLN: Use pandas.core.common for None checks (pandas-dev#17816) BUG: set tz on DTI from fixed format HDFStore (pandas-dev#17844) RLS: v0.21.0rc1 Whatsnew cleanup (pandas-dev#17858) DEPR: Deprecate the convert parameter completely (pandas-dev#17831) ...
git diff upstream/master -u -- "*.py" | flake8 --diff
There were several sources for the JSON string buffer at enc->start exceeding the reserved space: