BUG: to_json failing on PyPy #40525

mzeitlin11 · 2021-03-20T00:20:37Z

closes BUG: AttributeError: 'BlockManager' object has no attribute 'is_mixed_type' when trying to jsonify dataframe. #39837
Ensure all linting tests pass, see here for how to run them
whatsnew entry

Not sure how best to test this - confirmed this now works when run on PyPy, but since we don't test against PyPy the examples in the issue (or any other simple case) which fail on PyPy will still pass on master.

Benchmarks look similar, might be minor performance improvement for the single block case:

before           after         ratio
     [b524462e]       [b1f4bcc8]
     <master>         <bug/is_mixed_type>
          194±5ms          190±2ms     0.98  io.json.ToJSONISO.time_iso_format('columns')
          208±5ms          198±6ms     0.95  io.json.ToJSONISO.time_iso_format('index')
          186±4ms          173±3ms     0.93  io.json.ToJSONISO.time_iso_format('records')
         193±10ms          187±3ms     0.97  io.json.ToJSONISO.time_iso_format('split')
          164±4ms          165±3ms     1.00  io.json.ToJSONISO.time_iso_format('values')
          115±4ms          114±3ms     0.99  io.json.ToJSONLines.time_delta_int_tstamp_lines
          142±4ms          132±6ms     0.93  io.json.ToJSONLines.time_float_int_lines
          140±4ms          142±4ms     1.02  io.json.ToJSONLines.time_float_int_str_lines
          152±2ms          153±3ms     1.01  io.json.ToJSONLines.time_float_longint_str_lines
          110±3ms          107±3ms     0.97  io.json.ToJSONLines.time_floats_with_dt_index_lines
          112±3ms          104±3ms     0.93  io.json.ToJSONLines.time_floats_with_int_idex_lines
          54.3M            55M         1.01  io.json.ToJSONMem.peakmem_float
          55.4M               55.2M    1.00  io.json.ToJSONMem.peakmem_int
          88.6±2ms         78.8±2ms    0.89  io.json.ToJSON.time_to_json('index', 'df')
          70.7±2ms         61.7±2ms    0.87  io.json.ToJSON.time_to_json('values', 'df')
          94.9±3ms         81.4±1ms    0.86  io.json.ToJSON.time_to_json('index', 'df_date_idx')
          76.3±0.9ms       65.1±1ms    0.85  io.json.ToJSON.time_to_json('records', 'df_date_idx')

doc/source/whatsnew/v1.2.4.rst

…o bug/is_mixed_type

jbrockmendel · 2021-03-22T03:57:34Z

Looks harmless cc @WillAyd

jreback · 2021-03-22T13:00:37Z

looks fine and backport ok.

WillAyd · 2021-03-22T20:56:33Z

pandas/_libs/src/ujson/python/objToJSON.c

@@ -144,6 +144,7 @@ typedef struct __PyObjectEncoder {
 enum PANDAS_FORMAT { SPLIT, RECORDS, INDEX, COLUMNS, VALUES };

 int PdBlock_iterNext(JSOBJ, JSONTypeContext *);
+static Py_ssize_t get_attr_length(PyObject *obj, char *attr);


Rather than forward declaring this can you just swap the position the of get_attr_length and is_simple_frame functions?

Will make that change

WillAyd · 2021-03-22T21:00:06Z

pandas/_libs/src/ujson/python/objToJSON.c

        return 0;
    }
+    int ret = (get_attr_length(mgr, "blocks") <= 1);


Hmm I'm not really sure about this change. The problem with how this code is currently set up is that when is_simple_frame is True then we will just serialize whatever .values is for that DataFrame. However, the rules for extension arrays are a bit different.

Do we have test coverage for DataFrames that only contain 1 block of an extension array? If not I'd be worried this could change things that we aren't testing currently

My motivation here was to replace the is_mixed_type property the same way it was handled in #40525 (so theoretically this should revert behavior to pre #40525).

Will look into your question and see if that case is covered

Didn't see any coverage, have added some simple tests for the one ExtensionBlock case (which match 1.1.5 behavior). Let me know if this is what you had in mind!

WillAyd · 2021-03-23T01:22:49Z

Thanks for the updates. Actually reading through the history a little more I understand the length check that you've added - I didn't realize that was the case before in internals

The test isn't really what I was looking for but let's just go ahead and remove that to get this in. For context, the extension arrays I were worried about are likely commented here:

pandas/pandas/_libs/src/ujson/python/objToJSON.c

Line 222 in ced764d

// The special cases to worry about are dt64tz and category[dt64tz].

We've had a rather strange history with .values and what actually gets read in the JSON code. At this point it appears pretty much everything uses that though. If you are interested in contributing more could definitely use help cleaning up the comments and code around that...

But yea for now let's just remove the test and merge this in

mzeitlin11 · 2021-03-23T01:31:59Z

Removed the test. Thanks for pointing that out, I see what you mean now

WillAyd

lgtm when green

jreback · 2021-03-23T15:18:01Z

thanks @mzeitlin11

jreback · 2021-03-23T15:18:13Z

@meeseeksdev backport 1.2.x

Co-authored-by: Matthew Zeitlin <37011898+mzeitlin11@users.noreply.github.com>

mzeitlin11 added 2 commits March 19, 2021 19:15

WIP

197a2f1

BUG: to_json failing on PyPy

b1f4bcc

mzeitlin11 added IO JSON read_json, to_json, json_normalize Regression Functionality that used to work in a prior pandas version labels Mar 20, 2021

mzeitlin11 commented Mar 20, 2021

View reviewed changes

doc/source/whatsnew/v1.2.4.rst Outdated Show resolved Hide resolved

mzeitlin11 and others added 3 commits March 20, 2021 11:14

Update doc/source/whatsnew/v1.2.4.rst

fc59ec4

Merge remote-tracking branch 'origin/master' into bug/is_mixed_type

16d13d0

Merge branch 'bug/is_mixed_type' of github.com:/mzeitlin11/pandas int…

0900f23

…o bug/is_mixed_type

jreback added this to the 1.2.4 milestone Mar 22, 2021

WillAyd requested changes Mar 22, 2021

View reviewed changes

mzeitlin11 added 3 commits March 22, 2021 19:09

Merge remote-tracking branch 'origin/master' into bug/is_mixed_type

f606811

Reorder functions to remove forward dec

6582051

Add simple extension array tests

51f37e0

Revert test addition

9af4785

WillAyd approved these changes Mar 23, 2021

View reviewed changes

jreback merged commit 1dab473 into pandas-dev:master Mar 23, 2021

mzeitlin11 deleted the bug/is_mixed_type branch March 23, 2021 15:19

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Mar 23, 2021

Backport PR pandas-dev#40525: BUG: to_json failing on PyPy

28a242d

meeseeksmachine mentioned this pull request Mar 23, 2021

Backport PR #40525 on branch 1.2.x (BUG: to_json failing on PyPy) #40590

Merged

simonjayhawkins pushed a commit that referenced this pull request Mar 25, 2021

Backport PR #40525: BUG: to_json failing on PyPy (#40590)

97f9a1e

Co-authored-by: Matthew Zeitlin <37011898+mzeitlin11@users.noreply.github.com>

vladu pushed a commit to vladu/pandas that referenced this pull request Apr 5, 2021

BUG: to_json failing on PyPy (pandas-dev#40525)

0906d35

JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021

BUG: to_json failing on PyPy (pandas-dev#40525)

85402c9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: to_json failing on PyPy #40525

BUG: to_json failing on PyPy #40525

mzeitlin11 commented Mar 20, 2021

jbrockmendel commented Mar 22, 2021

jreback commented Mar 22, 2021

WillAyd Mar 22, 2021

mzeitlin11 Mar 22, 2021

WillAyd Mar 22, 2021

mzeitlin11 Mar 22, 2021

mzeitlin11 Mar 23, 2021

WillAyd commented Mar 23, 2021

mzeitlin11 commented Mar 23, 2021

WillAyd left a comment

jreback commented Mar 23, 2021

jreback commented Mar 23, 2021

BUG: to_json failing on PyPy #40525

BUG: to_json failing on PyPy #40525

Conversation

mzeitlin11 commented Mar 20, 2021

jbrockmendel commented Mar 22, 2021

jreback commented Mar 22, 2021

WillAyd Mar 22, 2021

Choose a reason for hiding this comment

mzeitlin11 Mar 22, 2021

Choose a reason for hiding this comment

WillAyd Mar 22, 2021

Choose a reason for hiding this comment

mzeitlin11 Mar 22, 2021

Choose a reason for hiding this comment

mzeitlin11 Mar 23, 2021

Choose a reason for hiding this comment

WillAyd commented Mar 23, 2021

mzeitlin11 commented Mar 23, 2021

WillAyd left a comment

Choose a reason for hiding this comment

jreback commented Mar 23, 2021

jreback commented Mar 23, 2021