Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

json round trip exception #3867

Closed
hayd opened this issue Jun 12, 2013 · 13 comments
Closed

json round trip exception #3867

hayd opened this issue Jun 12, 2013 · 13 comments

Comments

@hayd
Copy link
Contributor

hayd commented Jun 12, 2013

This csv (from the baseball database) reads ok to a DataFrame, pastes ok to a json.

In [6]: df = pd.read_csv('https://raw.github.com/hayd/lahman2012/master/csvs/Teams.csv')

In [7]: s = df.to_json()

In [8]: pd.read_json(s)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-8-ebde42cd0695> in <module>()
----> 1 pd.read_json(s)

/Users/234BroadWalk/pandas/pandas/io/json.pyc in read_json(path_or_buf, orient, typ, dtype, numpy, parse_dates, keep_default_dates)
    158     obj = None
    159     if typ == 'frame':
--> 160         obj = FrameParser(json, orient, dtype, numpy, parse_dates, keep_default_dates).parse()
    161
    162     if typ == 'series' or obj is None:

/Users/234BroadWalk/pandas/pandas/io/json.pyc in parse(self)
    185
    186     def parse(self):
--> 187         self._parse()
    188         if self.obj is not None:
    189             self._convert_axes()

/Users/234BroadWalk/pandas/pandas/io/json.pyc in _parse(self)
    284             try:
    285                 if orient == "columns":
--> 286                     args = loads(json, dtype=dtype, numpy=True, labelled=True)
    287                     if args:
    288                         args = (args[0].T, args[2], args[1])

TypeError: long() argument must be a string or a number, not 'NoneType'

cc #3804

@jreback
Copy link
Contributor

jreback commented Jun 12, 2013

was a bug, but ran into another feature/bug

here's my new test:

df = pd.read_csv('https://raw.github.com/hayd/lahman2012/master/csvs/Teams.csv')
s = df.to_json()
result = pd.read_json(s)
result.index = result.index.astype(int)
result = result.reindex(columns=df.columns,index=df.index)
assert_frame_equal(result,df)

so, I am not sure json guarantees order?
and should I try to do automatic index conversion on other types (I am doing it on datetimes now)?

@hayd
Copy link
Contributor Author

hayd commented Jun 12, 2013

Guess it's not so surprising, python dictionaries don't... (I don't think?). Quite a big file to test against!

Not sure, what were you thinking?

@jreback
Copy link
Contributor

jreback commented Jun 12, 2013

I think @cpcloud had sort of the same problem in html, he added infer_types kw....now I am doing that for dates now; I mean its not hard to do a soft conversion, e.g. no forcing......

@cpcloud
Copy link
Member

cpcloud commented Jun 12, 2013

do all valid json objects have a total ordering in python? if they do why not guarantee ordering, unless of course that goes against json spec...

python dicts don't because there are hashable objects that don't define an ordering eg complex numbers, custom objects, among other erasons

@hayd
Copy link
Contributor Author

hayd commented Jun 12, 2013

Hmmm, different bug?

In [5]: pd.read_json('[{"a": 1, "b": 2}, {"b":2, "a" :1}]')
Out[5]:
   0  1
a  1  2
b  2  1

@jreback
Copy link
Contributor

jreback commented Jun 12, 2013

which one is more useful to round-trip exactly?

biggie = DataFrame(np.zeros((200, 4)),
                           columns=[str(i) for i in range(4)],
                           index=[str(i) for i in range(200)])
biggie2 = DataFrame(np.zeros((200,4)),
                           columns=range(4),
                           index=range(200))

@jreback
Copy link
Contributor

jreback commented Jun 12, 2013

@cpcloud any thoughts?

@cpcloud
Copy link
Member

cpcloud commented Jun 12, 2013

roundtrip doesn't look like it can be invertible...they both json'd the same because of json's rules about keys in objects (must be string).

@jreback
Copy link
Contributor

jreback commented Jun 12, 2013

I am going to setup some options so the second will roundtrip
hence convert_axes=True

while the 1st will work if you pass
convert_axes=False

@cpcloud
Copy link
Member

cpcloud commented Jun 12, 2013

this might present a problem for nested json, no? that's a different beast though so for "frame/series-able" json that's probably ok

@jreback
Copy link
Contributor

jreback commented Jun 12, 2013

conversion is done at the end
so should worl

@jreback
Copy link
Contributor

jreback commented Jun 13, 2013

fixed by #3876

@jreback
Copy link
Contributor

jreback commented Jun 13, 2013

closing this as incorporated in #3876

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants