Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

INT: the json C code should not deal with blocks #27164

Closed
jorisvandenbossche opened this issue Jul 1, 2019 · 4 comments · Fixed by #41081
Closed

INT: the json C code should not deal with blocks #27164

jorisvandenbossche opened this issue Jul 1, 2019 · 4 comments · Fixed by #41081
Labels
Internals Related to non-user accessible pandas implementation IO JSON read_json, to_json, json_normalize Refactor Internal refactoring of code
Milestone

Comments

@jorisvandenbossche
Copy link
Member

PR #26409 introduced a get_block_values to get the values of the Block, used in https://github.com/pandas-dev/pandas/blob/master/pandas/_libs/src/ujson/python/objToJSON.c

Ideally, the JSON C code should not deal with blocks (eg by converting to a numpy array before passing the data to the C code).

So to do item for this issue: update the JSON code so the call from objToJSON.c to get_block_values can be removed.

@jorisvandenbossche jorisvandenbossche added Internals Related to non-user accessible pandas implementation IO JSON read_json, to_json, json_normalize labels Jul 1, 2019
@jorisvandenbossche jorisvandenbossche added this to the Contributions Welcome milestone Jul 1, 2019
@WillAyd
Copy link
Member

WillAyd commented Jul 1, 2019

Took a look and I think getting rid of this really simplifies the code. Have one failure I haven't seen yet and need to check for memory leaks but will post a PR for you to review

@jbrockmendel
Copy link
Member

+1 on this idea. IIRC there was a discussion a few months ago about trying to get internals out of the ujson code and the conclusion was "wait and see if we can use the arrow parser instead". Since the latter is a non-starter for the moment, making this code more maintainable would be great.

@jbrockmendel
Copy link
Member

just noticed that get_block_values is just an alias for get_values. Any reason not to just use get_values (until block usage can be removed entirely)?

@mroeschke mroeschke added the Refactor Internal refactoring of code label Apr 27, 2020
@jorisvandenbossche
Copy link
Member Author

Experiments of @WillAyd related to this: #27166, #32343

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Internals Related to non-user accessible pandas implementation IO JSON read_json, to_json, json_normalize Refactor Internal refactoring of code
Projects
None yet
5 participants