Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(python): Parse JSON data in Utf8 to polars dtype #6885

Merged
merged 1 commit into from
Feb 15, 2023
Merged

feat(python): Parse JSON data in Utf8 to polars dtype #6885

merged 1 commit into from
Feb 15, 2023

Conversation

josh
Copy link
Contributor

@josh josh commented Feb 15, 2023

I was looking for a native replacement for a simple apply(json.loads) UDF that also worked well on lazy frames. I saw str.json_path_match but I really wanted a parsed struct (or whatever dtype) back, not a string value.

It looks like some initial work on this started back in #3413 and got partially exposed in #5140. A private Utf8Chunked.json_extract helper was added, but it never was fully exposed publicly on the Py Series or Expr APIs. So this PR exposes it.

The API optionally supports dtype inference on eager frames and series. When used on a lazy frame, the default unknown dtype will properly lead to an error. Additionally, a nice feature over apply(json.loads) is that a partial dtype can be supplied to omit struct keys you're not interested in decoding. Seems to be a nice property of using read_ndjson under the hood.

How does the name json_extract still sound? That's the existing name of the internal function so just went with it. parse_json might have been my first instinct, but deferring to the maintainers preference.

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars labels Feb 15, 2023
@ritchie46
Copy link
Member

Thanks a lot! That's great functionality indeed. Can you expose the methods to the docs (on alphabetical order).

@josh
Copy link
Contributor Author

josh commented Feb 15, 2023

Can you expose the methods to the docs (on alphabetical order).

✅ Done

@ritchie46 ritchie46 changed the title feat(python): Parse JSON columns feat(python): Parse JSON data in Utf8 to polars dtype Feb 15, 2023
@ritchie46 ritchie46 merged commit 8c119ab into pola-rs:master Feb 15, 2023
@josh josh deleted the json_extract branch February 15, 2023 18:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or an improvement of an existing feature python Related to Python Polars
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants