Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add method to convert a data frame to a JSON string #94

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

mtomko
Copy link

@mtomko mtomko commented Apr 2, 2024

Adds support for getting data out of a DataFrame as JSON.

The current scala-polars bindings appear to support file I/O, but I couldn't find a good way to load data in to a DataFrame, run some transformations, and then get the results back out for programmatic access. We'd be interested in working on ways to export columns of data as Scala Vector or Array data types as well, but this gets us started.

I welcome suggestions, as I'm a novice when it comes both to Polars and Rust.

@mtomko
Copy link
Author

mtomko commented Apr 2, 2024

PS, we'd probably also love to have the JSON come back as a byte array and skip the string conversion

@chitralverma
Copy link
Owner

let me check this out and get back to you by Tuesday

@tmgreen
Copy link

tmgreen commented May 13, 2024

Thanks for looking into this, @chitralverma . Are there any updates yet? My team is considering using/contributing to this project but are wondering if anyone is still actively working on it.

Copy link
Owner

@chitralverma chitralverma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My apologies for the late review, I was traveling and busy with other things at work.
I've added the review comments now.

Comment on lines +11 to +14
.bloop
.metals
metals.sbt
.vscode
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these can be excluded

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wanna keep the build light.

Comment on lines +159 to +161
def toJsonString(pretty: Boolean, rowOriented: Boolean): String = json(ptr, pretty, rowOriented)

def toJsonBytes(pretty: Boolean, rowOriented: Boolean): Array[Byte] = jsonBytes(ptr, pretty, rowOriented)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

three things here,

  1. Writeable contains things strictly for writing to stores, not for returning, so if we want to return json as string, there should be some other place to materialize this, maybe via Series or something.
  2. Instead of toJsonString and toJsonBytes, I suggest adding 2 methods - json and ndJson (new line separated json). signature will look similar to that of avro etc.
  3. any kind of additional options for a writable must come from options or option this is already in place, just need to use this on rust side.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for point 1, another option can be via df.rows.asJsonString, df.rows.asJsonBytes

Comment on lines +33 to +36
@native def json(ptr: Long, pretty: Boolean, rowOriented: Boolean): String

@native def jsonBytes(ptr: Long, pretty: Boolean, rowOriented: Boolean): Array[Byte]

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs changes as per the suggestions above.

@@ -0,0 +1,47 @@
package examples.scala.io
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a separate example might not be needed for every format, but it will be great if this can be converted to a bunch of unit tests instead. current coverage is 0 and is the next thing on my agenda :)

@@ -0,0 +1,61 @@
#![allow(non_snake_case)]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this whole thing needs changes as per the suggestions above.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can follow the same file contents as parquet.rs on same level.

For inspiration you can see write_json and write_ndjson

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants