-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add method to convert a data frame to a JSON string #94
base: main
Are you sure you want to change the base?
Conversation
PS, we'd probably also love to have the JSON come back as a byte array and skip the string conversion |
let me check this out and get back to you by Tuesday |
Thanks for looking into this, @chitralverma . Are there any updates yet? My team is considering using/contributing to this project but are wondering if anyone is still actively working on it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My apologies for the late review, I was traveling and busy with other things at work.
I've added the review comments now.
.bloop | ||
.metals | ||
metals.sbt | ||
.vscode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these can be excluded
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wanna keep the build light.
def toJsonString(pretty: Boolean, rowOriented: Boolean): String = json(ptr, pretty, rowOriented) | ||
|
||
def toJsonBytes(pretty: Boolean, rowOriented: Boolean): Array[Byte] = jsonBytes(ptr, pretty, rowOriented) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
three things here,
- Writeable contains things strictly for writing to stores, not for returning, so if we want to return json as string, there should be some other place to materialize this, maybe via
Series
or something. - Instead of toJsonString and toJsonBytes, I suggest adding 2 methods - json and ndJson (new line separated json). signature will look similar to that of avro etc.
- any kind of additional options for a writable must come from
options
oroption
this is already in place, just need to use this on rust side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for point 1, another option can be via df.rows.asJsonString
, df.rows.asJsonBytes
@native def json(ptr: Long, pretty: Boolean, rowOriented: Boolean): String | ||
|
||
@native def jsonBytes(ptr: Long, pretty: Boolean, rowOriented: Boolean): Array[Byte] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs changes as per the suggestions above.
@@ -0,0 +1,47 @@ | |||
package examples.scala.io |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a separate example might not be needed for every format, but it will be great if this can be converted to a bunch of unit tests instead. current coverage is 0 and is the next thing on my agenda :)
@@ -0,0 +1,61 @@ | |||
#![allow(non_snake_case)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this whole thing needs changes as per the suggestions above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can follow the same file contents as parquet.rs
on same level.
For inspiration you can see write_json and write_ndjson
Adds support for getting data out of a
DataFrame
as JSON.The current
scala-polars
bindings appear to support file I/O, but I couldn't find a good way to load data in to aDataFrame
, run some transformations, and then get the results back out for programmatic access. We'd be interested in working on ways to export columns of data as ScalaVector
orArray
data types as well, but this gets us started.I welcome suggestions, as I'm a novice when it comes both to Polars and Rust.