Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add method to convert a data frame to a JSON string #94

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,12 @@
.fleet/
*.iml

# VS Code specific
.bloop
.metals
metals.sbt
.vscode
Comment on lines +11 to +14
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these can be excluded

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wanna keep the build light.


# SBT specific
.bsp/
coverage.xml
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -155,4 +155,9 @@ class Writeable private[polars] (ptr: Long) {
writeMode = _mode
)
}

def toJsonString(pretty: Boolean, rowOriented: Boolean): String = json(ptr, pretty, rowOriented)

def toJsonBytes(pretty: Boolean, rowOriented: Boolean): Array[Byte] = jsonBytes(ptr, pretty, rowOriented)
Comment on lines +159 to +161
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

three things here,

  1. Writeable contains things strictly for writing to stores, not for returning, so if we want to return json as string, there should be some other place to materialize this, maybe via Series or something.
  2. Instead of toJsonString and toJsonBytes, I suggest adding 2 methods - json and ndJson (new line separated json). signature will look similar to that of avro etc.
  3. any kind of additional options for a writable must come from options or option this is already in place, just need to use this on rust side.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for point 1, another option can be via df.rows.asJsonString, df.rows.asJsonBytes


}
Original file line number Diff line number Diff line change
Expand Up @@ -30,4 +30,8 @@ private[polars] object write extends Natively {
writeMode: String
): Unit

@native def json(ptr: Long, pretty: Boolean, rowOriented: Boolean): String

@native def jsonBytes(ptr: Long, pretty: Boolean, rowOriented: Boolean): Array[Byte]

Comment on lines +33 to +36
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs changes as per the suggestions above.

}
47 changes: 47 additions & 0 deletions examples/src/main/scala/examples/scala/io/Json.scala
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
package examples.scala.io
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a separate example might not be needed for every format, but it will be great if this can be converted to a bunch of unit tests instead. current coverage is 0 and is the next thing on my agenda :)


import org.polars.scala.polars.Polars
import org.polars.scala.polars.api.DataFrame

import examples.scala.utils.CommonUtils

/** Polars supports exporting the contents of a [[DataFrame]] to JSON.
*
* It has 2 formats:
* - a row-oriented format, which represents the frame as an array of objects whose keys are
* the column names and whose values are the row’s corresponding values.
* - a column-oriented format, which represents the frame as an array of objects containing a
* column name, type, and the array of column values
*
* The column-oriented format may be pretty-printed. The row-oriented format is less efficient,
* but may be more convenient for downstream applications.
*/
object Json {

def main(args: Array[String]) = {

val path = CommonUtils.getResource("/files/web-ds/data.csv")
val df: DataFrame = Polars.csv.scan(path).collect

println("Showing CSV file as a DataFrame to stdout.")
df.show()

println("Showing column-oriented CSV file as a DataFrame to stdout.")
val colOriented = df.write().toJsonString(pretty = false, rowOriented = false)
println(colOriented)

println("Showing pretty column-oriented CSV file as a DataFrame to stdout.")
val prettyOriented = df.write().toJsonString(pretty = true, rowOriented = false)
println(prettyOriented)

println("Showing row column-oriented CSV file as a DataFrame to stdout.")
val rowOriented = df.write().toJsonString(pretty = false, rowOriented = true)
println(rowOriented)


println("Showing pretty column-oriented CSV file as a DataFrame to stdout.")
val prettyOrientedBytes = df.write().toJsonBytes(pretty = true, rowOriented = false)
println(new String(prettyOrientedBytes, "UTF-8"))
}

}
61 changes: 61 additions & 0 deletions native/src/internal_jni/io/write/json.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
#![allow(non_snake_case)]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this whole thing needs changes as per the suggestions above.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can follow the same file contents as parquet.rs on same level.

For inspiration you can see write_json and write_ndjson


use jni::objects::{JObject, JPrimitiveArray};
use jni::sys::{_jobject, jboolean, jlong};
use jni::JNIEnv;
use jni_fn::jni_fn;
use polars::prelude::*;

use crate::j_data_frame::JDataFrame;

#[jni_fn("org.polars.scala.polars.internal.jni.io.write$")]
pub fn json(
env: JNIEnv,
_object: JObject,
df_ptr: jlong,
pretty: jboolean,
row_oriented: jboolean,
) -> *mut _jobject {
let buf = json_bytes(df_ptr, pretty, row_oriented);
let rust_string = String::from_utf8(buf).unwrap();

let output = env
.new_string(rust_string)
.expect("Couldn't create Java string!");

output.into_raw()
}

#[jni_fn("org.polars.scala.polars.internal.jni.io.write$")]
pub fn jsonBytes<'a>(
env: JNIEnv<'a>,
_object: JObject,
df_ptr: jlong,
pretty: jboolean,
row_oriented: jboolean,
) -> JPrimitiveArray<'a, i8> {
let buf = json_bytes(df_ptr, pretty, row_oriented);
env.byte_array_from_slice(&buf).unwrap()
}

fn json_bytes<'a>(df_ptr: jlong, pretty: jboolean, row_oriented: jboolean) -> Vec<u8> {
let j_df = unsafe { &mut *(df_ptr as *mut JDataFrame) };
let mut data_frame = j_df.to_owned().df;

let mut df = data_frame.as_single_chunk_par();

let mut buf: Vec<u8> = Vec::new();
match (pretty == 1, row_oriented == 1) {
(_, true) => JsonWriter::new(&mut buf)
.with_json_format(JsonFormat::Json)
.finish(&mut df),
(true, _) => serde_json::to_writer_pretty(&mut buf, &df)
.map_err(|e| polars_err!(ComputeError: "{e}")),
(false, _) => {
serde_json::to_writer(&mut buf, &df).map_err(|e| polars_err!(ComputeError: "{e}"))
},
}
.expect("Unable to format JSON");

buf
}
1 change: 1 addition & 0 deletions native/src/internal_jni/io/write/mod.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
pub mod avro;
pub mod ipc;
pub mod json;
pub mod parquet;