Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

do not materialize rows of spark DataSet until prompted #7393

Closed
scottdraves opened this issue May 21, 2018 · 2 comments
Closed

do not materialize rows of spark DataSet until prompted #7393

scottdraves opened this issue May 21, 2018 · 2 comments
Assignees

Comments

@scottdraves
Copy link
Contributor

scottdraves commented May 21, 2018

if a user returns a dataset, current spark integration shows a preview.
this triggers more work than users expect.
instead it should just show the shape of the table (column names and types) and have a button.
if the user clicks, then it collects the rows for a preview.

for example without our magic, spark does this by default with tornadoes_2014.csv:

[Date: string, Time: string ... 9 more fields]

we want that but with all the columns, not "9 more fields".

could do this with a table with 0 rows, and instead of the rows having a button that says to fetch some.
or could show it with text (one column per row), plus the button.

@scottdraves
Copy link
Contributor Author

If ds is a dataset, then ds.schema.fields has an array of structs with name&type.

@scottdraves
Copy link
Contributor Author

scottdraves commented May 29, 2018

Generate the preview like this:

def preview(df: org.apache.spark.sql.DataFrame) : TableDisplay = {
    return new TableDisplay(Seq(df.schema.fields.map { col => col.name -> col.dataType.typeName}.toMap))
}

Here's an example:
screen shot 2018-05-29 at 12 39 38 pm

above the table should have 2 buttons, one that says "show 10 rows" the other says "count rows".
if there is any difficulty, then skip the "count rows" button, we can address in another PR.

jaroslawmalekcodete added a commit that referenced this issue Jun 4, 2018
jaroslawmalekcodete added a commit that referenced this issue Jun 6, 2018
scottdraves pushed a commit that referenced this issue Jun 6, 2018
* #7393: add preview for dataset

* #7393: add count button

* #7393: row = 10

* #7393: show a warning "Note: showing a preview of a non-materialized Spark RDD"

* #7393: explicit return types for toJavaMap and toJavaCollection

* #7393: remove count button and change warning

* #7393: keep column order for Spark Datasets
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants