Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-26191][SQL] Control truncation of Spark plans via maxFields parameter #23159

Closed
wants to merge 9 commits into from

Conversation

MaxGekk
Copy link
Member

@MaxGekk MaxGekk commented Nov 27, 2018

What changes were proposed in this pull request?

In the PR, I propose to add maxFields parameter to all functions involved in creation of textual representation of spark plans such as simpleString and verboseString. New parameter restricts number of fields converted to truncated strings. Any elements beyond the limit will be dropped and replaced by a "... N more fields" placeholder. The threshold is bumped up to Int.MaxValue for toFile().

How was this patch tested?

Added a test to QueryExecutionSuite which checks maxFields impacts on number of truncated fields in LocalRelation.

@MaxGekk
Copy link
Member Author

MaxGekk commented Nov 27, 2018

ping @hvanhovell

@SparkQA
Copy link

SparkQA commented Nov 28, 2018

Test build #99342 has finished for PR 23159 at commit e9e7893.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Nov 28, 2018

@gatorsmile @cloud-fan Could you look at the changes - extracted from another PR: #22429

@SparkQA
Copy link

SparkQA commented Nov 30, 2018

Test build #99513 has finished for PR 23159 at commit bf968be.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Nov 30, 2018

Test build #99520 has finished for PR 23159 at commit a5e82f7.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Nov 30, 2018

jenkins, retest this, please

@SparkQA
Copy link

SparkQA commented Dec 1, 2018

Test build #99527 has finished for PR 23159 at commit a5e82f7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Dec 4, 2018

@HyukjinKwon @dongjoon-hyun @srowen @zsxwing Do you have any objections of this PR?

@srowen
Copy link
Member

srowen commented Dec 4, 2018

Rather than change every single call to this method, if this should generally be the value of the argument, then why not make it the default value or something?

@SparkQA
Copy link

SparkQA commented Dec 4, 2018

Test build #99678 has finished for PR 23159 at commit b6fa959.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Dec 4, 2018

Rather than change every single call to this method, if this should generally be the value of the argument, then why not make it the default value or something?

New parameter aims to solve the problem when there are multiple callers, and each of them needs different maximum fields. So, a feasible approach is to propagate maxFields from callers to truncatedString. Changing global SQL config does not solve the problem.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK so not every call would just pass the value SQLConf.get.maxToStringFields? It looked like it from the code here but I didn't examine the whole diff. If it were really always SQLConf.get.maxToStringFields and configured that way this could be simpler, but I suppose it isn't.

@@ -1777,7 +1777,7 @@ class Analyzer(

case p if p.expressions.exists(hasGenerator) =>
throw new AnalysisException("Generators are not supported outside the SELECT clause, but " +
"got: " + p.simpleString)
"got: " + p.simpleString((SQLConf.get.maxToStringFields)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: are there extra parens here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed here and other places

@SparkQA
Copy link

SparkQA commented Dec 5, 2018

Test build #99715 has finished for PR 23159 at commit e0aa626.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Dec 5, 2018

jenkins, retest this, please

@SparkQA
Copy link

SparkQA commented Dec 5, 2018

Test build #99735 has finished for PR 23159 at commit e0aa626.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dongjoon-hyun
Copy link
Member

cc @cloud-fan and @gatorsmile .

@HyukjinKwon
Copy link
Member

Looks okay but let me leave it to @cloud-fan, @gatorsmile and @hvanhovell

@MaxGekk
Copy link
Member Author

MaxGekk commented Dec 11, 2018

@hvanhovell We all are waiting for your decision ;-). Please, review the PR.

@SparkQA
Copy link

SparkQA commented Dec 21, 2018

Test build #100349 has finished for PR 23159 at commit 9acdf9d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@MaxGekk
Copy link
Member Author

MaxGekk commented Dec 22, 2018

I am going to close the PR since nobody interested in it :-(

@hvanhovell
Copy link
Contributor

@MaxGekk can you resolve the merge conflicts?

@hvanhovell
Copy link
Contributor

LGTM

@SparkQA
Copy link

SparkQA commented Dec 24, 2018

Test build #100425 has finished for PR 23159 at commit c6ad275.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • public class ShuffleMetrics implements MetricSet
  • sealed abstract class UserDefinedFunction

@hvanhovell
Copy link
Contributor

merging to master.

@asfgit asfgit closed this in a1c1dd3 Dec 27, 2018
holdenk pushed a commit to holdenk/spark that referenced this pull request Jan 5, 2019
…rameter

## What changes were proposed in this pull request?

In the PR, I propose to add `maxFields` parameter to all functions involved in creation of textual representation of spark plans such as `simpleString` and `verboseString`. New parameter restricts number of fields converted to truncated strings. Any elements beyond the limit will be dropped and replaced by a `"... N more fields"` placeholder. The threshold is bumped up to `Int.MaxValue` for `toFile()`.

## How was this patch tested?

Added a test to `QueryExecutionSuite` which checks `maxFields` impacts on number of truncated fields in `LocalRelation`.

Closes apache#23159 from MaxGekk/to-file-max-fields.

Lead-authored-by: Maxim Gekk <max.gekk@gmail.com>
Co-authored-by: Maxim Gekk <maxim.gekk@databricks.com>
Signed-off-by: Herman van Hovell <hvanhovell@databricks.com>
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
…rameter

## What changes were proposed in this pull request?

In the PR, I propose to add `maxFields` parameter to all functions involved in creation of textual representation of spark plans such as `simpleString` and `verboseString`. New parameter restricts number of fields converted to truncated strings. Any elements beyond the limit will be dropped and replaced by a `"... N more fields"` placeholder. The threshold is bumped up to `Int.MaxValue` for `toFile()`.

## How was this patch tested?

Added a test to `QueryExecutionSuite` which checks `maxFields` impacts on number of truncated fields in `LocalRelation`.

Closes apache#23159 from MaxGekk/to-file-max-fields.

Lead-authored-by: Maxim Gekk <max.gekk@gmail.com>
Co-authored-by: Maxim Gekk <maxim.gekk@databricks.com>
Signed-off-by: Herman van Hovell <hvanhovell@databricks.com>
@MaxGekk MaxGekk deleted the to-file-max-fields branch August 17, 2019 13:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants