Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-6990] [Build] Add Java linting script; fix minor warnings #9867

Closed
wants to merge 38 commits into from

Conversation

dskrvk
Copy link
Contributor

@dskrvk dskrvk commented Nov 20, 2015

This replaces #9696

Invoke Checkstyle and print any errors to the console, failing the step.
Use Google's style rules modified according to
https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide
Some important checks are disabled (see TODOs in checkstyle.xml) due to
multiple violations being present in the codebase.

Suggest fixing those TODOs in a separate PR(s).

More on Checkstyle can be found on the official website.

Sample output (from build 46345) (duplicated because I run the build twice with different profiles):

Checkstyle checks failed at following occurrences:
[ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/UnsafeRowParquetRecordReader.java:217,7 MissingSwitchDefault: switch without "default" clause.
[ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java:198,10 ModifierOrder: 'protected' modifier out of order with the JLS suggestions.
[ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/UnsafeRowParquetRecordReader.java:217,7 MissingSwitchDefault: switch without "default" clause.
[ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java:198,10 ModifierOrder: 'protected' modifier out of order with the JLS suggestions.
[error] running /home/jenkins/workspace/SparkPullRequestBuilder@2/dev/lint-java ; received return code 1

Also fix some of the minor violations that didn't require sweeping changes.

Apologies for the previous botched PRs - I finally figured out the issue.

cr: @JoshRosen, @pwendell

I state that the contribution is my original work, and I license the work to the project under the project's open source license.

Closing '>' and method name shouldn't have whitespace between them, according
to http://checkstyle.sourceforge.net/config_whitespace.html#GenericWhitespace
Invoke Checkstyle and print any errors to the console, failing the step.
Use Google's style rules modified according to
https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide
Some important checks are disabled (see TODOs in checkstyle.xml) due to
multiple violations being present in the codebase.
The code was copied from a third-party source and needs to be in sync with
that, so we shouldn't make our own modifications to it.
The file contains some style violations, so suppress the checks.
The code was copied from a third-party source and needs to be in sync with
that, so we shouldn't make our own modifications to it.
The file contains some style violations, so suppress the checks.
The code was copied from a third-party source and needs to be in sync with
that, so we shouldn't make our own modifications to it.
The file contains some style violations, so suppress the checks.
Checks fails in UnsafeRowParquetRecordReader.java. Let's enable the check in a
separate change.
The code was copied from a third-party source and needs to be in sync with
that, so we shouldn't make our own modifications to it.
The file contains some style violations, so suppress the checks.
Checks fails in UnsafeRowParquetRecordReader.java. Let's enable the check in a
separate change.
@JoshRosen
Copy link
Contributor

Jenkins, this is ok to test.

@SparkQA
Copy link

SparkQA commented Nov 20, 2015

Test build #46433 has finished for PR 9867 at commit fd6d0e0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * public class JavaLDAExample\n * public abstract static class PrefixComputer\n

@SparkQA
Copy link

SparkQA commented Nov 21, 2015

Test build #46482 has finished for PR 9867 at commit 7a49ad7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * public abstract static class PrefixComputer\n * abstract class Aggregator[-I, B, O] extends Serializable\n

@dskrvk
Copy link
Contributor Author

dskrvk commented Nov 21, 2015

Added some more commits so that new changes are in line with the style guide.

@dskrvk
Copy link
Contributor Author

dskrvk commented Nov 24, 2015

Ping? I've just merged the latest changes locally and verified the checks still pass.

@rxin
Copy link
Contributor

rxin commented Nov 25, 2015

I haven't looked at it super closely yet but I think this is definitely good to have!

@SparkQA
Copy link

SparkQA commented Nov 25, 2015

Test build #46692 has finished for PR 9867 at commit c773e90.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * public abstract static class PrefixComputer\n

@dskrvk
Copy link
Contributor Author

dskrvk commented Dec 2, 2015

Thanks @rxin. Would appreciate a "Ship it" on this (unless there are issues). Don't mean to whine, but the longer we wait, the bigger this PR becomes as I have to fix any new code that doesn't pass the checks. Right now the latest merge from upstream passes successfully.

Added some more details in the description.

@JoshRosen
Copy link
Contributor

Jenkins, retest this please.

@JoshRosen
Copy link
Contributor

Hey @dskrvk, sorry to let this slip through the cracks. I'm going to shepherd this today to try to get it merged.

SPARK_ROOT_DIR="$(dirname $SCRIPT_DIR)"

$SCRIPT_DIR/../build/mvn -Pkinesis-asl -Phive -Phive-thriftserver checkstyle:check > checkstyle.txt
$SCRIPT_DIR/../build/mvn -Pkinesis-asl -Pyarn -Phadoop-2.2 checkstyle:check >> checkstyle.txt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick question: why do you need to run twice with different profiles? AFAIK the set of source files should be the same under all of the Hadoop profiles, so I don't think we need to set -Phadoop-2.2 here.

Why can't we just use one Maven run with the profiles -Pkinesis-asl -Phive -Phive-thriftserver -Pyarn?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't actually observe any differences between the two profiles in terms of Checkstyle warnings, but decided to add the second run just to be thorough.

My reasoning was that since some profiles omit some of the modules, we need to exercise all of the possible ones, even though at the moment the set of Java sources may be the same. In any case, this only adds a few seconds to the build - negligible compared to the overall run-tests time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to have a single run with "-Pkinesis-asl -Pyarn -Phive -Phive-thriftserver" - I even think "-Phive" is unnecessary, I think it only affects packaging right now.

"-Phadoop2.2" is unnecessary, that's the default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point; changed.

@JoshRosen
Copy link
Contributor

Changes look good to me and ready to merge today; my only question concerns why we need to run Checkstyle twice with different sets of profiles.

@SparkQA
Copy link

SparkQA commented Dec 2, 2015

Test build #47074 has finished for PR 9867 at commit c773e90.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * public abstract static class PrefixComputer\n

@vanzin
Copy link
Contributor

vanzin commented Dec 3, 2015

Minor, but there's a typo in the title: "mix" -> "fix".

@dskrvk dskrvk changed the title [SPARK-6990] [Build] Add Java linting script; mix minor warnings [SPARK-6990] [Build] Add Java linting script; fix minor warnings Dec 4, 2015
Add -Pyarn to the first run so that all modules are covered at once. No need to
execute checkstyle twice any more.
@SparkQA
Copy link

SparkQA commented Dec 4, 2015

Test build #47183 has finished for PR 9867 at commit b079f29.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * public abstract static class PrefixComputer\n

@dskrvk
Copy link
Contributor Author

dskrvk commented Dec 4, 2015

Jenkins, retest this please.

@vanzin
Copy link
Contributor

vanzin commented Dec 4, 2015

retest this please

@SparkQA
Copy link

SparkQA commented Dec 4, 2015

Test build #47201 has finished for PR 9867 at commit b079f29.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * public abstract static class PrefixComputer\n

@JoshRosen
Copy link
Contributor

LGTM, so I'm going to merge this into master. Thanks for being so patient, @dskrvk!

@asfgit asfgit closed this in d0d8222 Dec 4, 2015
@JoshRosen
Copy link
Contributor

Hey @dskrvk, what's your Apache JIRA username? I need it in order to assign the JIRA to you so that you are properly credit by our release-notes generation script.

@dskrvk
Copy link
Contributor Author

dskrvk commented Dec 4, 2015

Hey @JoshRosen, thanks, my handle is dskrvk there as well.

cloud-fan added a commit that referenced this pull request Oct 17, 2024
…ating many Scala List objects for deep expression trees

### What changes were proposed in this pull request?
In some use cases with deep expression trees, the driver's heap shows many `scala.collection.immutable.$colon$colon` objects from the heap. The objects are allocated due to deep recursion in the `gatherCommutative` method which uses `flatmap` recursively. Each invocation of `flatmap` creates a new temporary Scala collection. Our claim is based on the following stack trace (>1K lines) of a thread in the driver below, truncated here for brevity:
```
"HiveServer2-Background-Pool: Thread-9867" #9867 daemon prio=5 os_prio=0 tid=0x00007f35080bf000 nid=0x33e7 runnable [0x00007f3393372000]
   java.lang.Thread.State: RUNNABLE
   	at scala.collection.immutable.List$Appender$1.apply(List.scala:350)
	at scala.collection.immutable.List$Appender$1.apply(List.scala:341)
	at scala.collection.immutable.List.flatMap(List.scala:431)
	at org.apache.spark.sql.catalyst.expressions.CommutativeExpression.gatherCommutative(Expression.scala:1479)
	at org.apache.spark.sql.catalyst.expressions.CommutativeExpression.$anonfun$gatherCommutative$1(Expression.scala:1479)
	at org.apache.spark.sql.catalyst.expressions.CommutativeExpression$$Lambda$5280/143713747.apply(Unknown Source)
	at scala.collection.immutable.List.flatMap(List.scala:366)
....
	at org.apache.spark.sql.catalyst.expressions.CommutativeExpression.gatherCommutative(Expression.scala:1479)
	at org.apache.spark.sql.catalyst.expressions.CommutativeExpression.$anonfun$gatherCommutative$1(Expression.scala:1479)
	at org.apache.spark.sql.catalyst.expressions.CommutativeExpression$$Lambda$5280/143713747.apply(Unknown Source)
	at scala.collection.immutable.List.flatMap(List.scala:366)
....
```

This PR fixes the issue by using a stack-based iterative computation, completely avoiding the creation of temporary Scala objects.

### Why are the changes needed?

Reduce heap usage of the driver

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing tests, refactor

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #48481 from utkarsh39/SPARK-49977.

Lead-authored-by: Utkarsh <utkarsh.agarwal@databricks.com>
Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants