[SPARK-6990] [Build] Add Java linting script; fix minor warnings #9867

dskrvk · 2015-11-20T19:56:05Z

This replaces #9696

Invoke Checkstyle and print any errors to the console, failing the step.
Use Google's style rules modified according to
https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide
Some important checks are disabled (see TODOs in checkstyle.xml) due to
multiple violations being present in the codebase.

Suggest fixing those TODOs in a separate PR(s).

More on Checkstyle can be found on the official website.

Sample output (from build 46345) (duplicated because I run the build twice with different profiles):

Checkstyle checks failed at following occurrences:
[ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/UnsafeRowParquetRecordReader.java:217,7 MissingSwitchDefault: switch without "default" clause.
[ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java:198,10 ModifierOrder: 'protected' modifier out of order with the JLS suggestions.
[ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/UnsafeRowParquetRecordReader.java:217,7 MissingSwitchDefault: switch without "default" clause.
[ERROR] src/main/java/org/apache/spark/sql/execution/datasources/parquet/SpecificParquetRecordReaderBase.java:198,10 ModifierOrder: 'protected' modifier out of order with the JLS suggestions.
[error] running /home/jenkins/workspace/SparkPullRequestBuilder@2/dev/lint-java ; received return code 1

Also fix some of the minor violations that didn't require sweeping changes.

Apologies for the previous botched PRs - I finally figured out the issue.

cr: @JoshRosen, @pwendell

I state that the contribution is my original work, and I license the work to the project under the project's open source license.

Closing '>' and method name shouldn't have whitespace between them, according to http://checkstyle.sourceforge.net/config_whitespace.html#GenericWhitespace

Invoke Checkstyle and print any errors to the console, failing the step. Use Google's style rules modified according to https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide Some important checks are disabled (see TODOs in checkstyle.xml) due to multiple violations being present in the codebase.

The code was copied from a third-party source and needs to be in sync with that, so we shouldn't make our own modifications to it. The file contains some style violations, so suppress the checks.

This makes sure all case statements end with a break. See http://checkstyle.sourceforge.net/config_coding.html#FallThrough

The code was copied from a third-party source and needs to be in sync with that, so we shouldn't make our own modifications to it. The file contains some style violations, so suppress the checks.

Checks fails in UnsafeRowParquetRecordReader.java. Let's enable the check in a separate change.

The code was copied from a third-party source and needs to be in sync with that, so we shouldn't make our own modifications to it. The file contains some style violations, so suppress the checks.

This makes sure all case statements end with a break. See http://checkstyle.sourceforge.net/config_coding.html#FallThrough

Checks fails in UnsafeRowParquetRecordReader.java. Let's enable the check in a separate change.

JoshRosen · 2015-11-20T20:04:13Z

Jenkins, this is ok to test.

SparkQA · 2015-11-20T23:17:28Z

Test build #46433 has finished for PR 9867 at commit fd6d0e0.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):\n * public class JavaLDAExample\n * public abstract static class PrefixComputer\n

SparkQA · 2015-11-21T21:31:57Z

Test build #46482 has finished for PR 9867 at commit 7a49ad7.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):\n * public abstract static class PrefixComputer\n * abstract class Aggregator[-I, B, O] extends Serializable\n

dskrvk · 2015-11-21T21:36:38Z

Added some more commits so that new changes are in line with the style guide.

dskrvk · 2015-11-24T17:53:46Z

Ping? I've just merged the latest changes locally and verified the checks still pass.

rxin · 2015-11-25T18:47:34Z

I haven't looked at it super closely yet but I think this is definitely good to have!

SparkQA · 2015-11-25T18:54:43Z

Test build #46692 has finished for PR 9867 at commit c773e90.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):\n * public abstract static class PrefixComputer\n

dskrvk · 2015-12-02T15:38:54Z

Thanks @rxin. Would appreciate a "Ship it" on this (unless there are issues). Don't mean to whine, but the longer we wait, the bigger this PR becomes as I have to fix any new code that doesn't pass the checks. Right now the latest merge from upstream passes successfully.

Added some more details in the description.

JoshRosen · 2015-12-02T17:26:51Z

Jenkins, retest this please.

JoshRosen · 2015-12-02T17:27:22Z

Hey @dskrvk, sorry to let this slip through the cracks. I'm going to shepherd this today to try to get it merged.

JoshRosen · 2015-12-02T19:25:16Z

dev/lint-java

+SPARK_ROOT_DIR="$(dirname $SCRIPT_DIR)"
+
+$SCRIPT_DIR/../build/mvn -Pkinesis-asl -Phive -Phive-thriftserver checkstyle:check > checkstyle.txt
+$SCRIPT_DIR/../build/mvn -Pkinesis-asl -Pyarn -Phadoop-2.2 checkstyle:check >> checkstyle.txt


Quick question: why do you need to run twice with different profiles? AFAIK the set of source files should be the same under all of the Hadoop profiles, so I don't think we need to set -Phadoop-2.2 here.

Why can't we just use one Maven run with the profiles -Pkinesis-asl -Phive -Phive-thriftserver -Pyarn?

I didn't actually observe any differences between the two profiles in terms of Checkstyle warnings, but decided to add the second run just to be thorough.

My reasoning was that since some profiles omit some of the modules, we need to exercise all of the possible ones, even though at the moment the set of Java sources may be the same. In any case, this only adds a few seconds to the build - negligible compared to the overall run-tests time.

You should be able to have a single run with "-Pkinesis-asl -Pyarn -Phive -Phive-thriftserver" - I even think "-Phive" is unnecessary, I think it only affects packaging right now.

"-Phadoop2.2" is unnecessary, that's the default.

Good point; changed.

JoshRosen · 2015-12-02T19:27:33Z

Changes look good to me and ready to merge today; my only question concerns why we need to run Checkstyle twice with different sets of profiles.

SparkQA · 2015-12-02T19:33:15Z

Test build #47074 has finished for PR 9867 at commit c773e90.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):\n * public abstract static class PrefixComputer\n

vanzin · 2015-12-03T19:18:02Z

Minor, but there's a typo in the title: "mix" -> "fix".

Add -Pyarn to the first run so that all modules are covered at once. No need to execute checkstyle twice any more.

SparkQA · 2015-12-04T05:24:42Z

Test build #47183 has finished for PR 9867 at commit b079f29.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):\n * public abstract static class PrefixComputer\n

dskrvk · 2015-12-04T05:44:10Z

Jenkins, retest this please.

vanzin · 2015-12-04T17:39:51Z

retest this please

SparkQA · 2015-12-04T19:58:57Z

Test build #47201 has finished for PR 9867 at commit b079f29.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):\n * public abstract static class PrefixComputer\n

JoshRosen · 2015-12-04T20:02:34Z

LGTM, so I'm going to merge this into master. Thanks for being so patient, @dskrvk!

JoshRosen · 2015-12-04T20:06:03Z

Hey @dskrvk, what's your Apache JIRA username? I need it in order to assign the JIRA to you so that you are properly credit by our release-notes generation script.

dskrvk · 2015-12-04T21:00:09Z

Hey @JoshRosen, thanks, my handle is dskrvk there as well.

…ating many Scala List objects for deep expression trees ### What changes were proposed in this pull request? In some use cases with deep expression trees, the driver's heap shows many `scala.collection.immutable.$colon$colon` objects from the heap. The objects are allocated due to deep recursion in the `gatherCommutative` method which uses `flatmap` recursively. Each invocation of `flatmap` creates a new temporary Scala collection. Our claim is based on the following stack trace (>1K lines) of a thread in the driver below, truncated here for brevity: ``` "HiveServer2-Background-Pool: Thread-9867" #9867 daemon prio=5 os_prio=0 tid=0x00007f35080bf000 nid=0x33e7 runnable [0x00007f3393372000] java.lang.Thread.State: RUNNABLE at scala.collection.immutable.List$Appender$1.apply(List.scala:350) at scala.collection.immutable.List$Appender$1.apply(List.scala:341) at scala.collection.immutable.List.flatMap(List.scala:431) at org.apache.spark.sql.catalyst.expressions.CommutativeExpression.gatherCommutative(Expression.scala:1479) at org.apache.spark.sql.catalyst.expressions.CommutativeExpression.$anonfun$gatherCommutative$1(Expression.scala:1479) at org.apache.spark.sql.catalyst.expressions.CommutativeExpression$$Lambda$5280/143713747.apply(Unknown Source) at scala.collection.immutable.List.flatMap(List.scala:366) .... at org.apache.spark.sql.catalyst.expressions.CommutativeExpression.gatherCommutative(Expression.scala:1479) at org.apache.spark.sql.catalyst.expressions.CommutativeExpression.$anonfun$gatherCommutative$1(Expression.scala:1479) at org.apache.spark.sql.catalyst.expressions.CommutativeExpression$$Lambda$5280/143713747.apply(Unknown Source) at scala.collection.immutable.List.flatMap(List.scala:366) .... ``` This PR fixes the issue by using a stack-based iterative computation, completely avoiding the creation of temporary Scala objects. ### Why are the changes needed? Reduce heap usage of the driver ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing tests, refactor ### Was this patch authored or co-authored using generative AI tooling? No Closes #48481 from utkarsh39/SPARK-49977. Lead-authored-by: Utkarsh <utkarsh.agarwal@databricks.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

dskrvk added 21 commits November 20, 2015 13:38

[SPARK-6990] Remove whitespace after '>'

10465b5

Closing '>' and method name shouldn't have whitespace between them, according to http://checkstyle.sourceforge.net/config_whitespace.html#GenericWhitespace

[SPARK-6990] Fix some Checkstyle warnings

806ae31

[SPARK-6990] Suppress Checkstyle for TimSort

65db253

The code was copied from a third-party source and needs to be in sync with that, so we shouldn't make our own modifications to it. The file contains some style violations, so suppress the checks.

[SPARK-6990] Fix some Checkstyle issues in tests

4d7c602

[SPARK-6990] Enable Checkstyle for tests

7ac1bc5

[SPARK-6990] Enable FallThrough check in Checkstyle

74dc8df

This makes sure all case statements end with a break. See http://checkstyle.sourceforge.net/config_coding.html#FallThrough

[SPARK-6990] Fix some Checkstyle warnings

7a88ac2

[SPARK-6990] Suppress Checkstyle for TimSort

26f6998

The code was copied from a third-party source and needs to be in sync with that, so we shouldn't make our own modifications to it. The file contains some style violations, so suppress the checks.

[SPARK-6990] Fix some Checkstyle warnings

9df236e

[SPARK-6990] Suppress Checkstyle for TimSort

338f9ae

The code was copied from a third-party source and needs to be in sync with that, so we shouldn't make our own modifications to it. The file contains some style violations, so suppress the checks.

[SPARK-6990] Disable MissingSwitchDefault check

2ff1c35

Checks fails in UnsafeRowParquetRecordReader.java. Let's enable the check in a separate change.

[SPARK-6990] Fix qualifier order in SpecificParquetRecordReaderBase

55948e0

[SPARK-6990] Fix some Checkstyle warnings

1a57e30

[SPARK-6990] Suppress Checkstyle for TimSort

59c99c9

The code was copied from a third-party source and needs to be in sync with that, so we shouldn't make our own modifications to it. The file contains some style violations, so suppress the checks.

[SPARK-6990] Fix some Checkstyle issues in tests

d33ed8d

[SPARK-6990] Enable Checkstyle for tests

7d5cbe1

[SPARK-6990] Enable FallThrough check in Checkstyle

b96a61a

This makes sure all case statements end with a break. See http://checkstyle.sourceforge.net/config_coding.html#FallThrough

[SPARK-6990] Disable MissingSwitchDefault check

db17e0e

Checks fails in UnsafeRowParquetRecordReader.java. Let's enable the check in a separate change.

[SPARK-6990] Fix qualifier order in SpecificParquetRecordReaderBase

ae30c6a

Merge remote-tracking branch 'upstream/master'

fd6d0e0

dskrvk added 2 commits November 20, 2015 16:58

Merge remote-tracking branch 'upstream/master'

a9b0393

Merge remote-tracking branch 'upstream/master'

4c27a50

dskrvk added 3 commits November 21, 2015 13:27

Merge remote-tracking branch 'upstream/master'

ce44d2e

Merge branch 'master' into spark-6990

b97a675

[SPARK-6990] Fix Checkstyle issues in ML examples

7a49ad7

dskrvk added 2 commits November 24, 2015 10:41

Merge remote-tracking branch 'upstream/master'

ba90836

Merge remote-tracking branch 'upstream/master'

5b4c7eb

dskrvk added 2 commits November 25, 2015 10:09

Merge remote-tracking branch 'upstream/master'

1b1a5ab

Merge remote-tracking branch 'upstream/master'

c773e90

dskrvk added 3 commits November 29, 2015 21:52

Merge remote-tracking branch 'upstream/master'

c74343b

Merge remote-tracking branch 'upstream/master'

b3a192d

Merge remote-tracking branch 'upstream/master'

520e6d2

JoshRosen reviewed Dec 2, 2015
View reviewed changes

dskrvk changed the title ~~[SPARK-6990] [Build] Add Java linting script; mix minor warnings~~ [SPARK-6990] [Build] Add Java linting script; fix minor warnings Dec 4, 2015

dskrvk added 2 commits December 3, 2015 22:05

Merge remote-tracking branch 'upstream/master'

f58340c

[SPARK-6990] Remove second run of mvn checkstyle

b079f29

Add -Pyarn to the first run so that all modules are covered at once. No need to execute checkstyle twice any more.

asfgit closed this in d0d8222 Dec 4, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-6990] [Build] Add Java linting script; fix minor warnings #9867

[SPARK-6990] [Build] Add Java linting script; fix minor warnings #9867

dskrvk commented Nov 20, 2015

JoshRosen commented Nov 20, 2015

SparkQA commented Nov 20, 2015

SparkQA commented Nov 21, 2015

dskrvk commented Nov 21, 2015

dskrvk commented Nov 24, 2015

rxin commented Nov 25, 2015

SparkQA commented Nov 25, 2015

dskrvk commented Dec 2, 2015

JoshRosen commented Dec 2, 2015

JoshRosen commented Dec 2, 2015

JoshRosen Dec 2, 2015

dskrvk Dec 2, 2015

vanzin Dec 3, 2015

dskrvk Dec 4, 2015

JoshRosen commented Dec 2, 2015

SparkQA commented Dec 2, 2015

vanzin commented Dec 3, 2015

SparkQA commented Dec 4, 2015

dskrvk commented Dec 4, 2015

vanzin commented Dec 4, 2015

SparkQA commented Dec 4, 2015

JoshRosen commented Dec 4, 2015

JoshRosen commented Dec 4, 2015

dskrvk commented Dec 4, 2015

[SPARK-6990] [Build] Add Java linting script; fix minor warnings #9867

[SPARK-6990] [Build] Add Java linting script; fix minor warnings #9867

Conversation

dskrvk commented Nov 20, 2015

JoshRosen commented Nov 20, 2015

SparkQA commented Nov 20, 2015

SparkQA commented Nov 21, 2015

dskrvk commented Nov 21, 2015

dskrvk commented Nov 24, 2015

rxin commented Nov 25, 2015

SparkQA commented Nov 25, 2015

dskrvk commented Dec 2, 2015

JoshRosen commented Dec 2, 2015

JoshRosen commented Dec 2, 2015

JoshRosen Dec 2, 2015

Choose a reason for hiding this comment

dskrvk Dec 2, 2015

Choose a reason for hiding this comment

vanzin Dec 3, 2015

Choose a reason for hiding this comment

dskrvk Dec 4, 2015

Choose a reason for hiding this comment

JoshRosen commented Dec 2, 2015

SparkQA commented Dec 2, 2015

vanzin commented Dec 3, 2015

SparkQA commented Dec 4, 2015

dskrvk commented Dec 4, 2015

vanzin commented Dec 4, 2015

SparkQA commented Dec 4, 2015

JoshRosen commented Dec 4, 2015

JoshRosen commented Dec 4, 2015

dskrvk commented Dec 4, 2015