[SPARK-4205][SQL] Timestamp and Date classes which work in the catalyst DSL. #3066

culler · 2014-11-03T03:37:39Z

These Timestamp and Date classes support the usual comparison operators, so they can be used in catalyst Expressions. They come with implicit conversions to and from the underlying Java classes. They also have a convenient initializer which accepts a String.

Additional implicit conversions fix an issue with the DSL where expressions like 0 < 'x are not recognized.

operators, as well as implicit conversions to support using these classes in the catalalyst DSL. This commit also adds a method to Row which builds a row from a schema and a list of strings.

AmplabJenkins · 2014-11-03T03:42:10Z

Can one of the admins verify this patch?

rxin · 2014-11-03T05:33:50Z

@liancheng - can you take a look at this first? Thanks.

@culler - can you create a JIRA ticket at https://issues.apache.org/jira/browse/SPARK and then update the pull request title to include the JIRA ticket number? e.g. "[SPARK-9999][SQL] DSL support for TimeStamp & Date"

Author: wangfei <wangfei1@huawei.com> Closes #3042 from scwf/patch-9 and squashes the following commits: 3784ed1 [wangfei] remove 'TODO' 1891553 [wangfei] update build doc since JDBC/CLI support hive 13

liancheng · 2014-11-03T06:05:57Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala

+      def >= (other: Symbol):Expression = {literal >= other}
+      def === (other: Symbol):Expression = {literal === other}
+      def <=> (other: Symbol):Expression = {literal <=> other}
+      def !== (other: Symbol):Expression = {literal !== other}


Please add a space after :, and remove the braces.

This is a PR to send the fetch failure message back to Web UI. Before: ![f1](https://cloud.githubusercontent.com/assets/1000778/4856595/1f036c80-60be-11e4-956f-335147fbccb7.png) ![f2](https://cloud.githubusercontent.com/assets/1000778/4856596/1f11cbea-60be-11e4-8fe9-9f9b2b35c884.png) After (Please ignore the meaning of exception, I threw it in the code directly because it's hard to simulate a fetch failure): ![e1](https://cloud.githubusercontent.com/assets/1000778/4856600/2657ea38-60be-11e4-9f2d-d56c5f900f10.png) ![e2](https://cloud.githubusercontent.com/assets/1000778/4856601/26595008-60be-11e4-912b-2744af786991.png) Author: zsxwing <zsxwing@gmail.com> Closes #3032 from zsxwing/SPARK-4163 and squashes the following commits: f7e1faf [zsxwing] Discard changes for FetchFailedException and minor modification 4e946f7 [zsxwing] Add e as the cause of SparkException 316767d [zsxwing] Add private[storage] to FetchResult d51b0b6 [zsxwing] Set e as the cause of FetchFailedException b88c919 [zsxwing] Use 'private[storage]' for case classes instead of 'sealed' 62103fd [zsxwing] Update as per review 0c07d1f [zsxwing] Backward-compatible support a3bca65 [zsxwing] Send the fetch failure message back to Web UI

liancheng · 2014-11-03T07:34:25Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/dsl/package.scala

+     * where a literal is being combined with a symbol. Without these an
+     * expression such as 0 < 'x is not recognized.
+     */
+    implicit class InitialLiteral(x: Any) {


How about LhsLiteral (left hand side literal)? Also, converting Any to this class implicitly seems a bit dangerous. I'd prefer this style:

case class LhsLiteral(x: Any) { ... } // Similar to all the xxToLiteral implicit conversions: implicit booleanToLhsLiteral(b: Boolean) = new LhsLiteral(b) implicit byteToLhsLiteral(b: Byte) = new LhsLiteral(b) ...

liancheng · 2014-11-03T07:43:28Z

Hey @culler, in general I like the DSL improvements, thanks for the work! Left some in-line comments, and two additional hight level comments here:

Usually we encourage one PR for one code change. It's harder to track and review if developers mix several unrelated or loosely related changes into a single PR.
Please always add sufficient tests for changes made in the PR unless they are trivial (e.g. renaming or styling changes).

liancheng · 2014-11-03T08:29:59Z

sql/core/src/main/scala/org/apache/spark/sql/package.scala

+   * use in DSL expressions.  Implicit conversions to java.sql.Date
+   * are provided.  The class intializer accepts a String, e.g.
+   *
+   * val ts = Date("2014-01-01")


Please use proper Scaladoc format here, i.e.:

{{{ val ts = Date("2014-01-01") }}}

And more thanks for reading the code so carefully and for all of the thoughtful comments and helpful advice. I am pushing fixes.

liancheng · 2014-11-03T11:55:58Z

test this please

SparkQA · 2014-11-03T11:59:49Z

Test build #22811 has started for PR 3066 at commit 0c389a7.

This patch merges cleanly.

SparkQA · 2014-11-03T12:02:46Z

Test build #22811 has finished for PR 3066 at commit 0c389a7.

This patch fails to build.
This patch merges cleanly.
This patch adds the following public classes (experimental):
- implicit class InitialLiteral(x: Any)
- final class MutableDate extends MutableValue
- final class MutableTimestamp extends MutableValue
- class Date(milliseconds: Long) extends JDate(milliseconds)
- class Timestamp(milliseconds: Long) extends JTimestamp(milliseconds)

AmplabJenkins · 2014-11-03T12:02:47Z

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22811/
Test FAILed.

…l comments.

culler · 2014-11-03T14:53:59Z

I have corrected the issues pointed out in the comments. Please retest.

culler · 2014-11-03T15:51:54Z

Hmmm. The GitHub docs seem to suggest that I am supposed to mention @liancheng and @rxin in my comments. So I am doing that. My apologies if this is superfluous or annoying. - @culler

We reference a specific branch in two places. This patch makes it one place. Author: Nicholas Chammas <nicholas.chammas@gmail.com> Closes #3008 from nchammas/mesos-spark-ec2-branch and squashes the following commits: 10a6089 [Nicholas Chammas] factor out mess spark-ec2 branch

culler · 2014-11-06T19:29:08Z

The file sql/core/src/test/scala/org/apache/spark/sql/UserDefinedTypeSuite.scala was added after this PR, and caused a test to fail due to the collision between === operators when I tested my changes against the current master branch. I added the whole file to this PR (with explicit conversions for === added to it) hoping to avoid a test failure. Now I see that merge conflicts are being reported. Not sure how to handle this.

This was added by me in 61a5cce. The real fix will be added in [SPARK-4281](https://issues.apache.org/jira/browse/SPARK-4281). Author: Andrew Or <andrew@databricks.com> Closes #3145 from andrewor14/fix-make-distribution and squashes the following commits: c78be61 [Andrew Or] Hot fix make distribution

I did not realize there was a `network.util.JavaUtils` when I wrote this code. This PR moves the `ByteBuffer` string conversion to the appropriate place. I tested the changes on a stable yarn cluster. Author: Andrew Or <andrew@databricks.com> Closes #3144 from andrewor14/yarn-shuffle-util and squashes the following commits: b6c08bf [Andrew Or] Remove unused import 94e205c [Andrew Or] Use netty Unpooled 85202a5 [Andrew Or] Use guava Charsets 057135b [Andrew Or] Reword comment adf186d [Andrew Or] Move byte buffer String conversion logic to JavaUtils

Author: Aaron Davidson <aaron@databricks.com> Closes #3142 from aarondav/worker and squashes the following commits: 3780bd7 [Aaron Davidson] Address comments 2dcdfc1 [Aaron Davidson] Add private[worker] 47f49d3 [Aaron Davidson] NettyBlockTransferService shouldn't care about app ids (it's only b/t executors) 258417c [Aaron Davidson] [SPARK-4277] Support external shuffle service on executor

This adds a RetryingBlockFetcher to the NettyBlockTransferService which is wrapped around our typical OneForOneBlockFetcher, adding retry logic in the event of an IOException. This sort of retry allows us to avoid marking an entire executor as failed due to garbage collection or high network load. TODO: - [x] unit tests - [x] put in ExternalShuffleClient too Author: Aaron Davidson <aaron@databricks.com> Closes #3101 from aarondav/retry and squashes the following commits: 72a2a32 [Aaron Davidson] Add that we should remove the condition around the retry thingy c7fd107 [Aaron Davidson] Fix unit tests e80e4c2 [Aaron Davidson] Address initial comments 6f594cd [Aaron Davidson] Fix unit test 05ff43c [Aaron Davidson] Add to external shuffle client and add unit test 66e5a24 [Aaron Davidson] [SPARK-4238] [Core] Perform network-level retry of shuffle file fetches

This relies on a hook from whoever is hosting the shuffle service to invoke removeApplication() when the application is completed. Once invoked, we will clean up all the executors' shuffle directories we know about. Author: Aaron Davidson <aaron@databricks.com> Closes #3126 from aarondav/cleanup and squashes the following commits: 33a64a9 [Aaron Davidson] Missing brace e6e428f [Aaron Davidson] Address comments 16a0d27 [Aaron Davidson] Cleanup e4df3e7 [Aaron Davidson] [SPARK-4236] Cleanup removed applications' files in shuffle service

… inner exceptions and make the error information in Web UI more friendly This PR fixed `Utils.exceptionString` to output the full exception information. However, the stack trace may become very huge, so I also updated the Web UI to collapse the error information by default (display the first line and clicking `+detail` will display the full info). Here are the screenshots: Stages: ![stages](https://cloud.githubusercontent.com/assets/1000778/4882441/66d8cc68-6356-11e4-8346-6318677d9470.png) Details for one stage: ![stage](https://cloud.githubusercontent.com/assets/1000778/4882513/1311043c-6357-11e4-8804-ca14240a9145.png) The full information in the gray text field is: ```Java org.apache.spark.shuffle.FetchFailedException: Connection reset by peer at org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:67) at org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:83) at org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:83) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:129) at org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:160) at org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:159) at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) at org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:159) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) at org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) at org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:189) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198) at sun.nio.ch.IOUtil.read(IOUtil.java:166) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245) at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311) at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881) at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:225) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116) ... 1 more ``` /cc aarondav Author: zsxwing <zsxwing@gmail.com> Closes #3073 from zsxwing/SPARK-4204 and squashes the following commits: 176d1e3 [zsxwing] Add comments to explain the stack trace difference ca509d3 [zsxwing] Add fullStackTrace to the constructor of ExceptionFailure a07057b [zsxwing] Core style fix dfb0032 [zsxwing] Backward compatibility for old history server 1e50f71 [zsxwing] Update as per review and increase the max height of the stack trace details 94f2566 [zsxwing] Change Utils.exceptionString to contain the inner exceptions and make the error information in Web UI more friendly

operators, as well as implicit conversions to support using these classes in the catalalyst DSL. This commit also adds a method to Row which builds a row from a schema and a list of strings.

…l comments.

e.g. Add(1, 'x).

… for Date and Timestamp. Added Date and Timestamp to NativeType.defaultSizeOf.

conversion (due to the existence of an === operator), applied the transformation assert(X == Y) --> assert(convertToEqualizer(X).===(Y))

in the code generator, and this PR is already too big. Now passes the standard test suite (except SparkSubmitSuite, which I think is unrelated to these changes.)

aliases for RichDate and RichTimestamp when importing an SQLContext.

culler · 2014-11-07T15:16:35Z

Hi @liancheng , now that I have completely screwed up this PR by attempting to rebase the repository, I will close it and open a new one which will hopefully be clean.

liancheng · 2014-11-08T05:16:20Z

@culler Don't worry, I guess every contributor had screwed up rebasing at least once :)

culler · 2014-11-08T13:15:23Z

Thanks! I'm proud to be a member of the club. :^)

On Fri, Nov 7, 2014 at 11:16 PM, Cheng Lian notifications@github.com
wrote:

@culler https://github.com/culler Don't worry, I guess every
contributor had screwed up rebasing at least once :)

—
Reply to this email directly or view it on GitHub
#3066 (comment).

culler added 2 commits November 2, 2014 20:57

Adding Timestamp and Date classes which support the standard comparison

495eacd

operators, as well as implicit conversions to support using these classes in the catalalyst DSL. This commit also adds a method to Row which builds a row from a schema and a list of strings.

Correcting a typo in the documentation.

0c389a7

rxin and others added 2 commits November 2, 2014 21:56

Close #2971.

d6e4c59

[SPARK-4177][Doc]update build doc since JDBC/CLI support hive 13 now

001acc4

Author: wangfei <wangfei1@huawei.com> Closes #3042 from scwf/patch-9 and squashes the following commits: 3784ed1 [wangfei] remove 'TODO' 1891553 [wangfei] update build doc since JDBC/CLI support hive 13

liancheng reviewed Nov 3, 2014
View reviewed changes

Correcting the bugs and issues pointed out in liancheng's very helpfu…

7c8b2c0

…l comments.

culler changed the title ~~Timestamp and Date classes which work in the catalyst DSL.~~ [SPARK-4205][SQL] Timestamp and Date classes which work in the catalyst DSL. Nov 3, 2014

Adding a test which appeared after the PR and uses assert(X === Y).

0d2e389

Andrew Or and others added 23 commits November 6, 2014 15:31

Removing new test until I rebase the repository.

c3f9ce1

Adding Timestamp and Date classes which support the standard comparison

908fc6a

operators, as well as implicit conversions to support using these classes in the catalalyst DSL. This commit also adds a method to Row which builds a row from a schema and a list of strings.

Correcting a typo in the documentation.

dc2ed72

Correcting the bugs and issues pointed out in liancheng's very helpfu…

b003619

…l comments.

Make implicit conversions for Literal op Symbol return a specific type,

a006ddb

e.g. Add(1, 'x).

Reversed random line permutations. Eliminated all getters and setters…

1935289

… for Date and Timestamp. Added Date and Timestamp to NativeType.defaultSizeOf.

A couple more pointless changes undone.

d59e5d9

In tests where the scalatest assert conversion collides with the new DSL

43406fe

conversion (due to the existence of an === operator), applied the transformation assert(X == Y) --> assert(convertToEqualizer(X).===(Y))

Removed Date and Timestamp from NativeTypes as this would force changes

c304b16

in the code generator, and this PR is already too big. Now passes the standard test suite (except SparkSubmitSuite, which I think is unrelated to these changes.)

Added tests for the features in this PR. Added Date and Timestamp as

32df474

aliases for RichDate and RichTimestamp when importing an SQLContext.

Removed accidentlay extraneous import from Row.scala.

d52e6d7

... and removed another extraneous import.

45f9478

Tiny style issue.

f126042

Cleaning up comments.

2ec6f6b

One last comment clarification.

3a14915

One more test to fix after rebasing.

d0a27ab

Rebasing PR to current master.

098bb5d

culler closed this Nov 7, 2014

culler mentioned this pull request Nov 7, 2014

[SPARK-4205][SQL] Timestamp and Date with comparisons #3158

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-4205][SQL] Timestamp and Date classes which work in the catalyst DSL. #3066

[SPARK-4205][SQL] Timestamp and Date classes which work in the catalyst DSL. #3066

culler commented Nov 3, 2014

AmplabJenkins commented Nov 3, 2014

rxin commented Nov 3, 2014

liancheng Nov 3, 2014

culler Nov 3, 2014

liancheng Nov 3, 2014

liancheng commented Nov 3, 2014

liancheng Nov 3, 2014

culler Nov 3, 2014

culler Nov 3, 2014

liancheng commented Nov 3, 2014

SparkQA commented Nov 3, 2014

SparkQA commented Nov 3, 2014

AmplabJenkins commented Nov 3, 2014

culler commented Nov 3, 2014

culler commented Nov 3, 2014

culler commented Nov 6, 2014

culler commented Nov 7, 2014

liancheng commented Nov 8, 2014

culler commented Nov 8, 2014

[SPARK-4205][SQL] Timestamp and Date classes which work in the catalyst DSL. #3066

[SPARK-4205][SQL] Timestamp and Date classes which work in the catalyst DSL. #3066

Conversation

culler commented Nov 3, 2014

AmplabJenkins commented Nov 3, 2014

rxin commented Nov 3, 2014

liancheng Nov 3, 2014

Choose a reason for hiding this comment

culler Nov 3, 2014

Choose a reason for hiding this comment

liancheng Nov 3, 2014

Choose a reason for hiding this comment

liancheng commented Nov 3, 2014

liancheng Nov 3, 2014

Choose a reason for hiding this comment

culler Nov 3, 2014

Choose a reason for hiding this comment

culler Nov 3, 2014

Choose a reason for hiding this comment

liancheng commented Nov 3, 2014

SparkQA commented Nov 3, 2014

SparkQA commented Nov 3, 2014

AmplabJenkins commented Nov 3, 2014

culler commented Nov 3, 2014

culler commented Nov 3, 2014

culler commented Nov 6, 2014

culler commented Nov 7, 2014

liancheng commented Nov 8, 2014

culler commented Nov 8, 2014