Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-4205][SQL] Timestamp and Date classes which work in the catalyst DSL. #3066

Closed
wants to merge 80 commits into from
Closed

Conversation

culler
Copy link

@culler culler commented Nov 3, 2014

These Timestamp and Date classes support the usual comparison operators, so they can be used in catalyst Expressions. They come with implicit conversions to and from the underlying Java classes. They also have a convenient initializer which accepts a String.

Additional implicit conversions fix an issue with the DSL where expressions like 0 < 'x are not recognized.

operators, as well as implicit conversions to support using these classes
in the catalalyst DSL.

This commit also adds a method to Row which builds a row from a schema
and a list of strings.
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@rxin
Copy link
Contributor

rxin commented Nov 3, 2014

@liancheng - can you take a look at this first? Thanks.

@culler - can you create a JIRA ticket at https://issues.apache.org/jira/browse/SPARK and then update the pull request title to include the JIRA ticket number? e.g. "[SPARK-9999][SQL] DSL support for TimeStamp & Date"

rxin and others added 2 commits November 2, 2014 21:56
Author: wangfei <wangfei1@huawei.com>

Closes #3042 from scwf/patch-9 and squashes the following commits:

3784ed1 [wangfei] remove 'TODO'
1891553 [wangfei] update build doc since JDBC/CLI support hive 13
def >= (other: Symbol):Expression = {literal >= other}
def === (other: Symbol):Expression = {literal === other}
def <=> (other: Symbol):Expression = {literal <=> other}
def !== (other: Symbol):Expression = {literal !== other}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a space after :, and remove the braces.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

This is a PR to send the fetch failure message back to Web UI.
Before:
![f1](https://cloud.githubusercontent.com/assets/1000778/4856595/1f036c80-60be-11e4-956f-335147fbccb7.png)
![f2](https://cloud.githubusercontent.com/assets/1000778/4856596/1f11cbea-60be-11e4-8fe9-9f9b2b35c884.png)

After (Please ignore the meaning of exception, I threw it in the code directly because it's hard to simulate a fetch failure):
![e1](https://cloud.githubusercontent.com/assets/1000778/4856600/2657ea38-60be-11e4-9f2d-d56c5f900f10.png)
![e2](https://cloud.githubusercontent.com/assets/1000778/4856601/26595008-60be-11e4-912b-2744af786991.png)

Author: zsxwing <zsxwing@gmail.com>

Closes #3032 from zsxwing/SPARK-4163 and squashes the following commits:

f7e1faf [zsxwing] Discard changes for FetchFailedException and minor modification
4e946f7 [zsxwing] Add e as the cause of SparkException
316767d [zsxwing] Add private[storage] to FetchResult
d51b0b6 [zsxwing] Set e as the cause of FetchFailedException
b88c919 [zsxwing] Use 'private[storage]' for case classes instead of 'sealed'
62103fd [zsxwing] Update as per review
0c07d1f [zsxwing] Backward-compatible support
a3bca65 [zsxwing] Send the fetch failure message back to Web UI
* where a literal is being combined with a symbol. Without these an
* expression such as 0 < 'x is not recognized.
*/
implicit class InitialLiteral(x: Any) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about LhsLiteral (left hand side literal)? Also, converting Any to this class implicitly seems a bit dangerous. I'd prefer this style:

case class LhsLiteral(x: Any) {
  ...
}

// Similar to all the xxToLiteral implicit conversions:
implicit booleanToLhsLiteral(b: Boolean) = new LhsLiteral(b)
implicit byteToLhsLiteral(b: Byte) = new LhsLiteral(b)
...

@liancheng
Copy link
Contributor

Hey @culler, in general I like the DSL improvements, thanks for the work! Left some in-line comments, and two additional hight level comments here:

  1. Usually we encourage one PR for one code change. It's harder to track and review if developers mix several unrelated or loosely related changes into a single PR.
  2. Please always add sufficient tests for changes made in the PR unless they are trivial (e.g. renaming or styling changes).

* use in DSL expressions. Implicit conversions to java.sql.Date
* are provided. The class intializer accepts a String, e.g.
*
* val ts = Date("2014-01-01")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use proper Scaladoc format here, i.e.:

{{{
  val ts = Date("2014-01-01")
}}}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And more thanks for reading the code so carefully and for all of the thoughtful comments and helpful advice. I am pushing fixes.

@liancheng
Copy link
Contributor

test this please

@SparkQA
Copy link

SparkQA commented Nov 3, 2014

Test build #22811 has started for PR 3066 at commit 0c389a7.

  • This patch merges cleanly.

@SparkQA
Copy link

SparkQA commented Nov 3, 2014

Test build #22811 has finished for PR 3066 at commit 0c389a7.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • implicit class InitialLiteral(x: Any)
    • final class MutableDate extends MutableValue
    • final class MutableTimestamp extends MutableValue
    • class Date(milliseconds: Long) extends JDate(milliseconds)
    • class Timestamp(milliseconds: Long) extends JTimestamp(milliseconds)

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22811/
Test FAILed.

@culler
Copy link
Author

culler commented Nov 3, 2014

I have corrected the issues pointed out in the comments. Please retest.

@culler
Copy link
Author

culler commented Nov 3, 2014

Hmmm. The GitHub docs seem to suggest that I am supposed to mention @liancheng and @rxin in my comments. So I am doing that. My apologies if this is superfluous or annoying. - @culler

We reference a specific branch in two places. This patch makes it one place.

Author: Nicholas Chammas <nicholas.chammas@gmail.com>

Closes #3008 from nchammas/mesos-spark-ec2-branch and squashes the following commits:

10a6089 [Nicholas Chammas] factor out mess spark-ec2 branch
@culler culler changed the title Timestamp and Date classes which work in the catalyst DSL. [SPARK-4205][SQL] Timestamp and Date classes which work in the catalyst DSL. Nov 3, 2014
@culler
Copy link
Author

culler commented Nov 6, 2014

The file sql/core/src/test/scala/org/apache/spark/sql/UserDefinedTypeSuite.scala was added after this PR, and caused a test to fail due to the collision between === operators when I tested my changes against the current master branch. I added the whole file to this PR (with explicit conversions for === added to it) hoping to avoid a test failure. Now I see that merge conflicts are being reported. Not sure how to handle this.

Andrew Or and others added 23 commits November 6, 2014 15:31
This was added by me in 61a5cce. The real fix will be added in [SPARK-4281](https://issues.apache.org/jira/browse/SPARK-4281).

Author: Andrew Or <andrew@databricks.com>

Closes #3145 from andrewor14/fix-make-distribution and squashes the following commits:

c78be61 [Andrew Or] Hot fix make distribution
I did not realize there was a `network.util.JavaUtils` when I wrote this code. This PR moves the `ByteBuffer` string conversion to the appropriate place. I tested the changes on a stable yarn cluster.

Author: Andrew Or <andrew@databricks.com>

Closes #3144 from andrewor14/yarn-shuffle-util and squashes the following commits:

b6c08bf [Andrew Or] Remove unused import
94e205c [Andrew Or] Use netty Unpooled
85202a5 [Andrew Or] Use guava Charsets
057135b [Andrew Or] Reword comment
adf186d [Andrew Or] Move byte buffer String conversion logic to JavaUtils
Author: Aaron Davidson <aaron@databricks.com>

Closes #3142 from aarondav/worker and squashes the following commits:

3780bd7 [Aaron Davidson] Address comments
2dcdfc1 [Aaron Davidson] Add private[worker]
47f49d3 [Aaron Davidson] NettyBlockTransferService shouldn't care about app ids (it's only b/t executors)
258417c [Aaron Davidson] [SPARK-4277] Support external shuffle service on executor
This adds a RetryingBlockFetcher to the NettyBlockTransferService which is wrapped around our typical OneForOneBlockFetcher, adding retry logic in the event of an IOException.

This sort of retry allows us to avoid marking an entire executor as failed due to garbage collection or high network load.

TODO:
- [x] unit tests
- [x] put in ExternalShuffleClient too

Author: Aaron Davidson <aaron@databricks.com>

Closes #3101 from aarondav/retry and squashes the following commits:

72a2a32 [Aaron Davidson] Add that we should remove the condition around the retry thingy
c7fd107 [Aaron Davidson] Fix unit tests
e80e4c2 [Aaron Davidson] Address initial comments
6f594cd [Aaron Davidson] Fix unit test
05ff43c [Aaron Davidson] Add to external shuffle client and add unit test
66e5a24 [Aaron Davidson] [SPARK-4238] [Core] Perform network-level retry of shuffle file fetches
This relies on a hook from whoever is hosting the shuffle service to invoke removeApplication() when the application is completed. Once invoked, we will clean up all the executors' shuffle directories we know about.

Author: Aaron Davidson <aaron@databricks.com>

Closes #3126 from aarondav/cleanup and squashes the following commits:

33a64a9 [Aaron Davidson] Missing brace
e6e428f [Aaron Davidson] Address comments
16a0d27 [Aaron Davidson] Cleanup
e4df3e7 [Aaron Davidson] [SPARK-4236] Cleanup removed applications' files in shuffle service
… inner exceptions and make the error information in Web UI more friendly

This PR fixed `Utils.exceptionString` to output the full exception information. However, the stack trace may become very huge, so I also updated the Web UI to collapse the error information by default (display the first line and clicking `+detail` will display the full info).

Here are the screenshots:

Stages:
![stages](https://cloud.githubusercontent.com/assets/1000778/4882441/66d8cc68-6356-11e4-8346-6318677d9470.png)

Details for one stage:
![stage](https://cloud.githubusercontent.com/assets/1000778/4882513/1311043c-6357-11e4-8804-ca14240a9145.png)

The full information in the gray text field is:
```Java
org.apache.spark.shuffle.FetchFailedException: Connection reset by peer
	at org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$.org$apache$spark$shuffle$hash$BlockStoreShuffleFetcher$$unpackBlock$1(BlockStoreShuffleFetcher.scala:67)
	at org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:83)
	at org.apache.spark.shuffle.hash.BlockStoreShuffleFetcher$$anonfun$3.apply(BlockStoreShuffleFetcher.scala:83)
	at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
	at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
	at org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:129)
	at org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:160)
	at org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:159)
	at scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
	at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
	at scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
	at org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:159)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
	at org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
	at org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:31)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
	at org.apache.spark.scheduler.Task.run(Task.scala:56)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:189)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Connection reset by peer
	at sun.nio.ch.FileDispatcher.read0(Native Method)
	at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
	at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
	at sun.nio.ch.IOUtil.read(IOUtil.java:166)
	at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245)
	at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:311)
	at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
	at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:225)
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
	at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
	... 1 more
```

/cc aarondav

Author: zsxwing <zsxwing@gmail.com>

Closes #3073 from zsxwing/SPARK-4204 and squashes the following commits:

176d1e3 [zsxwing] Add comments to explain the stack trace difference
ca509d3 [zsxwing] Add fullStackTrace to the constructor of ExceptionFailure
a07057b [zsxwing] Core style fix
dfb0032 [zsxwing] Backward compatibility for old history server
1e50f71 [zsxwing] Update as per review and increase the max height of the stack trace details
94f2566 [zsxwing] Change Utils.exceptionString to contain the inner exceptions and make the error information in Web UI more friendly
operators, as well as implicit conversions to support using these classes
in the catalalyst DSL.

This commit also adds a method to Row which builds a row from a schema
and a list of strings.
… for Date

and Timestamp.  Added Date and Timestamp to NativeType.defaultSizeOf.
conversion (due to the existence of an === operator), applied the
transformation assert(X == Y) --> assert(convertToEqualizer(X).===(Y))
in the code generator, and this PR is already too big.  Now passes the
standard test suite (except SparkSubmitSuite, which I think is unrelated
to these changes.)
aliases for RichDate and RichTimestamp when importing an SQLContext.
@culler
Copy link
Author

culler commented Nov 7, 2014

Hi @liancheng , now that I have completely screwed up this PR by attempting to rebase the repository, I will close it and open a new one which will hopefully be clean.

@liancheng
Copy link
Contributor

@culler Don't worry, I guess every contributor had screwed up rebasing at least once :)

@culler
Copy link
Author

culler commented Nov 8, 2014

Thanks! I'm proud to be a member of the club. :^)

On Fri, Nov 7, 2014 at 11:16 PM, Cheng Lian notifications@github.com
wrote:

@culler https://github.com/culler Don't worry, I guess every
contributor had screwed up rebasing at least once :)


Reply to this email directly or view it on GitHub
#3066 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.