Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SKIPME merged Apache branch-1.6 #139

Merged
merged 17 commits into from
Dec 29, 2015
Merged

Commits on Dec 22, 2015

  1. Configuration menu
    Copy the full SHA
    4062cda View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    5b19e7c View commit details
    Browse the repository at this point in the history
  3. [MINOR] Fix typos in JavaStreamingContext

    Author: Shixiong Zhu <shixiong@databricks.com>
    
    Closes apache#10424 from zsxwing/typo.
    
    (cherry picked from commit 93da856)
    Signed-off-by: Reynold Xin <rxin@databricks.com>
    zsxwing authored and rxin committed Dec 22, 2015
    Configuration menu
    Copy the full SHA
    309ef35 View commit details
    Browse the repository at this point in the history
  4. [SPARK-11823][SQL] Fix flaky JDBC cancellation test in HiveThriftBina…

    …ryServerSuite
    
    This patch fixes a flaky "test jdbc cancel" test in HiveThriftBinaryServerSuite. This test is prone to a race-condition which causes it to block indefinitely with while waiting for an extremely slow query to complete, which caused many Jenkins builds to time out.
    
    For more background, see my comments on apache#6207 (the PR which introduced this test).
    
    Author: Josh Rosen <joshrosen@databricks.com>
    
    Closes apache#10425 from JoshRosen/SPARK-11823.
    
    (cherry picked from commit 2235cd4)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    JoshRosen committed Dec 22, 2015
    Configuration menu
    Copy the full SHA
    0f905d7 View commit details
    Browse the repository at this point in the history
  5. [SPARK-12487][STREAMING][DOCUMENT] Add docs for Kafka message handler

    Author: Shixiong Zhu <shixiong@databricks.com>
    
    Closes apache#10439 from zsxwing/kafka-message-handler-doc.
    
    (cherry picked from commit 93db50d)
    Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
    zsxwing authored and tdas committed Dec 22, 2015
    Configuration menu
    Copy the full SHA
    94fb5e8 View commit details
    Browse the repository at this point in the history

Commits on Dec 23, 2015

  1. [SPARK-12429][STREAMING][DOC] Add Accumulator and Broadcast example f…

    …or Streaming
    
    This PR adds Scala, Java and Python examples to show how to use Accumulator and Broadcast in Spark Streaming to support checkpointing.
    
    Author: Shixiong Zhu <shixiong@databricks.com>
    
    Closes apache#10385 from zsxwing/accumulator-broadcast-example.
    
    (cherry picked from commit 20591af)
    Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
    zsxwing authored and tdas committed Dec 23, 2015
    Configuration menu
    Copy the full SHA
    942c057 View commit details
    Browse the repository at this point in the history
  2. [SPARK-12477][SQL] - Tungsten projection fails for null values in arr…

    …ay fields
    
    Accessing null elements in an array field fails when tungsten is enabled.
    It works in Spark 1.3.1, and in Spark > 1.5 with Tungsten disabled.
    
    This PR solves this by checking if the accessed element in the array field is null, in the generated code.
    
    Example:
    ```
    // Array of String
    case class AS( as: Seq[String] )
    val dfAS = sc.parallelize( Seq( AS ( Seq("a",null,"b") ) ) ).toDF
    dfAS.registerTempTable("T_AS")
    for (i <- 0 to 2) { println(i + " = " + sqlContext.sql(s"select as[$i] from T_AS").collect.mkString(","))}
    ```
    
    With Tungsten disabled:
    ```
    0 = [a]
    1 = [null]
    2 = [b]
    ```
    
    With Tungsten enabled:
    ```
    0 = [a]
    15/12/22 09:32:50 ERROR Executor: Exception in task 7.0 in stage 1.0 (TID 15)
    java.lang.NullPointerException
    	at org.apache.spark.sql.catalyst.expressions.UnsafeRowWriters$UTF8StringWriter.getSize(UnsafeRowWriters.java:90)
    	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
    	at org.apache.spark.sql.execution.TungstenProject$$anonfun$3$$anonfun$apply$3.apply(basicOperators.scala:90)
    	at org.apache.spark.sql.execution.TungstenProject$$anonfun$3$$anonfun$apply$3.apply(basicOperators.scala:88)
    	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
    	at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
    ```
    
    Author: pierre-borckmans <pierre.borckmans@realimpactanalytics.com>
    
    Closes apache#10429 from pierre-borckmans/SPARK-12477_Tungsten-Projection-Null-Element-In-Array.
    
    (cherry picked from commit 43b2a63)
    Signed-off-by: Reynold Xin <rxin@databricks.com>
    pierre-borckmans authored and rxin committed Dec 23, 2015
    Configuration menu
    Copy the full SHA
    c6c9bf9 View commit details
    Browse the repository at this point in the history

Commits on Dec 24, 2015

  1. [SPARK-12499][BUILD] don't force MAVEN_OPTS

    allow the user to override MAVEN_OPTS (2GB wasn't sufficient for me)
    
    Author: Adrian Bridgett <adrian@smop.co.uk>
    
    Closes apache#10448 from abridgett/feature/do_not_force_maven_opts.
    
    (cherry picked from commit ead6abf)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    abridgett authored and JoshRosen committed Dec 24, 2015
    Configuration menu
    Copy the full SHA
    5987b16 View commit details
    Browse the repository at this point in the history
  2. [SPARK-12411][CORE] Decrease executor heartbeat timeout to match hear…

    …tbeat interval
    
    Previously, the rpc timeout was the default network timeout, which is the same value
    the driver uses to determine dead executors. This means if there is a network issue,
    the executor is determined dead after one heartbeat attempt. There is a separate config
    for the heartbeat interval which is a better value to use for the heartbeat RPC. With
    this change, the executor will make multiple heartbeat attempts even with RPC issues.
    
    Author: Nong Li <nong@databricks.com>
    
    Closes apache#10365 from nongli/spark-12411.
    nongli authored and Andrew Or committed Dec 24, 2015
    Configuration menu
    Copy the full SHA
    b49856a View commit details
    Browse the repository at this point in the history
  3. [SPARK-12502][BUILD][PYTHON] Script /dev/run-tests fails when IBM Jav…

    …a is used
    
    fix an exception with IBM JDK by removing update field from a JavaVersion tuple. This is because IBM JDK does not have information on update '_xx'
    
    Author: Kazuaki Ishizaki <ishizaki@jp.ibm.com>
    
    Closes apache#10463 from kiszk/SPARK-12502.
    
    (cherry picked from commit 9e85bb7)
    Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
    kiszk authored and sarutak committed Dec 24, 2015
    Configuration menu
    Copy the full SHA
    4dd8712 View commit details
    Browse the repository at this point in the history
  4. [SPARK-12010][SQL] Spark JDBC requires support for column-name-free I…

    …NSERT syntax
    
    In the past Spark JDBC write only worked with technologies which support the following INSERT statement syntax (JdbcUtils.scala: insertStatement()):
    
    INSERT INTO $table VALUES ( ?, ?, ..., ? )
    
    But some technologies require a list of column names:
    
    INSERT INTO $table ( $colNameList ) VALUES ( ?, ?, ..., ? )
    
    This was blocking the use of e.g. the Progress JDBC Driver for Cassandra.
    
    Another limitation is that syntax 1 relies no the dataframe field ordering match that of the target table. This works fine, as long as the target table has been created by writer.jdbc().
    
    If the target table contains more columns (not created by writer.jdbc()), then the insert fails due mismatch of number of columns or their data types.
    
    This PR switches to the recommended second INSERT syntax. Column names are taken from datafram field names.
    
    Author: CK50 <christian.kurz@oracle.com>
    
    Closes apache#10380 from CK50/master-SPARK-12010-2.
    
    (cherry picked from commit 502476e)
    Signed-off-by: Sean Owen <sowen@cloudera.com>
    CK50 authored and srowen committed Dec 24, 2015
    Configuration menu
    Copy the full SHA
    865dd8b View commit details
    Browse the repository at this point in the history

Commits on Dec 28, 2015

  1. [SPARK-12520] [PYSPARK] Correct Descriptions and Add Use Cases in Equ…

    …i-Join
    
    After reading the JIRA https://issues.apache.org/jira/browse/SPARK-12520, I double checked the code.
    
    For example, users can do the Equi-Join like
      ```df.join(df2, 'name', 'outer').select('name', 'height').collect()```
    - There exists a bug in 1.5 and 1.4. The code just ignores the third parameter (join type) users pass. However, the join type we called is `Inner`, even if the user-specified type is the other type (e.g., `Outer`).
    - After a PR: apache#8600, the 1.6 does not have such an issue, but the description has not been updated.
    
    Plan to submit another PR to fix 1.5 and issue an error message if users specify a non-inner join type when using Equi-Join.
    
    Author: gatorsmile <gatorsmile@gmail.com>
    
    Closes apache#10477 from gatorsmile/pyOuterJoin.
    gatorsmile authored and davies committed Dec 28, 2015
    Configuration menu
    Copy the full SHA
    b8da77e View commit details
    Browse the repository at this point in the history
  2. [SPARK-12517] add default RDD name for one created via sc.textFile

    The feature was first added at commit: 7b877b2 but was later removed (probably by mistake) at commit: fc8b581.
    This change sets the default path of RDDs created via sc.textFile(...) to the path argument.
    
    Here is the symptom:
    
    * Using spark-1.5.2-bin-hadoop2.6:
    
    scala> sc.textFile("/home/root/.bashrc").name
    res5: String = null
    
    scala> sc.binaryFiles("/home/root/.bashrc").name
    res6: String = /home/root/.bashrc
    
    * while using Spark 1.3.1:
    
    scala> sc.textFile("/home/root/.bashrc").name
    res0: String = /home/root/.bashrc
    
    scala> sc.binaryFiles("/home/root/.bashrc").name
    res1: String = /home/root/.bashrc
    
    Author: Yaron Weinsberg <wyaron@gmail.com>
    Author: yaron <yaron@il.ibm.com>
    
    Closes apache#10456 from wyaron/master.
    
    (cherry picked from commit 73b70f0)
    Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
    wyaron authored and sarutak committed Dec 28, 2015
    Configuration menu
    Copy the full SHA
    1fbcb6e View commit details
    Browse the repository at this point in the history
  3. [SPARK-12424][ML] The implementation of ParamMap#filter is wrong.

    ParamMap#filter uses `mutable.Map#filterKeys`. The return type of `filterKey` is collection.Map, not mutable.Map but the result is casted to mutable.Map using `asInstanceOf` so we get `ClassCastException`.
    Also, the return type of Map#filterKeys is not Serializable. It's the issue of Scala (https://issues.scala-lang.org/browse/SI-6654).
    
    Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
    
    Closes apache#10381 from sarutak/SPARK-12424.
    
    (cherry picked from commit 07165ca)
    Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
    sarutak committed Dec 28, 2015
    Configuration menu
    Copy the full SHA
    7c7d76f View commit details
    Browse the repository at this point in the history
  4. [SPARK-12222][CORE] Deserialize RoaringBitmap using Kryo serializer t…

    …hrow Buffer underflow exception
    
    Since we only need to implement `def skipBytes(n: Int)`,
    code in apache#10213 could be simplified.
    davies scwf
    
    Author: Daoyuan Wang <daoyuan.wang@intel.com>
    
    Closes apache#10253 from adrian-wang/kryo.
    
    (cherry picked from commit a6d3853)
    Signed-off-by: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
    adrian-wang authored and sarutak committed Dec 28, 2015
    Configuration menu
    Copy the full SHA
    a9c52d4 View commit details
    Browse the repository at this point in the history
  5. [SPARK-12489][CORE][SQL][MLIB] Fix minor issues found by FindBugs

    Include the following changes:
    
    1. Close `java.sql.Statement`
    2. Fix incorrect `asInstanceOf`.
    3. Remove unnecessary `synchronized` and `ReentrantLock`.
    
    Author: Shixiong Zhu <shixiong@databricks.com>
    
    Closes apache#10440 from zsxwing/findbugs.
    
    (cherry picked from commit 710b411)
    Signed-off-by: Shixiong Zhu <shixiong@databricks.com>
    zsxwing committed Dec 28, 2015
    Configuration menu
    Copy the full SHA
    fd20248 View commit details
    Browse the repository at this point in the history

Commits on Dec 29, 2015

  1. Configuration menu
    Copy the full SHA
    d545dfe View commit details
    Browse the repository at this point in the history