Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REL-505 merge Apache branch-1.1 bug fixes and add new ByteswapPartitioner #27

Closed
wants to merge 78 commits into from

Commits on Nov 17, 2014

  1. Revert "[SPARK-4075] [Deploy] Jar url validation is not enough for Ja…

    …r file"
    
    This reverts commit 098f83c.
    Andrew Or committed Nov 17, 2014
    Configuration menu
    Copy the full SHA
    b528367 View commit details
    Browse the repository at this point in the history
  2. Revert "[maven-release-plugin] prepare for next development iteration"

    This reverts commit 685bdd2.
    Andrew Or committed Nov 17, 2014
    Configuration menu
    Copy the full SHA
    cf8d0ef View commit details
    Browse the repository at this point in the history
  3. Revert "[maven-release-plugin] prepare release v1.1.1-rc1"

    This reverts commit 72a4fdb.
    Andrew Or committed Nov 17, 2014
    Configuration menu
    Copy the full SHA
    e4f5695 View commit details
    Browse the repository at this point in the history

Commits on Nov 18, 2014

  1. [SPARK-4467] Partial fix for fetch failure in sort-based shuffle (1.1)

    This is the 1.1 version of apache#3302. There has been some refactoring in master so we can't cherry-pick that PR.
    
    Author: Andrew Or <andrew@databricks.com>
    
    Closes apache#3330 from andrewor14/sort-fetch-fail and squashes the following commits:
    
    486fc49 [Andrew Or] Reset `elementsRead`
    Andrew Or committed Nov 18, 2014
    Configuration menu
    Copy the full SHA
    aa9ebda View commit details
    Browse the repository at this point in the history
  2. [SPARK-4393] Fix memory leak in ConnectionManager ACK timeout TimerTa…

    …sks; use HashedWheelTimer (For branch-1.1)
    
    This patch is intended to fix a subtle memory leak in ConnectionManager's ACK timeout TimerTasks: in the old code, each TimerTask held a reference to the message being sent and a cancelled TimerTask won't necessarily be garbage-collected until it's scheduled to run, so this caused huge buildups of messages that weren't garbage collected until their timeouts expired, leading to OOMs.
    
    This patch addresses this problem by capturing only the message ID in the TimerTask instead of the whole message, and by keeping a WeakReference to the promise in the TimerTask. I've also modified this code to use Netty's HashedWheelTimer, whose performance characteristics should be better for this use-case.
    
    Author: Kousuke Saruta <sarutak@oss.nttdata.co.jp>
    
    Closes apache#3321 from sarutak/connection-manager-timeout-bugfix and squashes the following commits:
    
    786af91 [Kousuke Saruta] Fixed memory leak issue of ConnectionManager
    sarutak authored and JoshRosen committed Nov 18, 2014
    Configuration menu
    Copy the full SHA
    91b5fa8 View commit details
    Browse the repository at this point in the history

Commits on Nov 19, 2014

  1. [SPARK-4433] fix a racing condition in zipWithIndex

    Spark hangs with the following code:
    
    ~~~
    sc.parallelize(1 to 10).zipWithIndex.repartition(10).count()
    ~~~
    
    This is because ZippedWithIndexRDD triggers a job in getPartitions and it causes a deadlock in DAGScheduler.getPreferredLocs (synced). The fix is to compute `startIndices` during construction.
    
    This should be applied to branch-1.0, branch-1.1, and branch-1.2.
    
    pwendell
    
    Author: Xiangrui Meng <meng@databricks.com>
    
    Closes apache#3291 from mengxr/SPARK-4433 and squashes the following commits:
    
    c284d9f [Xiangrui Meng] fix a racing condition in zipWithIndex
    
    (cherry picked from commit bb46046)
    Signed-off-by: Xiangrui Meng <meng@databricks.com>
    mengxr committed Nov 19, 2014
    Configuration menu
    Copy the full SHA
    ae9b1f6 View commit details
    Browse the repository at this point in the history
  2. [SPARK-4468][SQL] Backports apache#3334 to branch-1.1

    <!-- Reviewable:start -->
    [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3338)
    <!-- Reviewable:end -->
    
    Author: Cheng Lian <lian@databricks.com>
    
    Closes apache#3338 from liancheng/spark-3334-for-1.1 and squashes the following commits:
    
    bd17512 [Cheng Lian] Backports apache#3334 to branch-1.1
    liancheng authored and marmbrus committed Nov 19, 2014
    Configuration menu
    Copy the full SHA
    f9739b9 View commit details
    Browse the repository at this point in the history
  3. [SPARK-4380] Log more precise number of bytes spilled (1.1)

    This is the branch-1.1 version of apache#3243.
    
    Author: Andrew Or <andrew@databricks.com>
    
    Closes apache#3355 from andrewor14/spill-log-bytes-1.1 and squashes the following commits:
    
    36ec152 [Andrew Or] Log more precise representation of bytes in spilling code
    Andrew Or committed Nov 19, 2014
    Configuration menu
    Copy the full SHA
    e22a759 View commit details
    Browse the repository at this point in the history
  4. Merge branch 'branch-1.1' of github.com:apache/spark into csd-1.1

    Conflicts:
    	assembly/pom.xml
    	bagel/pom.xml
    	core/pom.xml
    	examples/pom.xml
    	external/flume-sink/pom.xml
    	external/flume/pom.xml
    	external/kafka/pom.xml
    	external/mqtt/pom.xml
    	external/twitter/pom.xml
    	external/zeromq/pom.xml
    	extras/kinesis-asl/pom.xml
    	extras/spark-ganglia-lgpl/pom.xml
    	graphx/pom.xml
    	mllib/pom.xml
    	pom.xml
    	repl/pom.xml
    	sql/catalyst/pom.xml
    	sql/core/pom.xml
    	sql/hive-thriftserver/pom.xml
    	sql/hive/pom.xml
    	streaming/pom.xml
    	tools/pom.xml
    	yarn/pom.xml
    	yarn/stable/pom.xml
    markhamstra committed Nov 19, 2014
    Configuration menu
    Copy the full SHA
    1713c7e View commit details
    Browse the repository at this point in the history
  5. [SPARK-4480] Avoid many small spills in external data structures (1.1)

    This is the branch-1.1 version of apache#3353. This requires a separate PR because the code in master has been refactored a little to eliminate duplicate code. I have tested this on a standalone cluster. The goal is to merge this into 1.1.1.
    
    Author: Andrew Or <andrew@databricks.com>
    
    Closes apache#3354 from andrewor14/avoid-small-spills-1.1 and squashes the following commits:
    
    f2e552c [Andrew Or] Fix tests
    7012595 [Andrew Or] Avoid many small spills
    Andrew Or committed Nov 19, 2014
    Configuration menu
    Copy the full SHA
    16bf5f3 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    aa3c794 View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    3693ae5 View commit details
    Browse the repository at this point in the history
  8. Configuration menu
    Copy the full SHA
    1df1c1d View commit details
    Browse the repository at this point in the history

Commits on Nov 24, 2014

  1. Merge tag 'v1.1.1-rc2' of github.com:apache/spark into csd-1.1

    [maven-release-plugin]  copy for tag v1.1.1-rc2
    
    Conflicts:
    	assembly/pom.xml
    	bagel/pom.xml
    	core/pom.xml
    	examples/pom.xml
    	external/flume-sink/pom.xml
    	external/flume/pom.xml
    	external/kafka/pom.xml
    	external/mqtt/pom.xml
    	external/twitter/pom.xml
    	external/zeromq/pom.xml
    	extras/kinesis-asl/pom.xml
    	extras/spark-ganglia-lgpl/pom.xml
    	graphx/pom.xml
    	mllib/pom.xml
    	pom.xml
    	repl/pom.xml
    	sql/catalyst/pom.xml
    	sql/core/pom.xml
    	sql/hive-thriftserver/pom.xml
    	sql/hive/pom.xml
    	streaming/pom.xml
    	tools/pom.xml
    	yarn/pom.xml
    	yarn/stable/pom.xml
    markhamstra committed Nov 24, 2014
    Configuration menu
    Copy the full SHA
    b838cef View commit details
    Browse the repository at this point in the history
  2. Fixed merge typo

    markhamstra committed Nov 24, 2014
    Configuration menu
    Copy the full SHA
    1b2b7dd View commit details
    Browse the repository at this point in the history
  3. Update versions to 1.1.2-SNAPSHOT

    Andrew Or committed Nov 24, 2014
    Configuration menu
    Copy the full SHA
    6371737 View commit details
    Browse the repository at this point in the history

Commits on Nov 25, 2014

  1. [SPARK-4196][SPARK-4602][Streaming] Fix serialization issue in PairDS…

    …treamFunctions.saveAsNewAPIHadoopFiles
    
    Solves two JIRAs in one shot
    - Makes the ForechDStream created by saveAsNewAPIHadoopFiles serializable for checkpoints
    - Makes the default configuration object used saveAsNewAPIHadoopFiles be the Spark's hadoop configuration
    
    Author: Tathagata Das <tathagata.das1565@gmail.com>
    
    Closes apache#3457 from tdas/savefiles-fix and squashes the following commits:
    
    bb4729a [Tathagata Das] Same treatment for saveAsHadoopFiles
    b382ea9 [Tathagata Das] Fix serialization issue in PairDStreamFunctions.saveAsNewAPIHadoopFiles.
    
    (cherry picked from commit 8838ad7)
    Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
    tdas committed Nov 25, 2014
    Configuration menu
    Copy the full SHA
    7aa592c View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    1a7f414 View commit details
    Browse the repository at this point in the history

Commits on Nov 27, 2014

  1. [Release] Automate generation of contributors list

    This commit provides a script that computes the contributors list
    by linking the github commits with JIRA issues. Automatically
    translating github usernames remains a TODO at this point.
    Andrew Or committed Nov 27, 2014
    Configuration menu
    Copy the full SHA
    a59c445 View commit details
    Browse the repository at this point in the history

Commits on Nov 28, 2014

  1. [BRANCH-1.1][SPARK-4626] Kill a task only if the executorId is (still…

    …) registered with the scheduler
    
    v1.1 backport for apache#3483
    
    Author: roxchkplusony <roxchkplusony@gmail.com>
    
    Closes apache#3503 from roxchkplusony/bugfix/4626-1.1 and squashes the following commits:
    
    234d350 [roxchkplusony] [SPARK-4626] Kill a task only if the executorId is (still) registered with the scheduler
    roxchkplusony authored and rxin committed Nov 28, 2014
    Configuration menu
    Copy the full SHA
    f8a4fd3 View commit details
    Browse the repository at this point in the history

Commits on Nov 29, 2014

  1. [SPARK-4597] Use proper exception and reset variable in Utils.createT…

    …empDir()
    
    `File.exists()` and `File.mkdirs()` only throw `SecurityException` instead of `IOException`. Then, when an exception is thrown, `dir` should be reset too.
    
    Author: Liang-Chi Hsieh <viirya@gmail.com>
    
    Closes apache#3449 from viirya/fix_createtempdir and squashes the following commits:
    
    36cacbd [Liang-Chi Hsieh] Use proper exception and reset variable.
    
    (cherry picked from commit 49fe879)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    viirya authored and JoshRosen committed Nov 29, 2014
    Configuration menu
    Copy the full SHA
    24b5c03 View commit details
    Browse the repository at this point in the history

Commits on Nov 30, 2014

  1. SPARK-2143 [WEB UI] Add Spark version to UI footer

    This PR adds the Spark version number to the UI footer; this is how it looks:
    
    ![screen shot 2014-11-21 at 22 58 40](https://cloud.githubusercontent.com/assets/822522/5157738/f4822094-7316-11e4-98f1-333a535fdcfa.png)
    
    Author: Sean Owen <sowen@cloudera.com>
    
    Closes apache#3410 from srowen/SPARK-2143 and squashes the following commits:
    
    e9b3a7a [Sean Owen] Add Spark version to footer
    srowen authored and JoshRosen committed Nov 30, 2014
    Configuration menu
    Copy the full SHA
    1a2508b View commit details
    Browse the repository at this point in the history
  2. [HOTFIX] Fix build break in 1a2508b

    org.apache.spark.SPARK_VERSION is new in 1.2; in earlier versions,
    we have to use SparkContext.SPARK_VERSION.
    JoshRosen committed Nov 30, 2014
    Configuration menu
    Copy the full SHA
    90d90b2 View commit details
    Browse the repository at this point in the history

Commits on Dec 1, 2014

  1. [DOC] Fixes formatting typo in SQL programming guide

    <!-- Reviewable:start -->
    [<img src="https://reviewable.io/review_button.png" height=40 alt="Review on Reviewable"/>](https://reviewable.io/reviews/apache/spark/3498)
    <!-- Reviewable:end -->
    
    Author: Cheng Lian <lian@databricks.com>
    
    Closes apache#3498 from liancheng/fix-sql-doc-typo and squashes the following commits:
    
    865ecd7 [Cheng Lian] Fixes formatting typo in SQL programming guide
    
    (cherry picked from commit 2a4d389)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    liancheng authored and JoshRosen committed Dec 1, 2014
    Configuration menu
    Copy the full SHA
    91eadd2 View commit details
    Browse the repository at this point in the history

Commits on Dec 2, 2014

  1. [SPARK-4686] Link to allowed master URLs is broken

    The link points to the old scala programming guide; it should point to the submitting applications page.
    
    This should be backported to 1.1.2 (it's been broken as of 1.0).
    
    Author: Kay Ousterhout <kayousterhout@gmail.com>
    
    Closes apache#3542 from kayousterhout/SPARK-4686 and squashes the following commits:
    
    a8fc43b [Kay Ousterhout] [SPARK-4686] Link to allowed master URLs is broken
    
    (cherry picked from commit d9a148b)
    Signed-off-by: Kay Ousterhout <kayousterhout@gmail.com>
    kayousterhout committed Dec 2, 2014
    Configuration menu
    Copy the full SHA
    f333e4f View commit details
    Browse the repository at this point in the history

Commits on Dec 3, 2014

  1. Configuration menu
    Copy the full SHA
    aec20af View commit details
    Browse the repository at this point in the history
  2. [SPARK-4701] Typo in sbt/sbt

    Modified typo.
    
    Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp>
    
    Closes apache#3560 from tsudukim/feature/SPARK-4701 and squashes the following commits:
    
    ed2a3f1 [Masayoshi TSUZUKI] Another whitespace position error.
    1af3a35 [Masayoshi TSUZUKI] [SPARK-4701] Typo in sbt/sbt
    
    (cherry picked from commit 96786e3)
    Signed-off-by: Andrew Or <andrew@databricks.com>
    tsudukim authored and Andrew Or committed Dec 3, 2014
    Configuration menu
    Copy the full SHA
    e484b8a View commit details
    Browse the repository at this point in the history
  3. [SPARK-4715][Core] Make sure tryToAcquire won't return a negative value

    ShuffleMemoryManager.tryToAcquire may return a negative value. The unit test demonstrates this bug. It will output `0 did not equal -200 granted is negative`.
    
    Author: zsxwing <zsxwing@gmail.com>
    
    Closes apache#3575 from zsxwing/SPARK-4715 and squashes the following commits:
    
    a193ae6 [zsxwing] Make sure tryToAcquire won't return a negative value
    zsxwing authored and Andrew Or committed Dec 3, 2014
    Configuration menu
    Copy the full SHA
    af76954 View commit details
    Browse the repository at this point in the history
  4. [SPARK-4642] Add description about spark.yarn.queue to running-on-YAR…

    …N document.
    
    Added descriptions about these parameters.
    - spark.yarn.queue
    
    Modified description about the defalut value of this parameter.
    - spark.yarn.submit.file.replication
    
    Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp>
    
    Closes apache#3500 from tsudukim/feature/SPARK-4642 and squashes the following commits:
    
    ce99655 [Masayoshi TSUZUKI] better gramatically.
    21cf624 [Masayoshi TSUZUKI] Removed intentionally undocumented properties.
    88cac9b [Masayoshi TSUZUKI] [SPARK-4642] Documents about running-on-YARN needs update
    tsudukim authored and Andrew Or committed Dec 3, 2014
    Configuration menu
    Copy the full SHA
    3e3cd5a View commit details
    Browse the repository at this point in the history
  5. [SPARK-4498][core] Don't transition ExecutorInfo to RUNNING until Dri…

    …ver adds Executor
    
    The ExecutorInfo only reaches the RUNNING state if the Driver is alive to send the ExecutorStateChanged message to master.  Else, appInfo.resetRetryCount() is never called and failing Executors will eventually exceed ApplicationState.MAX_NUM_RETRY, resulting in the application being removed from the master's accounting.
    
    Author: Mark Hamstra <markhamstra@gmail.com>
    
    Closes apache#3550 from markhamstra/SPARK-4498 and squashes the following commits:
    
    8f543b1 [Mark Hamstra] Don't transition ExecutorInfo to RUNNING until Executor is added by Driver
    markhamstra authored and JoshRosen committed Dec 3, 2014
    Configuration menu
    Copy the full SHA
    17dfd41 View commit details
    Browse the repository at this point in the history

Commits on Dec 4, 2014

  1. [Release] Correctly translate contributors name in release notes

    This commit involves three main changes:
    
    (1) It separates the translation of contributor names from the
    generation of the contributors list. This is largely motivated
    by the Github API limit; even if we exceed this limit, we should
    at least be able to proceed manually as before. This is why the
    translation logic is abstracted into its own script
    translate-contributors.py.
    
    (2) When we look for candidate replacements for invalid author
    names, we should look for the assignees of the associated JIRAs
    too. As a result, the intermediate file must keep track of these.
    
    (3) This provides an interactive mode with which the user can
    sit at the terminal and manually pick the candidate replacement
    that he/she thinks makes the most sense. As before, there is a
    non-interactive mode that picks the first candidate that the
    script considers "valid."
    
    TODO: We should have a known_contributors file that stores
    known mappings so we don't have to go through all of this
    translation every time. This is also valuable because some
    contributors simply cannot be automatically translated.
    
    Conflicts:
    	.gitignore
    Andrew Or committed Dec 4, 2014
    Configuration menu
    Copy the full SHA
    6c53225 View commit details
    Browse the repository at this point in the history
  2. [SPARK-4253] Ignore spark.driver.host in yarn-cluster and standalone-…

    …cluster modes
    
    In yarn-cluster and standalone-cluster modes, we don't know where driver will run until it is launched.  If the `spark.driver.host` property is set on the submitting machine and propagated to the driver through SparkConf then this will lead to errors when the driver launches.
    
    This patch fixes this issue by dropping the `spark.driver.host` property in SparkSubmit when running in a cluster deploy mode.
    
    Author: WangTaoTheTonic <barneystinson@aliyun.com>
    Author: WangTao <barneystinson@aliyun.com>
    
    Closes apache#3112 from WangTaoTheTonic/SPARK4253 and squashes the following commits:
    
    ed1a25c [WangTaoTheTonic] revert unrelated formatting issue
    02c4e49 [WangTao] add comment
    32a3f3f [WangTaoTheTonic] ingore it in SparkSubmit instead of SparkContext
    667cf24 [WangTaoTheTonic] document fix
    ff8d5f7 [WangTaoTheTonic] also ignore it in standalone cluster mode
    2286e6b [WangTao] ignore spark.driver.host in yarn-cluster mode
    
    (cherry picked from commit 8106b1e)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    
    Conflicts:
    	core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
    WangTaoTheTonic authored and JoshRosen committed Dec 4, 2014
    Configuration menu
    Copy the full SHA
    5ac55c8 View commit details
    Browse the repository at this point in the history
  3. [SPARK-4745] Fix get_existing_cluster() function with multiple securi…

    …ty groups
    
    The current get_existing_cluster() function would only find an instance belonged to a cluster if the instance's security groups == cluster_name + "-master" (or "-slaves"). This fix allows for multiple security groups by checking if the cluster_name + "-master" security group is in the list of groups for a particular instance.
    
    Author: alexdebrie <alexdebrie1@gmail.com>
    
    Closes apache#3596 from alexdebrie/master and squashes the following commits:
    
    9d51232 [alexdebrie] Fix get_existing_cluster() function with multiple security groups
    
    (cherry picked from commit 794f3ae)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    alexdebrie authored and JoshRosen committed Dec 4, 2014
    Configuration menu
    Copy the full SHA
    d01fdd3 View commit details
    Browse the repository at this point in the history
  4. [SPARK-4459] Change groupBy type parameter from K to U

    Please see https://issues.apache.org/jira/browse/SPARK-4459
    
    Author: Saldanha <saldaal1@phusca-l24858.wlan.na.novartis.net>
    
    Closes apache#3327 from alokito/master and squashes the following commits:
    
    54b1095 [Saldanha] [SPARK-4459] changed type parameter for keyBy from K to U
    d5f73c3 [Saldanha] [SPARK-4459] added keyBy test
    316ad77 [Saldanha] SPARK-4459 changed type parameter for groupBy from K to U.
    62ddd4b [Saldanha] SPARK-4459 added failing unit test
    (cherry picked from commit 743a889)
    
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    Saldanha authored and JoshRosen committed Dec 4, 2014
    Configuration menu
    Copy the full SHA
    e98aa54 View commit details
    Browse the repository at this point in the history
  5. [SPARK-4652][DOCS] Add docs about spark-git-repo option

    There might be some cases when WIPS spark version need to be run
    on EC2 cluster. In order to setup this type of cluster more easily,
    add --spark-git-repo option description to ec2 documentation.
    
    Author: lewuathe <lewuathe@me.com>
    Author: Josh Rosen <joshrosen@databricks.com>
    
    Closes apache#3513 from Lewuathe/doc-for-development-spark-cluster and squashes the following commits:
    
    6dae8ee [lewuathe] Wrap consistent with other descriptions
    cfaf9be [lewuathe] Add docs about spark-git-repo option
    
    (Editing / cleanup by Josh Rosen)
    
    (cherry picked from commit ab8177d)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    Lewuathe authored and JoshRosen committed Dec 4, 2014
    Configuration menu
    Copy the full SHA
    bf637e0 View commit details
    Browse the repository at this point in the history

Commits on Dec 5, 2014

  1. [SPARK-4421] Wrong link in spark-standalone.html

    Modified the link of building Spark. (backport version of apache#3279.)
    
    Author: Masayoshi TSUZUKI <tsudukim@oss.nttdata.co.jp>
    
    Closes apache#3280 from tsudukim/feature/SPARK-4421-2 and squashes the following commits:
    
    3b4d38d [Masayoshi TSUZUKI] [SPARK-4421] Wrong link in spark-standalone.html
    tsudukim authored and JoshRosen committed Dec 5, 2014
    Configuration menu
    Copy the full SHA
    b09382a View commit details
    Browse the repository at this point in the history
  2. Fix typo in Spark SQL docs.

    Author: Andy Konwinski <andykonwinski@gmail.com>
    
    Closes apache#3611 from andyk/patch-3 and squashes the following commits:
    
    7bab333 [Andy Konwinski] Fix typo in Spark SQL docs.
    
    (cherry picked from commit 15cf3b0)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    andyk authored and JoshRosen committed Dec 5, 2014
    Configuration menu
    Copy the full SHA
    8ee2d18 View commit details
    Browse the repository at this point in the history
  3. Merge branch 'branch-1.1' of github.com:apache/spark into csd-1.1

    Conflicts:
    	assembly/pom.xml
    	bagel/pom.xml
    	core/pom.xml
    	docs/_config.yml
    	examples/pom.xml
    	external/flume-sink/pom.xml
    	external/flume/pom.xml
    	external/kafka/pom.xml
    	external/mqtt/pom.xml
    	external/twitter/pom.xml
    	external/zeromq/pom.xml
    	extras/kinesis-asl/pom.xml
    	extras/spark-ganglia-lgpl/pom.xml
    	graphx/pom.xml
    	mllib/pom.xml
    	pom.xml
    	repl/pom.xml
    	sql/catalyst/pom.xml
    	sql/core/pom.xml
    	sql/hive-thriftserver/pom.xml
    	sql/hive/pom.xml
    	streaming/pom.xml
    	tools/pom.xml
    	yarn/alpha/pom.xml
    	yarn/pom.xml
    	yarn/stable/pom.xml
    markhamstra committed Dec 5, 2014
    Configuration menu
    Copy the full SHA
    a290486 View commit details
    Browse the repository at this point in the history

Commits on Dec 8, 2014

  1. [SPARK-4764] Ensure that files are fetched atomically

    tempFile is created in the same directory than targetFile, so that the
    move from tempFile to targetFile is always atomic
    
    Author: Christophe Préaud <christophe.preaud@kelkoo.com>
    
    Closes apache#2855 from preaudc/master and squashes the following commits:
    
    9ba89ca [Christophe Préaud] Ensure that files are fetched atomically
    54419ae [Christophe Préaud] Merge remote-tracking branch 'upstream/master'
    c6a5590 [Christophe Préaud] Revert commit 8ea871f
    7456a33 [Christophe Préaud] Merge remote-tracking branch 'upstream/master'
    8ea871f [Christophe Préaud] Ensure that files are fetched atomically
    
    (cherry picked from commit ab2abcb)
    Signed-off-by: Josh Rosen <rosenville@gmail.com>
    
    Conflicts:
    	core/src/main/scala/org/apache/spark/util/Utils.scala
    Christophe Préaud authored and JoshRosen committed Dec 8, 2014
    Configuration menu
    Copy the full SHA
    16bc77b View commit details
    Browse the repository at this point in the history

Commits on Dec 9, 2014

  1. SPARK-3926 [CORE] Reopened: result of JavaRDD collectAsMap() is not s…

    …erializable
    
    My original 'fix' didn't fix at all. Now, there's a unit test to check whether it works. Of the two options to really fix it -- copy the `Map` to a `java.util.HashMap`, or copy and modify Scala's implementation in `Wrappers.MapWrapper`, I went with the latter.
    
    Author: Sean Owen <sowen@cloudera.com>
    
    Closes apache#3587 from srowen/SPARK-3926 and squashes the following commits:
    
    8586bb9 [Sean Owen] Remove unneeded no-arg constructor, and add additional note about copied code in LICENSE
    7bb0e66 [Sean Owen] Make SerializableMapWrapper actually serialize, and add unit test
    
    (cherry picked from commit e829bfa)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    srowen authored and JoshRosen committed Dec 9, 2014
    Configuration menu
    Copy the full SHA
    fe7d7a9 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    7bf3aa3 View commit details
    Browse the repository at this point in the history
  3. [SPARK-4714] BlockManager.dropFromMemory() should check whether block…

    … has been removed after synchronizing on BlockInfo instance.
    
    After synchronizing on the `info` lock in the `removeBlock`/`dropOldBlocks`/`dropFromMemory` methods in BlockManager, the block that `info` represented may have already removed.
    
    The three methods have the same logic to get the `info` lock:
    ```
       info = blockInfo.get(id)
       if (info != null) {
         info.synchronized {
           // do something
         }
       }
    ```
    
    So, there is chance that when a thread enters the `info.synchronized` block, `info` has already been removed from the `blockInfo` map by some other thread who entered `info.synchronized` first.
    
    The `removeBlock` and `dropOldBlocks` methods are idempotent, so it's safe for them to run on blocks that have already been removed.
    But in `dropFromMemory` it may be problematic since it may drop block data which already removed into the diskstore, and this calls data store operations that are not designed to handle missing blocks.
    
    This patch fixes this issue by adding a check to `dropFromMemory` to test whether blocks have been removed by a racing thread.
    
    Author: hushan[胡珊] <hushan@xiaomi.com>
    
    Closes apache#3574 from suyanNone/refine-block-concurrency and squashes the following commits:
    
    edb989d [hushan[胡珊]] Refine code style and comments position
    55fa4ba [hushan[胡珊]] refine code
    e57e270 [hushan[胡珊]] add check info is already remove or not while having gotten info.syn
    
    (cherry picked from commit 30dca92)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    suyanNone authored and JoshRosen committed Dec 9, 2014
    Configuration menu
    Copy the full SHA
    9b99237 View commit details
    Browse the repository at this point in the history

Commits on Dec 10, 2014

  1. [SPARK-4772] Clear local copies of accumulators as soon as we're done…

    … with them
    
    Accumulators keep thread-local copies of themselves.  These copies were only cleared at the beginning of a task.  This meant that (a) the memory they used was tied up until the next task ran on that thread, and (b) if a thread died, the memory it had used for accumulators was locked up forever on that worker.
    
    This PR clears the thread-local copies of accumulators at the end of each task, in the tasks finally block, to make sure they are cleaned up between tasks.  It also stores them in a ThreadLocal object, so that if, for some reason, the thread dies, any memory they are using at the time should be freed up.
    
    Author: Nathan Kronenfeld <nkronenfeld@oculusinfo.com>
    
    Closes apache#3570 from nkronenfeld/Accumulator-Improvements and squashes the following commits:
    
    a581f3f [Nathan Kronenfeld] Change Accumulators to private[spark] instead of adding mima exclude to get around false positive in mima tests
    b6c2180 [Nathan Kronenfeld] Include MiMa exclude as per build error instructions - this version incompatibility should be irrelevent, as it will only surface if a master is talking to a worker running a different version of spark.
    537baad [Nathan Kronenfeld] Fuller refactoring as intended, incorporating JR's suggestions for ThreadLocal localAccums, and keeping clear(), but also calling it in tasks' finally block, rather than just at the beginning of the task.
    39a82f2 [Nathan Kronenfeld] Clear local copies of accumulators as soon as we're done with them
    
    (cherry picked from commit 94b377f)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    
    Conflicts:
    	core/src/main/scala/org/apache/spark/Accumulators.scala
    	core/src/main/scala/org/apache/spark/executor/Executor.scala
    Nathan Kronenfeld authored and JoshRosen committed Dec 10, 2014
    Configuration menu
    Copy the full SHA
    6dcafa7 View commit details
    Browse the repository at this point in the history
  2. [SPARK-4771][Docs] Document standalone cluster supervise mode

    tdas looks like streaming already refers to the supervise mode. The link from there is broken though.
    
    Author: Andrew Or <andrew@databricks.com>
    
    Closes apache#3627 from andrewor14/document-supervise and squashes the following commits:
    
    9ca0908 [Andrew Or] Wording changes
    2b55ed2 [Andrew Or] Document standalone cluster supervise mode
    Andrew Or committed Dec 10, 2014
    Configuration menu
    Copy the full SHA
    273f2c8 View commit details
    Browse the repository at this point in the history
  3. [SPARK-4759] Fix driver hanging from coalescing partitions

    The driver hangs sometimes when we coalesce RDD partitions. See JIRA for more details and reproduction.
    
    This is because our use of empty string as default preferred location in `CoalescedRDDPartition` causes the `TaskSetManager` to schedule the corresponding task on host `""` (empty string). The intended semantics here, however, is that the partition does not have a preferred location, and the TSM should schedule the corresponding task accordingly.
    
    Author: Andrew Or <andrew@databricks.com>
    
    Closes apache#3633 from andrewor14/coalesce-preferred-loc and squashes the following commits:
    
    e520d6b [Andrew Or] Oops
    3ebf8bd [Andrew Or] A few comments
    f370a4e [Andrew Or] Fix tests
    2f7dfb6 [Andrew Or] Avoid using empty string as default preferred location
    
    (cherry picked from commit 4f93d0c)
    Signed-off-by: Andrew Or <andrew@databricks.com>
    Andrew Or committed Dec 10, 2014
    Configuration menu
    Copy the full SHA
    396de67 View commit details
    Browse the repository at this point in the history

Commits on Dec 14, 2014

  1. fixed spelling errors in documentation

    changed "form" to "from" in 3 documentation entries for Kafka integration
    
    Author: Peter Klipfel <peter@klipfel.me>
    
    Closes apache#3691 from peterklipfel/master and squashes the following commits:
    
    0fe7fc5 [Peter Klipfel] fixed spelling errors in documentation
    
    (cherry picked from commit 2a2983f)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    peterklipfel authored and JoshRosen committed Dec 14, 2014
    Configuration menu
    Copy the full SHA
    0faea17 View commit details
    Browse the repository at this point in the history

Commits on Dec 16, 2014

  1. SPARK-785 [CORE] ClosureCleaner not invoked on most PairRDDFunctions

    This looked like perhaps a simple and important one. `combineByKey` looks like it should clean its arguments' closures, and that in turn covers apparently all remaining functions in `PairRDDFunctions` which delegate to it.
    
    Author: Sean Owen <sowen@cloudera.com>
    
    Closes apache#3690 from srowen/SPARK-785 and squashes the following commits:
    
    8df68fe [Sean Owen] Clean context of most remaining functions in PairRDDFunctions, which ultimately call combineByKey
    
    (cherry picked from commit 2a28bc6)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    srowen authored and JoshRosen committed Dec 16, 2014
    Configuration menu
    Copy the full SHA
    fa3b3e3 View commit details
    Browse the repository at this point in the history
  2. SPARK-4814 [CORE] Enable assertions in SBT, Maven tests / AssertionEr…

    …ror from Hive's LazyBinaryInteger
    
    This enables assertions for the Maven and SBT build, but overrides the Hive module to not enable assertions.
    
    Author: Sean Owen <sowen@cloudera.com>
    
    Closes apache#3692 from srowen/SPARK-4814 and squashes the following commits:
    
    caca704 [Sean Owen] Disable assertions just for Hive
    f71e783 [Sean Owen] Enable assertions for SBT and Maven build
    
    (cherry picked from commit 81112e4)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    
    Conflicts:
    	pom.xml
    srowen authored and JoshRosen committed Dec 16, 2014
    Configuration menu
    Copy the full SHA
    892685b View commit details
    Browse the repository at this point in the history

Commits on Dec 17, 2014

  1. [Release] Major improvements to generate contributors script

    This commit introduces several major improvements to the script
    that generates the contributors list for release notes, notably:
    
    (1) Use release tags instead of a range of commits. Across branches,
    commits are not actually strictly two-dimensional, and so it is not
    sufficient to specify a start hash and an end hash. Otherwise, we
    end up counting commits that were already merged in an older branch.
    
    (2) Match PR numbers in addition to commit hashes. This is related
    to the first point in that if a PR is already merged in an older
    minor release tag, it should be filtered out here. This requires us
    to do some intelligent regex parsing on the commit description in
    addition to just relying on the GitHub API.
    
    (3) Relax author validity check. The old code fails on a name that
    has many middle names, for instance. The test was just too strict.
    
    (4) Use GitHub authentication. This allows us to make far more
    requests through the GitHub API than before (5000 as opposed to 60
    per hour).
    
    (5) Translate from Github username, not commit author name. This is
    important because the commit author name is not always configured
    correctly by the user. For instance, the username "falaki" used to
    resolve to just "Hossein", which was treated as a github username
    and translated to something else that is completely arbitrary.
    
    (6) Add an option to use the untranslated name. If there is not
    a satisfactory candidate to replace the untranslated name with,
    at least allow the user to not translate it.
    Andrew Or committed Dec 17, 2014
    Configuration menu
    Copy the full SHA
    581f866 View commit details
    Browse the repository at this point in the history
  2. [Release] Cache known author translations locally

    This bypasses unnecessary calls to the Github and JIRA API.
    Additionally, having a local cache allows us to remember names
    that we had to manually discover ourselves.
    Andrew Or committed Dec 17, 2014
    Configuration menu
    Copy the full SHA
    991748d View commit details
    Browse the repository at this point in the history
  3. [Release] Update contributors list format and sort it

    Additionally, we now warn the user when a duplicate author name
    arises, in which case he/she needs to resolve it manually.
    
    Conflicts:
    	.rat-excludes
    Andrew Or committed Dec 17, 2014
    Configuration menu
    Copy the full SHA
    0efd691 View commit details
    Browse the repository at this point in the history
  4. [HOTFIX] Fix RAT exclusion for known_translations file

    Author: Josh Rosen <joshrosen@databricks.com>
    
    Closes apache#3719 from JoshRosen/rat-fix and squashes the following commits:
    
    1542886 [Josh Rosen] [HOTFIX] Fix RAT exclusion for known_translations file
    
    (cherry picked from commit 3d0c37b)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    JoshRosen committed Dec 17, 2014
    Configuration menu
    Copy the full SHA
    c15e7f2 View commit details
    Browse the repository at this point in the history

Commits on Dec 18, 2014

  1. Configuration menu
    Copy the full SHA
    bed4807 View commit details
    Browse the repository at this point in the history

Commits on Dec 19, 2014

  1. [SPARK-4884]: Improve Partition docs

    Rewording was based on this discussion: http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-data-flow-td9804.html
    This is the associated JIRA ticket: https://issues.apache.org/jira/browse/SPARK-4884
    
    Author: Madhu Siddalingaiah <madhu@madhu.com>
    
    Closes apache#3722 from msiddalingaiah/master and squashes the following commits:
    
    79e679f [Madhu Siddalingaiah] [DOC]: improve documentation
    51d14b9 [Madhu Siddalingaiah] Merge remote-tracking branch 'upstream/master'
    38faca4 [Madhu Siddalingaiah] Merge remote-tracking branch 'upstream/master'
    cbccbfe [Madhu Siddalingaiah] Documentation: replace <b> with <code> (again)
    332f7a2 [Madhu Siddalingaiah] Documentation: replace <b> with <code>
    cd2b05a [Madhu Siddalingaiah] Merge remote-tracking branch 'upstream/master'
    0fc12d7 [Madhu Siddalingaiah] Documentation: add description for repartitionAndSortWithinPartitions
    
    (cherry picked from commit d5a596d)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    msiddalingaiah authored and JoshRosen committed Dec 19, 2014
    Configuration menu
    Copy the full SHA
    f4e6ffc View commit details
    Browse the repository at this point in the history
  2. SPARK-3428. TaskMetrics for running tasks is missing GC time metrics

    Author: Sandy Ryza <sandy@cloudera.com>
    
    Closes apache#3684 from sryza/sandy-spark-3428 and squashes the following commits:
    
    cb827fe [Sandy Ryza] SPARK-3428. TaskMetrics for running tasks is missing GC time metrics
    
    (cherry picked from commit 283263f)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    sryza authored and JoshRosen committed Dec 19, 2014
    Configuration menu
    Copy the full SHA
    2d66463 View commit details
    Browse the repository at this point in the history
  3. [SPARK-4896] don’t redundantly overwrite executor JAR deps

    Author: Ryan Williams <ryan.blake.williams@gmail.com>
    
    Closes apache#2848 from ryan-williams/fetch-file and squashes the following commits:
    
    c14daff [Ryan Williams] Fix copy that was changed to a move inadvertently
    8e39c16 [Ryan Williams] code review feedback
    788ed41 [Ryan Williams] don’t redundantly overwrite executor JAR deps
    
    (cherry picked from commit 7981f96)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    
    Conflicts:
    	core/src/main/scala/org/apache/spark/util/Utils.scala
    ryan-williams authored and JoshRosen committed Dec 19, 2014
    Configuration menu
    Copy the full SHA
    546a239 View commit details
    Browse the repository at this point in the history

Commits on Dec 20, 2014

  1. SPARK-2641: Passing num executors to spark arguments from properties …

    …file
    
    Since we can set spark executor memory and executor cores using property file, we must also be allowed to set the executor instances.
    
    Author: Kanwaljit Singh <kanwaljit.singh@guavus.com>
    
    Closes apache#1657 from kjsingh/branch-1.0 and squashes the following commits:
    
    d8a5a12 [Kanwaljit Singh] SPARK-2641: Fixing how spark arguments are loaded from properties file for num executors
    
    Conflicts:
    	core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala
    Kanwaljit Singh authored and Andrew Or committed Dec 20, 2014
    Configuration menu
    Copy the full SHA
    3597c2e View commit details
    Browse the repository at this point in the history
  2. [Minor] Build Failed: value defaultProperties not found

    Mvn Build Failed: value defaultProperties not found .Maybe related to this pr:
    apache@1d64812
    andrewor14 can you look at this problem?
    
    Author: huangzhaowei <carlmartinmax@gmail.com>
    
    Closes apache#3749 from SaintBacchus/Mvn-Build-Fail and squashes the following commits:
    
    8e2917c [huangzhaowei] Build Failed: value defaultProperties not found
    
    (cherry picked from commit a764960)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    SaintBacchus authored and JoshRosen committed Dec 20, 2014
    Configuration menu
    Copy the full SHA
    e5f2752 View commit details
    Browse the repository at this point in the history

Commits on Dec 22, 2014

  1. [SPARK-4818][Core] Add 'iterator' to reduce memory consumed by join

    In Scala, `map` and `flatMap` of `Iterable` will copy the contents of `Iterable` to a new `Seq`. Such as,
    ```Scala
      val iterable = Seq(1, 2, 3).map(v => {
        println(v)
        v
      })
      println("Iterable map done")
    
      val iterator = Seq(1, 2, 3).iterator.map(v => {
        println(v)
        v
      })
      println("Iterator map done")
    ```
    outputed
    ```
    1
    2
    3
    Iterable map done
    Iterator map done
    ```
    So we should use 'iterator' to reduce memory consumed by join.
    
    Found by Johannes Simon in http://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3C5BE70814-9D03-4F61-AE2C-0D63F2DE4446%40mail.de%3E
    
    Author: zsxwing <zsxwing@gmail.com>
    
    Closes apache#3671 from zsxwing/SPARK-4824 and squashes the following commits:
    
    48ee7b9 [zsxwing] Remove the explicit types
    95d59d6 [zsxwing] Add 'iterator' to reduce memory consumed by join
    
    (cherry picked from commit c233ab3)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    
    Conflicts:
    	core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala
    zsxwing authored and JoshRosen committed Dec 22, 2014
    Configuration menu
    Copy the full SHA
    3bce43f View commit details
    Browse the repository at this point in the history

Commits on Dec 23, 2014

  1. [SPARK-4802] [streaming] Remove receiverInfo once receiver is de-regi…

    …stered
    
      Once the streaming receiver is de-registered at executor, the `ReceiverTrackerActor` needs to
    remove the corresponding reveiverInfo from the `receiverInfo` map at `ReceiverTracker`.
    
    Author: Ilayaperumal Gopinathan <igopinathan@pivotal.io>
    
    Closes apache#3647 from ilayaperumalg/receiverInfo-RTracker and squashes the following commits:
    
    6eb97d5 [Ilayaperumal Gopinathan] Polishing based on the review
    3640c86 [Ilayaperumal Gopinathan] Remove receiverInfo once receiver is de-registered
    
    (cherry picked from commit 10d69e9)
    Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
    
    Conflicts:
    	streaming/src/main/scala/org/apache/spark/streaming/scheduler/ReceiverTracker.scala
    ilayaperumalg authored and tdas committed Dec 23, 2014
    Configuration menu
    Copy the full SHA
    b1de461 View commit details
    Browse the repository at this point in the history

Commits on Dec 24, 2014

  1. [SPARK-4606] Send EOF to child JVM when there's no more data to read.

    Author: Marcelo Vanzin <vanzin@cloudera.com>
    
    Closes apache#3460 from vanzin/SPARK-4606 and squashes the following commits:
    
    031207d [Marcelo Vanzin] [SPARK-4606] Send EOF to child JVM when there's no more data to read.
    
    (cherry picked from commit 7e2deb7)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    Marcelo Vanzin authored and JoshRosen committed Dec 24, 2014
    Configuration menu
    Copy the full SHA
    dd0287c View commit details
    Browse the repository at this point in the history

Commits on Dec 26, 2014

  1. [SPARK-4537][Streaming] Expand StreamingSource to add more metrics

    Add `processingDelay`, `schedulingDelay` and `totalDelay` for the last completed batch. Add `lastReceivedBatchRecords` and `totalReceivedBatchRecords` to the received records counting.
    
    Author: jerryshao <saisai.shao@intel.com>
    
    Closes apache#3466 from jerryshao/SPARK-4537 and squashes the following commits:
    
    00f5f7f [jerryshao] Change the code style and add totalProcessedRecords
    44721a6 [jerryshao] Further address the comments
    c097ddc [jerryshao] Address the comments
    02dd44f [jerryshao] Fix the addressed comments
    c7a9376 [jerryshao] Expand StreamingSource to add more metrics
    
    (cherry picked from commit f205fe4)
    Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
    jerryshao authored and tdas committed Dec 26, 2014
    Configuration menu
    Copy the full SHA
    d21347d View commit details
    Browse the repository at this point in the history

Commits on Dec 27, 2014

  1. [SPARK-4952][Core]Handle ConcurrentModificationExceptions in SparkEnv…

    ….environmentDetails
    
    Author: GuoQiang Li <witgo@qq.com>
    
    Closes apache#3788 from witgo/SPARK-4952 and squashes the following commits:
    
    d903529 [GuoQiang Li] Handle ConcurrentModificationExceptions in SparkEnv.environmentDetails
    
    (cherry picked from commit 080ceb7)
    Signed-off-by: Patrick Wendell <pwendell@gmail.com>
    witgo authored and pwendell committed Dec 27, 2014
    Configuration menu
    Copy the full SHA
    3442b7b View commit details
    Browse the repository at this point in the history

Commits on Dec 30, 2014

  1. [HOTFIX] Add SPARK_VERSION to Spark package object.

    This helps to avoid build breaks when backporting patches that use
    org.apache.spark.SPARK_VERSION.
    JoshRosen committed Dec 30, 2014
    Configuration menu
    Copy the full SHA
    d5e0a45 View commit details
    Browse the repository at this point in the history
  2. [SPARK-4882] Register PythonBroadcast with Kryo so that PySpark works…

    … with KryoSerializer
    
    This PR fixes an issue where PySpark broadcast variables caused NullPointerExceptions if KryoSerializer was used.  The fix is to register PythonBroadcast with Kryo so that it's deserialized with a KryoJavaSerializer.
    
    Author: Josh Rosen <joshrosen@databricks.com>
    
    Closes apache#3831 from JoshRosen/SPARK-4882 and squashes the following commits:
    
    0466c7a [Josh Rosen] Register PythonBroadcast with Kryo.
    d5b409f [Josh Rosen] Enable registrationRequired, which would have caught this bug.
    069d8a7 [Josh Rosen] Add failing test for SPARK-4882
    
    (cherry picked from commit efa80a5)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    JoshRosen committed Dec 30, 2014
    Configuration menu
    Copy the full SHA
    822a0b4 View commit details
    Browse the repository at this point in the history
  3. Revert "[SPARK-4882] Register PythonBroadcast with Kryo so that PySpa…

    …rk works with KryoSerializer"
    
    This reverts commit 822a0b4.
    
    This fix does not apply to branch-1.1 or branch-1.0, since PythonBroadcast
    is new in 1.2.
    JoshRosen committed Dec 30, 2014
    Configuration menu
    Copy the full SHA
    d6b8d2c View commit details
    Browse the repository at this point in the history
  4. [SPARK-4813][Streaming] Fix the issue that ContextWaiter didn't handl…

    …e 'spurious wakeup'
    
    Used `Condition` to rewrite `ContextWaiter` because it provides a convenient API `awaitNanos` for timeout.
    
    Author: zsxwing <zsxwing@gmail.com>
    
    Closes apache#3661 from zsxwing/SPARK-4813 and squashes the following commits:
    
    52247f5 [zsxwing] Add explicit unit type
    be42bcf [zsxwing] Update as per review suggestion
    e06bd4f [zsxwing] Fix the issue that ContextWaiter didn't handle 'spurious wakeup'
    
    (cherry picked from commit 6a89782)
    Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
    zsxwing authored and tdas committed Dec 30, 2014
    Configuration menu
    Copy the full SHA
    eac740e View commit details
    Browse the repository at this point in the history

Commits on Dec 31, 2014

  1. [SPARK-1010] Clean up uses of System.setProperty in unit tests

    Several of our tests call System.setProperty (or test code which implicitly sets system properties) and don't always reset/clear the modified properties, which can create ordering dependencies between tests and cause hard-to-diagnose failures.
    
    This patch removes most uses of System.setProperty from our tests, since in most cases we can use SparkConf to set these configurations (there are a few exceptions, including the tests of SparkConf itself).
    
    For the cases where we continue to use System.setProperty, this patch introduces a `ResetSystemProperties` ScalaTest mixin class which snapshots the system properties before individual tests and to automatically restores them on test completion / failure.  See the block comment at the top of the ResetSystemProperties class for more details.
    
    Author: Josh Rosen <joshrosen@databricks.com>
    
    Closes apache#3739 from JoshRosen/cleanup-system-properties-in-tests and squashes the following commits:
    
    0236d66 [Josh Rosen] Replace setProperty uses in two example programs / tools
    3888fe3 [Josh Rosen] Remove setProperty use in LocalJavaStreamingContext
    4f4031d [Josh Rosen] Add note on why SparkSubmitSuite needs ResetSystemProperties
    4742a5b [Josh Rosen] Clarify ResetSystemProperties trait inheritance ordering.
    0eaf0b6 [Josh Rosen] Remove setProperty call in TaskResultGetterSuite.
    7a3d224 [Josh Rosen] Fix trait ordering
    3fdb554 [Josh Rosen] Remove setProperty call in TaskSchedulerImplSuite
    bee20df [Josh Rosen] Remove setProperty calls in SparkContextSchedulerCreationSuite
    655587c [Josh Rosen] Remove setProperty calls in JobCancellationSuite
    3f2f955 [Josh Rosen] Remove System.setProperty calls in DistributedSuite
    cfe9cce [Josh Rosen] Remove use of system properties in SparkContextSuite
    8783ab0 [Josh Rosen] Remove TestUtils.setSystemProperty, since it is subsumed by the ResetSystemProperties trait.
    633a84a [Josh Rosen] Remove use of system properties in FileServerSuite
    25bfce2 [Josh Rosen] Use ResetSystemProperties in UtilsSuite
    1d1aa5a [Josh Rosen] Use ResetSystemProperties in SizeEstimatorSuite
    dd9492b [Josh Rosen] Use ResetSystemProperties in AkkaUtilsSuite
    b0daff2 [Josh Rosen] Use ResetSystemProperties in BlockManagerSuite
    e9ded62 [Josh Rosen] Use ResetSystemProperties in TaskSchedulerImplSuite
    5b3cb54 [Josh Rosen] Use ResetSystemProperties in SparkListenerSuite
    0995c4b [Josh Rosen] Use ResetSystemProperties in SparkContextSchedulerCreationSuite
    c83ded8 [Josh Rosen] Use ResetSystemProperties in SparkConfSuite
    51aa870 [Josh Rosen] Use withSystemProperty in ShuffleSuite
    60a63a1 [Josh Rosen] Use ResetSystemProperties in JobCancellationSuite
    14a92e4 [Josh Rosen] Use withSystemProperty in FileServerSuite
    628f46c [Josh Rosen] Use ResetSystemProperties in DistributedSuite
    9e3e0dd [Josh Rosen] Add ResetSystemProperties test fixture mixin; use it in SparkSubmitSuite.
    4dcea38 [Josh Rosen] Move withSystemProperty to TestUtils class.
    
    (cherry picked from commit 352ed6b)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    
    Conflicts:
    	core/src/test/scala/org/apache/spark/ShuffleSuite.scala
    	core/src/test/scala/org/apache/spark/SparkConfSuite.scala
    	core/src/test/scala/org/apache/spark/SparkContextSchedulerCreationSuite.scala
    	core/src/test/scala/org/apache/spark/SparkContextSuite.scala
    	core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala
    	core/src/test/scala/org/apache/spark/util/UtilsSuite.scala
    	external/flume/src/test/java/org/apache/spark/streaming/LocalJavaStreamingContext.java
    	external/mqtt/src/test/java/org/apache/spark/streaming/LocalJavaStreamingContext.java
    	external/twitter/src/test/java/org/apache/spark/streaming/LocalJavaStreamingContext.java
    	external/zeromq/src/test/java/org/apache/spark/streaming/LocalJavaStreamingContext.java
    	tools/src/main/scala/org/apache/spark/tools/StoragePerfTester.scala
    JoshRosen committed Dec 31, 2014
    Configuration menu
    Copy the full SHA
    babcafa View commit details
    Browse the repository at this point in the history
  2. [SPARK-4298][Core] - The spark-submit cannot read Main-Class from Man…

    …ifest.
    
    Resolves a bug where the `Main-Class` from a .jar file wasn't being read in properly. This was caused by the fact that the `primaryResource` object was a URI and needed to be normalized through a call to `.getPath` before it could be passed into the `JarFile` object.
    
    Author: Brennon York <brennon.york@capitalone.com>
    
    Closes apache#3561 from brennonyork/SPARK-4298 and squashes the following commits:
    
    5e0fce1 [Brennon York] Use string interpolation for error messages, moved comment line from original code to above its necessary code segment
    14daa20 [Brennon York] pushed mainClass assignment into match statement, removed spurious spaces, removed { } from case statements, removed return values
    c6dad68 [Brennon York] Set case statement to support multiple jar URI's and enabled the 'file' URI to load the main-class
    8d20936 [Brennon York] updated to reset the error message back to the default
    a043039 [Brennon York] updated to split the uri and jar vals
    8da7cbf [Brennon York] fixes SPARK-4298
    
    (cherry picked from commit 8e14c5e)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    
    Conflicts:
    	core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala
    Brennon York authored and JoshRosen committed Dec 31, 2014
    Configuration menu
    Copy the full SHA
    08d4f70 View commit details
    Browse the repository at this point in the history
  3. [HOTFIX] Disable Spark UI in SparkSubmitSuite tests

    This should fix a major cause of build breaks when running many parallel tests.
    JoshRosen committed Dec 31, 2014
    Configuration menu
    Copy the full SHA
    1034707 View commit details
    Browse the repository at this point in the history

Commits on Jan 1, 2015

  1. [SPARK-5035] [Streaming] ReceiverMessage trait should extend Serializ…

    …able
    
    Spark Streaming's ReceiverMessage trait should extend Serializable in order to fix a subtle bug that only occurs when running on a real cluster:
    
    If you attempt to send a fire-and-forget message to a remote Akka actor and that message cannot be serialized, then this seems to lead to more-or-less silent failures. As an optimization, Akka skips message serialization for messages sent within the same JVM. As a result, Spark's unit tests will never fail due to non-serializable Akka messages, but these will cause mostly-silent failures when running on a real cluster.
    
    Before this patch, here was the code for ReceiverMessage:
    
    ```
    /** Messages sent to the NetworkReceiver. */
    private[streaming] sealed trait ReceiverMessage
    private[streaming] object StopReceiver extends ReceiverMessage
    ```
    
    Since ReceiverMessage does not extend Serializable and StopReceiver is a regular `object`, not a `case object`, StopReceiver will throw serialization errors. As a result, graceful receiver shutdown is broken on real clusters (and local-cluster mode) but works in local modes. If you want to reproduce this, try running the word count example from the Streaming Programming Guide in the Spark shell:
    
    ```
    import org.apache.spark._
    import org.apache.spark.streaming._
    import org.apache.spark.streaming.StreamingContext._
    val ssc = new StreamingContext(sc, Seconds(10))
    // Create a DStream that will connect to hostname:port, like localhost:9999
    val lines = ssc.socketTextStream("localhost", 9999)
    // Split each line into words
    val words = lines.flatMap(_.split(" "))
    import org.apache.spark.streaming.StreamingContext._
    // Count each word in each batch
    val pairs = words.map(word => (word, 1))
    val wordCounts = pairs.reduceByKey(_ + _)
    // Print the first ten elements of each RDD generated in this DStream to the console
    wordCounts.print()
    ssc.start()
    Thread.sleep(10000)
    ssc.stop(true, true)
    ```
    
    Prior to this patch, this would work correctly in local mode but fail when running against a real cluster (it would report that some receivers were not shut down).
    
    Author: Josh Rosen <joshrosen@databricks.com>
    
    Closes apache#3857 from JoshRosen/SPARK-5035 and squashes the following commits:
    
    71d0eae [Josh Rosen] [SPARK-5035] ReceiverMessage trait should extend Serializable.
    
    (cherry picked from commit fe6efac)
    Signed-off-by: Tathagata Das <tathagata.das1565@gmail.com>
    JoshRosen authored and tdas committed Jan 1, 2015
    Configuration menu
    Copy the full SHA
    61eb9be View commit details
    Browse the repository at this point in the history
  2. [HOTFIX] Bind web UI to ephemeral port in DriverSuite

    The job launched by DriverSuite should bind the web UI to an ephemeral port, since it looks like port contention in this test has caused a large number of Jenkins failures when many builds are started simultaneously.  Our tests already disable the web UI, but this doesn't affect subprocesses launched by our tests.  In this case, I've opted to bind to an ephemeral port instead of disabling the UI because disabling features in this test may mask its ability to catch certain bugs.
    
    See also: e24d3a9
    
    Author: Josh Rosen <joshrosen@databricks.com>
    
    Closes apache#3873 from JoshRosen/driversuite-webui-port and squashes the following commits:
    
    48cd05c [Josh Rosen] [HOTFIX] Bind web UI to ephemeral port in DriverSuite.
    
    (cherry picked from commit 0128398)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    JoshRosen committed Jan 1, 2015
    Configuration menu
    Copy the full SHA
    c532acf View commit details
    Browse the repository at this point in the history

Commits on Jan 4, 2015

  1. [SPARK-4787] Stop SparkContext if a DAGScheduler init error occurs

    Author: Dale <tigerquoll@outlook.com>
    
    Closes apache#3809 from tigerquoll/SPARK-4787 and squashes the following commits:
    
    5661e01 [Dale] [SPARK-4787] Ensure that call to stop() doesn't lose the exception by using a finally block.
    2172578 [Dale] [SPARK-4787] Stop context properly if an exception occurs during DAGScheduler initialization.
    
    (cherry picked from commit 3fddc94)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    tigerquoll authored and JoshRosen committed Jan 4, 2015
    Configuration menu
    Copy the full SHA
    6d1ca23 View commit details
    Browse the repository at this point in the history

Commits on Jan 7, 2015

  1. [SPARK-5132][Core]Correct stage Attempt Id key in stageInfofromJson

    SPARK-5132:
    stageInfoToJson: Stage Attempt Id
    stageInfoFromJson: Attempt Id
    
    Author: hushan[胡珊] <hushan@xiaomi.com>
    
    Closes apache#3932 from suyanNone/json-stage and squashes the following commits:
    
    41419ab [hushan[胡珊]] Correct stage Attempt Id key in stageInfofromJson
    
    (cherry picked from commit d345ebe)
    Signed-off-by: Josh Rosen <joshrosen@databricks.com>
    suyanNone authored and JoshRosen committed Jan 7, 2015
    Configuration menu
    Copy the full SHA
    55325af View commit details
    Browse the repository at this point in the history

Commits on Jan 9, 2015

  1. Configuration menu
    Copy the full SHA
    c07a691 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    677281e View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    470f026 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    99910b1 View commit details
    Browse the repository at this point in the history