Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update #10

Merged
merged 68 commits into from
Nov 11, 2014
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
bcecd73
fixed MLlib Naive-Bayes java example bug
dkobylarz Nov 4, 2014
f90ad5d
[Spark-4060] [MLlib] exposing special rdd functions to the public
Nov 4, 2014
5e73138
[SPARK-2938] Support SASL authentication in NettyBlockTransferService
aarondav Nov 5, 2014
515abb9
[SQL] Add String option for DSL AS
marmbrus Nov 5, 2014
c8abddc
[SPARK-3964] [MLlib] [PySpark] add Hypothesis test Python API
Nov 5, 2014
5f13759
[SPARK-4029][Streaming] Update streaming driver to reliably save and …
tdas Nov 5, 2014
5b3b6f6
[SPARK-4197] [mllib] GradientBoosting API cleanup and examples in Sca…
jkbradley Nov 5, 2014
4c42986
[SPARK-4242] [Core] Add SASL to external shuffle service
aarondav Nov 5, 2014
a46497e
[SPARK-3984] [SPARK-3983] Fix incorrect scheduler delay and display t…
kayousterhout Nov 5, 2014
f37817b
SPARK-4222 [CORE] use readFully in FixedLengthBinaryRecordReader
industrial-sloth Nov 5, 2014
61a5cce
[SPARK-3797] Run external shuffle service in Yarn NM
Nov 5, 2014
868cd4c
SPARK-4040. Update documentation to exemplify use of local (n) value,…
Nov 5, 2014
f7ac8c2
SPARK-3223 runAsSparkUser cannot change HDFS write permission properl…
jongyoul Nov 5, 2014
cb0eae3
[SPARK-4158] Fix for missing resources.
brndnmtthws Nov 6, 2014
c315d13
[SPARK-4254] [mllib] MovieLensALS bug fix
jkbradley Nov 6, 2014
3d2b5bc
[SPARK-4262][SQL] add .schemaRDD to JavaSchemaRDD
mengxr Nov 6, 2014
db45f5a
[SPARK-4137] [EC2] Don't change working dir on user
nchammas Nov 6, 2014
5f27ae1
[SPARK-4255] Fix incorrect table striping
kayousterhout Nov 6, 2014
b41a39e
[SPARK-4186] add binaryFiles and binaryRecords in Python
Nov 6, 2014
23eaf0e
[SPARK-4264] Completion iterator should only invoke callback once
aarondav Nov 6, 2014
d15c6e9
[SPARK-4249][GraphX]fix a problem of EdgePartitionBuilder in Graphx
lianhuiwang Nov 6, 2014
470881b
[HOT FIX] Make distribution fails
Nov 6, 2014
96136f2
[SPARK-3797] Minor addendum to Yarn shuffle service
Nov 7, 2014
6e9ef10
[SPARK-4277] Support external shuffle service on Standalone Worker
aarondav Nov 7, 2014
f165b2b
[SPARK-4188] [Core] Perform network-level retry of shuffle file fetches
aarondav Nov 7, 2014
48a19a6
[SPARK-4236] Cleanup removed applications' files in shuffle service
aarondav Nov 7, 2014
3abdb1b
[SPARK-4204][Core][WebUI] Change Utils.exceptionString to contain the…
zsxwing Nov 7, 2014
d4fa04e
[SPARK-4187] [Core] Switch to binary protocol for external shuffle se…
aarondav Nov 7, 2014
636d7bc
[SQL][DOC][Minor] Spark SQL Hive now support dynamic partitioning
scwf Nov 7, 2014
86e9eaa
[SPARK-4225][SQL] Resorts to SparkContext.version to inspect Spark ve…
liancheng Nov 7, 2014
8154ed7
[SQL] Support ScalaReflection of schema in different universes
marmbrus Nov 7, 2014
68609c5
[SQL] Modify keyword val location according to ordering
Nov 7, 2014
14c54f1
[SPARK-4213][SQL] ParquetFilters - No support for LT, LTE, GT, GTE op…
sarutak Nov 7, 2014
60ab80f
[SPARK-4272] [SQL] Add more unwrapper functions for primitive type in…
chenghao-intel Nov 7, 2014
a6405c5
[SPARK-4270][SQL] Fix Cast from DateType to DecimalType.
ueshin Nov 7, 2014
ac70c97
[SPARK-4203][SQL] Partition directories in random order when insertin…
tbfenet Nov 7, 2014
d6e5552
[SPARK-4292][SQL] Result set iterator bug in JDBC/ODBC
scwf Nov 7, 2014
7c9ec52
Update JavaCustomReceiver.java
xiao321 Nov 7, 2014
5923dd9
MAINTENANCE: Automated closing of pull requests.
pwendell Nov 7, 2014
7779109
[SPARK-4304] [PySpark] Fix sort on empty RDD
Nov 8, 2014
7e9d975
[MLLIB] [PYTHON] SPARK-4221: Expose nonnegative ALS in the python API
Nov 8, 2014
7afc856
[SPARK-4291][Build] Rename network module projects
Nov 8, 2014
4af5c7e
[Minor] [Core] Don't NPE on closeQuietly(null)
aarondav Nov 8, 2014
7b41b17
[SPARK-4301] StreamingContext should not allow start() to be called a…
JoshRosen Nov 9, 2014
8c99a47
SPARK-971 [DOCS] Link to Confluence wiki from project website / docum…
srowen Nov 10, 2014
d136265
SPARK-1344 [DOCS] Scala API docs for top methods
srowen Nov 10, 2014
f73b56f
MAINTENANCE: Automated closing of pull requests.
pwendell Nov 10, 2014
f8e5732
SPARK-1209 [CORE] (Take 2) SparkHadoop{MapRed,MapReduce}Util should n…
srowen Nov 10, 2014
3c2cff4
SPARK-3179. Add task OutputMetrics.
sryza Nov 10, 2014
227488d
MAINTENANCE: Automated closing of pull requests.
pwendell Nov 10, 2014
bd86cb1
[SPARK-2703][Core]Make Tachyon related unit tests execute without dep…
RongGu Nov 10, 2014
894a724
[SQL] support udt to hive types conversion (hive->udt is not supported)
mengxr Nov 10, 2014
ed8bf1e
[SPARK-4169] [Core] Accommodate non-English Locales in unit tests
Nov 10, 2014
3a02d41
SPARK-2548 [STREAMING] JavaRecoverableWordCount is missing
srowen Nov 10, 2014
0340c56
Update RecoverableNetworkWordCount.scala
comcmipi Nov 10, 2014
c5db8e2
[SPARK-4312] bash doesn't have "die"
jey Nov 10, 2014
c6f4e70
SPARK-4230. Doc for spark.default.parallelism is incorrect
sryza Nov 10, 2014
b32734e
SPARK-1297 Upgrade HBase dependency to 0.98
tedyu Nov 10, 2014
974d334
[SPARK-4047] - Generate runtime warnings for example implementation o…
varadharajan Nov 10, 2014
6e7a309
Revert "[SPARK-2703][Core]Make Tachyon related unit tests execute wit…
pwendell Nov 10, 2014
dbf1058
[SPARK-4319][SQL] Enable an ignored test "null count".
ueshin Nov 10, 2014
534b231
[SPARK-4000][Build] Uploads HiveCompatibilitySuite logs
liancheng Nov 11, 2014
acb55ae
[SPARK-4308][SQL] Sets SQL operation state to ERROR when exception is…
liancheng Nov 11, 2014
d793d80
[SQL] remove a decimal case branch that has no effect at runtime
mengxr Nov 11, 2014
fa77783
[SPARK-4250] [SQL] Fix bug of constant null value mapping to Constant…
chenghao-intel Nov 11, 2014
a1fc059
[SPARK-4149][SQL] ISO 8601 support for json date time strings
adrian-wang Nov 11, 2014
ce6ed2a
[SPARK-3954][Streaming] Optimization to FileInputDStream
surongquan Nov 11, 2014
c764d0a
[SPARK-4274] [SQL] Fix NPE in printing the details of the query plan
chenghao-intel Nov 11, 2014
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 20 additions & 1 deletion LICENSE
Original file line number Diff line number Diff line change
Expand Up @@ -754,7 +754,7 @@ SUCH DAMAGE.


========================================================================
For Timsort (core/src/main/java/org/apache/spark/util/collection/Sorter.java):
For Timsort (core/src/main/java/org/apache/spark/util/collection/TimSort.java):
========================================================================
Copyright (C) 2008 The Android Open Source Project

Expand All @@ -771,6 +771,25 @@ See the License for the specific language governing permissions and
limitations under the License.


========================================================================
For LimitedInputStream
(network/common/src/main/java/org/apache/spark/network/util/LimitedInputStream.java):
========================================================================
Copyright (C) 2007 The Guava Authors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.


========================================================================
BSD-style licenses
========================================================================
Expand Down
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,8 @@ and Spark Streaming for stream processing.
## Online Documentation

You can find the latest Spark documentation, including a programming
guide, on the [project web page](http://spark.apache.org/documentation.html).
guide, on the [project web page](http://spark.apache.org/documentation.html)
and [project wiki](https://cwiki.apache.org/confluence/display/SPARK).
This README file only contains basic setup instructions.

## Building Spark
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,8 @@ $(function() {
var column = "table ." + $(this).attr("name");
$(column).hide();
});
// Stripe table rows after rows have been hidden to ensure correct striping.
stripeTables();

$("input:checkbox").click(function() {
var column = "table ." + $(this).attr("name");
Expand Down
5 changes: 0 additions & 5 deletions core/src/main/resources/org/apache/spark/ui/static/table.js
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,3 @@ function stripeTables() {
});
});
}

/* Stripe all tables after pages finish loading. */
$(function() {
stripeTables();
});
14 changes: 14 additions & 0 deletions core/src/main/resources/org/apache/spark/ui/static/webui.css
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,20 @@ pre {
border: none;
}

.stacktrace-details {
max-height: 300px;
overflow-y: auto;
margin: 0;
transition: max-height 0.5s ease-out, padding 0.5s ease-out;
}

.stacktrace-details.collapsed {
max-height: 0;
padding-top: 0;
padding-bottom: 0;
border: none;
}

span.expand-additional-metrics {
cursor: pointer;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,6 @@ private[spark] class ExecutorAllocationManager(sc: SparkContext) extends Logging
// Lower and upper bounds on the number of executors. These are required.
private val minNumExecutors = conf.getInt("spark.dynamicAllocation.minExecutors", -1)
private val maxNumExecutors = conf.getInt("spark.dynamicAllocation.maxExecutors", -1)
verifyBounds()

// How long there must be backlogged tasks for before an addition is triggered
private val schedulerBacklogTimeout = conf.getLong(
Expand All @@ -77,9 +76,14 @@ private[spark] class ExecutorAllocationManager(sc: SparkContext) extends Logging
"spark.dynamicAllocation.sustainedSchedulerBacklogTimeout", schedulerBacklogTimeout)

// How long an executor must be idle for before it is removed
private val removeThresholdSeconds = conf.getLong(
private val executorIdleTimeout = conf.getLong(
"spark.dynamicAllocation.executorIdleTimeout", 600)

// During testing, the methods to actually kill and add executors are mocked out
private val testing = conf.getBoolean("spark.dynamicAllocation.testing", false)

validateSettings()

// Number of executors to add in the next round
private var numExecutorsToAdd = 1

Expand All @@ -103,17 +107,14 @@ private[spark] class ExecutorAllocationManager(sc: SparkContext) extends Logging
// Polling loop interval (ms)
private val intervalMillis: Long = 100

// Whether we are testing this class. This should only be used internally.
private val testing = conf.getBoolean("spark.dynamicAllocation.testing", false)

// Clock used to schedule when executors should be added and removed
private var clock: Clock = new RealClock

/**
* Verify that the lower and upper bounds on the number of executors are valid.
* Verify that the settings specified through the config are valid.
* If not, throw an appropriate exception.
*/
private def verifyBounds(): Unit = {
private def validateSettings(): Unit = {
if (minNumExecutors < 0 || maxNumExecutors < 0) {
throw new SparkException("spark.dynamicAllocation.{min/max}Executors must be set!")
}
Expand All @@ -124,6 +125,22 @@ private[spark] class ExecutorAllocationManager(sc: SparkContext) extends Logging
throw new SparkException(s"spark.dynamicAllocation.minExecutors ($minNumExecutors) must " +
s"be less than or equal to spark.dynamicAllocation.maxExecutors ($maxNumExecutors)!")
}
if (schedulerBacklogTimeout <= 0) {
throw new SparkException("spark.dynamicAllocation.schedulerBacklogTimeout must be > 0!")
}
if (sustainedSchedulerBacklogTimeout <= 0) {
throw new SparkException(
"spark.dynamicAllocation.sustainedSchedulerBacklogTimeout must be > 0!")
}
if (executorIdleTimeout <= 0) {
throw new SparkException("spark.dynamicAllocation.executorIdleTimeout must be > 0!")
}
// Require external shuffle service for dynamic allocation
// Otherwise, we may lose shuffle files when killing executors
if (!conf.getBoolean("spark.shuffle.service.enabled", false) && !testing) {
throw new SparkException("Dynamic allocation of executors requires the external " +
"shuffle service. You may enable this through spark.shuffle.service.enabled.")
}
}

/**
Expand Down Expand Up @@ -254,7 +271,7 @@ private[spark] class ExecutorAllocationManager(sc: SparkContext) extends Logging
val removeRequestAcknowledged = testing || sc.killExecutor(executorId)
if (removeRequestAcknowledged) {
logInfo(s"Removing executor $executorId because it has been idle for " +
s"$removeThresholdSeconds seconds (new desired total will be ${numExistingExecutors - 1})")
s"$executorIdleTimeout seconds (new desired total will be ${numExistingExecutors - 1})")
executorsPendingToRemove.add(executorId)
true
} else {
Expand Down Expand Up @@ -329,8 +346,8 @@ private[spark] class ExecutorAllocationManager(sc: SparkContext) extends Logging
private def onExecutorIdle(executorId: String): Unit = synchronized {
if (!removeTimes.contains(executorId) && !executorsPendingToRemove.contains(executorId)) {
logDebug(s"Starting idle timer for $executorId because there are no more tasks " +
s"scheduled to run on the executor (to expire in $removeThresholdSeconds seconds)")
removeTimes(executorId) = clock.getTimeMillis + removeThresholdSeconds * 1000
s"scheduled to run on the executor (to expire in $executorIdleTimeout seconds)")
removeTimes(executorId) = clock.getTimeMillis + executorIdleTimeout * 1000
}
}

Expand Down
15 changes: 12 additions & 3 deletions core/src/main/scala/org/apache/spark/SecurityManager.scala
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ import java.net.{Authenticator, PasswordAuthentication}
import org.apache.hadoop.io.Text

import org.apache.spark.deploy.SparkHadoopUtil
import org.apache.spark.network.sasl.SecretKeyHolder

/**
* Spark class responsible for security.
Expand Down Expand Up @@ -84,7 +85,7 @@ import org.apache.spark.deploy.SparkHadoopUtil
* Authenticator installed in the SecurityManager to how it does the authentication
* and in this case gets the user name and password from the request.
*
* - ConnectionManager -> The Spark ConnectionManager uses java nio to asynchronously
* - BlockTransferService -> The Spark BlockTransferServices uses java nio to asynchronously
* exchange messages. For this we use the Java SASL
* (Simple Authentication and Security Layer) API and again use DIGEST-MD5
* as the authentication mechanism. This means the shared secret is not passed
Expand All @@ -98,7 +99,7 @@ import org.apache.spark.deploy.SparkHadoopUtil
* of protection they want. If we support those, the messages will also have to
* be wrapped and unwrapped via the SaslServer/SaslClient.wrap/unwrap API's.
*
* Since the connectionManager does asynchronous messages passing, the SASL
* Since the NioBlockTransferService does asynchronous messages passing, the SASL
* authentication is a bit more complex. A ConnectionManager can be both a client
* and a Server, so for a particular connection is has to determine what to do.
* A ConnectionId was added to be able to track connections and is used to
Expand All @@ -107,6 +108,10 @@ import org.apache.spark.deploy.SparkHadoopUtil
* and waits for the response from the server and does the handshake before sending
* the real message.
*
* The NettyBlockTransferService ensures that SASL authentication is performed
* synchronously prior to any other communication on a connection. This is done in
* SaslClientBootstrap on the client side and SaslRpcHandler on the server side.
*
* - HTTP for the Spark UI -> the UI was changed to use servlets so that javax servlet filters
* can be used. Yarn requires a specific AmIpFilter be installed for security to work
* properly. For non-Yarn deployments, users can write a filter to go through a
Expand Down Expand Up @@ -139,7 +144,7 @@ import org.apache.spark.deploy.SparkHadoopUtil
* can take place.
*/

private[spark] class SecurityManager(sparkConf: SparkConf) extends Logging {
private[spark] class SecurityManager(sparkConf: SparkConf) extends Logging with SecretKeyHolder {

// key used to store the spark secret in the Hadoop UGI
private val sparkSecretLookupKey = "sparkCookie"
Expand Down Expand Up @@ -337,4 +342,8 @@ private[spark] class SecurityManager(sparkConf: SparkConf) extends Logging {
* @return the secret key as a String if authentication is enabled, otherwise returns null
*/
def getSecretKey(): String = secretKey

// Default SecurityManager only has a single secret key, so ignore appId.
override def getSaslUser(appId: String): String = getSaslUser()
override def getSecretKey(appId: String): String = getSecretKey()
}
6 changes: 6 additions & 0 deletions core/src/main/scala/org/apache/spark/SparkConf.scala
Original file line number Diff line number Diff line change
Expand Up @@ -217,6 +217,12 @@ class SparkConf(loadDefaults: Boolean) extends Cloneable with Logging {
*/
getAll.filter { case (k, _) => isAkkaConf(k) }

/**
* Returns the Spark application id, valid in the Driver after TaskScheduler registration and
* from the start in the Executor.
*/
def getAppId: String = get("spark.app.id")

/** Does the configuration contain a given parameter? */
def contains(key: String): Boolean = settings.contains(key)

Expand Down
6 changes: 6 additions & 0 deletions core/src/main/scala/org/apache/spark/SparkContext.scala
Original file line number Diff line number Diff line change
Expand Up @@ -313,6 +313,8 @@ class SparkContext(config: SparkConf) extends SparkStatusAPI with Logging {
val applicationId: String = taskScheduler.applicationId()
conf.set("spark.app.id", applicationId)

env.blockManager.initialize(applicationId)

val metricsSystem = env.metricsSystem

// The metrics system for Driver need to be set spark.app.id to app ID.
Expand Down Expand Up @@ -558,6 +560,8 @@ class SparkContext(config: SparkConf) extends SparkStatusAPI with Logging {


/**
* :: Experimental ::
*
* Get an RDD for a Hadoop-readable dataset as PortableDataStream for each file
* (useful for binary data)
*
Expand Down Expand Up @@ -600,6 +604,8 @@ class SparkContext(config: SparkConf) extends SparkStatusAPI with Logging {
}

/**
* :: Experimental ::
*
* Load data from a flat binary file, assuming the length of each record is constant.
*
* @param path Directory to the input data files
Expand Down
5 changes: 3 additions & 2 deletions core/src/main/scala/org/apache/spark/SparkEnv.scala
Original file line number Diff line number Diff line change
Expand Up @@ -276,7 +276,7 @@ object SparkEnv extends Logging {
val blockTransferService =
conf.get("spark.shuffle.blockTransferService", "netty").toLowerCase match {
case "netty" =>
new NettyBlockTransferService(conf)
new NettyBlockTransferService(conf, securityManager)
case "nio" =>
new NioBlockTransferService(conf, securityManager)
}
Expand All @@ -285,8 +285,9 @@ object SparkEnv extends Logging {
"BlockManagerMaster",
new BlockManagerMasterActor(isLocal, conf, listenerBus)), conf, isDriver)

// NB: blockManager is not valid until initialize() is called later.
val blockManager = new BlockManager(executorId, actorSystem, blockManagerMaster,
serializer, conf, mapOutputTracker, shuffleManager, blockTransferService)
serializer, conf, mapOutputTracker, shuffleManager, blockTransferService, securityManager)

val broadcastManager = new BroadcastManager(isDriver, conf, securityManager)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ import org.apache.hadoop.mapred._
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.fs.Path

import org.apache.spark.mapred.SparkHadoopMapRedUtil
import org.apache.spark.rdd.HadoopRDD

/**
Expand Down
Loading