Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] SPARK-2157 Ability to write tight firewall rules for Spark #1107

Closed
wants to merge 14 commits into from
Closed

[WIP] SPARK-2157 Ability to write tight firewall rules for Spark #1107

wants to merge 14 commits into from

Conversation

ash211
Copy link
Contributor

@ash211 ash211 commented Jun 17, 2014

https://issues.apache.org/jira/browse/SPARK-2157

This pull request adds the ability to specify every port opened by Spark that I could find to configure. Specifically, it adds the following four configuration options:

  • spark.fileserver.port
  • spark.broadcast.port
  • spark.replClassServer.port
  • spark.blockManager.port

Of those options, most will attempt to fall back to the n+1 incremental port if the port is currently in use. This incremental fallback logic is done for :

  • spark.broadcast.port
  • spark.blockManager.port
  • spark.replClassServer.port

But not for spark.fileserver.port

Things still thinking about for this PR:

  • I'm still observing instances of CoarseGrainedExecutorBackend listening on ephemeral ports
  • In earlier documentation spark.driver.port is described as being the way to configure :7077 on Spark Master, and I'm not sure this is the case. If so the same config item is used for multiple things, as it's also the port on which the applications starts listening on Akka.

Many thanks to @epahomov whose work this is based off of.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15851/

@@ -102,7 +102,8 @@ import org.apache.spark.util.Utils

val virtualDirectory = new PlainFile(outputDir) // "directory" for classfiles
/** Jetty server that will serve our classes to worker nodes */
val classServer = new HttpServer(outputDir, new SecurityManager(conf))
val classServerListenPort: Int = conf.getInt("spark.replClassServer.port", 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"val classServerListenPort: Int" - word "port" make it obvious, that it's Int. There is no need for such specification

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16258/

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@ash211
Copy link
Contributor Author

ash211 commented Jun 30, 2014

With these changes, I think I'm able to make this work for the majority case.

There will still need to be some work done to handle the situation where a Spark application is run on the same machine as a worker though. Currently the application will take the port of an executor and cause the executor on one server to be on port n+1 while all others are on port n. This occurs for spark.fileserver.port and spark.blockManager.port

Fresh cluster, no applications:

$ clear; jps | grep -v ' $' | grep -v 'Jps' | sort ; lsof -i 4tcp -P | grep java | awk '{print "pid", $2, $9}'
85073 Master
85187 Worker
86636 sbt-launch-0.13.2.jar
pid 85073 aash-mbp:7077
pid 85073 *:8080
pid 85073 aash-mbp:7077->aash-mbp:49582
pid 85187 aash-mbp:49581
pid 85187 *:8081
pid 85187 aash-mbp:49582->aash-mbp:7077

The connections can be summarized as:

  • Worker -> Master:7077

After starting spark-shell like this:
./bin/spark-shell --master spark://aash-mbp.local:7077 --driver-java-options "-Dspark.fileserver.port=40010 -Dspark.broadcast.port=40020 -Dspark.replClassServer.port=40030 -Dspark.blockManager.port=40040 -Dspark.driver.port=40050"

$ clear; jps | grep -v ' $' | grep -v 'Jps' | sort ; lsof -i 4tcp -P | grep java | awk '{print "pid", $2, $9}'
85073 Master
85187 Worker
86247 SparkSubmit
86325 CoarseGrainedExecutorBackend
pid 85073 aash-mbp:7077
pid 85073 *:8080
pid 85073 aash-mbp:7077->aash-mbp:49582
pid 85073 aash-mbp:7077->aash-mbp:49932
pid 85187 aash-mbp:49581
pid 85187 *:8081
pid 85187 aash-mbp:49582->aash-mbp:7077
pid 85187 aash-mbp:49581->aash-mbp:49935
pid 86247 *:40030
pid 86247 aash-mbp:40050
pid 86247 *:40040
pid 86247 *:40020
pid 86247 *:40010
pid 86247 *:4040
pid 86247 aash-mbp:49932->aash-mbp:7077
pid 86247 aash-mbp:40050->aash-mbp:49934
pid 86247 aash-mbp:40050->aash-mbp:49937
pid 86325 aash-mbp:49933
pid 86325 aash-mbp:49934->aash-mbp:40050
pid 86325 aash-mbp:49935->aash-mbp:49581
pid 86325 aash-mbp:49936
pid 86325 aash-mbp:49937->aash-mbp:40050
pid 86325 *:40041
pid 86325 *:40011

These are summarized as:

  • Worker:eph -> Master:7077
  • SparkSubmit:eph -> Master:7077
  • Executor:eph -> SparkSubmit:40050
  • Executor:eph -> SparkSubmit:40050 (again?)
  • Worker:eph -> Executor:eph

The outstanding things I'd like to do for this still are:

  • figure out a way to get the conflict of app vs executor on the same server figured out. I think the right way to do this might be to have separate config settings: spark.fileserver.app.port and spark.fileserver.executor.port, and the same for spark.blockManager.port. This would allow ops teams to configure the network activity on the cluster and executor ports separately from dev teams running jobs on the cluster.
  • better logging for port failover, ideally specific to the service name (turn "failed to start service on port X" into "failed to start block manager on port X")

And lower priority:

  • figure out why two connections are made from executor -> app on the app's spark.driver.port rather than just one

Because all the ports being listened on ("*:40010" etc) are specified now though I think this is moving closer to being ready.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16259/

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16261/

@pwendell
Copy link
Contributor

@ash211 do you mind bringing this up to date? I'd like to do a review and merge in the next few days.

Fails when running master+worker+executor+shell on the same machine.  I think
the issue is that both the shell and the executor attempt to start a
ConnectionManager, which causes port conflicts.  Solution is to attempt and
increment on BindExceptions
- spark.fileserver.port
- spark.broadcast.port
- spark.replClassServer.port
- spark.blockManager.port
@ash211
Copy link
Contributor Author

ash211 commented Jul 25, 2014

Hi @pwendell I had a minor conflict with the fix for SPARK-2392 in #1335 but it's rebased now and merges cleanly.

@@ -19,6 +19,7 @@ package org.apache.spark

import java.io.File

import org.apache.spark.network.PortManager
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pwendell
Copy link
Contributor

@ash211 made some small comments but overall looks good. One thing - have you tested this on a cluster? I want to make sure we cover every possible connection here (I wonder if there are some we might have missed). I think the only real way to do it is to totally lock down a cluster and see if Spark can run.

@ash211
Copy link
Contributor Author

ash211 commented Aug 1, 2014

I haven't tested this on an actual locked down cluster yet -- it's just
been looking at netstat output so far
On Jul 30, 2014 10:22 PM, "Patrick Wendell" notifications@github.com
wrote:

@ash211 https://github.com/ash211 made some small comments but overall
looks good. One thing - have you tested this on a cluster? I want to make
sure we cover every possible connection here (I wonder if there are some we
might have missed). I think the only real way to do it is to totally lock
down a cluster and see if Spark can run.


Reply to this email directly or view it on GitHub
#1107 (comment).

@ash211
Copy link
Contributor Author

ash211 commented Aug 2, 2014

Looking at netstat more closely, we realized that there is still a port that's not configurable: the port that the driver connects to the executor on with Akka. The worker connects to its executor on this port as well, but because that's a local connection it's not as important from the firewall perspective. Looking at the master branch now, it also looks like the driverPropsFetcher right before needs a configurable port as well.

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala#L129

@pwendell
Copy link
Contributor

pwendell commented Aug 2, 2014

Yeah good call, we need to cover that one as well.

}
logInfo("Could not bind on port: " + tryPort + ". Attempting port " + (tryPort + 1))
}
case e: Exception => throw e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't do anything I believe

@pwendell
Copy link
Contributor

pwendell commented Aug 4, 2014

Hey @ash211 I think @andrewor14 is gonna take a stab at fixing these issues so we can get it merged today. We're passed the window on this release and I want to make sure it goes in so people can test it. We'll include all of your comitts and just add a few changes on top of them.

@ash211
Copy link
Contributor Author

ash211 commented Aug 4, 2014

Sounds great -- my apologies for not being on top of responding to the code
review

On Mon, Aug 4, 2014 at 6:31 PM, Patrick Wendell notifications@github.com
wrote:

Hey @ash211 https://github.com/ash211 I think @andrewor14
https://github.com/andrewor14 is gonna take a stab at fixing these
issues so we can get it merged today. We're passed the window on this
release and I want to make sure it goes in so people can test it. We'll
include all of your comitts and just add a few changes on top of them.


Reply to this email directly or view it on GitHub
#1107 (comment).

@andrewor14
Copy link
Contributor

No problem, thanks for working on this @ash211

asfgit pushed a commit that referenced this pull request Aug 6, 2014
The goal of this PR is to allow users of Spark to write tight firewall rules for their clusters. This is currently not possible because Spark uses random ports in many places, notably the communication between executors and drivers. The changes in this PR are based on top of ash211's changes in #1107.

The list covered here may or may not be the complete set of port needed for Spark to operate perfectly. However, as of the latest commit there are no known sources of random ports (except in tests). I have not documented a few of the more obscure configs.

My spark-env.sh looks like this:
```
export SPARK_MASTER_PORT=6060
export SPARK_WORKER_PORT=7070
export SPARK_MASTER_WEBUI_PORT=9090
export SPARK_WORKER_WEBUI_PORT=9091
```
and my spark-defaults.conf looks like this:
```
spark.master spark://andrews-mbp:6060
spark.driver.port 5001
spark.fileserver.port 5011
spark.broadcast.port 5021
spark.replClassServer.port 5031
spark.blockManager.port 5041
spark.executor.port 5051
```

Author: Andrew Or <andrewor14@gmail.com>
Author: Andrew Ash <andrew@andrewash.com>

Closes #1777 from andrewor14/configure-ports and squashes the following commits:

621267b [Andrew Or] Merge branch 'master' of github.com:apache/spark into configure-ports
8a6b820 [Andrew Or] Use a random UI port during tests
7da0493 [Andrew Or] Fix tests
523c30e [Andrew Or] Add test for isBindCollision
b97b02a [Andrew Or] Minor fixes
c22ad00 [Andrew Or] Merge branch 'master' of github.com:apache/spark into configure-ports
93d359f [Andrew Or] Executors connect to wrong port when collision occurs
d502e5f [Andrew Or] Handle port collisions when creating Akka systems
a2dd05c [Andrew Or] Patrick's comment nit
86461e2 [Andrew Or] Remove spark.executor.env.port and spark.standalone.client.port
1d2d5c6 [Andrew Or] Fix ports for standalone cluster mode
cb3be88 [Andrew Or] Various doc fixes (broken link, format etc.)
e837cde [Andrew Or] Remove outdated TODOs
bfbab28 [Andrew Or] Merge branch 'master' of github.com:apache/spark into configure-ports
de1b207 [Andrew Or] Update docs to reflect new ports
b565079 [Andrew Or] Add spark.ports.maxRetries
2551eb2 [Andrew Or] Remove spark.worker.watcher.port
151327a [Andrew Or] Merge branch 'master' of github.com:apache/spark into configure-ports
9868358 [Andrew Or] Add a few miscellaneous ports
6016e77 [Andrew Or] Add spark.executor.port
8d836e6 [Andrew Or] Also document SPARK_{MASTER/WORKER}_WEBUI_PORT
4d9e6f3 [Andrew Or] Fix super subtle bug
3f8e51b [Andrew Or] Correct erroneous docs...
e111d08 [Andrew Or] Add names for UI services
470f38c [Andrew Or] Special case non-"Address already in use" exceptions
1d7e408 [Andrew Or] Treat 0 ports specially + return correct ConnectionManager port
ba32280 [Andrew Or] Minor fixes
6b550b0 [Andrew Or] Assorted fixes
73fbe89 [Andrew Or] Move start service logic to Utils
ec676f4 [Andrew Or] Merge branch 'SPARK-2157' of github.com:ash211/spark into configure-ports
038a579 [Andrew Ash] Trust the server start function to report the port the service started on
7c5bdc4 [Andrew Ash] Fix style issue
0347aef [Andrew Ash] Unify port fallback logic to a single place
24a4c32 [Andrew Ash] Remove type on val to match surrounding style
9e4ad96 [Andrew Ash] Reformat for style checker
5d84e0e [Andrew Ash] Document new port configuration options
066dc7a [Andrew Ash] Fix up HttpServer port increments
cad16da [Andrew Ash] Add fallover increment logic for HttpServer
c5a0568 [Andrew Ash] Fix ConnectionManager to retry with increment
b80d2fd [Andrew Ash] Make Spark's block manager port configurable
17c79bb [Andrew Ash] Add a configuration option for spark-shell's class server
f34115d [Andrew Ash] SPARK-1176 Add port configuration for HttpBroadcast
49ee29b [Andrew Ash] SPARK-1174 Add port configuration for HttpFileServer
1c0981a [Andrew Ash] Make port in HttpServer configurable
(cherry picked from commit 09f7e45)

Signed-off-by: Patrick Wendell <pwendell@gmail.com>
asfgit pushed a commit that referenced this pull request Aug 6, 2014
The goal of this PR is to allow users of Spark to write tight firewall rules for their clusters. This is currently not possible because Spark uses random ports in many places, notably the communication between executors and drivers. The changes in this PR are based on top of ash211's changes in #1107.

The list covered here may or may not be the complete set of port needed for Spark to operate perfectly. However, as of the latest commit there are no known sources of random ports (except in tests). I have not documented a few of the more obscure configs.

My spark-env.sh looks like this:
```
export SPARK_MASTER_PORT=6060
export SPARK_WORKER_PORT=7070
export SPARK_MASTER_WEBUI_PORT=9090
export SPARK_WORKER_WEBUI_PORT=9091
```
and my spark-defaults.conf looks like this:
```
spark.master spark://andrews-mbp:6060
spark.driver.port 5001
spark.fileserver.port 5011
spark.broadcast.port 5021
spark.replClassServer.port 5031
spark.blockManager.port 5041
spark.executor.port 5051
```

Author: Andrew Or <andrewor14@gmail.com>
Author: Andrew Ash <andrew@andrewash.com>

Closes #1777 from andrewor14/configure-ports and squashes the following commits:

621267b [Andrew Or] Merge branch 'master' of github.com:apache/spark into configure-ports
8a6b820 [Andrew Or] Use a random UI port during tests
7da0493 [Andrew Or] Fix tests
523c30e [Andrew Or] Add test for isBindCollision
b97b02a [Andrew Or] Minor fixes
c22ad00 [Andrew Or] Merge branch 'master' of github.com:apache/spark into configure-ports
93d359f [Andrew Or] Executors connect to wrong port when collision occurs
d502e5f [Andrew Or] Handle port collisions when creating Akka systems
a2dd05c [Andrew Or] Patrick's comment nit
86461e2 [Andrew Or] Remove spark.executor.env.port and spark.standalone.client.port
1d2d5c6 [Andrew Or] Fix ports for standalone cluster mode
cb3be88 [Andrew Or] Various doc fixes (broken link, format etc.)
e837cde [Andrew Or] Remove outdated TODOs
bfbab28 [Andrew Or] Merge branch 'master' of github.com:apache/spark into configure-ports
de1b207 [Andrew Or] Update docs to reflect new ports
b565079 [Andrew Or] Add spark.ports.maxRetries
2551eb2 [Andrew Or] Remove spark.worker.watcher.port
151327a [Andrew Or] Merge branch 'master' of github.com:apache/spark into configure-ports
9868358 [Andrew Or] Add a few miscellaneous ports
6016e77 [Andrew Or] Add spark.executor.port
8d836e6 [Andrew Or] Also document SPARK_{MASTER/WORKER}_WEBUI_PORT
4d9e6f3 [Andrew Or] Fix super subtle bug
3f8e51b [Andrew Or] Correct erroneous docs...
e111d08 [Andrew Or] Add names for UI services
470f38c [Andrew Or] Special case non-"Address already in use" exceptions
1d7e408 [Andrew Or] Treat 0 ports specially + return correct ConnectionManager port
ba32280 [Andrew Or] Minor fixes
6b550b0 [Andrew Or] Assorted fixes
73fbe89 [Andrew Or] Move start service logic to Utils
ec676f4 [Andrew Or] Merge branch 'SPARK-2157' of github.com:ash211/spark into configure-ports
038a579 [Andrew Ash] Trust the server start function to report the port the service started on
7c5bdc4 [Andrew Ash] Fix style issue
0347aef [Andrew Ash] Unify port fallback logic to a single place
24a4c32 [Andrew Ash] Remove type on val to match surrounding style
9e4ad96 [Andrew Ash] Reformat for style checker
5d84e0e [Andrew Ash] Document new port configuration options
066dc7a [Andrew Ash] Fix up HttpServer port increments
cad16da [Andrew Ash] Add fallover increment logic for HttpServer
c5a0568 [Andrew Ash] Fix ConnectionManager to retry with increment
b80d2fd [Andrew Ash] Make Spark's block manager port configurable
17c79bb [Andrew Ash] Add a configuration option for spark-shell's class server
f34115d [Andrew Ash] SPARK-1176 Add port configuration for HttpBroadcast
49ee29b [Andrew Ash] SPARK-1174 Add port configuration for HttpFileServer
1c0981a [Andrew Ash] Make port in HttpServer configurable
@pwendell
Copy link
Contributor

pwendell commented Aug 6, 2014

Hey @ash211 I think this is covered now in #1777. Do you mind closing this issue?

@ash211
Copy link
Contributor Author

ash211 commented Aug 7, 2014

Of course. Thanks for getting this in and dealing with the fallout I saw come through

@ash211 ash211 closed this Aug 7, 2014
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
The goal of this PR is to allow users of Spark to write tight firewall rules for their clusters. This is currently not possible because Spark uses random ports in many places, notably the communication between executors and drivers. The changes in this PR are based on top of ash211's changes in apache#1107.

The list covered here may or may not be the complete set of port needed for Spark to operate perfectly. However, as of the latest commit there are no known sources of random ports (except in tests). I have not documented a few of the more obscure configs.

My spark-env.sh looks like this:
```
export SPARK_MASTER_PORT=6060
export SPARK_WORKER_PORT=7070
export SPARK_MASTER_WEBUI_PORT=9090
export SPARK_WORKER_WEBUI_PORT=9091
```
and my spark-defaults.conf looks like this:
```
spark.master spark://andrews-mbp:6060
spark.driver.port 5001
spark.fileserver.port 5011
spark.broadcast.port 5021
spark.replClassServer.port 5031
spark.blockManager.port 5041
spark.executor.port 5051
```

Author: Andrew Or <andrewor14@gmail.com>
Author: Andrew Ash <andrew@andrewash.com>

Closes apache#1777 from andrewor14/configure-ports and squashes the following commits:

621267b [Andrew Or] Merge branch 'master' of github.com:apache/spark into configure-ports
8a6b820 [Andrew Or] Use a random UI port during tests
7da0493 [Andrew Or] Fix tests
523c30e [Andrew Or] Add test for isBindCollision
b97b02a [Andrew Or] Minor fixes
c22ad00 [Andrew Or] Merge branch 'master' of github.com:apache/spark into configure-ports
93d359f [Andrew Or] Executors connect to wrong port when collision occurs
d502e5f [Andrew Or] Handle port collisions when creating Akka systems
a2dd05c [Andrew Or] Patrick's comment nit
86461e2 [Andrew Or] Remove spark.executor.env.port and spark.standalone.client.port
1d2d5c6 [Andrew Or] Fix ports for standalone cluster mode
cb3be88 [Andrew Or] Various doc fixes (broken link, format etc.)
e837cde [Andrew Or] Remove outdated TODOs
bfbab28 [Andrew Or] Merge branch 'master' of github.com:apache/spark into configure-ports
de1b207 [Andrew Or] Update docs to reflect new ports
b565079 [Andrew Or] Add spark.ports.maxRetries
2551eb2 [Andrew Or] Remove spark.worker.watcher.port
151327a [Andrew Or] Merge branch 'master' of github.com:apache/spark into configure-ports
9868358 [Andrew Or] Add a few miscellaneous ports
6016e77 [Andrew Or] Add spark.executor.port
8d836e6 [Andrew Or] Also document SPARK_{MASTER/WORKER}_WEBUI_PORT
4d9e6f3 [Andrew Or] Fix super subtle bug
3f8e51b [Andrew Or] Correct erroneous docs...
e111d08 [Andrew Or] Add names for UI services
470f38c [Andrew Or] Special case non-"Address already in use" exceptions
1d7e408 [Andrew Or] Treat 0 ports specially + return correct ConnectionManager port
ba32280 [Andrew Or] Minor fixes
6b550b0 [Andrew Or] Assorted fixes
73fbe89 [Andrew Or] Move start service logic to Utils
ec676f4 [Andrew Or] Merge branch 'SPARK-2157' of github.com:ash211/spark into configure-ports
038a579 [Andrew Ash] Trust the server start function to report the port the service started on
7c5bdc4 [Andrew Ash] Fix style issue
0347aef [Andrew Ash] Unify port fallback logic to a single place
24a4c32 [Andrew Ash] Remove type on val to match surrounding style
9e4ad96 [Andrew Ash] Reformat for style checker
5d84e0e [Andrew Ash] Document new port configuration options
066dc7a [Andrew Ash] Fix up HttpServer port increments
cad16da [Andrew Ash] Add fallover increment logic for HttpServer
c5a0568 [Andrew Ash] Fix ConnectionManager to retry with increment
b80d2fd [Andrew Ash] Make Spark's block manager port configurable
17c79bb [Andrew Ash] Add a configuration option for spark-shell's class server
f34115d [Andrew Ash] SPARK-1176 Add port configuration for HttpBroadcast
49ee29b [Andrew Ash] SPARK-1174 Add port configuration for HttpFileServer
1c0981a [Andrew Ash] Make port in HttpServer configurable
wangyum added a commit that referenced this pull request May 26, 2023
…ssions (#1107)

* [SPARK-40609][SQL] Casts types according to bucket info for Equality expressions

* Update TypeCoercion.scala
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants