Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-45431][DOCS] Document new SSL RPC feature #43240

Closed
wants to merge 4 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
109 changes: 104 additions & 5 deletions docs/security.md
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,26 @@ Note that when using files, Spark will not mount these files into the containers
you to ensure that the secret files are deployed securely into your containers and that the driver's
secret file agrees with the executors' secret file.

## Encryption
# Network Encryption

Spark supports two mutually exclusive forms of encryption for RPC connections.

The first is an AES-based encryption which relies on a shared secret, and thus requires
RPC authentication to also be enabled.

The second is an SSL based encryption mechanism utilizing Netty's support for SSL. This requires
keys and certificates to be properly configured. It can be used with or without the authentication
mechanism discussed earlier.

One may prefer to use the SSL based encryption in scenarios where compliance mandates the usage
of specific protocols; or to leverage the security of a more standard encryption library. However,
the AES based encryption is simpler to configure and may be preferred if the only requirement
is that data be encrypted in transit.

If both options are enabled in the configuration, the SSL based RPC encryption takes precedence
and the AES based encryption will not be used (and a warning message will be emitted).

## AES based Encryption

Spark supports AES-based encryption for RPC connections. For encryption to be enabled, RPC
authentication must also be enabled and properly configured. AES encryption uses the
Expand Down Expand Up @@ -209,6 +228,17 @@ The following table describes the different options available for configuring th
</tr>
</table>

## SSL Encryption

Spark supports SSL based encryption for RPC connections. Please refer to the SSL Configuration
section below to understand how to configure it. The SSL settings are mostly similar across the UI
and RPC, however there are a few additional settings which are specific to the RPC implementation.
The RPC implementation uses Netty under the hood (while the UI uses Jetty), which supports a
different set of options.

Unlike the other SSL settings for the UI, the RPC SSL is *not* automatically enabled if
`spark.ssl.enabled` is set. It must be explicitly enabled, to ensure a safe migration path for users
upgrading Spark versions.

# Local Storage Encryption

Expand Down Expand Up @@ -437,8 +467,10 @@ application configurations will be ignored.
Configuration for SSL is organized hierarchically. The user can configure the default SSL settings
which will be used for all the supported communication protocols unless they are overwritten by
protocol-specific settings. This way the user can easily provide the common settings for all the
protocols without disabling the ability to configure each one individually. The following table
describes the SSL configuration namespaces:
protocols without disabling the ability to configure each one individually. Note that all settings
are inherited this way, *except* for `spark.ssl.rpc.enabled` which must be explicitly set.

The following table describes the SSL configuration namespaces:

<table class="table table-striped">
<thead>
Expand Down Expand Up @@ -466,17 +498,22 @@ describes the SSL configuration namespaces:
<td><code>spark.ssl.historyServer</code></td>
<td>History Server Web UI</td>
</tr>
<tr>
<td><code>spark.ssl.rpc</code></td>
<td>Spark RPC communication</td>
</tr>
</table>

The full breakdown of available SSL options can be found below. The `${ns}` placeholder should be
replaced with one of the above namespaces.

<table class="table table-striped">
<thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr></thead>
<thead><tr><th>Property Name</th><th>Default</th><th>Meaning</th><th>Supported Namespaces</th></tr></thead>
<tr>
<td><code>${ns}.enabled</code></td>
<td>false</td>
<td>Enables SSL. When enabled, <code>${ns}.ssl.protocol</code> is required.</td>
<td>ui,standalone,historyServer,rpc</td>
</tr>
<tr>
<td><code>${ns}.port</code></td>
Expand All @@ -490,6 +527,7 @@ replaced with one of the above namespaces.
<br />When not set, the SSL port will be derived from the non-SSL port for the
same service. A value of "0" will make the service bind to an ephemeral port.
</td>
<td>ui,standalone,historyServer</td>
</tr>
<tr>
<td><code>${ns}.enabledAlgorithms</code></td>
Expand All @@ -504,13 +542,15 @@ replaced with one of the above namespaces.

<br />Note: If not set, the default cipher suite for the JRE will be used.
</td>
<td>ui,standalone,historyServer,rpc</td>
</tr>
<tr>
<td><code>${ns}.keyPassword</code></td>
<td>None</td>
<td>
The password to the private key in the key store.
</td>
<td>ui,standalone,historyServer,rpc</td>
</tr>
<tr>
<td><code>${ns}.keyStore</code></td>
Expand All @@ -519,16 +559,19 @@ replaced with one of the above namespaces.
Path to the key store file. The path can be absolute or relative to the directory in which the
process is started.
</td>
<td>ui,standalone,historyServer,rpc</td>
</tr>
<tr>
<td><code>${ns}.keyStorePassword</code></td>
<td>None</td>
<td>Password to the key store.</td>
<td>ui,standalone,historyServer,rpc</td>
</tr>
<tr>
<td><code>${ns}.keyStoreType</code></td>
<td>JKS</td>
<td>The type of the key store.</td>
<td>ui,standalone,historyServer</td>
</tr>
<tr>
<td><code>${ns}.protocol</code></td>
Expand All @@ -541,11 +584,15 @@ replaced with one of the above namespaces.
<a href="https://docs.oracle.com/en/java/javase/17/docs/specs/security/standard-names.html#additional-jsse-standard-names">this</a>
page.
</td>
<td>ui,standalone,historyServer,rpc</td>
</tr>
<tr>
<td><code>${ns}.needClientAuth</code></td>
<td>false</td>
<td>Whether to require client authentication.</td>
<td>
Whether to require client authentication.
</td>
<td>ui,standalone,historyServer</td>
</tr>
<tr>
<td><code>${ns}.trustStore</code></td>
Expand All @@ -554,16 +601,68 @@ replaced with one of the above namespaces.
Path to the trust store file. The path can be absolute or relative to the directory in which
the process is started.
</td>
<td>ui,standalone,historyServer,rpc</td>
</tr>
<tr>
<td><code>${ns}.trustStorePassword</code></td>
<td>None</td>
<td>Password for the trust store.</td>
<td>ui,standalone,historyServer,rpc</td>
</tr>
<tr>
<td><code>${ns}.trustStoreType</code></td>
<td>JKS</td>
<td>The type of the trust store.</td>
<td>ui,standalone,historyServer</td>
</tr>
<tr>
<td><code>${ns}.openSSLEnabled</code></td>
<td>false</td>
<td>
Whether to use OpenSSL for cryptographic operations instead of the JDK SSL provider.
This setting requires the `certChain` and `privateKey` settings to be set.
This takes precedence over the `keyStore` and `trustStore` settings if both are specified.
If the OpenSSL library is not available at runtime, we will fall back to the JDK provider.
</td>
<td>rpc</td>
</tr>
<tr>
<td><code>${ns}.privateKey</code></td>
<td>None</td>
<td>
Path to the private key file in PEM format. The path can be absolute or relative to the
directory in which the process is started.
This setting is required when using the OpenSSL implementation.
</td>
<td>rpc</td>
</tr>
<tr>
<td><code>${ns}.certChain</code></td>
<td>None</td>
<td>
Path to the certificate chain file in PEM format. The path can be absolute or relative to the
directory in which the process is started.
This setting is required when using the OpenSSL implementation.
</td>
<td>rpc</td>
</tr>
<tr>
<td><code>${ns}.trustStoreReloadingEnabled</code></td>
<td>false</td>
<td>
Whether the trust store should be reloaded periodically.
This setting is mostly only useful in standalone deployments, not k8s or yarn deployments.
</td>
<td>rpc</td>
</tr>
<tr>
<td><code>${ns}.trustStoreReloadIntervalMs</code></td>
<td>10000</td>
<td>
The interval at which the trust store should be reloaded (in milliseconds).
This setting is mostly only useful in standalone deployments, not k8s or yarn deployments.
</td>
<td>rpc</td>
</tr>
</table>

Expand Down