From a128609bf1f2ac4d59a0e4c704551f449168036a Mon Sep 17 00:00:00 2001 From: Hasnain Lakhani Date: Thu, 5 Oct 2023 20:02:14 -0700 Subject: [PATCH 1/4] working --- docs/security.md | 90 +++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 85 insertions(+), 5 deletions(-) diff --git a/docs/security.md b/docs/security.md index 0a6a3cf089185..28d1734f8fe57 100644 --- a/docs/security.md +++ b/docs/security.md @@ -209,6 +209,21 @@ The following table describes the different options available for configuring th +## SSL Encryption + +Spark supports SSL based encryption for RPC connections. Please refer to the SSL Configuration +section below to understand how to configure it. The SSL settings are mostly similar across the UI +and RPC, however there are a few additional settings which are specific to the RPC implementation. +The RPC implementation uses Netty under the hood (while the UI uses Jetty), which supports a +different set of options. + +Unlike the other SSL settings for the UI, the RPC SSL is *not* automatically enabled if +`spark.ssl.enabled` is set. It must be explicitly enabled, to ensure a safe migration path for users +upgrading Spark versions. + +The SSL encryption support supersedes the authentication and encryption settings mentioned +earlier. If both are enabled, the SSL settings take precedence and the prior settings will be +disabled at runtime, and a warning message will be emitted. # Local Storage Encryption @@ -437,8 +452,10 @@ application configurations will be ignored. Configuration for SSL is organized hierarchically. The user can configure the default SSL settings which will be used for all the supported communication protocols unless they are overwritten by protocol-specific settings. This way the user can easily provide the common settings for all the -protocols without disabling the ability to configure each one individually. The following table -describes the SSL configuration namespaces: +protocols without disabling the ability to configure each one individually. Note that all settings +are inherited this way, *except* for `spark.ssl.rpc.enabled` which must be explicitly set. + +The following table describes the SSL configuration namespaces: @@ -466,6 +483,10 @@ describes the SSL configuration namespaces: + + + +
spark.ssl.historyServer History Server Web UI
spark.ssl.rpcSpark RPC communication
The full breakdown of available SSL options can be found below. The `${ns}` placeholder should be @@ -489,6 +510,8 @@ replaced with one of the above namespaces.
When not set, the SSL port will be derived from the non-SSL port for the same service. A value of "0" will make the service bind to an ephemeral port. + +
This setting is not applicable to the `rpc` namespace. @@ -528,7 +551,7 @@ replaced with one of the above namespaces. ${ns}.keyStoreType JKS - The type of the key store. + The type of the key store. This setting is not applicable to the `rpc` namespace. ${ns}.protocol @@ -545,7 +568,10 @@ replaced with one of the above namespaces. ${ns}.needClientAuth false - Whether to require client authentication. + + Whether to require client authentication. This setting is not applicable to the `rpc` + namespace. + ${ns}.trustStore @@ -563,8 +589,62 @@ replaced with one of the above namespaces. ${ns}.trustStoreType JKS - The type of the trust store. + The type of the trust store. This setting is not applicable to the `rpc` namespace. + + + ${ns}.openSSLEnabled + false + + Whether to use OpenSSL for cryptographic operations instead of the JDK SSL provider. + This setting is only applicable to the `rpc` namespace, and also requires the `certChain` + and `privateKey` settings to be set. + + + + ${ns}.privateKey + None + + Path to the private key file in PEM format. The path can be absolute or relative to the + directory in which the process is started. + This setting is only applicable to the `rpc` namespace, and is required when using the + OpenSSL implementation. + + + ${ns}.certChain + None + + Path to the certificate chain file in PEM format. The path can be absolute or relative to the + directory in which the process is started. + This setting is only applicable to the `rpc` namespace, and is required when using the + OpenSSL implementation. + + + + ${ns}.trustStoreReloadingEnabled + false + + Whether the trust store should be reloaded periodically. + This setting is only applicable to the `rpc` namespace. + + + + ${ns}.trustStoreReloadIntervalMs + 10000 + + The interval at which the trust store should be reloaded (in milliseconds). + This setting is only applicable to the `rpc` namespace. + + + + ${ns}.dangerouslyFallbackIfKeysNotPresent + false + + Whether we should fall back to unencrypted connections if SSL is enabled but the required + key files are not present on the file system (instead of throwing a fatal error). + This setting is only applicable to the `rpc` namespace. This is a dangerous option and + should only be used under exceptional circumstances. + Spark also supports retrieving `${ns}.keyPassword`, `${ns}.keyStorePassword` and `${ns}.trustStorePassword` from From 7258c6f041da0014fbc73a7b24a3bd51d95bce50 Mon Sep 17 00:00:00 2001 From: Hasnain Lakhani Date: Tue, 10 Oct 2023 22:34:05 -0700 Subject: [PATCH 2/4] remove option --- docs/security.md | 15 +++------------ 1 file changed, 3 insertions(+), 12 deletions(-) diff --git a/docs/security.md b/docs/security.md index 28d1734f8fe57..92c6cb124576c 100644 --- a/docs/security.md +++ b/docs/security.md @@ -221,9 +221,9 @@ Unlike the other SSL settings for the UI, the RPC SSL is *not* automatically ena `spark.ssl.enabled` is set. It must be explicitly enabled, to ensure a safe migration path for users upgrading Spark versions. -The SSL encryption support supersedes the authentication and encryption settings mentioned -earlier. If both are enabled, the SSL settings take precedence and the prior settings will be -disabled at runtime, and a warning message will be emitted. +The SSL encryption support supersedes the encryption settings mentioned earlier. If both are +enabled, the SSL settings take precedence and the prior settings will be disabled at runtime, +and a warning message will be emitted. # Local Storage Encryption @@ -636,15 +636,6 @@ replaced with one of the above namespaces. This setting is only applicable to the `rpc` namespace. - - ${ns}.dangerouslyFallbackIfKeysNotPresent - false - - Whether we should fall back to unencrypted connections if SSL is enabled but the required - key files are not present on the file system (instead of throwing a fatal error). - This setting is only applicable to the `rpc` namespace. This is a dangerous option and - should only be used under exceptional circumstances. - Spark also supports retrieving `${ns}.keyPassword`, `${ns}.keyStorePassword` and `${ns}.trustStorePassword` from From a5f7c65d8ae25b494e771adb302a1c6e4545f90e Mon Sep 17 00:00:00 2001 From: Hasnain Lakhani Date: Wed, 11 Oct 2023 21:32:54 -0700 Subject: [PATCH 3/4] rework a little --- docs/security.md | 25 ++++++++++++++++++++----- 1 file changed, 20 insertions(+), 5 deletions(-) diff --git a/docs/security.md b/docs/security.md index 92c6cb124576c..ecfd04ae62f85 100644 --- a/docs/security.md +++ b/docs/security.md @@ -147,7 +147,26 @@ Note that when using files, Spark will not mount these files into the containers you to ensure that the secret files are deployed securely into your containers and that the driver's secret file agrees with the executors' secret file. -## Encryption +# Network Encryption + +Spark supports two mutually exclusive forms of encryption for RPC connections. + +The first is an AES-based encryption which relies on a shared secret, and thus requires +RPC authentication to also be enabled. + +The second is an SSL based encryption mechanism utilizing Netty's support for SSL. This requires +keys and certificates to be properly configured. It can be used with or without the authentication +mechanism discussed earlier. + +One may prefer to use the SSL based encryption in scenarios where compliance mandates the usage +of specific protocols; or to leverage the security of a more standard encryption library. However, +the AES based encryption is simpler to configure and may be preferred if the only requirement +is that data be encrypted in transit. + +If both options are enabled in the configuration, the SSL based RPC encryption takes precedence +and the AES based encryption will not be used (and a warning message will be emitted). + +## AES based Encryption Spark supports AES-based encryption for RPC connections. For encryption to be enabled, RPC authentication must also be enabled and properly configured. AES encryption uses the @@ -221,10 +240,6 @@ Unlike the other SSL settings for the UI, the RPC SSL is *not* automatically ena `spark.ssl.enabled` is set. It must be explicitly enabled, to ensure a safe migration path for users upgrading Spark versions. -The SSL encryption support supersedes the encryption settings mentioned earlier. If both are -enabled, the SSL settings take precedence and the prior settings will be disabled at runtime, -and a warning message will be emitted. - # Local Storage Encryption Spark supports encrypting temporary data written to local disks. This covers shuffle files, shuffle From fa9ca554def370eed8942f4b16f39e2ead127aee Mon Sep 17 00:00:00 2001 From: Hasnain Lakhani Date: Mon, 30 Oct 2023 13:32:00 -0700 Subject: [PATCH 4/4] comments --- docs/security.md | 43 ++++++++++++++++++++++++++++--------------- 1 file changed, 28 insertions(+), 15 deletions(-) diff --git a/docs/security.md b/docs/security.md index ecfd04ae62f85..2a1105fea33fe 100644 --- a/docs/security.md +++ b/docs/security.md @@ -508,11 +508,12 @@ The full breakdown of available SSL options can be found below. The `${ns}` plac replaced with one of the above namespaces. - + + @@ -525,9 +526,8 @@ replaced with one of the above namespaces.
When not set, the SSL port will be derived from the non-SSL port for the same service. A value of "0" will make the service bind to an ephemeral port. - -
This setting is not applicable to the `rpc` namespace. + @@ -542,6 +542,7 @@ replaced with one of the above namespaces.
Note: If not set, the default cipher suite for the JRE will be used. + @@ -549,6 +550,7 @@ replaced with one of the above namespaces. + @@ -557,16 +559,19 @@ replaced with one of the above namespaces. Path to the key store file. The path can be absolute or relative to the directory in which the process is started. + + - + + @@ -579,14 +584,15 @@ replaced with one of the above namespaces. this page. + + @@ -595,25 +601,30 @@ replaced with one of the above namespaces. Path to the trust store file. The path can be absolute or relative to the directory in which the process is started. + + - + + + @@ -621,9 +632,9 @@ replaced with one of the above namespaces. + @@ -631,25 +642,27 @@ replaced with one of the above namespaces. + + +
Property NameDefaultMeaning
Property NameDefaultMeaningSupported Namespaces
${ns}.enabled false Enables SSL. When enabled, ${ns}.ssl.protocol is required.ui,standalone,historyServer,rpc
${ns}.portui,standalone,historyServer
${ns}.enabledAlgorithmsui,standalone,historyServer,rpc
${ns}.keyPassword The password to the private key in the key store. ui,standalone,historyServer,rpc
${ns}.keyStoreui,standalone,historyServer,rpc
${ns}.keyStorePassword None Password to the key store.ui,standalone,historyServer,rpc
${ns}.keyStoreType JKSThe type of the key store. This setting is not applicable to the `rpc` namespace.The type of the key store.ui,standalone,historyServer
${ns}.protocolui,standalone,historyServer,rpc
${ns}.needClientAuth false - Whether to require client authentication. This setting is not applicable to the `rpc` - namespace. + Whether to require client authentication. ui,standalone,historyServer
${ns}.trustStoreui,standalone,historyServer,rpc
${ns}.trustStorePassword None Password for the trust store.ui,standalone,historyServer,rpc
${ns}.trustStoreType JKSThe type of the trust store. This setting is not applicable to the `rpc` namespace.The type of the trust store.ui,standalone,historyServer
${ns}.openSSLEnabled false Whether to use OpenSSL for cryptographic operations instead of the JDK SSL provider. - This setting is only applicable to the `rpc` namespace, and also requires the `certChain` - and `privateKey` settings to be set. + This setting requires the `certChain` and `privateKey` settings to be set. + This takes precedence over the `keyStore` and `trustStore` settings if both are specified. + If the OpenSSL library is not available at runtime, we will fall back to the JDK provider. rpc
${ns}.privateKey Path to the private key file in PEM format. The path can be absolute or relative to the directory in which the process is started. - This setting is only applicable to the `rpc` namespace, and is required when using the - OpenSSL implementation. + This setting is required when using the OpenSSL implementation. rpc
${ns}.certChain Path to the certificate chain file in PEM format. The path can be absolute or relative to the directory in which the process is started. - This setting is only applicable to the `rpc` namespace, and is required when using the - OpenSSL implementation. + This setting is required when using the OpenSSL implementation. rpc
${ns}.trustStoreReloadingEnabled false Whether the trust store should be reloaded periodically. - This setting is only applicable to the `rpc` namespace. + This setting is mostly only useful in standalone deployments, not k8s or yarn deployments. rpc
${ns}.trustStoreReloadIntervalMs 10000 The interval at which the trust store should be reloaded (in milliseconds). - This setting is only applicable to the `rpc` namespace. + This setting is mostly only useful in standalone deployments, not k8s or yarn deployments. rpc