spark.master.ui.port |
@@ -403,9 +403,7 @@ SPARK_MASTER_OPTS supports the following system properties:
Path to resources file which is used to find various resources while worker starting up.
The content of resources file should be formatted like
- [{"id":{"componentName":
- "spark.worker", "resourceName":"gpu"},
- "addresses":["0","1","2"]}] .
+ [{"id":{"componentName": "spark.worker", "resourceName":"gpu"}, "addresses":["0","1","2"]}] .
If a particular resource is not found in the resources file, the discovery script would be used to
find that resource. If the discovery script also does not find the resources, the worker will fail
to start up.
@@ -416,7 +414,7 @@ SPARK_MASTER_OPTS supports the following system properties:
SPARK_WORKER_OPTS supports the following system properties:
-
+
Property Name | Default | Meaning | Since Version |
spark.worker.initialRegistrationRetries |
@@ -549,8 +547,8 @@ You can also pass an option `--total-executor-cores ` to control the n
Spark applications supports the following configuration properties specific to standalone mode:
-
- Property Name | Default Value | Meaning | Since Version |
+
+ Property Name | Default Value | Meaning | Since Version |
spark.standalone.submit.waitAppCompletion |
false |
@@ -599,8 +597,8 @@ via http://[host:port]/[version]/submissions/[action] where
version is a protocol version, v1 as of today, and
action is one of the following supported actions.
-
- Command | Description | HTTP METHOD | Since Version |
+
+ Command | Description | HTTP METHOD | Since Version |
create |
Create a Spark driver via cluster mode. |
@@ -778,8 +776,8 @@ ZooKeeper is the best way to go for production-level high availability, but if y
In order to enable this recovery mode, you can set SPARK_DAEMON_JAVA_OPTS in spark-env using this configuration:
-
- System property | Default Value | Meaning | Since Version |
+
+ System property | Default Value | Meaning | Since Version |
spark.deploy.recoveryMode |
NONE |
diff --git a/docs/sql-data-sources-avro.md b/docs/sql-data-sources-avro.md
index ddfdc89370b1f..cbc3367e5f852 100644
--- a/docs/sql-data-sources-avro.md
+++ b/docs/sql-data-sources-avro.md
@@ -233,8 +233,8 @@ Data source options of Avro can be set via:
* the `.option` method on `DataFrameReader` or `DataFrameWriter`.
* the `options` parameter in function `from_avro`.
-
- Property Name | Default | Meaning | Scope | Since Version |
+
+ Property Name | Default | Meaning | Scope | Since Version |
avroSchema |
None |
@@ -331,8 +331,8 @@ Data source options of Avro can be set via:
## Configuration
Configuration of Avro can be done via `spark.conf.set` or by running `SET key=value` commands using SQL.
-
- Property Name | Default | Meaning | Since Version |
+
+ Property Name | Default | Meaning | Since Version |
spark.sql.legacy.replaceDatabricksSparkAvro.enabled |
true |
diff --git a/docs/sql-data-sources-hive-tables.md b/docs/sql-data-sources-hive-tables.md
index 0d16272ed6f86..b51cde53bd8fd 100644
--- a/docs/sql-data-sources-hive-tables.md
+++ b/docs/sql-data-sources-hive-tables.md
@@ -123,7 +123,7 @@ will compile against built-in Hive and use those classes for internal execution
The following options can be used to configure the version of Hive that is used to retrieve metadata:
-
+
Property Name | Default | Meaning | Since Version |
spark.sql.hive.metastore.version |
diff --git a/docs/sql-data-sources-orc.md b/docs/sql-data-sources-orc.md
index abd1901d24e4b..8267d39e949e5 100644
--- a/docs/sql-data-sources-orc.md
+++ b/docs/sql-data-sources-orc.md
@@ -129,8 +129,8 @@ When reading from Hive metastore ORC tables and inserting to Hive metastore ORC
### Configuration
-
- Property Name | Default | Meaning | Since Version |
+
+ Property Name | Default | Meaning | Since Version |
spark.sql.orc.impl |
native |
diff --git a/docs/sql-data-sources-parquet.md b/docs/sql-data-sources-parquet.md
index 7d80343214815..e944db24d76be 100644
--- a/docs/sql-data-sources-parquet.md
+++ b/docs/sql-data-sources-parquet.md
@@ -434,7 +434,7 @@ Other generic options can be found in
Property Name | Default | Meaning | Since Version |
spark.sql.parquet.binaryAsString |
diff --git a/docs/sql-performance-tuning.md b/docs/sql-performance-tuning.md
index 4ede18d1938bf..1dbe1bb7e1a26 100644
--- a/docs/sql-performance-tuning.md
+++ b/docs/sql-performance-tuning.md
@@ -34,7 +34,7 @@ memory usage and GC pressure. You can call `spark.catalog.uncacheTable("tableNam
Configuration of in-memory caching can be done via `spark.conf.set` or by running
`SET key=value` commands using SQL.
-
+
Property Name | Default | Meaning | Since Version |
spark.sql.inMemoryColumnarStorage.compressed |
@@ -62,7 +62,7 @@ Configuration of in-memory caching can be done via `spark.conf.set` or by runnin
The following options can also be used to tune the performance of query execution. It is possible
that these options will be deprecated in future release as more optimizations are performed automatically.
-
+
Property Name | Default | Meaning | Since Version |
spark.sql.files.maxPartitionBytes |
@@ -253,7 +253,7 @@ Adaptive Query Execution (AQE) is an optimization technique in Spark SQL that ma
### Coalescing Post Shuffle Partitions
This feature coalesces the post shuffle partitions based on the map output statistics when both `spark.sql.adaptive.enabled` and `spark.sql.adaptive.coalescePartitions.enabled` configurations are true. This feature simplifies the tuning of shuffle partition number when running queries. You do not need to set a proper shuffle partition number to fit your dataset. Spark can pick the proper shuffle partition number at runtime once you set a large enough initial number of shuffle partitions via `spark.sql.adaptive.coalescePartitions.initialPartitionNum` configuration.
-
+
Property Name | Default | Meaning | Since Version |
spark.sql.adaptive.coalescePartitions.enabled |
@@ -298,7 +298,7 @@ This feature coalesces the post shuffle partitions based on the map output stati
### Splitting skewed shuffle partitions
-
+
Property Name | Default | Meaning | Since Version |
spark.sql.adaptive.optimizeSkewsInRebalancePartitions.enabled |
@@ -320,7 +320,7 @@ This feature coalesces the post shuffle partitions based on the map output stati
### Converting sort-merge join to broadcast join
AQE converts sort-merge join to broadcast hash join when the runtime statistics of any join side is smaller than the adaptive broadcast hash join threshold. This is not as efficient as planning a broadcast hash join in the first place, but it's better than keep doing the sort-merge join, as we can save the sorting of both the join sides, and read shuffle files locally to save network traffic(if `spark.sql.adaptive.localShuffleReader.enabled` is true)
-
+
Property Name | Default | Meaning | Since Version |
spark.sql.adaptive.autoBroadcastJoinThreshold |
@@ -342,7 +342,7 @@ AQE converts sort-merge join to broadcast hash join when the runtime statistics
### Converting sort-merge join to shuffled hash join
AQE converts sort-merge join to shuffled hash join when all post shuffle partitions are smaller than a threshold, the max threshold can see the config `spark.sql.adaptive.maxShuffledHashJoinLocalMapThreshold`.
-
+
Property Name | Default | Meaning | Since Version |
spark.sql.adaptive.maxShuffledHashJoinLocalMapThreshold |
@@ -356,7 +356,7 @@ AQE converts sort-merge join to shuffled hash join when all post shuffle partiti
### Optimizing Skew Join
Data skew can severely downgrade the performance of join queries. This feature dynamically handles skew in sort-merge join by splitting (and replicating if needed) skewed tasks into roughly evenly sized tasks. It takes effect when both `spark.sql.adaptive.enabled` and `spark.sql.adaptive.skewJoin.enabled` configurations are enabled.
-
+
Property Name | Default | Meaning | Since Version |
spark.sql.adaptive.skewJoin.enabled |
@@ -393,7 +393,7 @@ Data skew can severely downgrade the performance of join queries. This feature d
### Misc
-
+
Property Name | Default | Meaning | Since Version |
spark.sql.adaptive.optimizer.excludedRules |
diff --git a/docs/sql-ref-ansi-compliance.md b/docs/sql-ref-ansi-compliance.md
index 93af3e6698474..9b933ec1f65c1 100644
--- a/docs/sql-ref-ansi-compliance.md
+++ b/docs/sql-ref-ansi-compliance.md
@@ -28,10 +28,40 @@ The casting behaviours are defined as store assignment rules in the standard.
When `spark.sql.storeAssignmentPolicy` is set to `ANSI`, Spark SQL complies with the ANSI store assignment rules. This is a separate configuration because its default value is `ANSI`, while the configuration `spark.sql.ansi.enabled` is disabled by default.
-|Property Name|Default| Meaning |Since Version|
-|-------------|-------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------|
-|`spark.sql.ansi.enabled`|false| When true, Spark tries to conform to the ANSI SQL specification: 1. Spark SQL will throw runtime exceptions on invalid operations, including integer overflow errors, string parsing errors, etc. 2. Spark will use different type coercion rules for resolving conflicts among data types. The rules are consistently based on data type precedence. |3.0.0|
-|`spark.sql.storeAssignmentPolicy`|ANSI| When inserting a value into a column with different data type, Spark will perform type conversion. Currently, we support 3 policies for the type coercion rules: ANSI, legacy and strict. 1. With ANSI policy, Spark performs the type coercion as per ANSI SQL. In practice, the behavior is mostly the same as PostgreSQL. It disallows certain unreasonable type conversions such as converting string to int or double to boolean. On inserting a numeric type column, an overflow error will be thrown if the value is out of the target data type's range. 2. With legacy policy, Spark allows the type coercion as long as it is a valid Cast, which is very loose. e.g. converting string to int or double to boolean is allowed. It is also the only behavior in Spark 2.x and it is compatible with Hive. 3. With strict policy, Spark doesn't allow any possible precision loss or data truncation in type coercion, e.g. converting double to int or decimal to double is not allowed. |3.0.0|
+
+Property Name | Default | Meaning | Since Version |
+
+ spark.sql.ansi.enabled |
+ false |
+
+ When true, Spark tries to conform to the ANSI SQL specification:
+ 1. Spark SQL will throw runtime exceptions on invalid operations, including integer overflow
+ errors, string parsing errors, etc.
+ 2. Spark will use different type coercion rules for resolving conflicts among data types.
+ The rules are consistently based on data type precedence.
+ |
+ 3.0.0 |
+
+
+ spark.sql.storeAssignmentPolicy |
+ ANSI |
+
+ When inserting a value into a column with different data type, Spark will perform type
+ conversion. Currently, we support 3 policies for the type coercion rules: ANSI, legacy and
+ strict.
+ 1. With ANSI policy, Spark performs the type coercion as per ANSI SQL. In practice, the behavior
+ is mostly the same as PostgreSQL. It disallows certain unreasonable type conversions such as
+ converting string to int or double to boolean. On inserting a numeric type column, an overflow
+ error will be thrown if the value is out of the target data type's range.
+ 2. With legacy policy, Spark allows the type coercion as long as it is a valid Cast, which is
+ very loose. e.g. converting string to int or double to boolean is allowed. It is also the only
+ behavior in Spark 2.x and it is compatible with Hive.
+ 3. With strict policy, Spark doesn't allow any possible precision loss or data truncation in
+ type coercion, e.g. converting double to int or decimal to double is not allowed.
+ |
+ 3.0.0 |
+
+
The following subsections present behaviour changes in arithmetic operations, type conversions, and SQL parsing when the ANSI mode enabled. For type conversions in Spark SQL, there are three kinds of them and this article will introduce them one by one: cast, store assignment and type coercion.
diff --git a/docs/structured-streaming-kafka-integration.md b/docs/structured-streaming-kafka-integration.md
index c5ffdf025b173..37846216fc758 100644
--- a/docs/structured-streaming-kafka-integration.md
+++ b/docs/structured-streaming-kafka-integration.md
@@ -607,7 +607,7 @@ The caching key is built up from the following information:
The following properties are available to configure the consumer pool:
-
+
Property Name | Default | Meaning | Since Version |
spark.kafka.consumer.cache.capacity |
@@ -657,7 +657,7 @@ Note that it doesn't leverage Apache Commons Pool due to the difference of chara
The following properties are available to configure the fetched data pool:
-
+
Property Name | Default | Meaning | Since Version |
spark.kafka.consumer.fetchedData.cache.timeout |
@@ -912,7 +912,7 @@ It will use different Kafka producer when delegation token is renewed; Kafka pro
The following properties are available to configure the producer pool:
-
+
Property Name | Default | Meaning | Since Version |
spark.kafka.producer.cache.timeout |
@@ -1039,7 +1039,7 @@ When none of the above applies then unsecure connection assumed.
Delegation tokens can be obtained from multiple clusters and ${cluster} is an arbitrary unique identifier which helps to group different configurations.
-
+
Property Name | Default | Meaning | Since Version |
spark.kafka.clusters.${cluster}.auth.bootstrap.servers |
diff --git a/sql/gen-sql-config-docs.py b/sql/gen-sql-config-docs.py
index 83334b6a1f539..b69a903b44f90 100644
--- a/sql/gen-sql-config-docs.py
+++ b/sql/gen-sql-config-docs.py
@@ -56,7 +56,7 @@ def generate_sql_configs_table_html(sql_configs, path):
The table will look something like this:
```html
-
+
Property Name | Default | Meaning | Since Version |
@@ -76,7 +76,7 @@ def generate_sql_configs_table_html(sql_configs, path):
with open(path, 'w') as f:
f.write(dedent(
"""
-
+
Property Name | Default | Meaning | Since Version |
"""
))
|