From 1c6184ad9073f948a324b37b43992f800b3e4f35 Mon Sep 17 00:00:00 2001
From: Nicholas Chammas <nicholas.chammas@gmail.com>
Date: Fri, 26 Jan 2024 10:41:13 -0500
Subject: [PATCH 01/10] explain terminology

---
 .../utils/src/main/resources/error/README.md  | 100 ++++++++++++------
 1 file changed, 68 insertions(+), 32 deletions(-)
diff --git a/common/utils/src/main/resources/error/README.md b/common/utils/src/main/resources/error/README.md
index 46294d57d797f..6cc498c7fb7f4 100644
--- a/common/utils/src/main/resources/error/README.md
+++ b/common/utils/src/main/resources/error/README.md
@@ -1,8 +1,32 @@
-# Guidelines
+# Guidelines for Throwing User-Facing Errors
 
 To throw a standardized user-facing error or exception, developers should specify the error class, a SQLSTATE,
 and message parameters rather than an arbitrary error message.
 
+## Terminology
+
+Though we will mainly talk about "error classes" in front of users, user-facing errors having many parts.
+
+The hierarchy is as follows:
+1. Error category
+2. Error sub-category
+3. Error state / SQLSTATE
+4. Error class
+5. Error sub-class
+
+The 5-character error state is simply the concatenation of the 2-character category with the 3-character sub-category.
+
+Here is an example:
+* Error category: `42` - "Syntax Error or Access Rule Violation"
+* Error sub-category: `K01`
+* Error state / SQLSTATE: `42K01` - "data type not fully specified"
+  * Error class: `INCOMPLETE_TYPE_DEFINITION`
+    * Error sub-class: `ARRAY`
+    * Error sub-class: `MAP`
+    * Error sub-class: `STRUCT`
+  * Error class: `DATATYPE_MISSING_SIZE`
+
+
 ## Usage
 
 1. Check if the error is an internal error.
@@ -10,43 +34,54 @@ and message parameters rather than an arbitrary error message.
    If true, use the error class `INTERNAL_ERROR` and skip to step 4.
 2. Check if an appropriate error class already exists in `error-classes.json`.
    If true, use the error class and skip to step 4.
-3. Add a new class with a new or existing SQLSTATE to `error-classes.json`; keep in mind the invariants below.
+3. Add a new class with a new or existing SQLSTATE to `error-classes.json`; keep in mind the invariants below, which are also [checked here][error-invariants].
 4. Check if the exception type already extends `SparkThrowable`.
    If true, skip to step 6.
 5. Mix `SparkThrowable` into the exception.
 6. Throw the exception with the error class and message parameters. If the same exception is thrown in several places, create an util function in a central place such as `QueryCompilationErrors.scala` to instantiate the exception.
 
+[error-invariants]: https://github.com/apache/spark/blob/40574bb36647a35d7ac1fe8b7b1efcb98b058065/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala#L138-L141
+
 ### Before
 
 Throw with arbitrary error message:
 
-    throw new TestException("Problem A because B")
+```scala
+throw new TestException("Problem A because B")
+```
 
 ### After
 
 `error-classes.json`
 
-    "PROBLEM_BECAUSE" : {
-      "message" : ["Problem <problem> because <cause>"],
-      "sqlState" : "XXXXX"
-    }
+```json
+"PROBLEM_BECAUSE" : {
+  "message" : ["Problem <problem> because <cause>"],
+  "sqlState" : "XXXXX"
+}
+```
 
 `SparkException.scala`
 
-    class SparkTestException(
-        errorClass: String,
-        messageParameters: Map[String, String])
-      extends TestException(SparkThrowableHelper.getMessage(errorClass, messageParameters))
-        with SparkThrowable {
-        
-      override def getMessageParameters: java.util.Map[String, String] = messageParameters.asJava
+```scala
+class SparkTestException(
+    errorClass: String,
+    messageParameters: Map[String, String])
+  extends TestException(SparkThrowableHelper.getMessage(errorClass, messageParameters))
+    with SparkThrowable {
+    
+  override def getMessageParameters: java.util.Map[String, String] =
+    messageParameters.asJava
 
-      override def getErrorClass: String = errorClass
-    }
+  override def getErrorClass: String = errorClass
+}
+```
 
 Throw with error class and message parameters:
 
-    throw new SparkTestException("PROBLEM_BECAUSE", Map("problem" -> "A", "cause" -> "B"))
+```scala
+throw new SparkTestException("PROBLEM_BECAUSE", Map("problem" -> "A", "cause" -> "B"))
+```
 
 ## Access fields
 
@@ -54,13 +89,14 @@ To access error fields, catch exceptions that extend `org.apache.spark.SparkThro
   - Error class with `getErrorClass`
   - SQLSTATE with `getSqlState`
 
-
-    try {
-        ...
-    } catch {
-        case e: SparkThrowable if Option(e.getSqlState).forall(_.startsWith("42")) =>
-            warn("Syntax error")
-    }
+```scala
+try {
+    ...
+} catch {
+    case e: SparkThrowable if Option(e.getSqlState).forall(_.startsWith("42")) =>
+        warn("Syntax error")
+}
+```
 
 ## Fields
 
@@ -81,9 +117,9 @@ You should not introduce new uncategorized errors. Instead, convert them to prop
 ### Message
 
 Error messages provide a descriptive, human-readable representation of the error.
-The message format accepts string parameters via the HTML tag syntax: e.g. <relationName>.
+The message format accepts string parameters via the HTML tag syntax: e.g. `<relationName>`.
 
-The values passed to the message shoudl not themselves be messages.
+The values passed to the message should not themselves be messages.
 They should be: runtime-values, keywords, identifiers, or other values that are not translated.
 
 The quality of the error message should match the
@@ -96,13 +132,13 @@ The quality of the error message should match the
 ### SQLSTATE
 
 SQLSTATE is an mandatory portable error identifier across SQL engines.
-SQLSTATE comprises a 2-character class value followed by a 3-character subclass value.
+SQLSTATE comprises a 2-character category followed by a 3-character sub-category.
 Spark prefers to re-use existing SQLSTATEs, preferably used by multiple vendors.
-For extension Spark claims the 'K**' subclass range.
-If a new class is needed it will also claim the 'K0' class.
+For extension Spark claims the `K**` sub-category range.
+If a new category is needed it will also claim the `K0` category.
 
-Internal errors should use the 'XX' class. You can subdivide internal errors by component. 
-For example: The existing 'XXKD0' is used for an internal analyzer error.
+Internal errors should use the `XX` category. You can subdivide internal errors by component.
+For example: The existing `XXKD0` is used for an internal analyzer error.
 
 #### Invariants
 
@@ -110,7 +146,7 @@ For example: The existing 'XXKD0' is used for an internal analyzer error.
 
 #### ANSI/ISO standard
 
-The following SQLSTATEs are collated from:
+The SQLSTATEs in `error-states.json` are collated from:
 - SQL2016
 - DB2 zOS/LUW
 - PostgreSQL 15

From e9e072ee577b0c63317ebad7734d91f22cc45906 Mon Sep 17 00:00:00 2001
From: Nicholas Chammas <nicholas.chammas@gmail.com>
Date: Sun, 28 Jan 2024 14:30:08 -0500
Subject: [PATCH 02/10] update error README per discussion on SPARK-46810

---
 .../utils/src/main/resources/error/README.md  | 94 +++++++++++--------
 1 file changed, 56 insertions(+), 38 deletions(-)

diff --git a/common/utils/src/main/resources/error/README.md b/common/utils/src/main/resources/error/README.md
index 6cc498c7fb7f4..fd2a6f01df00c 100644
--- a/common/utils/src/main/resources/error/README.md
+++ b/common/utils/src/main/resources/error/README.md
@@ -1,46 +1,62 @@
 # Guidelines for Throwing User-Facing Errors
 
-To throw a standardized user-facing error or exception, developers should specify the error class, a SQLSTATE,
-and message parameters rather than an arbitrary error message.
+To throw a user-facing error or exception, developers should specify a standardized SQLSTATE, an error condition, and message parameters rather than an arbitrary error message.
 
-## Terminology
+This guide will describe how to do this.
 
-Though we will mainly talk about "error classes" in front of users, user-facing errors having many parts.
+## Error Hierarchy and Terminology
 
-The hierarchy is as follows:
-1. Error category
-2. Error sub-category
-3. Error state / SQLSTATE
-4. Error class
-5. Error sub-class
+The error hierarchy is as follows:
+1. Error state / SQLSTATE
+2. Error condition
+3. Error sub-condition
 
-The 5-character error state is simply the concatenation of the 2-character category with the 3-character sub-category.
+The error state / SQLSTATE itself is comprised of two parts:
+1. Error class
+2. Error sub-class
 
-Here is an example:
-* Error category: `42` - "Syntax Error or Access Rule Violation"
-* Error sub-category: `K01`
-* Error state / SQLSTATE: `42K01` - "data type not fully specified"
-  * Error class: `INCOMPLETE_TYPE_DEFINITION`
-    * Error sub-class: `ARRAY`
-    * Error sub-class: `MAP`
-    * Error sub-class: `STRUCT`
-  * Error class: `DATATYPE_MISSING_SIZE`
+Acceptable values for these various error parts are defined in the following files:
+* `error-categories.json`
+* `error-states.json`
+* `error-classes.json`
 
+### Illustrative Example
+* Error state / SQLSTATE: `42K01` (Class: `42`; Sub-class: `K01`)
+  * Error condition: `DATATYPE_MISSING_SIZE`
+  * Error condition: `INCOMPLETE_TYPE_DEFINITION`
+    * Error sub-condition: `ARRAY`
+    * Error sub-condition: `MAP`
+    * Error sub-condition: `STRUCT`
+* Error state / SQLSTATE: `42604` (Class: `42`; Sub-class: `604`)
+  * Error condition: `INVALID_ESCAPE_CHAR`
+  * Error condition: `AS_OF_JOIN`
+    * Error sub-condition: `TOLERANCE_IS_NON_NEGATIVE`
+    * Error sub-condition: `TOLERANCE_IS_UNFOLDABLE`
+
+### Inconsistent Use of the Term "Error Class"
+
+Unfortunately, we have historically used the term "error class" inconsistently to refer both to a proper error class like `42` and also to an error condition like `DATATYPE_MISSING_SIZE`.
+
+Fixing this would require renaming `SparkException.errorClass` to `SparkException.errorCondition` and making similar changes to `ErrorClassesJsonReader` and other parts of the codebase. This may not be practical or even possible, depending on the impact of such a change on Spark's public API.
+
+Unless and until we refactor the codebase to bring it in line with the proper error terminology, we will have to live with the fact that a string like `DATATYPE_MISSING_SIZE` is called an "error condition" in our user-facing documentation but an "error class" in the code.
+
+For more details, please see [SPARK-46810][SPARK-46810].
+
+[SPARK-46810]: https://issues.apache.org/jira/browse/SPARK-46810
 
 ## Usage
 
 1. Check if the error is an internal error.
    Internal errors are bugs in the code that we do not expect users to encounter; this does not include unsupported operations.
-   If true, use the error class `INTERNAL_ERROR` and skip to step 4.
-2. Check if an appropriate error class already exists in `error-classes.json`.
-   If true, use the error class and skip to step 4.
-3. Add a new class with a new or existing SQLSTATE to `error-classes.json`; keep in mind the invariants below, which are also [checked here][error-invariants].
+   If true, use the error condition `INTERNAL_ERROR` and skip to step 4.
+2. Check if an appropriate error condition already exists in `error-classes.json`.
+   If true, use the error condition and skip to step 4.
+3. Add a new condition to `error-classes.json`. If the new condition requires a new error state, add the new error state to `error-states.json`.
 4. Check if the exception type already extends `SparkThrowable`.
    If true, skip to step 6.
 5. Mix `SparkThrowable` into the exception.
-6. Throw the exception with the error class and message parameters. If the same exception is thrown in several places, create an util function in a central place such as `QueryCompilationErrors.scala` to instantiate the exception.
-
-[error-invariants]: https://github.com/apache/spark/blob/40574bb36647a35d7ac1fe8b7b1efcb98b058065/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala#L138-L141
+6. Throw the exception with the error condition and message parameters. If the same exception is thrown in several places, create an util function in a central place such as `QueryCompilationErrors.scala` to instantiate the exception.
 
 ### Before
 
@@ -77,16 +93,16 @@ class SparkTestException(
 }
 ```
 
-Throw with error class and message parameters:
+Throw with error condition and message parameters:
 
 ```scala
 throw new SparkTestException("PROBLEM_BECAUSE", Map("problem" -> "A", "cause" -> "B"))
 ```
 
-## Access fields
+### Access fields
 
 To access error fields, catch exceptions that extend `org.apache.spark.SparkThrowable` and access
-  - Error class with `getErrorClass`
+  - Error condition with `getErrorClass`
   - SQLSTATE with `getSqlState`
 
 ```scala
@@ -100,14 +116,16 @@ try {
 
 ## Fields
 
-### Error class
+### Error condition
 
-Error classes are a succinct, human-readable representation of the error category.
+Error conditions are a succinct, human-readable representation of the error category.
 
-An uncategorized errors can be assigned to a legacy error class with the prefix `_LEGACY_ERROR_TEMP_` and an unused sequential number, for instance `_LEGACY_ERROR_TEMP_0053`.
+An uncategorized errors can be assigned to a legacy error condition with the prefix `_LEGACY_ERROR_TEMP_` and an unused sequential number, for instance `_LEGACY_ERROR_TEMP_0053`.
 
 You should not introduce new uncategorized errors. Instead, convert them to proper errors whenever encountering them in new code.
 
+**Note:** Though the proper term for this field is an "error condition", it is called `errorClass` in the codebase due to an unfortunate accident of history. For more details, please refer to [SPARK-46810].
+
 #### Invariants
 
 - Unique
@@ -131,13 +149,13 @@ The quality of the error message should match the
 
 ### SQLSTATE
 
-SQLSTATE is an mandatory portable error identifier across SQL engines.
-SQLSTATE comprises a 2-character category followed by a 3-character sub-category.
+SQLSTATE is a mandatory portable error identifier across SQL engines.
+SQLSTATE comprises a 2-character class followed by a 3-character sub-class.
 Spark prefers to re-use existing SQLSTATEs, preferably used by multiple vendors.
-For extension Spark claims the `K**` sub-category range.
-If a new category is needed it will also claim the `K0` category.
+For extension Spark claims the `K**` sub-class range.
+If a new class is needed it will also claim the `K0` class.
 
-Internal errors should use the `XX` category. You can subdivide internal errors by component.
+Internal errors should use the `XX` class. You can subdivide internal errors by component.
 For example: The existing `XXKD0` is used for an internal analyzer error.
 
 #### Invariants

From 6537240fe4883ce1b9f1a912616b1bb4a54a167e Mon Sep 17 00:00:00 2001
From: Nicholas Chammas <nicholas.chammas@gmail.com>
Date: Sun, 28 Jan 2024 14:37:44 -0500
Subject: [PATCH 03/10] error-classes.json -> error-conditions.json

---
 common/utils/src/main/resources/error/README.md           | 8 ++++----
 .../error/{error-classes.json => error-conditions.json}   | 0
 .../scala/org/apache/spark/SparkThrowableHelper.scala     | 5 ++++-
 ...fka-error-classes.json => kafka-error-conditions.json} | 0
 .../org/apache/spark/sql/kafka010/KafkaExceptions.scala   | 5 ++++-
 .../test/scala/org/apache/spark/SparkThrowableSuite.scala | 6 +++++-
 6 files changed, 17 insertions(+), 7 deletions(-)
 rename common/utils/src/main/resources/error/{error-classes.json => error-conditions.json} (100%)
 rename connector/kafka-0-10-sql/src/main/resources/error/{kafka-error-classes.json => kafka-error-conditions.json} (100%)

diff --git a/common/utils/src/main/resources/error/README.md b/common/utils/src/main/resources/error/README.md
index fd2a6f01df00c..0ed66f3fcf653 100644
--- a/common/utils/src/main/resources/error/README.md
+++ b/common/utils/src/main/resources/error/README.md
@@ -18,7 +18,7 @@ The error state / SQLSTATE itself is comprised of two parts:
 Acceptable values for these various error parts are defined in the following files:
 * `error-categories.json`
 * `error-states.json`
-* `error-classes.json`
+* `error-conditions.json`
 
 ### Illustrative Example
 * Error state / SQLSTATE: `42K01` (Class: `42`; Sub-class: `K01`)
@@ -50,9 +50,9 @@ For more details, please see [SPARK-46810][SPARK-46810].
 1. Check if the error is an internal error.
    Internal errors are bugs in the code that we do not expect users to encounter; this does not include unsupported operations.
    If true, use the error condition `INTERNAL_ERROR` and skip to step 4.
-2. Check if an appropriate error condition already exists in `error-classes.json`.
+2. Check if an appropriate error condition already exists in `error-conditions.json`.
    If true, use the error condition and skip to step 4.
-3. Add a new condition to `error-classes.json`. If the new condition requires a new error state, add the new error state to `error-states.json`.
+3. Add a new condition to `error-conditions.json`. If the new condition requires a new error state, add the new error state to `error-states.json`.
 4. Check if the exception type already extends `SparkThrowable`.
    If true, skip to step 6.
 5. Mix `SparkThrowable` into the exception.
@@ -68,7 +68,7 @@ throw new TestException("Problem A because B")
 
 ### After
 
-`error-classes.json`
+`error-conditions.json`
 
 ```json
 "PROBLEM_BECAUSE" : {
diff --git a/common/utils/src/main/resources/error/error-classes.json b/common/utils/src/main/resources/error/error-conditions.json
similarity index 100%
rename from common/utils/src/main/resources/error/error-classes.json
rename to common/utils/src/main/resources/error/error-conditions.json
diff --git a/common/utils/src/main/scala/org/apache/spark/SparkThrowableHelper.scala b/common/utils/src/main/scala/org/apache/spark/SparkThrowableHelper.scala
index a44d36ff85b55..b05980c6c23e5 100644
--- a/common/utils/src/main/scala/org/apache/spark/SparkThrowableHelper.scala
+++ b/common/utils/src/main/scala/org/apache/spark/SparkThrowableHelper.scala
@@ -32,7 +32,10 @@ private[spark] object ErrorMessageFormat extends Enumeration {
  */
 private[spark] object SparkThrowableHelper {
   val errorReader = new ErrorClassesJsonReader(
-    Seq(SparkClassUtils.getSparkClassLoader.getResource("error/error-classes.json")))
+    // Note that though we call them "error classes" here, the proper name is "error conditions",
+    // hence why the name of the JSON file different.
+    // For more information, please see: https://issues.apache.org/jira/browse/SPARK-46810
+    Seq(SparkClassUtils.getSparkClassLoader.getResource("error/error-conditions.json")))
 
   def getMessage(
       errorClass: String,
diff --git a/connector/kafka-0-10-sql/src/main/resources/error/kafka-error-classes.json b/connector/kafka-0-10-sql/src/main/resources/error/kafka-error-conditions.json
similarity index 100%
rename from connector/kafka-0-10-sql/src/main/resources/error/kafka-error-classes.json
rename to connector/kafka-0-10-sql/src/main/resources/error/kafka-error-conditions.json
diff --git a/connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaExceptions.scala b/connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaExceptions.scala
index b0e30f37af51f..dcbd16a87ab95 100644
--- a/connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaExceptions.scala
+++ b/connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaExceptions.scala
@@ -24,7 +24,10 @@ import org.apache.spark.{ErrorClassesJsonReader, SparkException}
 object KafkaExceptions {
   private val errorClassesJsonReader: ErrorClassesJsonReader =
     new ErrorClassesJsonReader(
-      Seq(getClass.getClassLoader.getResource("error/kafka-error-classes.json")))
+      // Note that though we call them "error classes" here, the proper name is "error conditions",
+      // hence why the name of the JSON file different.
+      // For more information, please see: https://issues.apache.org/jira/browse/SPARK-46810
+      Seq(getClass.getClassLoader.getResource("error/kafka-error-conditions.json")))
 
   def mismatchedTopicPartitionsBetweenEndOffsetAndPrefetched(
       tpsForPrefetched: Set[TopicPartition],
diff --git a/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala b/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala
index 1f3b28968025d..e513e85c08796 100644
--- a/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala
+++ b/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala
@@ -56,7 +56,10 @@ class SparkThrowableSuite extends SparkFunSuite {
    }}}
    */
   private val errorJsonFilePath = getWorkspaceFilePath(
-    "common", "utils", "src", "main", "resources", "error", "error-classes.json")
+    // Note that though we call them "error classes" here, the proper name is "error conditions",
+    // hence why the name of the JSON file different.
+    // For more information, please see: https://issues.apache.org/jira/browse/SPARK-46810
+    "common", "utils", "src", "main", "resources", "error", "error-conditions.json")
 
   private val errorReader = new ErrorClassesJsonReader(Seq(errorJsonFilePath.toUri.toURL))
 
@@ -162,6 +165,7 @@ class SparkThrowableSuite extends SparkFunSuite {
     checkIfUnique(messageFormats)
   }
 
+  // TODO: Delete
   test("Error classes match with document") {
     val errors = errorReader.errorInfoMap
 

From 5684213fe8ee91b4038b297c86d20e91fb2fb9a1 Mon Sep 17 00:00:00 2001
From: Nicholas Chammas <nicholas.chammas@gmail.com>
Date: Sun, 28 Jan 2024 14:42:31 -0500
Subject: [PATCH 04/10] error-categories.json -> error-classes.json

---
 .../utils/src/main/resources/error/README.md  |  2 +-
 ...ror-categories.json => error-classes.json} |  0
 .../apache/spark/SparkThrowableSuite.scala    | 19 +++++++++++--------
 3 files changed, 12 insertions(+), 9 deletions(-)
 rename common/utils/src/main/resources/error/{error-categories.json => error-classes.json} (100%)

diff --git a/common/utils/src/main/resources/error/README.md b/common/utils/src/main/resources/error/README.md
index 0ed66f3fcf653..dd12e1d272d9e 100644
--- a/common/utils/src/main/resources/error/README.md
+++ b/common/utils/src/main/resources/error/README.md
@@ -16,7 +16,7 @@ The error state / SQLSTATE itself is comprised of two parts:
 2. Error sub-class
 
 Acceptable values for these various error parts are defined in the following files:
-* `error-categories.json`
+* `error-classes.json`
 * `error-states.json`
 * `error-conditions.json`
 
diff --git a/common/utils/src/main/resources/error/error-categories.json b/common/utils/src/main/resources/error/error-classes.json
similarity index 100%
rename from common/utils/src/main/resources/error/error-categories.json
rename to common/utils/src/main/resources/error/error-classes.json
diff --git a/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala b/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala
index e513e85c08796..d51ebe2588213 100644
--- a/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala
+++ b/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala
@@ -125,23 +125,26 @@ class SparkThrowableSuite extends SparkFunSuite {
       s"Error classes without SQLSTATE: ${errorClassesNoSqlState.mkString(", ")}")
   }
 
-  test("Error category and error state / SQLSTATE invariants") {
-    val errorCategoriesJson = Utils.getSparkClassLoader.getResource("error/error-categories.json")
+  test("Error class and error state / SQLSTATE invariants") {
+    // Unlike in the rest of the codebase, the term "error class" is used as it is in our
+    // documentation as well as in the SQL standard.
+    // For more information, please see: https://issues.apache.org/jira/browse/SPARK-46810
+    val errorClassesJson = Utils.getSparkClassLoader.getResource("error/error-classes.json")
     val errorStatesJson = Utils.getSparkClassLoader.getResource("error/error-states.json")
     val mapper = JsonMapper.builder()
       .addModule(DefaultScalaModule)
       .enable(STRICT_DUPLICATE_DETECTION)
       .build()
-    val errorCategories = mapper.readValue(
-      errorCategoriesJson, new TypeReference[Map[String, String]]() {})
+    val errorClasses = mapper.readValue(
+      errorClassesJson, new TypeReference[Map[String, String]]() {})
     val errorStates = mapper.readValue(
       errorStatesJson, new TypeReference[Map[String, ErrorStateInfo]]() {})
-    val errorClassStates = errorReader.errorInfoMap.values.toSeq.flatMap(_.sqlState).toSet
+    val errorConditionStates = errorReader.errorInfoMap.values.toSeq.flatMap(_.sqlState).toSet
     assert(Set("22012", "22003", "42601").subsetOf(errorStates.keySet))
-    assert(errorCategories.keySet.filter(!_.matches("[A-Z0-9]{2}")).isEmpty)
+    assert(errorClasses.keySet.filter(!_.matches("[A-Z0-9]{2}")).isEmpty)
     assert(errorStates.keySet.filter(!_.matches("[A-Z0-9]{5}")).isEmpty)
-    assert(errorStates.keySet.map(_.substring(0, 2)).diff(errorCategories.keySet).isEmpty)
-    assert(errorClassStates.diff(errorStates.keySet).isEmpty)
+    assert(errorStates.keySet.map(_.substring(0, 2)).diff(errorClasses.keySet).isEmpty)
+    assert(errorConditionStates.diff(errorStates.keySet).isEmpty)
   }
 
   test("Message invariants") {

From b2b9acf52714ae9d7c21f6be6f40da77743f998d Mon Sep 17 00:00:00 2001
From: Nicholas Chammas <nicholas.chammas@gmail.com>
Date: Sun, 28 Jan 2024 14:53:07 -0500
Subject: [PATCH 05/10] wording tweaks

---
 common/utils/src/main/resources/error/README.md                | 2 ++
 core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/common/utils/src/main/resources/error/README.md b/common/utils/src/main/resources/error/README.md
index dd12e1d272d9e..05df1ba859d8d 100644
--- a/common/utils/src/main/resources/error/README.md
+++ b/common/utils/src/main/resources/error/README.md
@@ -20,6 +20,8 @@ Acceptable values for these various error parts are defined in the following fil
 * `error-states.json`
 * `error-conditions.json`
 
+The terms error class, state, and condition come from the SQL standard.
+
 ### Illustrative Example
 * Error state / SQLSTATE: `42K01` (Class: `42`; Sub-class: `K01`)
   * Error condition: `DATATYPE_MISSING_SIZE`
diff --git a/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala b/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala
index d51ebe2588213..21417aa516956 100644
--- a/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala
+++ b/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala
@@ -126,7 +126,7 @@ class SparkThrowableSuite extends SparkFunSuite {
   }
 
   test("Error class and error state / SQLSTATE invariants") {
-    // Unlike in the rest of the codebase, the term "error class" is used as it is in our
+    // Unlike in the rest of the codebase, the term "error class" is used here as it is in our
     // documentation as well as in the SQL standard.
     // For more information, please see: https://issues.apache.org/jira/browse/SPARK-46810
     val errorClassesJson = Utils.getSparkClassLoader.getResource("error/error-classes.json")

From e3ce5ec58281aa4c597427cc03e94c0a7bc9dd66 Mon Sep 17 00:00:00 2001
From: Nicholas Chammas <nicholas.chammas@gmail.com>
Date: Sun, 28 Jan 2024 14:54:18 -0500
Subject: [PATCH 06/10] remove todo

---
 core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala | 1 -
 1 file changed, 1 deletion(-)

diff --git a/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala b/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala
index 21417aa516956..de615abee5eb2 100644
--- a/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala
+++ b/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala
@@ -168,7 +168,6 @@ class SparkThrowableSuite extends SparkFunSuite {
     checkIfUnique(messageFormats)
   }
 
-  // TODO: Delete
   test("Error classes match with document") {
     val errors = errorReader.errorInfoMap
 

From 483c97d5fc5e1bf0ee801a5a6f015e4840ca556d Mon Sep 17 00:00:00 2001
From: Nicholas Chammas <nicholas.chammas@gmail.com>
Date: Sat, 16 Mar 2024 12:18:00 -0400
Subject: [PATCH 07/10] update conditions from master

---
 .../resources/error/error-conditions.json     | 705 ++++++++++++++++--
 1 file changed, 632 insertions(+), 73 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-conditions.json b/common/utils/src/main/resources/error/error-conditions.json
index 64d65fd4beed1..9362b8342abf1 100644
--- a/common/utils/src/main/resources/error/error-conditions.json
+++ b/common/utils/src/main/resources/error/error-conditions.json
@@ -40,7 +40,7 @@
   "AMBIGUOUS_COLUMN_REFERENCE" : {
     "message" : [
       "Column <name> is ambiguous. It's because you joined several DataFrame together, and some of these DataFrames are the same.",
-      "This column points to one of the DataFrame but Spark is unable to figure out which one.",
+      "This column points to one of the DataFrames but Spark is unable to figure out which one.",
       "Please alias the DataFrames with different names via `DataFrame.alias` before joining them,",
       "and specify the column using qualified name, e.g. `df.alias(\"a\").join(df.alias(\"b\"), col(\"a.id\") > col(\"b.id\"))`."
     ],
@@ -433,6 +433,12 @@
     ],
     "sqlState" : "56000"
   },
+  "CLASS_NOT_OVERRIDE_EXPECTED_METHOD" : {
+    "message" : [
+      "<className> must override either <method1> or <method2>."
+    ],
+    "sqlState" : "38000"
+  },
   "CLASS_UNSUPPORTED_BY_MAP_OBJECTS" : {
     "message" : [
       "`MapObjects` does not support the class <cls> as resulting collection."
@@ -441,8 +447,20 @@
   },
   "CODEC_NOT_AVAILABLE" : {
     "message" : [
-      "The codec <codecName> is not available. Consider to set the config <configKey> to <configVal>."
+      "The codec <codecName> is not available."
     ],
+    "subClass" : {
+      "WITH_AVAILABLE_CODECS_SUGGESTION" : {
+        "message" : [
+          "Available codecs are <availableCodecs>."
+        ]
+      },
+      "WITH_CONF_SUGGESTION" : {
+        "message" : [
+          "Consider to set the config <configKey> to <configVal>."
+        ]
+      }
+    },
     "sqlState" : "56038"
   },
   "CODEC_SHORT_NAME_NOT_FOUND" : {
@@ -451,6 +469,12 @@
     ],
     "sqlState" : "42704"
   },
+  "COLLATION_INVALID_NAME" : {
+    "message" : [
+      "The value <collationName> does not represent a correct collation name. Suggested valid collation name: [<proposal>]."
+    ],
+    "sqlState" : "42704"
+  },
   "COLLECTION_SIZE_LIMIT_EXCEEDED" : {
     "message" : [
       "Can't create array with <numberOfElements> elements which exceeding the array size limit <maxRoundedArrayLength>,"
@@ -474,15 +498,15 @@
     },
     "sqlState" : "54000"
   },
-  "COLUMN_ALIASES_IS_NOT_ALLOWED" : {
+  "COLUMN_ALIASES_NOT_ALLOWED" : {
     "message" : [
-      "Columns aliases are not allowed in <op>."
+      "Column aliases are not allowed in <op>."
     ],
     "sqlState" : "42601"
   },
   "COLUMN_ALREADY_EXISTS" : {
     "message" : [
-      "The column <columnName> already exists. Consider to choose another name or rename the existing column."
+      "The column <columnName> already exists. Choose another name or rename the existing column."
     ],
     "sqlState" : "42711"
   },
@@ -672,6 +696,11 @@
           "To convert values from <srcType> to <targetType>, you can use the functions <functionNames> instead."
         ]
       },
+      "COLLATION_MISMATCH" : {
+        "message" : [
+          "Collations <collationNameLeft> and <collationNameRight> are not compatible. Please use the same collation for both strings."
+        ]
+      },
       "CREATE_MAP_KEY_DIFF_TYPES" : {
         "message" : [
           "The given keys of function <functionName> should all be the same type, but they are <dataType>."
@@ -852,7 +881,7 @@
       },
       "UNEXPECTED_INPUT_TYPE" : {
         "message" : [
-          "Parameter <paramIndex> requires the <requiredType> type, however <inputSql> has the type <inputType>."
+          "The <paramIndex> parameter requires the <requiredType> type, however <inputSql> has the type <inputType>."
         ]
       },
       "UNEXPECTED_NULL" : {
@@ -1224,6 +1253,12 @@
     ],
     "sqlState" : "22018"
   },
+  "FAILED_READ_FILE" : {
+    "message" : [
+      "Encountered error while reading file <path>."
+    ],
+    "sqlState" : "KD001"
+  },
   "FAILED_REGISTER_CLASS_WITH_KRYO" : {
     "message" : [
       "Failed to register classes with Kryo."
@@ -1242,6 +1277,12 @@
     ],
     "sqlState" : "58030"
   },
+  "FAILED_ROW_TO_JSON" : {
+    "message" : [
+      "Failed to convert the row value <value> of the class <class> to the target SQL type <sqlType> in the JSON format."
+    ],
+    "sqlState" : "2203G"
+  },
   "FIELDS_ALREADY_EXISTS" : {
     "message" : [
       "Cannot <op> column, because <fieldNames> already exists in <struct>."
@@ -1260,6 +1301,12 @@
     ],
     "sqlState" : "42809"
   },
+  "FOREACH_BATCH_USER_FUNCTION_ERROR" : {
+    "message" : [
+      "An error occurred in the user provided function in foreach batch sink. Reason: <reason>"
+    ],
+    "sqlState" : "39000"
+  },
   "FOUND_MULTIPLE_DATA_SOURCES" : {
     "message" : [
       "Detected multiple data sources with the name '<provider>'. Please check the data source isn't simultaneously registered and located in the classpath."
@@ -1356,6 +1403,24 @@
     ],
     "sqlState" : "42601"
   },
+  "ILLEGAL_STATE_STORE_VALUE" : {
+    "message" : [
+      "Illegal value provided to the State Store"
+    ],
+    "subClass" : {
+      "EMPTY_LIST_VALUE" : {
+        "message" : [
+          "Cannot write empty list values to State Store for StateName <stateName>."
+        ]
+      },
+      "NULL_VALUE" : {
+        "message" : [
+          "Cannot write null values to State Store for StateName <stateName>."
+        ]
+      }
+    },
+    "sqlState" : "42601"
+  },
   "INCOMPARABLE_PIVOT_COLUMN" : {
     "message" : [
       "Invalid pivot column <columnName>. Pivot columns must be comparable."
@@ -1481,6 +1546,13 @@
           "2) You can form a valid datetime pattern with the guide from '<docroot>/sql-ref-datetime-pattern.html'."
         ]
       },
+      "DATETIME_WEEK_BASED_PATTERN" : {
+        "message" : [
+          "Spark >= 3.0:",
+          "All week-based patterns are unsupported since Spark 3.0, detected week-based character: <c>.",
+          "Please use the SQL function EXTRACT instead."
+        ]
+      },
       "PARSE_DATETIME_BY_NEW_PARSER" : {
         "message" : [
           "Spark >= 3.0:",
@@ -1656,6 +1728,12 @@
     ],
     "sqlState" : "XX000"
   },
+  "INTERNAL_ERROR_TWS" : {
+    "message" : [
+      "<message>"
+    ],
+    "sqlState" : "XX000"
+  },
   "INTERVAL_ARITHMETIC_OVERFLOW" : {
     "message" : [
       "<message>.<alternative>"
@@ -1704,6 +1782,12 @@
     },
     "sqlState" : "22003"
   },
+  "INVALID_BUCKET_COLUMN_DATA_TYPE" : {
+    "message" : [
+      "Cannot use <type> for bucket column. Collated data types are not supported for bucketing."
+    ],
+    "sqlState" : "42601"
+  },
   "INVALID_BUCKET_FILE" : {
     "message" : [
       "Invalid bucket file: <path>."
@@ -1728,6 +1812,19 @@
     ],
     "sqlState" : "42000"
   },
+  "INVALID_CONF_VALUE" : {
+    "message" : [
+      "The value '<confValue>' in the config \"<confName>\" is invalid."
+    ],
+    "subClass" : {
+      "TIME_ZONE" : {
+        "message" : [
+          "Cannot resolve the given timezone."
+        ]
+      }
+    },
+    "sqlState" : "22022"
+  },
   "INVALID_CURSOR" : {
     "message" : [
       "The cursor is invalid."
@@ -1756,6 +1853,24 @@
     },
     "sqlState" : "HY109"
   },
+  "INVALID_DATETIME_PATTERN" : {
+    "message" : [
+      "Unrecognized datetime pattern: <pattern>."
+    ],
+    "subClass" : {
+      "ILLEGAL_CHARACTER" : {
+        "message" : [
+          "Illegal pattern character found in datetime pattern: <c>. Please provide legal character."
+        ]
+      },
+      "LENGTH" : {
+        "message" : [
+          "Too many letters in datetime pattern: <pattern>. Please reduce pattern length."
+        ]
+      }
+    },
+    "sqlState" : "22007"
+  },
   "INVALID_DEFAULT_VALUE" : {
     "message" : [
       "Failed to execute <statement> command because the destination column or variable <colName> has a DEFAULT value <defaultValue>,"
@@ -1766,6 +1881,11 @@
           "which requires <expectedType> type, but the statement provided a value of incompatible <actualType> type."
         ]
       },
+      "NOT_CONSTANT" : {
+        "message" : [
+          "which is not a constant expression whose equivalent value is known at query planning time."
+        ]
+      },
       "SUBQUERY_EXPRESSION" : {
         "message" : [
           "which contains subquery expressions."
@@ -1779,6 +1899,34 @@
     },
     "sqlState" : "42623"
   },
+  "INVALID_DELIMITER_VALUE" : {
+    "message" : [
+      "Invalid value for delimiter."
+    ],
+    "subClass" : {
+      "DELIMITER_LONGER_THAN_EXPECTED" : {
+        "message" : [
+          "Delimiter cannot be more than one character: <str>."
+        ]
+      },
+      "EMPTY_STRING" : {
+        "message" : [
+          "Delimiter cannot be empty string."
+        ]
+      },
+      "SINGLE_BACKSLASH" : {
+        "message" : [
+          "Single backslash is prohibited. It has special meaning as beginning of an escape sequence. To get the backslash character, pass a string with two backslashes as the delimiter."
+        ]
+      },
+      "UNSUPPORTED_SPECIAL_CHARACTER" : {
+        "message" : [
+          "Unsupported special character for delimiter: <str>."
+        ]
+      }
+    },
+    "sqlState" : "42602"
+  },
   "INVALID_DRIVER_MEMORY" : {
     "message" : [
       "System memory <systemMemory> must be at least <minSystemMemory>.",
@@ -1811,6 +1959,12 @@
     ],
     "sqlState" : "F0000"
   },
+  "INVALID_EXPRESSION_ENCODER" : {
+    "message" : [
+      "Found an invalid expression encoder. Expects an instance of ExpressionEncoder but got <encoderType>. For more information consult '<docroot>/api/java/index.html?org/apache/spark/sql/Encoder.html'."
+    ],
+    "sqlState" : "42001"
+  },
   "INVALID_EXTRACT_BASE_FIELD_TYPE" : {
     "message" : [
       "Can't extract a value from <base>. Need a complex type [STRUCT, ARRAY, MAP] but got <other>."
@@ -1942,15 +2096,12 @@
     },
     "sqlState" : "HY000"
   },
-  "INVALID_HIVE_COLUMN_NAME" : {
-    "message" : [
-      "Cannot create the table <tableName> having the column <columnName> whose name contains invalid characters <invalidChars> in Hive metastore."
-    ],
-    "sqlState" : "42K05"
-  },
   "INVALID_IDENTIFIER" : {
     "message" : [
-      "The identifier <ident> is invalid. Please, consider quoting it with back-quotes as `<ident>`."
+      "The unquoted identifier <ident> is invalid and must be back quoted as: `<ident>`.",
+      "Unquoted identifiers can only contain ASCII letters ('a' - 'z', 'A' - 'Z'), digits ('0' - '9'), and underbar ('_').",
+      "Unquoted identifiers must also not start with a digit.",
+      "Different data sources and meta stores may impose additional restrictions on valid identifiers."
     ],
     "sqlState" : "42602"
   },
@@ -1988,6 +2139,74 @@
     },
     "sqlState" : "42000"
   },
+  "INVALID_INTERVAL_FORMAT" : {
+    "message" : [
+      "Error parsing '<input>' to interval. Please ensure that the value provided is in a valid format for defining an interval. You can reference the documentation for the correct format."
+    ],
+    "subClass" : {
+      "ARITHMETIC_EXCEPTION" : {
+        "message" : [
+          "Uncaught arithmetic exception while parsing '<input>'."
+        ]
+      },
+      "INPUT_IS_EMPTY" : {
+        "message" : [
+          "Interval string cannot be empty."
+        ]
+      },
+      "INPUT_IS_NULL" : {
+        "message" : [
+          "Interval string cannot be null."
+        ]
+      },
+      "INVALID_FRACTION" : {
+        "message" : [
+          "<unit> cannot have fractional part."
+        ]
+      },
+      "INVALID_PRECISION" : {
+        "message" : [
+          "Interval can only support nanosecond precision, <value> is out of range."
+        ]
+      },
+      "INVALID_PREFIX" : {
+        "message" : [
+          "Invalid interval prefix <prefix>."
+        ]
+      },
+      "INVALID_UNIT" : {
+        "message" : [
+          "Invalid unit <unit>."
+        ]
+      },
+      "INVALID_VALUE" : {
+        "message" : [
+          "Invalid value <value>."
+        ]
+      },
+      "MISSING_NUMBER" : {
+        "message" : [
+          "Expect a number after <word> but hit EOL."
+        ]
+      },
+      "MISSING_UNIT" : {
+        "message" : [
+          "Expect a unit name after <word> but hit EOL."
+        ]
+      },
+      "UNKNOWN_PARSING_ERROR" : {
+        "message" : [
+          "Unknown error when parsing <word>."
+        ]
+      },
+      "UNRECOGNIZED_NUMBER" : {
+        "message" : [
+          "Unrecognized number <number>."
+        ]
+      }
+    },
+    "sqlState" : "22006"
+  },
   "INVALID_INVERSE_DISTRIBUTION_FUNCTION" : {
     "message" : [
       "Invalid inverse distribution function <funcName>."
@@ -2011,6 +2230,12 @@
     },
     "sqlState" : "42K0K"
   },
+  "INVALID_JSON_DATA_TYPE" : {
+    "message" : [
+      "Failed to convert the JSON string '<invalidType>' to a data type. Please enter a valid data type."
+    ],
+    "sqlState" : "2203G"
+  },
   "INVALID_JSON_ROOT_FIELD" : {
     "message" : [
       "Cannot convert JSON root field to target Spark type."
@@ -2238,6 +2463,12 @@
     },
     "sqlState" : "22023"
   },
+  "INVALID_PARTITION_COLUMN_DATA_TYPE" : {
+    "message" : [
+      "Cannot use <type> for partition column."
+    ],
+    "sqlState" : "0A000"
+  },
   "INVALID_PARTITION_OPERATION" : {
     "message" : [
       "The partition command is invalid."
@@ -2437,6 +2668,12 @@
     ],
     "sqlState" : "07501"
   },
+  "INVALID_STATEMENT_OR_CLAUSE" : {
+    "message" : [
+      "The statement or clause: <operation> is not valid."
+    ],
+    "sqlState" : "42601"
+  },
   "INVALID_SUBQUERY_EXPRESSION" : {
     "message" : [
       "Invalid subquery:"
@@ -2723,6 +2960,12 @@
     ],
     "sqlState" : "07501"
   },
+  "NONEXISTENT_FIELD_NAME_IN_LIST" : {
+    "message" : [
+      "Field(s) <nonExistFields> do(es) not exist. Available fields: <fieldNames>"
+    ],
+    "sqlState" : "HV091"
+  },
   "NON_FOLDABLE_ARGUMENT" : {
     "message" : [
       "The function <funcName> requires the parameter <paramName> to be a foldable expression of the type <paramType>, but the actual argument is a non-foldable."
@@ -2872,6 +3115,12 @@
     },
     "sqlState" : "0A000"
   },
+  "NOT_UNRESOLVED_ENCODER" : {
+    "message" : [
+      "Unresolved encoder expected, but <attr> was found."
+    ],
+    "sqlState" : "42601"
+  },
   "NO_DEFAULT_COLUMN_VALUE_AVAILABLE" : {
     "message" : [
       "Can't determine the default value for <colName> since it is not nullable and it has no default value."
@@ -2920,6 +3169,12 @@
     ],
     "sqlState" : "2200E"
   },
+  "NULL_QUERY_STRING_EXECUTE_IMMEDIATE" : {
+    "message" : [
+      "Execute immediate requires a non-null variable as the query string, but the provided variable <varName> is null."
+    ],
+    "sqlState" : "22004"
+  },
   "NUMERIC_OUT_OF_SUPPORTED_RANGE" : {
     "message" : [
       "The value <value> cannot be interpreted as a numeric since it has more than 38 digits."
@@ -3062,6 +3317,12 @@
     ],
     "sqlState" : "38000"
   },
+  "PYTHON_STREAMING_DATA_SOURCE_RUNTIME_ERROR" : {
+    "message" : [
+      "Failed when Python streaming data source perform <action>: <msg>"
+    ],
+    "sqlState" : "38000"
+  },
   "RECURSIVE_PROTOBUF_SCHEMA" : {
     "message" : [
       "Found recursive reference in Protobuf schema, which can not be processed by Spark by default: <fieldDescriptor>. try setting the option `recursive.fields.max.depth` 0 to 10. Going beyond 10 levels of recursion is not allowed."
@@ -3230,6 +3491,49 @@
     ],
     "sqlState" : "0A000"
   },
+  "STATEFUL_PROCESSOR_CANNOT_PERFORM_OPERATION_WITH_INVALID_HANDLE_STATE" : {
+    "message" : [
+      "Failed to perform stateful processor operation=<operationType> with invalid handle state=<handleState>."
+    ],
+    "sqlState" : "42802"
+  },
+  "STATEFUL_PROCESSOR_CANNOT_PERFORM_OPERATION_WITH_INVALID_TIMEOUT_MODE" : {
+    "message" : [
+      "Failed to perform stateful processor operation=<operationType> with invalid timeoutMode=<timeoutMode>"
+    ],
+    "sqlState" : "42802"
+  },
+  "STATE_STORE_CANNOT_CREATE_COLUMN_FAMILY_WITH_RESERVED_CHARS" : {
+    "message" : [
+      "Failed to create column family with unsupported starting character and name=<colFamilyName>."
+    ],
+    "sqlState" : "42802"
+  },
+  "STATE_STORE_CANNOT_USE_COLUMN_FAMILY_WITH_INVALID_NAME" : {
+    "message" : [
+      "Failed to perform column family operation=<operationName> with invalid name=<colFamilyName>. Column family name cannot be empty or include leading/trailing spaces or use the reserved keyword=default"
+    ],
+    "sqlState" : "42802"
+  },
+  "STATE_STORE_HANDLE_NOT_INITIALIZED" : {
+    "message" : [
+      "The handle has not been initialized for this StatefulProcessor.",
+      "Please only use the StatefulProcessor within the transformWithState operator."
+    ],
+    "sqlState" : "42802"
+  },
+  "STATE_STORE_UNSUPPORTED_OPERATION" : {
+    "message" : [
+      "<operationType> operation not supported with <entity>"
+    ],
+    "sqlState" : "XXKST"
+  },
+  "STATE_STORE_UNSUPPORTED_OPERATION_ON_MISSING_COLUMN_FAMILY" : {
+    "message" : [
+      "State store operation=<operationType> not supported on missing column family=<colFamilyName>."
+    ],
+    "sqlState" : "42802"
+  },
   "STATIC_PARTITION_COLUMN_IN_INSERT_COLUMN_LIST" : {
     "message" : [
       "Static partition column <staticName> is also specified in the column list."
@@ -3311,6 +3615,13 @@
     ],
     "sqlState" : "42601"
   },
+  "STREAMING_STATEFUL_OPERATOR_NOT_MATCH_IN_STATE_METADATA" : {
+    "message" : [
+      "Streaming stateful operator name does not match with the operator in state metadata. This likely to happen when user adds/removes/changes stateful operator of existing streaming query.",
+      "Stateful operators in the metadata: [<OpsInMetadataSeq>]; Stateful operators in current batch: [<OpsInCurBatchSeq>]."
+    ],
+    "sqlState" : "42K03"
+  },
   "STREAM_FAILED" : {
     "message" : [
       "Query [id = <id>, runId = <runId>] terminated with exception: <message>"
@@ -3397,6 +3708,12 @@
     ],
     "sqlState" : "42802"
   },
+  "UDTF_INVALID_REQUESTED_SELECTED_EXPRESSION_FROM_ANALYZE_METHOD_REQUIRES_ALIAS" : {
+    "message" : [
+      "Failed to evaluate the user-defined table function because its 'analyze' method returned a requested 'select' expression (<expression>) that does not include a corresponding alias; please update the UDTF to specify an alias there and then try the query again."
+    ],
+    "sqlState" : "42802"
+  },
   "UNABLE_TO_ACQUIRE_MEMORY" : {
     "message" : [
       "Unable to acquire <requestedBytes> bytes of memory, got <receivedBytes>."
@@ -3614,6 +3931,18 @@
     "message" : [
       "Cannot call the method \"<methodName>\" of the class \"<className>\"."
     ],
+    "subClass" : {
+      "FIELD_INDEX" : {
+        "message" : [
+          "The row shall have a schema to get an index of the field <fieldName>."
+        ]
+      },
+      "WITHOUT_SUGGESTION" : {
+        "message" : [
+          ""
+        ]
+      }
+    },
     "sqlState" : "0A000"
   },
   "UNSUPPORTED_CHAR_OR_VARCHAR_AS_STRING" : {
@@ -3623,6 +3952,19 @@
     ],
     "sqlState" : "0A000"
   },
+  "UNSUPPORTED_COLLATION" : {
+    "message" : [
+      "Collation <collationName> is not supported for:"
+    ],
+    "subClass" : {
+      "FOR_FUNCTION" : {
+        "message" : [
+          "function <functionName>. Please try to use a different collation."
+        ]
+      }
+    },
+    "sqlState" : "0A000"
+  },
   "UNSUPPORTED_DATASOURCE_FOR_DIRECT_QUERY" : {
     "message" : [
       "Unsupported data source type for direct query on files: <dataSourceType>"
@@ -3751,6 +4093,11 @@
           "Catalog <catalogName> does not support <operation>."
         ]
       },
+      "COLLATION" : {
+        "message" : [
+          "Collation is not yet supported."
+        ]
+      },
       "COMBINATION_QUERY_RESULT_CLAUSES" : {
         "message" : [
           "Combination of ORDER BY/SORT BY/DISTRIBUTE BY/CLUSTER BY."
@@ -3916,6 +4263,16 @@
           "<variableName> is a VARIABLE and cannot be updated using the SET statement. Use SET VARIABLE <variableName> = ... instead."
         ]
       },
+      "STATE_STORE_MULTIPLE_COLUMN_FAMILIES" : {
+        "message" : [
+          "Creating multiple column families with <stateStoreProvider> is not supported."
+        ]
+      },
+      "STATE_STORE_REMOVING_COLUMN_FAMILIES" : {
+        "message" : [
+          "Removing column families with <stateStoreProvider> is not supported."
+        ]
+      },
       "TABLE_OPERATION" : {
         "message" : [
           "Table <tableName> does not support <operation>. Please check the current catalog and namespace to make sure the qualified table name is expected, and also check the catalog implementation which is configured by \"spark.sql.catalog\"."
@@ -3951,7 +4308,7 @@
     "subClass" : {
       "MULTI_GENERATOR" : {
         "message" : [
-          "only one generator allowed per <clause> clause but found <num>: <generators>."
+          "only one generator allowed per SELECT clause but found <num>: <generators>."
         ]
       },
       "NESTED_IN_EXPRESSIONS" : {
@@ -4188,6 +4545,13 @@
     ],
     "sqlState" : "42883"
   },
+  "VARIANT_SIZE_LIMIT" : {
+    "message" : [
+      "Cannot build variant bigger than <sizeLimit> in <functionName>.",
+      "Please avoid large input strings to this expression (for example, add function calls(s) to check the expression size and convert it to NULL first if it is too big)."
+    ],
+    "sqlState" : "22023"
+  },
   "VIEW_ALREADY_EXISTS" : {
     "message" : [
       "Cannot create view <relationName> because it already exists.",
@@ -4915,11 +5279,6 @@
       "Fail to resolve data source for the table <table> since the table serde property has the duplicated key <key> with extra options specified for this scan operation. To fix this, you can rollback to the legacy behavior of ignoring the extra options by setting the config <config> to `false`, or address the conflicts of the same config."
     ]
   },
-  "_LEGACY_ERROR_TEMP_1153" : {
-    "message" : [
-      "Cannot use <field> for partition column."
-    ]
-  },
   "_LEGACY_ERROR_TEMP_1155" : {
     "message" : [
       "Partition column `<col>` not found in schema <schemaCatalog>."
@@ -5011,11 +5370,6 @@
       "Unrecognized Parquet type: <field>."
     ]
   },
-  "_LEGACY_ERROR_TEMP_1175" : {
-    "message" : [
-      "Unsupported data type <dataType>."
-    ]
-  },
   "_LEGACY_ERROR_TEMP_1181" : {
     "message" : [
       "Stream-stream join without equality predicate is not supported."
@@ -5028,7 +5382,7 @@
   },
   "_LEGACY_ERROR_TEMP_1183" : {
     "message" : [
-      "Cannot use interval type in the table schema."
+      "Cannot use \"INTERVAL\" type in the table schema."
     ]
   },
   "_LEGACY_ERROR_TEMP_1184" : {
@@ -5597,31 +5951,6 @@
       "not resolved."
     ]
   },
-  "_LEGACY_ERROR_TEMP_2020" : {
-    "message" : [
-      "Couldn't find a valid constructor on <cls>."
-    ]
-  },
-  "_LEGACY_ERROR_TEMP_2021" : {
-    "message" : [
-      "Couldn't find a primary constructor on <cls>."
-    ]
-  },
-  "_LEGACY_ERROR_TEMP_2023" : {
-    "message" : [
-      "Unresolved encoder expected, but <attr> was found."
-    ]
-  },
-  "_LEGACY_ERROR_TEMP_2024" : {
-    "message" : [
-      "Only expression encoders are supported for now."
-    ]
-  },
-  "_LEGACY_ERROR_TEMP_2025" : {
-    "message" : [
-      "<className> must override either <m1> or <m2>."
-    ]
-  },
   "_LEGACY_ERROR_TEMP_2026" : {
     "message" : [
       "Failed to convert value <value> (class of <cls>) with the type of <dataType> to JSON."
@@ -5702,11 +6031,6 @@
       "<message>. If necessary set <ansiConfig> to false to bypass this error."
     ]
   },
-  "_LEGACY_ERROR_TEMP_2043" : {
-    "message" : [
-      "- <sqlValue> caused overflow."
-    ]
-  },
   "_LEGACY_ERROR_TEMP_2045" : {
     "message" : [
       "Unsupported table change: <message>"
@@ -5796,11 +6120,6 @@
       "Parquet column cannot be converted in file <filePath>. Column: <column>, Expected: <logicalType>, Found: <physicalType>."
     ]
   },
-  "_LEGACY_ERROR_TEMP_2064" : {
-    "message" : [
-      "Encountered error while reading file <path>. Details:"
-    ]
-  },
   "_LEGACY_ERROR_TEMP_2065" : {
     "message" : [
       "Cannot create columnar reader."
@@ -5998,17 +6317,17 @@
   },
   "_LEGACY_ERROR_TEMP_2109" : {
     "message" : [
-      "Cannot build HashedRelation with more than 1/3 billions unique keys."
+      "Cannot build HashedRelation with more than 1/3 billion unique keys."
     ]
   },
   "_LEGACY_ERROR_TEMP_2110" : {
     "message" : [
-      "Can not build a HashedRelation that is larger than 8G."
+      "Cannot build a HashedRelation that is larger than 8G."
     ]
   },
   "_LEGACY_ERROR_TEMP_2111" : {
     "message" : [
-      "failed to push a row into <rowQueue>."
+      "Failed to push a row into <rowQueue>."
     ]
   },
   "_LEGACY_ERROR_TEMP_2112" : {
@@ -6076,11 +6395,6 @@
       "Exception when registering StreamingQueryListener."
     ]
   },
-  "_LEGACY_ERROR_TEMP_2134" : {
-    "message" : [
-      "Cannot parse field value <value> for pattern <pattern> as target spark data type [<dataType>]."
-    ]
-  },
   "_LEGACY_ERROR_TEMP_2138" : {
     "message" : [
       "Cannot have circular references in bean class, but got the circular reference of class <clazz>."
@@ -6421,11 +6735,6 @@
       "Primitive types are not supported."
     ]
   },
-  "_LEGACY_ERROR_TEMP_2231" : {
-    "message" : [
-      "fieldIndex on a Row without schema is undefined."
-    ]
-  },
   "_LEGACY_ERROR_TEMP_2232" : {
     "message" : [
       "Value at index <index> is null."
@@ -7437,6 +7746,256 @@
       "Datatype not supported <dt>"
     ]
   },
+  "_LEGACY_ERROR_TEMP_3198" : {
+    "message" : [
+      "Cannot grow BufferHolder by size <neededSize> because the size is negative"
+    ]
+  },
+  "_LEGACY_ERROR_TEMP_3199" : {
+    "message" : [
+      "Cannot grow BufferHolder by size <neededSize> because the size after growing exceeds size limitation <arrayMax>"
+    ]
+  },
+  "_LEGACY_ERROR_TEMP_3200" : {
+    "message" : [
+      "Read-ahead limit < 0"
+    ]
+  },
+  "_LEGACY_ERROR_TEMP_3201" : {
+    "message" : [
+      "'note' is malformed in the expression [<exprName>]. It should start with a newline and 4 leading spaces; end with a newline and two spaces; however, got [<note>]."
+    ]
+  },
+  "_LEGACY_ERROR_TEMP_3202" : {
+    "message" : [
+      "'group' is malformed in the expression [<exprName>]. It should be a value in <validGroups>; however, got <group>."
+    ]
+  },
+  "_LEGACY_ERROR_TEMP_3203" : {
+    "message" : [
+      "'source' is malformed in the expression [<exprName>]. It should be a value in <validSources>; however, got [<source>]."
+    ]
+  },
+  "_LEGACY_ERROR_TEMP_3204" : {
+    "message" : [
+      "'since' is malformed in the expression [<exprName>]. It should not start with a negative number; however, got [<since>]."
+    ]
+  },
+  "_LEGACY_ERROR_TEMP_3205" : {
+    "message" : [
+      "'deprecated' is malformed in the expression [<exprName>]. It should start with a newline and 4 leading spaces; end with a newline and two spaces; however, got [<deprecated>]."
+    ]
+  },
+  "_LEGACY_ERROR_TEMP_3206" : {
+    "message" : [
+      "<value> is not a boolean string."
+    ]
+  },
+  "_LEGACY_ERROR_TEMP_3207" : {
+    "message" : [
+      "Unexpected V2 expression: <expr>"
+    ]
+  },
+  "_LEGACY_ERROR_TEMP_3208" : {
+    "message" : [
+      "The number of fields (<numFields>) in the partition identifier is not equal to the partition schema length (<schemaLen>). The identifier might not refer to one partition."
+    ]
+  },
+  "_LEGACY_ERROR_TEMP_3209" : {
+    "message" : [
+      "Illegal input for day of week: <string>"
+    ]
+  },
+  "_LEGACY_ERROR_TEMP_3210" : {
+    "message" : [
+      "Interval string does not match second-nano format of ss.nnnnnnnnn"
+    ]
+  },
+  "_LEGACY_ERROR_TEMP_3211" : {
+    "message" : [
+      "Error parsing interval day-time string: <msg>"
+    ]
+  },
+  "_LEGACY_ERROR_TEMP_3212" : {
+    "message" : [
+      "Cannot support (interval '<input>' <from> to <to>) expression"
+    ]
+  },
+  "_LEGACY_ERROR_TEMP_3213" : {
+    "message" : [
+      "Error parsing interval <interval> string: <msg>"
+    ]
+  },
+  "_LEGACY_ERROR_TEMP_3214" : {
+    "message" : [
+      "Interval string does not match <intervalStr> format of <supportedFormat> when cast to <typeName>: <input><fallBackNotice>"
+    ]
+  },
+  "_LEGACY_ERROR_TEMP_3215" : {
+    "message" : [
+      "Expected a Boolean type expression in replaceNullWithFalse, but got the type <dataType> in <expr>."
+    ]
+  },
+  "_LEGACY_ERROR_TEMP_3216" : {
+    "message" : [
+      "Unsupported join type '<typ>'. Supported join types include: <supported>."
+    ]
+  },
+  "_LEGACY_ERROR_TEMP_3217" : {
+    "message" : [
+      "Unsupported as-of join direction '<direction>'. Supported as-of join direction include: <supported>."
+    ]
+  },
+  "_LEGACY_ERROR_TEMP_3218" : {
+    "message" : [
+      "Must be 2 children: <others>"
+    ]
+  },
+  "_LEGACY_ERROR_TEMP_3219" : {
+    "message" : [
+      "The value (<other>) of the type (<otherClass>) cannot be converted to the <dataType> type."
+    ]
+  },
+  "_LEGACY_ERROR_TEMP_3220" : {
+    "message" : [
+      "The value (<other>) of the type (<otherClass>) cannot be converted to an array of <elementType>"
+    ]
+  },
+  "_LEGACY_ERROR_TEMP_3221" : {
+    "message" : [
+      "The value (<other>) of the type (<otherClass>) cannot be converted to a map type with key type (<keyType>) and value type (<valueType>)"
+    ]
+  },
+  "_LEGACY_ERROR_TEMP_3222" : {
+    "message" : [
+      "Only literals are allowed in the partition spec, but got <expr>"
+    ]
+  },
+  "_LEGACY_ERROR_TEMP_3223" : {
+    "message" : [
+      "Cannot find field: <name> in <dataType>"
+    ]
+  },
+  "_LEGACY_ERROR_TEMP_3224" : {
+    "message" : [
+      "Cannot delete array element"
+    ]
+  },
+  "_LEGACY_ERROR_TEMP_3225" : {
+    "message" : [
+      "Cannot delete map value"
+    ]
+  },
+  "_LEGACY_ERROR_TEMP_3226" : {
+    "message" : [
+      "Cannot delete map key"
+    ]
+  },
+  "_LEGACY_ERROR_TEMP_3227" : {
+    "message" : [
+      "Cannot find field: <fieldName>"
+    ]
+  },
+  "_LEGACY_ERROR_TEMP_3228" : {
+    "message" : [
+      "AFTER column not found: <afterCol>"
+    ]
+  },
+  "_LEGACY_ERROR_TEMP_3229" : {
+    "message" : [
+      "Not a struct: <name>"
+    ]
+  },
+  "_LEGACY_ERROR_TEMP_3230" : {
+    "message" : [
+      "Field not found: <name>"
+    ]
+  },
+  "_LEGACY_ERROR_TEMP_3231" : {
+    "message" : [
+      "Intervals greater than a month is not supported (<interval>)."
+    ]
+  },
+  "_LEGACY_ERROR_TEMP_3232" : {
+    "message" : [
+      "Unknown EvalMode value: <other>"
+    ]
+  },
+  "_LEGACY_ERROR_TEMP_3233" : {
+    "message" : [
+      "cannot generate code for unsupported type: <dataType>"
+    ]
+  },
+  "_LEGACY_ERROR_TEMP_3235" : {
+    "message" : [
+      "The numbers of zipped arrays and field names should be the same"
+    ]
+  },
+  "_LEGACY_ERROR_TEMP_3238" : {
+    "message" : [
+      "Failed to convert value <v> (class of <class>) in type <dt> to XML."
+    ]
+  },
+  "_LEGACY_ERROR_TEMP_3239" : {
+    "message" : [
+      "Failed to parse data with unexpected event <e>"
+    ]
+  },
+  "_LEGACY_ERROR_TEMP_3240" : {
+    "message" : [
+      "Failed to parse a value for data type <dt> with event <e>"
+    ]
+  },
+  "_LEGACY_ERROR_TEMP_3241" : {
+    "message" : [
+      "<msg>"
+    ]
+  },
+  "_LEGACY_ERROR_TEMP_3242" : {
+    "message" : [
+      "sequence step must be an <intervalType> of day granularity if start and end values are dates"
+    ]
+  },
+  "_LEGACY_ERROR_TEMP_3243" : {
+    "message" : [
+      "Illegal sequence boundaries: <start> to <stop> by <step>"
+    ]
+  },
+  "_LEGACY_ERROR_TEMP_3244" : {
+    "message" : [
+      "Unsupported type: <castType>"
+    ]
+  },
+  "_LEGACY_ERROR_TEMP_3245" : {
+    "message" : [
+      "For input string: <s>"
+    ]
+  },
+  "_LEGACY_ERROR_TEMP_3246" : {
+    "message" : [
+      "Failed to parse a value for data type <dataType>."
+    ]
+  },
+  "_LEGACY_ERROR_TEMP_3250" : {
+    "message" : [
+      "Failed to convert the JSON string '<other>' to a field."
+    ]
+  },
+  "_LEGACY_ERROR_TEMP_3260" : {
+    "message" : [
+      "'<s>' is an invalid timestamp"
+    ]
+  },
+  "_LEGACY_ERROR_TEMP_3261" : {
+    "message" : [
+      "Unknown output mode <outputMode>. Accepted output modes are 'append', 'complete', 'update'"
+    ]
+  },
+  "_LEGACY_ERROR_TEMP_3262" : {
+    "message" : [
+      "Doesn't support month or year interval: <interval>"
+    ]
+  },
   "_LEGACY_ERROR_USER_RAISED_EXCEPTION" : {
     "message" : [
       "<errorMessage>"

From ab3838df3d6858f8652ac92352c543c85a16d284 Mon Sep 17 00:00:00 2001
From: Nicholas Chammas <nicholas.chammas@gmail.com>
Date: Sat, 16 Mar 2024 12:46:37 -0400
Subject: [PATCH 08/10] update comments to anticipate SPARK-47429

---
 common/utils/src/main/resources/error/README.md           | 5 ++---
 .../scala/org/apache/spark/SparkThrowableHelper.scala     | 4 ++--
 .../org/apache/spark/sql/kafka010/KafkaExceptions.scala   | 4 ++--
 .../test/scala/org/apache/spark/SparkThrowableSuite.scala | 8 ++++----
 4 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/common/utils/src/main/resources/error/README.md b/common/utils/src/main/resources/error/README.md
index 05df1ba859d8d..e2f68a1af9f4a 100644
--- a/common/utils/src/main/resources/error/README.md
+++ b/common/utils/src/main/resources/error/README.md
@@ -39,13 +39,12 @@ The terms error class, state, and condition come from the SQL standard.
 
 Unfortunately, we have historically used the term "error class" inconsistently to refer both to a proper error class like `42` and also to an error condition like `DATATYPE_MISSING_SIZE`.
 
-Fixing this would require renaming `SparkException.errorClass` to `SparkException.errorCondition` and making similar changes to `ErrorClassesJsonReader` and other parts of the codebase. This may not be practical or even possible, depending on the impact of such a change on Spark's public API.
-
-Unless and until we refactor the codebase to bring it in line with the proper error terminology, we will have to live with the fact that a string like `DATATYPE_MISSING_SIZE` is called an "error condition" in our user-facing documentation but an "error class" in the code.
+Fixing this will require renaming `SparkException.errorClass` to `SparkException.errorCondition` and making similar changes to `ErrorClassesJsonReader` and other parts of the codebase. We will address this in [SPARK-47429]. Until that is complete, we will have to live with the fact that a string like `DATATYPE_MISSING_SIZE` is called an "error condition" in our user-facing documentation but an "error class" in the code.
 
 For more details, please see [SPARK-46810][SPARK-46810].
 
 [SPARK-46810]: https://issues.apache.org/jira/browse/SPARK-46810
+[SPARK-47429]: https://issues.apache.org/jira/browse/SPARK-47429
 
 ## Usage
 
diff --git a/common/utils/src/main/scala/org/apache/spark/SparkThrowableHelper.scala b/common/utils/src/main/scala/org/apache/spark/SparkThrowableHelper.scala
index b05980c6c23e5..6bdafb11e4bdf 100644
--- a/common/utils/src/main/scala/org/apache/spark/SparkThrowableHelper.scala
+++ b/common/utils/src/main/scala/org/apache/spark/SparkThrowableHelper.scala
@@ -33,8 +33,8 @@ private[spark] object ErrorMessageFormat extends Enumeration {
 private[spark] object SparkThrowableHelper {
   val errorReader = new ErrorClassesJsonReader(
     // Note that though we call them "error classes" here, the proper name is "error conditions",
-    // hence why the name of the JSON file different.
-    // For more information, please see: https://issues.apache.org/jira/browse/SPARK-46810
+    // hence why the name of the JSON file different. We will address this inconsistency as part
+    // of this ticket: https://issues.apache.org/jira/browse/SPARK-47429
     Seq(SparkClassUtils.getSparkClassLoader.getResource("error/error-conditions.json")))
 
   def getMessage(
diff --git a/connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaExceptions.scala b/connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaExceptions.scala
index c6a21b8e3b595..4398fea67ce0a 100644
--- a/connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaExceptions.scala
+++ b/connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaExceptions.scala
@@ -27,8 +27,8 @@ private object KafkaExceptionsHelper {
   val errorClassesJsonReader: ErrorClassesJsonReader =
     new ErrorClassesJsonReader(
       // Note that though we call them "error classes" here, the proper name is "error conditions",
-      // hence why the name of the JSON file different.
-      // For more information, please see: https://issues.apache.org/jira/browse/SPARK-46810
+      // hence why the name of the JSON file different. We will address this inconsistency as part
+      // of this ticket: https://issues.apache.org/jira/browse/SPARK-47429
       Seq(getClass.getClassLoader.getResource("error/kafka-error-conditions.json")))
 
 object KafkaExceptions {
diff --git a/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala b/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala
index 9da858e0963fa..16f31264481c6 100644
--- a/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala
+++ b/core/src/test/scala/org/apache/spark/SparkThrowableSuite.scala
@@ -60,8 +60,8 @@ class SparkThrowableSuite extends SparkFunSuite {
 
   private val errorJsonFilePath = getWorkspaceFilePath(
     // Note that though we call them "error classes" here, the proper name is "error conditions",
-    // hence why the name of the JSON file different.
-    // For more information, please see: https://issues.apache.org/jira/browse/SPARK-46810
+    // hence why the name of the JSON file different. We will address this inconsistency as part
+    // of this ticket: https://issues.apache.org/jira/browse/SPARK-47429
     "common", "utils", "src", "main", "resources", "error", "error-conditions.json")
 
   private val errorReader = new ErrorClassesJsonReader(Seq(errorJsonFilePath.toUri.toURL))
@@ -130,8 +130,8 @@ class SparkThrowableSuite extends SparkFunSuite {
 
   test("Error class and error state / SQLSTATE invariants") {
     // Unlike in the rest of the codebase, the term "error class" is used here as it is in our
-    // documentation as well as in the SQL standard.
-    // For more information, please see: https://issues.apache.org/jira/browse/SPARK-46810
+    // documentation as well as in the SQL standard. We can remove this comment as part of this
+    // ticket: https://issues.apache.org/jira/browse/SPARK-47429
     val errorClassesJson = Utils.getSparkClassLoader.getResource("error/error-classes.json")
     val errorStatesJson = Utils.getSparkClassLoader.getResource("error/error-states.json")
     val mapper = JsonMapper.builder()

From 69291871527ce28f37ef5fe22f1f5f777ada49d9 Mon Sep 17 00:00:00 2001
From: Nicholas Chammas <nicholas.chammas@gmail.com>
Date: Sat, 16 Mar 2024 12:46:48 -0400
Subject: [PATCH 09/10] restore missing brace

---
 .../scala/org/apache/spark/sql/kafka010/KafkaExceptions.scala    | 1 +
 1 file changed, 1 insertion(+)

diff --git a/connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaExceptions.scala b/connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaExceptions.scala
index 4398fea67ce0a..735184db3c1af 100644
--- a/connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaExceptions.scala
+++ b/connector/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaExceptions.scala
@@ -30,6 +30,7 @@ private object KafkaExceptionsHelper {
       // hence why the name of the JSON file different. We will address this inconsistency as part
       // of this ticket: https://issues.apache.org/jira/browse/SPARK-47429
       Seq(getClass.getClassLoader.getResource("error/kafka-error-conditions.json")))
+}
 
 object KafkaExceptions {
   def mismatchedTopicPartitionsBetweenEndOffsetAndPrefetched(

From c7a83edc631da220f055d7fa4403cd37d02cae3d Mon Sep 17 00:00:00 2001
From: Nicholas Chammas <nicholas.chammas@gmail.com>
Date: Mon, 15 Apr 2024 10:32:41 -0400
Subject: [PATCH 10/10] update error-conditions from master

---
 .../resources/error/error-conditions.json     | 274 +++++++++++++-----
 1 file changed, 209 insertions(+), 65 deletions(-)

diff --git a/common/utils/src/main/resources/error/error-conditions.json b/common/utils/src/main/resources/error/error-conditions.json
index 9362b8342abf1..e1c8c881f98f3 100644
--- a/common/utils/src/main/resources/error/error-conditions.json
+++ b/common/utils/src/main/resources/error/error-conditions.json
@@ -244,7 +244,7 @@
       },
       "UNRELEASED_THREAD_ERROR" : {
         "message" : [
-          "<loggingId>: RocksDB instance could not be acquired by <newAcquiredThreadInfo> as it was not released by <acquiredThreadInfo> after <timeWaitedMs> ms.",
+          "<loggingId>: RocksDB instance could not be acquired by <newAcquiredThreadInfo> for operationType=<operationType> as it was not released by <acquiredThreadInfo> after <timeWaitedMs> ms.",
           "Thread holding the lock has trace: <stackTraceOutput>"
         ]
       }
@@ -304,14 +304,6 @@
     ],
     "sqlState" : "22007"
   },
-  "CANNOT_READ_FILE_FOOTER" : {
-    "message" : [
-      "Could not read footer for file: <file>. Please ensure that the file is in either ORC or Parquet format.",
-      "If not, please convert it to a valid format. If the file is in the valid format, please check if it is corrupt.",
-      "If it is, you can choose to either ignore it or fix the corruption."
-    ],
-    "sqlState" : "KD001"
-  },
   "CANNOT_RECOGNIZE_HIVE_TYPE" : {
     "message" : [
       "Cannot recognize hive type string: <fieldType>, column: <fieldName>. The specified data type for the field cannot be recognized by Spark SQL. Please check the data type of the specified field and ensure that it is a valid Spark SQL data type. Refer to the Spark SQL documentation for a list of valid data types and their format. If the data type is correct, please ensure that you are using a supported version of Spark SQL."
@@ -475,6 +467,24 @@
     ],
     "sqlState" : "42704"
   },
+  "COLLATION_MISMATCH" : {
+    "message" : [
+      "Could not determine which collation to use for string functions and operators."
+    ],
+    "subClass" : {
+      "EXPLICIT" : {
+        "message" : [
+          "Error occurred due to the mismatch between explicit collations: <explicitTypes>. Decide on a single explicit collation and remove others."
+        ]
+      },
+      "IMPLICIT" : {
+        "message" : [
+          "Error occurred due to the mismatch between multiple implicit non-default collations. Use COLLATE function to set the collation explicitly."
+        ]
+      }
+    },
+    "sqlState" : "42P21"
+  },
   "COLLECTION_SIZE_LIMIT_EXCEEDED" : {
     "message" : [
       "Can't create array with <numberOfElements> elements which exceeding the array size limit <maxRoundedArrayLength>,"
@@ -696,11 +706,6 @@
           "To convert values from <srcType> to <targetType>, you can use the functions <functionNames> instead."
         ]
       },
-      "COLLATION_MISMATCH" : {
-        "message" : [
-          "Collations <collationNameLeft> and <collationNameRight> are not compatible. Please use the same collation for both strings."
-        ]
-      },
       "CREATE_MAP_KEY_DIFF_TYPES" : {
         "message" : [
           "The given keys of function <functionName> should all be the same type, but they are <dataType>."
@@ -1257,6 +1262,31 @@
     "message" : [
       "Encountered error while reading file <path>."
     ],
+    "subClass" : {
+      "CANNOT_READ_FILE_FOOTER" : {
+        "message" : [
+          "Could not read footer. Please ensure that the file is in either ORC or Parquet format.",
+          "If not, please convert it to a valid format. If the file is in the valid format, please check if it is corrupt.",
+          "If it is, you can choose to either ignore it or fix the corruption."
+        ]
+      },
+      "FILE_NOT_EXIST" : {
+        "message" : [
+          "File does not exist. It is possible the underlying files have been updated.",
+          "You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved."
+        ]
+      },
+      "NO_HINT" : {
+        "message" : [
+          ""
+        ]
+      },
+      "PARQUET_COLUMN_DATA_TYPE_MISMATCH" : {
+        "message" : [
+          "Data type mismatches when reading Parquet column <column>. Expected Spark type <expectedType>, actual Parquet type <actualType>."
+        ]
+      }
+    },
     "sqlState" : "KD001"
   },
   "FAILED_REGISTER_CLASS_WITH_KRYO" : {
@@ -1587,6 +1617,12 @@
     ],
     "sqlState" : "22003"
   },
+  "INDETERMINATE_COLLATION" : {
+    "message" : [
+      "Function called requires knowledge of the collation it should apply, but indeterminate collation was found. Use COLLATE function to set the collation explicitly."
+    ],
+    "sqlState" : "42P22"
+  },
   "INDEX_ALREADY_EXISTS" : {
     "message" : [
       "Cannot create the index <indexName> on table <tableName> because it already exists."
@@ -1746,6 +1782,34 @@
     ],
     "sqlState" : "22012"
   },
+  "INVALID_AGGREGATE_FILTER" : {
+    "message" : [
+      "The FILTER expression <filterExpr> in an aggregate function is invalid."
+    ],
+    "subClass" : {
+      "CONTAINS_AGGREGATE" : {
+        "message" : [
+          "Expected a FILTER expression without an aggregation, but found <aggExpr>."
+        ]
+      },
+      "CONTAINS_WINDOW_FUNCTION" : {
+        "message" : [
+          "Expected a FILTER expression without a window function, but found <windowExpr>."
+        ]
+      },
+      "NON_DETERMINISTIC" : {
+        "message" : [
+          "Expected a deterministic FILTER expression."
+        ]
+      },
+      "NOT_BOOLEAN" : {
+        "message" : [
+          "Expected a FILTER expression of the BOOLEAN type."
+        ]
+      }
+    },
+    "sqlState" : "42903"
+  },
   "INVALID_ARRAY_INDEX" : {
     "message" : [
       "The index <indexValue> is out of bounds. The array has <arraySize> elements. Use the SQL function `get()` to tolerate accessing element at invalid index and return NULL instead. If necessary set <ansiConfig> to \"false\" to bypass this error."
@@ -1817,6 +1881,11 @@
       "The value '<confValue>' in the config \"<confName>\" is invalid."
     ],
     "subClass" : {
+      "DEFAULT_COLLATION" : {
+        "message" : [
+          "Cannot resolve the given default collation. Did you mean '<proposal>'?"
+        ]
+      },
       "TIME_ZONE" : {
         "message" : [
           "Cannot resolve the given timezone."
@@ -2083,6 +2152,11 @@
           "Operation not found."
         ]
       },
+      "SESSION_CHANGED" : {
+        "message" : [
+          "The existing Spark server driver instance has restarted. Please reconnect."
+        ]
+      },
       "SESSION_CLOSED" : {
         "message" : [
           "Session was closed."
@@ -2757,6 +2831,19 @@
     ],
     "sqlState" : "42K09"
   },
+  "INVALID_VARIANT_CAST" : {
+    "message" : [
+      "The variant value `<value>` cannot be cast into `<dataType>`. Please use `try_variant_get` instead."
+    ],
+    "sqlState" : "22023"
+  },
+  "INVALID_VARIANT_GET_PATH" : {
+    "message" : [
+      "The path `<path>` is not a valid variant extraction path in `<functionName>`.",
+      "A valid path should start with `$` and is followed by zero or more segments like `[123]`, `.name`, `['name']`, or `[\"name\"]`."
+    ],
+    "sqlState" : "22023"
+  },
   "INVALID_VIEW_TEXT" : {
     "message" : [
       "The view <viewName> cannot be displayed due to invalid view text: <viewText>. This may be caused by an unauthorized modification of the view or an incorrect query syntax. Please check your query syntax and verify that the view has not been tampered with."
@@ -2871,6 +2958,12 @@
     },
     "sqlState" : "22023"
   },
+  "MALFORMED_VARIANT" : {
+    "message" : [
+      "Variant binary is malformed. Please check the data source is valid."
+    ],
+    "sqlState" : "22023"
+  },
   "MERGE_CARDINALITY_VIOLATION" : {
     "message" : [
       "The ON search condition of the MERGE statement matched a single row from the target table with multiple rows of the source table.",
@@ -3183,8 +3276,20 @@
   },
   "NUMERIC_VALUE_OUT_OF_RANGE" : {
     "message" : [
-      "<value> cannot be represented as Decimal(<precision>, <scale>). If necessary set <config> to \"false\" to bypass this error, and return NULL instead."
+      ""
     ],
+    "subClass" : {
+      "WITHOUT_SUGGESTION" : {
+        "message" : [
+          "The <roundedValue> rounded half up from <originalValue> cannot be represented as Decimal(<precision>, <scale>)."
+        ]
+      },
+      "WITH_SUGGESTION" : {
+        "message" : [
+          "<value> cannot be represented as Decimal(<precision>, <scale>). If necessary set <config> to \"false\" to bypass this error, and return NULL instead."
+        ]
+      }
+    },
     "sqlState" : "22003"
   },
   "NUM_COLUMNS_MISMATCH" : {
@@ -3497,9 +3602,27 @@
     ],
     "sqlState" : "42802"
   },
-  "STATEFUL_PROCESSOR_CANNOT_PERFORM_OPERATION_WITH_INVALID_TIMEOUT_MODE" : {
+  "STATEFUL_PROCESSOR_CANNOT_PERFORM_OPERATION_WITH_INVALID_TIME_MODE" : {
+    "message" : [
+      "Failed to perform stateful processor operation=<operationType> with invalid timeMode=<timeMode>"
+    ],
+    "sqlState" : "42802"
+  },
+  "STATEFUL_PROCESSOR_CANNOT_REINITIALIZE_STATE_ON_KEY" : {
+    "message" : [
+      "Cannot re-initialize state on the same grouping key during initial state handling for stateful processor. Invalid grouping key=<groupingKey>."
+    ],
+    "sqlState" : "42802"
+  },
+  "STATEFUL_PROCESSOR_INCORRECT_TIME_MODE_TO_ASSIGN_TTL" : {
+    "message" : [
+      "Cannot use TTL for state=<stateName> in timeMode=<timeMode>, use TimeMode.ProcessingTime() instead."
+    ],
+    "sqlState" : "42802"
+  },
+  "STATEFUL_PROCESSOR_TTL_DURATION_MUST_BE_POSITIVE" : {
     "message" : [
-      "Failed to perform stateful processor operation=<operationType> with invalid timeoutMode=<timeoutMode>"
+      "TTL duration must be greater than zero for State store operation=<operationType> on state=<stateName>."
     ],
     "sqlState" : "42802"
   },
@@ -3522,18 +3645,48 @@
     ],
     "sqlState" : "42802"
   },
+  "STATE_STORE_INCORRECT_NUM_ORDERING_COLS_FOR_RANGE_SCAN" : {
+    "message" : [
+      "Incorrect number of ordering ordinals=<numOrderingCols> for range scan encoder. The number of ordering ordinals cannot be zero or greater than number of schema columns."
+    ],
+    "sqlState" : "42802"
+  },
+  "STATE_STORE_INCORRECT_NUM_PREFIX_COLS_FOR_PREFIX_SCAN" : {
+    "message" : [
+      "Incorrect number of prefix columns=<numPrefixCols> for prefix scan encoder. Prefix columns cannot be zero or greater than or equal to num of schema columns."
+    ],
+    "sqlState" : "42802"
+  },
+  "STATE_STORE_NULL_TYPE_ORDERING_COLS_NOT_SUPPORTED" : {
+    "message" : [
+      "Null type ordering column with name=<fieldName> at index=<index> is not supported for range scan encoder."
+    ],
+    "sqlState" : "42802"
+  },
   "STATE_STORE_UNSUPPORTED_OPERATION" : {
     "message" : [
       "<operationType> operation not supported with <entity>"
     ],
     "sqlState" : "XXKST"
   },
+  "STATE_STORE_UNSUPPORTED_OPERATION_BINARY_INEQUALITY" : {
+    "message" : [
+      "Binary inequality column is not supported with state store. Provided schema: <schema>."
+    ],
+    "sqlState" : "XXKST"
+  },
   "STATE_STORE_UNSUPPORTED_OPERATION_ON_MISSING_COLUMN_FAMILY" : {
     "message" : [
       "State store operation=<operationType> not supported on missing column family=<colFamilyName>."
     ],
     "sqlState" : "42802"
   },
+  "STATE_STORE_VARIABLE_SIZE_ORDERING_COLS_NOT_SUPPORTED" : {
+    "message" : [
+      "Variable size ordering column with name=<fieldName> at index=<index> is not supported for range scan encoder."
+    ],
+    "sqlState" : "42802"
+  },
   "STATIC_PARTITION_COLUMN_IN_INSERT_COLUMN_LIST" : {
     "message" : [
       "Static partition column <staticName> is also specified in the column list."
@@ -4273,6 +4426,11 @@
           "Removing column families with <stateStoreProvider> is not supported."
         ]
       },
+      "STATE_STORE_TTL" : {
+        "message" : [
+          "State TTL with <stateStoreProvider> is not supported. Please use RocksDBStateStoreProvider."
+        ]
+      },
       "TABLE_OPERATION" : {
         "message" : [
           "Table <tableName> does not support <operation>. Please check the current catalog and namespace to make sure the qualified table name is expected, and also check the catalog implementation which is configured by \"spark.sql.catalog\"."
@@ -4340,6 +4498,11 @@
       "Can't insert into the target."
     ],
     "subClass" : {
+      "MULTI_PATH" : {
+        "message" : [
+          "Can only write data to relations with a single path but given paths are <paths>."
+        ]
+      },
       "NOT_ALLOWED" : {
         "message" : [
           "The target relation <relationId> does not allow insertion."
@@ -4429,7 +4592,8 @@
     "subClass" : {
       "ACCESSING_OUTER_QUERY_COLUMN_IS_NOT_ALLOWED" : {
         "message" : [
-          "Accessing outer query column is not allowed in this location<treeNode>."
+          "Accessing outer query column is not allowed in this location:",
+          "<treeNode>"
         ]
       },
       "AGGREGATE_FUNCTION_MIXED_OUTER_LOCAL_REFERENCES" : {
@@ -4439,7 +4603,8 @@
       },
       "CORRELATED_COLUMN_IS_NOT_ALLOWED_IN_PREDICATE" : {
         "message" : [
-          "Correlated column is not allowed in predicate: <treeNode>."
+          "Correlated column is not allowed in predicate:",
+          "<treeNode>"
         ]
       },
       "CORRELATED_COLUMN_NOT_FOUND" : {
@@ -4452,6 +4617,11 @@
           "Expressions referencing the outer query are not supported outside of WHERE/HAVING clauses: <sqlExprs>."
         ]
       },
+      "HIGHER_ORDER_FUNCTION" : {
+        "message" : [
+          "Subquery expressions are not supported within higher-order functions. Please remove all subquery expressions from higher-order functions and then try the query again."
+        ]
+      },
       "LATERAL_JOIN_CONDITION_NON_DETERMINISTIC" : {
         "message" : [
           "Lateral join condition cannot be non-deterministic: <condition>."
@@ -4469,7 +4639,8 @@
       },
       "NON_DETERMINISTIC_LATERAL_SUBQUERIES" : {
         "message" : [
-          "Non-deterministic lateral subqueries are not supported when joining with outer relations that produce more than one row<treeNode>."
+          "Non-deterministic lateral subqueries are not supported when joining with outer relations that produce more than one row:",
+          "<treeNode>"
         ]
       },
       "UNSUPPORTED_CORRELATED_REFERENCE_DATA_TYPE" : {
@@ -4479,17 +4650,20 @@
       },
       "UNSUPPORTED_CORRELATED_SCALAR_SUBQUERY" : {
         "message" : [
-          "Correlated scalar subqueries can only be used in filters, aggregations, projections, and UPDATE/MERGE/DELETE commands<treeNode>."
+          "Correlated scalar subqueries can only be used in filters, aggregations, projections, and UPDATE/MERGE/DELETE commands:",
+          "<treeNode>"
         ]
       },
       "UNSUPPORTED_IN_EXISTS_SUBQUERY" : {
         "message" : [
-          "IN/EXISTS predicate subqueries can only be used in filters, joins, aggregations, window functions, projections, and UPDATE/MERGE/DELETE commands<treeNode>."
+          "IN/EXISTS predicate subqueries can only be used in filters, joins, aggregations, window functions, projections, and UPDATE/MERGE/DELETE commands:",
+          "<treeNode>"
         ]
       },
       "UNSUPPORTED_TABLE_ARGUMENT" : {
         "message" : [
-          "Table arguments are used in a function where they are not supported<treeNode>."
+          "Table arguments are used in a function where they are not supported:",
+          "<treeNode>"
         ]
       }
     },
@@ -4545,6 +4719,18 @@
     ],
     "sqlState" : "42883"
   },
+  "VARIANT_CONSTRUCTOR_SIZE_LIMIT" : {
+    "message" : [
+      "Cannot construct a Variant larger than 16 MiB. The maximum allowed size of a Variant value is 16 MiB."
+    ],
+    "sqlState" : "22023"
+  },
+  "VARIANT_DUPLICATE_KEY" : {
+    "message" : [
+      "Failed to build variant because of a duplicate object key `<key>`."
+    ],
+    "sqlState" : "22023"
+  },
   "VARIANT_SIZE_LIMIT" : {
     "message" : [
       "Cannot build variant bigger than <sizeLimit> in <functionName>.",
@@ -4868,26 +5054,6 @@
       "count(<targetString>.*) is not allowed. Please use count(*) or expand the columns manually, e.g. count(col1, col2)."
     ]
   },
-  "_LEGACY_ERROR_TEMP_1024" : {
-    "message" : [
-      "FILTER expression is non-deterministic, it cannot be used in aggregate functions."
-    ]
-  },
-  "_LEGACY_ERROR_TEMP_1025" : {
-    "message" : [
-      "FILTER expression is not of type boolean. It cannot be used in an aggregate function."
-    ]
-  },
-  "_LEGACY_ERROR_TEMP_1026" : {
-    "message" : [
-      "FILTER expression contains aggregate. It cannot be used in an aggregate function."
-    ]
-  },
-  "_LEGACY_ERROR_TEMP_1027" : {
-    "message" : [
-      "FILTER expression contains window function. It cannot be used in an aggregate function."
-    ]
-  },
   "_LEGACY_ERROR_TEMP_1030" : {
     "message" : [
       "Window aggregate function with filter predicate is not supported yet."
@@ -5264,11 +5430,6 @@
       "The ordering of partition columns is <partColumns>. All partition columns having constant values need to appear before other partition columns that do not have an assigned constant value."
     ]
   },
-  "_LEGACY_ERROR_TEMP_1148" : {
-    "message" : [
-      "Can only write data to relations with a single path."
-    ]
-  },
   "_LEGACY_ERROR_TEMP_1149" : {
     "message" : [
       "Fail to rebuild expression: missing key <filter> in `translatedFilterToExpr`."
@@ -6071,12 +6232,6 @@
       "buildReader is not supported for <format>."
     ]
   },
-  "_LEGACY_ERROR_TEMP_2055" : {
-    "message" : [
-      "<message>",
-      "It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved."
-    ]
-  },
   "_LEGACY_ERROR_TEMP_2056" : {
     "message" : [
       "Unable to clear output directory <staticPrefixPath> prior to writing to it."
@@ -6109,17 +6264,6 @@
       "No records should be returned from EmptyDataReader."
     ]
   },
-  "_LEGACY_ERROR_TEMP_2062" : {
-    "message" : [
-      "<message>",
-      "It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by recreating the Dataset/DataFrame involved."
-    ]
-  },
-  "_LEGACY_ERROR_TEMP_2063" : {
-    "message" : [
-      "Parquet column cannot be converted in file <filePath>. Column: <column>, Expected: <logicalType>, Found: <physicalType>."
-    ]
-  },
   "_LEGACY_ERROR_TEMP_2065" : {
     "message" : [
       "Cannot create columnar reader."