Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-46810][DOCS] Align error class terminology with SQL standard #44902

Closed
147 changes: 101 additions & 46 deletions common/utils/src/main/resources/error/README.md
Original file line number Diff line number Diff line change
@@ -1,77 +1,132 @@
# Guidelines
# Guidelines for Throwing User-Facing Errors

To throw a standardized user-facing error or exception, developers should specify the error class, a SQLSTATE,
and message parameters rather than an arbitrary error message.
To throw a user-facing error or exception, developers should specify a standardized SQLSTATE, an error condition, and message parameters rather than an arbitrary error message.

This guide will describe how to do this.

## Error Hierarchy and Terminology

The error hierarchy is as follows:
1. Error state / SQLSTATE
2. Error condition
3. Error sub-condition

The error state / SQLSTATE itself is comprised of two parts:
1. Error class
2. Error sub-class

Acceptable values for these various error parts are defined in the following files:
* `error-classes.json`
* `error-states.json`
* `error-conditions.json`

The terms error class, state, and condition come from the SQL standard.

### Illustrative Example
* Error state / SQLSTATE: `42K01` (Class: `42`; Sub-class: `K01`)
* Error condition: `DATATYPE_MISSING_SIZE`
* Error condition: `INCOMPLETE_TYPE_DEFINITION`
* Error sub-condition: `ARRAY`
* Error sub-condition: `MAP`
* Error sub-condition: `STRUCT`
* Error state / SQLSTATE: `42604` (Class: `42`; Sub-class: `604`)
* Error condition: `INVALID_ESCAPE_CHAR`
* Error condition: `AS_OF_JOIN`
* Error sub-condition: `TOLERANCE_IS_NON_NEGATIVE`
* Error sub-condition: `TOLERANCE_IS_UNFOLDABLE`

### Inconsistent Use of the Term "Error Class"

Unfortunately, we have historically used the term "error class" inconsistently to refer both to a proper error class like `42` and also to an error condition like `DATATYPE_MISSING_SIZE`.

Fixing this will require renaming `SparkException.errorClass` to `SparkException.errorCondition` and making similar changes to `ErrorClassesJsonReader` and other parts of the codebase. We will address this in [SPARK-47429]. Until that is complete, we will have to live with the fact that a string like `DATATYPE_MISSING_SIZE` is called an "error condition" in our user-facing documentation but an "error class" in the code.

For more details, please see [SPARK-46810][SPARK-46810].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to write [SPARK-46810] twice here? Is it because of the special syntax for markdown? Or do we actually want to write: [SPARK-46810] [SPARK-47429]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, I meant to link to SPARK-46810. The first [SPARK-46810] is the text I want to link, and the second [SPARK-46810] is the reference to the link URL I have defined just below in the same document:

[SPARK-46810]: https://issues.apache.org/jira/browse/SPARK-46810


[SPARK-46810]: https://issues.apache.org/jira/browse/SPARK-46810
[SPARK-47429]: https://issues.apache.org/jira/browse/SPARK-47429

## Usage

1. Check if the error is an internal error.
Internal errors are bugs in the code that we do not expect users to encounter; this does not include unsupported operations.
If true, use the error class `INTERNAL_ERROR` and skip to step 4.
2. Check if an appropriate error class already exists in `error-classes.json`.
If true, use the error class and skip to step 4.
3. Add a new class with a new or existing SQLSTATE to `error-classes.json`; keep in mind the invariants below.
If true, use the error condition `INTERNAL_ERROR` and skip to step 4.
2. Check if an appropriate error condition already exists in `error-conditions.json`.
If true, use the error condition and skip to step 4.
3. Add a new condition to `error-conditions.json`. If the new condition requires a new error state, add the new error state to `error-states.json`.
4. Check if the exception type already extends `SparkThrowable`.
If true, skip to step 6.
5. Mix `SparkThrowable` into the exception.
6. Throw the exception with the error class and message parameters. If the same exception is thrown in several places, create an util function in a central place such as `QueryCompilationErrors.scala` to instantiate the exception.
6. Throw the exception with the error condition and message parameters. If the same exception is thrown in several places, create an util function in a central place such as `QueryCompilationErrors.scala` to instantiate the exception.

### Before

Throw with arbitrary error message:

throw new TestException("Problem A because B")
```scala
throw new TestException("Problem A because B")
```

### After

`error-classes.json`
`error-conditions.json`

"PROBLEM_BECAUSE" : {
"message" : ["Problem <problem> because <cause>"],
"sqlState" : "XXXXX"
}
```json
"PROBLEM_BECAUSE" : {
"message" : ["Problem <problem> because <cause>"],
"sqlState" : "XXXXX"
}
```

`SparkException.scala`

class SparkTestException(
errorClass: String,
messageParameters: Map[String, String])
extends TestException(SparkThrowableHelper.getMessage(errorClass, messageParameters))
with SparkThrowable {

override def getMessageParameters: java.util.Map[String, String] = messageParameters.asJava
```scala
class SparkTestException(
errorClass: String,
messageParameters: Map[String, String])
extends TestException(SparkThrowableHelper.getMessage(errorClass, messageParameters))
with SparkThrowable {

override def getMessageParameters: java.util.Map[String, String] =
messageParameters.asJava

override def getErrorClass: String = errorClass
}
override def getErrorClass: String = errorClass
}
```

Throw with error class and message parameters:
Throw with error condition and message parameters:

throw new SparkTestException("PROBLEM_BECAUSE", Map("problem" -> "A", "cause" -> "B"))
```scala
throw new SparkTestException("PROBLEM_BECAUSE", Map("problem" -> "A", "cause" -> "B"))
```

## Access fields
### Access fields

To access error fields, catch exceptions that extend `org.apache.spark.SparkThrowable` and access
- Error class with `getErrorClass`
- Error condition with `getErrorClass`
- SQLSTATE with `getSqlState`


try {
...
} catch {
case e: SparkThrowable if Option(e.getSqlState).forall(_.startsWith("42")) =>
warn("Syntax error")
}
```scala
try {
...
} catch {
case e: SparkThrowable if Option(e.getSqlState).forall(_.startsWith("42")) =>
warn("Syntax error")
}
```

## Fields

### Error class
### Error condition

Error classes are a succinct, human-readable representation of the error category.
Error conditions are a succinct, human-readable representation of the error category.

An uncategorized errors can be assigned to a legacy error class with the prefix `_LEGACY_ERROR_TEMP_` and an unused sequential number, for instance `_LEGACY_ERROR_TEMP_0053`.
An uncategorized errors can be assigned to a legacy error condition with the prefix `_LEGACY_ERROR_TEMP_` and an unused sequential number, for instance `_LEGACY_ERROR_TEMP_0053`.

You should not introduce new uncategorized errors. Instead, convert them to proper errors whenever encountering them in new code.

**Note:** Though the proper term for this field is an "error condition", it is called `errorClass` in the codebase due to an unfortunate accident of history. For more details, please refer to [SPARK-46810].

#### Invariants

- Unique
Expand All @@ -81,9 +136,9 @@ You should not introduce new uncategorized errors. Instead, convert them to prop
### Message

Error messages provide a descriptive, human-readable representation of the error.
The message format accepts string parameters via the HTML tag syntax: e.g. <relationName>.
The message format accepts string parameters via the HTML tag syntax: e.g. `<relationName>`.

The values passed to the message shoudl not themselves be messages.
The values passed to the message should not themselves be messages.
They should be: runtime-values, keywords, identifiers, or other values that are not translated.

The quality of the error message should match the
Expand All @@ -95,22 +150,22 @@ The quality of the error message should match the

### SQLSTATE

SQLSTATE is an mandatory portable error identifier across SQL engines.
SQLSTATE comprises a 2-character class value followed by a 3-character subclass value.
SQLSTATE is a mandatory portable error identifier across SQL engines.
SQLSTATE comprises a 2-character class followed by a 3-character sub-class.
Spark prefers to re-use existing SQLSTATEs, preferably used by multiple vendors.
For extension Spark claims the 'K**' subclass range.
If a new class is needed it will also claim the 'K0' class.
For extension Spark claims the `K**` sub-class range.
If a new class is needed it will also claim the `K0` class.

Internal errors should use the 'XX' class. You can subdivide internal errors by component.
For example: The existing 'XXKD0' is used for an internal analyzer error.
Internal errors should use the `XX` class. You can subdivide internal errors by component.
For example: The existing `XXKD0` is used for an internal analyzer error.

#### Invariants

- Consistent across releases unless the error is internal.

#### ANSI/ISO standard

The following SQLSTATEs are collated from:
The SQLSTATEs in `error-states.json` are collated from:
- SQL2016
- DB2 zOS/LUW
- PostgreSQL 15
Expand Down
90 changes: 0 additions & 90 deletions common/utils/src/main/resources/error/error-categories.json

This file was deleted.

Loading