Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48806][SQL] Pass actual exception when url_decode fails #47211

Closed
wants to merge 2 commits into from

Conversation

wForget
Copy link
Member

@wForget wForget commented Jul 4, 2024

What changes were proposed in this pull request?

Pass actual exception for url_decode.

Follow-up to https://issues.apache.org/jira/browse/SPARK-40156

Why are the changes needed?

Currently url_decode function ignores actual exception, which contains information that is useful for quickly locating the problem.

Like executing this sql:

select url_decode('https%3A%2F%2spark.apache.org'); 

We only get the error message:

org.apache.spark.SparkIllegalArgumentException: [CANNOT_DECODE_URL] The provided URL cannot be decoded: https%3A%2F%2spark.apache.org. Please ensure that the URL is properly formatted and try again.
    at org.apache.spark.sql.errors.QueryExecutionErrors$.illegalUrlError(QueryExecutionErrors.scala:376)
    at org.apache.spark.sql.catalyst.expressions.UrlCodec$.decode(urlExpressions.scala:118)
    at org.apache.spark.sql.catalyst.expressions.UrlCodec.decode(urlExpressions.scala) 

However, the actual useful exception information is ignored:

java.lang.IllegalArgumentException: URLDecoder: Illegal hex characters in escape (%) pattern - Error at index 1 in: "2s" 

After this pr we will get:

org.apache.spark.SparkIllegalArgumentException: [CANNOT_DECODE_URL] The provided URL cannot be decoded: https%3A%2F%2spark.apache.org. Please ensure that the URL is properly formatted and try again. SQLSTATE: 22546
	at org.apache.spark.sql.errors.QueryExecutionErrors$.illegalUrlError(QueryExecutionErrors.scala:372)
	at org.apache.spark.sql.catalyst.expressions.UrlCodec$.decode(urlExpressions.scala:119)
	at org.apache.spark.sql.catalyst.expressions.UrlCodec.decode(urlExpressions.scala)
	...
Caused by: java.lang.IllegalArgumentException: URLDecoder: Illegal hex characters in escape (%) pattern - Error at index 1 in: "2s"
	at java.base/java.net.URLDecoder.decode(URLDecoder.java:237)
	at java.base/java.net.URLDecoder.decode(URLDecoder.java:147)
	at org.apache.spark.sql.catalyst.expressions.UrlCodec$.decode(urlExpressions.scala:116)
	... 135 more

Does this PR introduce any user-facing change?

No

How was this patch tested?

unit test

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Jul 4, 2024
@wForget
Copy link
Member Author

wForget commented Jul 4, 2024

cc @zzzzming95 @MaxGekk Could you please take a look?

@yaooqinn yaooqinn closed this in 310f8ea Jul 4, 2024
yaooqinn added a commit that referenced this pull request Jul 4, 2024
### What changes were proposed in this pull request?

Pass actual exception for url_decode.

Follow-up to https://issues.apache.org/jira/browse/SPARK-40156

### Why are the changes needed?

Currently url_decode function ignores actual exception, which contains information that is useful for quickly locating the problem.

Like executing this sql:
```
select url_decode('https%3A%2F%2spark.apache.org');
```
We only get the error message:
```
org.apache.spark.SparkIllegalArgumentException: [CANNOT_DECODE_URL] The provided URL cannot be decoded: https%3A%2F%2spark.apache.org. Please ensure that the URL is properly formatted and try again.
    at org.apache.spark.sql.errors.QueryExecutionErrors$.illegalUrlError(QueryExecutionErrors.scala:376)
    at org.apache.spark.sql.catalyst.expressions.UrlCodec$.decode(urlExpressions.scala:118)
    at org.apache.spark.sql.catalyst.expressions.UrlCodec.decode(urlExpressions.scala)
```
However, the actual useful exception information is ignored:
```
java.lang.IllegalArgumentException: URLDecoder: Illegal hex characters in escape (%) pattern - Error at index 1 in: "2s"
```

After this pr we will get:

```
org.apache.spark.SparkIllegalArgumentException: [CANNOT_DECODE_URL] The provided URL cannot be decoded: https%3A%2F%2spark.apache.org. Please ensure that the URL is properly formatted and try again. SQLSTATE: 22546
	at org.apache.spark.sql.errors.QueryExecutionErrors$.illegalUrlError(QueryExecutionErrors.scala:372)
	at org.apache.spark.sql.catalyst.expressions.UrlCodec$.decode(urlExpressions.scala:119)
	at org.apache.spark.sql.catalyst.expressions.UrlCodec.decode(urlExpressions.scala)
	...
Caused by: java.lang.IllegalArgumentException: URLDecoder: Illegal hex characters in escape (%) pattern - Error at index 1 in: "2s"
	at java.base/java.net.URLDecoder.decode(URLDecoder.java:237)
	at java.base/java.net.URLDecoder.decode(URLDecoder.java:147)
	at org.apache.spark.sql.catalyst.expressions.UrlCodec$.decode(urlExpressions.scala:116)
	... 135 more
```

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

unit test

### Was this patch authored or co-authored using generative AI tooling?

No

Closes #47211 from wForget/SPARK-48806.

Lead-authored-by: wforget <643348094@qq.com>
Co-authored-by: Kent Yao <yao@apache.org>
Signed-off-by: Kent Yao <yao@apache.org>
(cherry picked from commit 310f8ea)
Signed-off-by: Kent Yao <yao@apache.org>
@yaooqinn
Copy link
Member

yaooqinn commented Jul 4, 2024

Merged to master and 3.5. Thank you @wForget

jerryzhou196 pushed a commit to jerryzhou196/spark that referenced this pull request Jul 4, 2024
### What changes were proposed in this pull request?

Pass actual exception for url_decode.

Follow-up to https://issues.apache.org/jira/browse/SPARK-40156

### Why are the changes needed?

Currently url_decode function ignores actual exception, which contains information that is useful for quickly locating the problem.

Like executing this sql:
```
select url_decode('https%3A%2F%2spark.apache.org');
```
We only get the error message:
```
org.apache.spark.SparkIllegalArgumentException: [CANNOT_DECODE_URL] The provided URL cannot be decoded: https%3A%2F%2spark.apache.org. Please ensure that the URL is properly formatted and try again.
    at org.apache.spark.sql.errors.QueryExecutionErrors$.illegalUrlError(QueryExecutionErrors.scala:376)
    at org.apache.spark.sql.catalyst.expressions.UrlCodec$.decode(urlExpressions.scala:118)
    at org.apache.spark.sql.catalyst.expressions.UrlCodec.decode(urlExpressions.scala)
```
However, the actual useful exception information is ignored:
```
java.lang.IllegalArgumentException: URLDecoder: Illegal hex characters in escape (%) pattern - Error at index 1 in: "2s"
```

After this pr we will get:

```
org.apache.spark.SparkIllegalArgumentException: [CANNOT_DECODE_URL] The provided URL cannot be decoded: https%3A%2F%2spark.apache.org. Please ensure that the URL is properly formatted and try again. SQLSTATE: 22546
	at org.apache.spark.sql.errors.QueryExecutionErrors$.illegalUrlError(QueryExecutionErrors.scala:372)
	at org.apache.spark.sql.catalyst.expressions.UrlCodec$.decode(urlExpressions.scala:119)
	at org.apache.spark.sql.catalyst.expressions.UrlCodec.decode(urlExpressions.scala)
	...
Caused by: java.lang.IllegalArgumentException: URLDecoder: Illegal hex characters in escape (%) pattern - Error at index 1 in: "2s"
	at java.base/java.net.URLDecoder.decode(URLDecoder.java:237)
	at java.base/java.net.URLDecoder.decode(URLDecoder.java:147)
	at org.apache.spark.sql.catalyst.expressions.UrlCodec$.decode(urlExpressions.scala:116)
	... 135 more
```

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

unit test

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#47211 from wForget/SPARK-48806.

Lead-authored-by: wforget <643348094@qq.com>
Co-authored-by: Kent Yao <yao@apache.org>
Signed-off-by: Kent Yao <yao@apache.org>
Signed-off-by: Jerry Zhou <j448zhou@uwaterloo.ca>
ericm-db pushed a commit to ericm-db/spark that referenced this pull request Jul 10, 2024
### What changes were proposed in this pull request?

Pass actual exception for url_decode.

Follow-up to https://issues.apache.org/jira/browse/SPARK-40156

### Why are the changes needed?

Currently url_decode function ignores actual exception, which contains information that is useful for quickly locating the problem.

Like executing this sql:
```
select url_decode('https%3A%2F%2spark.apache.org');
```
We only get the error message:
```
org.apache.spark.SparkIllegalArgumentException: [CANNOT_DECODE_URL] The provided URL cannot be decoded: https%3A%2F%2spark.apache.org. Please ensure that the URL is properly formatted and try again.
    at org.apache.spark.sql.errors.QueryExecutionErrors$.illegalUrlError(QueryExecutionErrors.scala:376)
    at org.apache.spark.sql.catalyst.expressions.UrlCodec$.decode(urlExpressions.scala:118)
    at org.apache.spark.sql.catalyst.expressions.UrlCodec.decode(urlExpressions.scala)
```
However, the actual useful exception information is ignored:
```
java.lang.IllegalArgumentException: URLDecoder: Illegal hex characters in escape (%) pattern - Error at index 1 in: "2s"
```

After this pr we will get:

```
org.apache.spark.SparkIllegalArgumentException: [CANNOT_DECODE_URL] The provided URL cannot be decoded: https%3A%2F%2spark.apache.org. Please ensure that the URL is properly formatted and try again. SQLSTATE: 22546
	at org.apache.spark.sql.errors.QueryExecutionErrors$.illegalUrlError(QueryExecutionErrors.scala:372)
	at org.apache.spark.sql.catalyst.expressions.UrlCodec$.decode(urlExpressions.scala:119)
	at org.apache.spark.sql.catalyst.expressions.UrlCodec.decode(urlExpressions.scala)
	...
Caused by: java.lang.IllegalArgumentException: URLDecoder: Illegal hex characters in escape (%) pattern - Error at index 1 in: "2s"
	at java.base/java.net.URLDecoder.decode(URLDecoder.java:237)
	at java.base/java.net.URLDecoder.decode(URLDecoder.java:147)
	at org.apache.spark.sql.catalyst.expressions.UrlCodec$.decode(urlExpressions.scala:116)
	... 135 more
```

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

unit test

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#47211 from wForget/SPARK-48806.

Lead-authored-by: wforget <643348094@qq.com>
Co-authored-by: Kent Yao <yao@apache.org>
Signed-off-by: Kent Yao <yao@apache.org>
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
### What changes were proposed in this pull request?

Pass actual exception for url_decode.

Follow-up to https://issues.apache.org/jira/browse/SPARK-40156

### Why are the changes needed?

Currently url_decode function ignores actual exception, which contains information that is useful for quickly locating the problem.

Like executing this sql:
```
select url_decode('https%3A%2F%2spark.apache.org');
```
We only get the error message:
```
org.apache.spark.SparkIllegalArgumentException: [CANNOT_DECODE_URL] The provided URL cannot be decoded: https%3A%2F%2spark.apache.org. Please ensure that the URL is properly formatted and try again.
    at org.apache.spark.sql.errors.QueryExecutionErrors$.illegalUrlError(QueryExecutionErrors.scala:376)
    at org.apache.spark.sql.catalyst.expressions.UrlCodec$.decode(urlExpressions.scala:118)
    at org.apache.spark.sql.catalyst.expressions.UrlCodec.decode(urlExpressions.scala)
```
However, the actual useful exception information is ignored:
```
java.lang.IllegalArgumentException: URLDecoder: Illegal hex characters in escape (%) pattern - Error at index 1 in: "2s"
```

After this pr we will get:

```
org.apache.spark.SparkIllegalArgumentException: [CANNOT_DECODE_URL] The provided URL cannot be decoded: https%3A%2F%2spark.apache.org. Please ensure that the URL is properly formatted and try again. SQLSTATE: 22546
	at org.apache.spark.sql.errors.QueryExecutionErrors$.illegalUrlError(QueryExecutionErrors.scala:372)
	at org.apache.spark.sql.catalyst.expressions.UrlCodec$.decode(urlExpressions.scala:119)
	at org.apache.spark.sql.catalyst.expressions.UrlCodec.decode(urlExpressions.scala)
	...
Caused by: java.lang.IllegalArgumentException: URLDecoder: Illegal hex characters in escape (%) pattern - Error at index 1 in: "2s"
	at java.base/java.net.URLDecoder.decode(URLDecoder.java:237)
	at java.base/java.net.URLDecoder.decode(URLDecoder.java:147)
	at org.apache.spark.sql.catalyst.expressions.UrlCodec$.decode(urlExpressions.scala:116)
	... 135 more
```

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

unit test

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#47211 from wForget/SPARK-48806.

Lead-authored-by: wforget <643348094@qq.com>
Co-authored-by: Kent Yao <yao@apache.org>
Signed-off-by: Kent Yao <yao@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants