List state ttl #2

ericm-db · 2024-03-27T17:41:50Z

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

… throw internal error ### What changes were proposed in this pull request? This PR fixes the error messages and classes when Python UDFs are used in higher order functions. ### Why are the changes needed? To show the proper user-facing exceptions with error classes. ### Does this PR introduce _any_ user-facing change? Yes, previously it threw internal error such as: ```python from pyspark.sql.functions import transform, udf, col, array spark.range(1).select(transform(array("id"), lambda x: udf(lambda y: y)(x))).collect() ``` Before: ``` py4j.protocol.Py4JJavaError: An error occurred while calling o74.collectToPython. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 15 in stage 0.0 failed 1 times, most recent failure: Lost task 15.0 in stage 0.0 (TID 15) (ip-192-168-123-103.ap-northeast-2.compute.internal executor driver): org.apache.spark.SparkException: [INTERNAL_ERROR] Cannot evaluate expression: <lambda>(lambda x_0#3L)#2 SQLSTATE: XX000 at org.apache.spark.SparkException$.internalError(SparkException.scala:92) at org.apache.spark.SparkException$.internalError(SparkException.scala:96) ``` After: ``` pyspark.errors.exceptions.captured.AnalysisException: [INVALID_LAMBDA_FUNCTION_CALL.UNEVALUABLE] Invalid lambda function call. Python UDFs should be used in a lambda function at a higher order function. However, "<lambda>(lambda x_0#3L)" was a Python UDF. SQLSTATE: 42K0D; Project [transform(array(id#0L), lambdafunction(<lambda>(lambda x_0#3L)#2, lambda x_0#3L, false)) AS transform(array(id), lambdafunction(<lambda>(lambda x_0#3L), namedlambdavariable()))#4] +- Range (0, 1, step=1, splits=Some(16)) ``` ### How was this patch tested? Unittest was added ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47079 from HyukjinKwon/SPARK-48706. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Kent Yao <yao@apache.org>

sahnib and others added 16 commits March 26, 2024 13:07

Add support for ValueState TTL.

ff53e57

Fix existing testcases, and add TTL testcase.

1bc16f6

Improved documentation, added NERF error classes for user errors.

8c8c6b4

Regenerate docs for SQL Error conditions.

3a88c57

Add more testcases for value state ttl.

0abd3ac

Fix indentation.

71709fb

Add a placeholder TODO to use range scan once available.

2bb25df

Remove unnecessary log in TransformWithStateExec.

65f3737

Rebase with latest master

3828101

Suppress method checkstyle for TTLMode.

e917f7d

Modify Spark Connect tws API to include ttlMode.

5684646

Add support for ValueState TTL.

af2b430

Fix existing testcases, and add TTL testcase.

4a007a0

Improved documentation, added NERF error classes for user errors.

4e3445e

State TTL: Initial Commit

bf31813

init

7d03672

ericm-db changed the base branch from master to state-ttl March 27, 2024 17:42

github-actions bot added SQL STRUCTURED STREAMING DOCS BUILD CONNECT labels Mar 27, 2024

sahnib force-pushed the state-ttl branch 2 times, most recently from ab4000f to 69ad1e2 Compare April 1, 2024 15:48

ericm-db closed this Apr 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

List state ttl #2

List state ttl #2

ericm-db commented Mar 27, 2024

List state ttl #2

List state ttl #2

Conversation

ericm-db commented Mar 27, 2024

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?