[FEA] Support arrays with nulls in the input of `GpuRowBasedUDF` for Spark 3.1.1+ #3942

firestarman · 2021-10-28T05:34:18Z

This is a follow on issue for #3897.

When the input of GpuRowBasedUDF contains arrays with nulls, the UDF will run into the below error for Spark 3.1.1+ during our Unit tests where the Java assertion is enabled.

  org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 27.0 failed 1 times, most recent failure: Lost task 0.0 in stage 27.0 (TID 27) (liangcail-ubuntu18 executor driver): java.lang.AssertionError:  value at 6 is null
	at ai.rapids.cudf.HostColumnVectorCore.assertsForGet(HostColumnVectorCore.java:228)
	at ai.rapids.cudf.HostColumnVectorCore.getInt(HostColumnVectorCore.java:254)
	at com.nvidia.spark.rapids.RapidsHostColumnVectorCore.getInt(RapidsHostColumnVectorCore.java:107)
	at org.apache.spark.sql.vectorized.ColumnarArray.getInt(ColumnarArray.java:128)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.MapObjects_0$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.apply(Unknown Source)
	at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:182)
	at org.apache.spark.sql.rapids.GpuRowBasedScalaUDF.$anonfun$scalaConverter$2(GpuScalaUDF.scala:123)
	at org.apache.spark.sql.rapids.GpuRowBasedScalaUDF.$anonfun$childAccessors$2(GpuScalaUDF.scala:156)

So in the PR #3897, we let it fall back to CPU for this case for now.
However, we need to eventually support this case for Spark 3.1.1+, since we already do for Spark3.0.x.

[Update] Everything works well as the CPU under production envs where the Java assertion is disabled.

The text was updated successfully, but these errors were encountered:

firestarman · 2021-10-28T06:13:33Z

Here is a related issue #3855

firestarman · 2021-10-28T08:43:05Z

Just found this test can pass if the code generation for SafeProjection is disabled.

jlowe · 2021-10-28T13:48:59Z

This is low priority, since the Spark 3.0.x implementation, which we adopt for now, is compatible with Spark 3.1.1+.

I'm confused. Is this a bug report for code that we are not going to commit? If so I say we just close this, and the PR that wants to update to the 3.1.1 conversion code will need to resolve the test failures before it is committed.

firestarman · 2021-10-29T07:08:44Z

I'm confused. Is this a bug report for code that we are not going to commit? If so I say we just close this, and the PR that wants to update to the 3.1.1 conversion code will need to resolve the test failures before it is committed.

I updated it to be a feature request after adding shim layers for GpuRowBasedUDF.

firestarman · 2021-11-02T06:54:18Z

The null error mentioned in description is probably a bug of code generation in Spark .

The below code (extracted from the file scala-test-detailed-output.log for this test case here) is generated from the 3 lines 1095-1097.

/* 076 */       while (loopIndex_0 < dataLength_0) {
/* 077 */         value_MapObject_lambda_variable_1 = (int) (value_4.getInt(loopIndex_0));
/* 078 */         isNull_MapObject_lambda_variable_1 = value_4.isNullAt(loopIndex_0);

value_4 is an ArrayData, and you can see here getInt is called before isNullAt.

/* 061 */     boolean isNull_4 = i.isNullAt(0);
/* 062 */     ArrayData value_4 = isNull_4 ?
/* 063 */     null : (i.getArray(0));

And the log I added for debugging in RapidsHostColumnVectorCore also proved this unexpected calling order.

Array of Int
====> lilc getArray
====> lilc getInt: row at 0 is not null
====> lilc isNullAt: row at 0 is not null
====> lilc getInt: row at 1 is not null
====> lilc isNullAt: row at 1 is not null
====> lilc getArray
====> lilc getInt: row at 2 is not null
====> lilc isNullAt: row at 2 is not null
====> lilc getInt: row at 3 is not null
====> lilc isNullAt: row at 3 is not null
...

But as we know, isNullAt should be called first for null check.

firestarman · 2021-11-02T07:39:41Z

On the other hand, seems it was not always reasonale to throw exception when trying to get a null value.
For this case here, even the getInt is called first, the returned value (value_MapObject_lambda_variable_1 ) will not be used. So we can return any value for reading nulls instead of throwing an exception.

/* 076 */       while (loopIndex_0 < dataLength_0) {
/* 077 */         value_MapObject_lambda_variable_1 = (int) (value_4.getInt(loopIndex_0));
/* 078 */         isNull_MapObject_lambda_variable_1 = value_4.isNullAt(loopIndex_0);
/* 079 */
/* 080 */
/* 081 */         if (isNull_MapObject_lambda_variable_1) {
/* 082 */           convertedArray_0[loopIndex_0] = null;
/* 083 */         } else {
/* 084 */           convertedArray_0[loopIndex_0] = value_MapObject_lambda_variable_1;
/* 085 */         }
/* 086 */
/* 087 */         loopIndex_0 += 1;
/* 088 */       }

firestarman added bug Something isn't working ? - Needs Triage Need team to review and classify labels Oct 28, 2021

firestarman changed the title ~~[BUG] One test in CPUBasedUDFSuite fails when running under Spark 3.1+~~ [FEA] Support arrays with nulls in the UDF input for GpuRowBasedUDF Oct 29, 2021

firestarman changed the title ~~[FEA] Support arrays with nulls in the UDF input for GpuRowBasedUDF~~ [FEA] Support arrays with nulls in the input of GpuRowBasedUDF for Spark 3.1.1+ Oct 29, 2021

firestarman mentioned this issue Oct 29, 2021

Support running CPU based UDF efficiently [databricks] #3897

Merged

3 tasks

jlowe mentioned this issue Oct 29, 2021

[FEA] CPU based UDF to run efficiently and transfer data back to GPU for supported operations #3855

Closed

firestarman mentioned this issue Nov 1, 2021

[FEA] Improvements for CPU(Row) based UDF #3979

Closed

3 tasks

firestarman mentioned this issue Nov 2, 2021

Use Catalyst converter for array with nulls #3995

Closed

wbo4958 mentioned this issue Nov 2, 2021

[BUG] Exception happened when getting a null row #3996

Closed

Salonijain27 removed the ? - Needs Triage Need team to review and classify label Nov 2, 2021

Salonijain27 assigned firestarman Nov 2, 2021

firestarman added task Work required that improves the product but is not user facing and removed bug Something isn't working labels Nov 8, 2021

firestarman mentioned this issue Nov 8, 2021

Support null safe host column vector #4052

Merged

firestarman closed this as completed in #4052 Nov 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Support arrays with nulls in the input of `GpuRowBasedUDF` for Spark 3.1.1+ #3942

[FEA] Support arrays with nulls in the input of `GpuRowBasedUDF` for Spark 3.1.1+ #3942

firestarman commented Oct 28, 2021 •

edited

Loading

firestarman commented Oct 28, 2021

firestarman commented Oct 28, 2021

jlowe commented Oct 28, 2021

firestarman commented Oct 29, 2021 •

edited

Loading

firestarman commented Nov 2, 2021 •

edited

Loading

firestarman commented Nov 2, 2021 •

edited

Loading

[FEA] Support arrays with nulls in the input of GpuRowBasedUDF for Spark 3.1.1+ #3942

[FEA] Support arrays with nulls in the input of GpuRowBasedUDF for Spark 3.1.1+ #3942

Comments

firestarman commented Oct 28, 2021 • edited Loading

firestarman commented Oct 28, 2021

firestarman commented Oct 28, 2021

jlowe commented Oct 28, 2021

firestarman commented Oct 29, 2021 • edited Loading

firestarman commented Nov 2, 2021 • edited Loading

firestarman commented Nov 2, 2021 • edited Loading

[FEA] Support arrays with nulls in the input of `GpuRowBasedUDF` for Spark 3.1.1+ #3942

[FEA] Support arrays with nulls in the input of `GpuRowBasedUDF` for Spark 3.1.1+ #3942

firestarman commented Oct 28, 2021 •

edited

Loading

firestarman commented Oct 29, 2021 •

edited

Loading

firestarman commented Nov 2, 2021 •

edited

Loading

firestarman commented Nov 2, 2021 •

edited

Loading