Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Support arrays with nulls in the input of GpuRowBasedUDF for Spark 3.1.1+ #3942

Closed
firestarman opened this issue Oct 28, 2021 · 6 comments · Fixed by #4052
Closed

[FEA] Support arrays with nulls in the input of GpuRowBasedUDF for Spark 3.1.1+ #3942

firestarman opened this issue Oct 28, 2021 · 6 comments · Fixed by #4052
Assignees
Labels
task Work required that improves the product but is not user facing

Comments

@firestarman
Copy link
Collaborator

firestarman commented Oct 28, 2021

This is a follow on issue for #3897.

When the input of GpuRowBasedUDF contains arrays with nulls, the UDF will run into the below error for Spark 3.1.1+ during our Unit tests where the Java assertion is enabled.

  org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 27.0 failed 1 times, most recent failure: Lost task 0.0 in stage 27.0 (TID 27) (liangcail-ubuntu18 executor driver): java.lang.AssertionError:  value at 6 is null
	at ai.rapids.cudf.HostColumnVectorCore.assertsForGet(HostColumnVectorCore.java:228)
	at ai.rapids.cudf.HostColumnVectorCore.getInt(HostColumnVectorCore.java:254)
	at com.nvidia.spark.rapids.RapidsHostColumnVectorCore.getInt(RapidsHostColumnVectorCore.java:107)
	at org.apache.spark.sql.vectorized.ColumnarArray.getInt(ColumnarArray.java:128)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.MapObjects_0$(Unknown Source)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificSafeProjection.apply(Unknown Source)
	at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$Deserializer.apply(ExpressionEncoder.scala:182)
	at org.apache.spark.sql.rapids.GpuRowBasedScalaUDF.$anonfun$scalaConverter$2(GpuScalaUDF.scala:123)
	at org.apache.spark.sql.rapids.GpuRowBasedScalaUDF.$anonfun$childAccessors$2(GpuScalaUDF.scala:156)

So in the PR #3897, we let it fall back to CPU for this case for now.
However, we need to eventually support this case for Spark 3.1.1+, since we already do for Spark3.0.x.

[Update] Everything works well as the CPU under production envs where the Java assertion is disabled.

@firestarman firestarman added bug Something isn't working ? - Needs Triage Need team to review and classify labels Oct 28, 2021
@firestarman
Copy link
Collaborator Author

Here is a related issue #3855

@firestarman
Copy link
Collaborator Author

Just found this test can pass if the code generation for SafeProjection is disabled.

@jlowe
Copy link
Contributor

jlowe commented Oct 28, 2021

This is low priority, since the Spark 3.0.x implementation, which we adopt for now, is compatible with Spark 3.1.1+.

I'm confused. Is this a bug report for code that we are not going to commit? If so I say we just close this, and the PR that wants to update to the 3.1.1 conversion code will need to resolve the test failures before it is committed.

@firestarman firestarman changed the title [BUG] One test in CPUBasedUDFSuite fails when running under Spark 3.1+ [FEA] Support arrays with nulls in the UDF input for GpuRowBasedUDF Oct 29, 2021
@firestarman firestarman changed the title [FEA] Support arrays with nulls in the UDF input for GpuRowBasedUDF [FEA] Support arrays with nulls in the input of GpuRowBasedUDF for Spark 3.1.1+ Oct 29, 2021
@firestarman
Copy link
Collaborator Author

firestarman commented Oct 29, 2021

I'm confused. Is this a bug report for code that we are not going to commit? If so I say we just close this, and the PR that wants to update to the 3.1.1 conversion code will need to resolve the test failures before it is committed.

I updated it to be a feature request after adding shim layers for GpuRowBasedUDF.

@firestarman
Copy link
Collaborator Author

firestarman commented Nov 2, 2021

The null error mentioned in description is probably a bug of code generation in Spark .

The below code (extracted from the file scala-test-detailed-output.log for this test case here) is generated from the 3 lines 1095-1097.

/* 076 */       while (loopIndex_0 < dataLength_0) {
/* 077 */         value_MapObject_lambda_variable_1 = (int) (value_4.getInt(loopIndex_0));
/* 078 */         isNull_MapObject_lambda_variable_1 = value_4.isNullAt(loopIndex_0);

value_4 is an ArrayData, and you can see here getInt is called before isNullAt.

/* 061 */     boolean isNull_4 = i.isNullAt(0);
/* 062 */     ArrayData value_4 = isNull_4 ?
/* 063 */     null : (i.getArray(0));

And the log I added for debugging in RapidsHostColumnVectorCore also proved this unexpected calling order.

Array of Int
====> lilc getArray
====> lilc getInt: row at 0 is not null
====> lilc isNullAt: row at 0 is not null
====> lilc getInt: row at 1 is not null
====> lilc isNullAt: row at 1 is not null
====> lilc getArray
====> lilc getInt: row at 2 is not null
====> lilc isNullAt: row at 2 is not null
====> lilc getInt: row at 3 is not null
====> lilc isNullAt: row at 3 is not null
...

But as we know, isNullAt should be called first for null check.

@firestarman
Copy link
Collaborator Author

firestarman commented Nov 2, 2021

On the other hand, seems it was not always reasonale to throw exception when trying to get a null value.
For this case here, even the getInt is called first, the returned value (value_MapObject_lambda_variable_1 ) will not be used. So we can return any value for reading nulls instead of throwing an exception.

/* 076 */       while (loopIndex_0 < dataLength_0) {
/* 077 */         value_MapObject_lambda_variable_1 = (int) (value_4.getInt(loopIndex_0));
/* 078 */         isNull_MapObject_lambda_variable_1 = value_4.isNullAt(loopIndex_0);
/* 079 */
/* 080 */
/* 081 */         if (isNull_MapObject_lambda_variable_1) {
/* 082 */           convertedArray_0[loopIndex_0] = null;
/* 083 */         } else {
/* 084 */           convertedArray_0[loopIndex_0] = value_MapObject_lambda_variable_1;
/* 085 */         }
/* 086 */
/* 087 */         loopIndex_0 += 1;
/* 088 */       }

@Salonijain27 Salonijain27 removed the ? - Needs Triage Need team to review and classify label Nov 2, 2021
@firestarman firestarman added task Work required that improves the product but is not user facing and removed bug Something isn't working labels Nov 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
task Work required that improves the product but is not user facing
Projects
None yet
3 participants