Porting GpuRowToColumnar converters to InternalColumnarRDDConverter #4206

wjxiz1992 · 2021-11-24T10:53:34Z

allow converting array, map, struct and decimal
the converter in GpuRowToColumanrExec are for InternalRow, while this is for Row

Signed-off-by: Allen Xu allxu@nvidia.com

- allow converting array, map, struct and decimal Signed-off-by: Allen Xu <allxu@nvidia.com>

jlowe

It would be nice to add tests for this. To what extent has it been tested already?

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala

...ugin/src/main/scala/org/apache/spark/sql/rapids/execution/InternalColumnarRddConverter.scala

firestarman

You only need to enable the nested types support for your case here, instead of updating the API definition for all cases.

  def convert(df: DataFrame): RDD[Table] = {
    val schema = df.schema
-   if (!GpuOverrides.areAllSupportedTypes(schema.map(_.dataType) :_*)) {
-     val unsupported = schema.map(_.dataType).filter(!GpuOverrides.isSupportedType(_)).toSet
+   val unsupported = schema.map(_.dataType).filter( d => 
+     !GpuOverrides.isSupportedType(d, allowArray=true, allowNesting=true, ....)
+   ).toSet
+   if (unsupported.nonEmpty) {
      throw new IllegalArgumentException(s"Cannot convert $df to GPU columnar $unsupported are " +
        s"not currently supported data types for columnar.")
    }

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala

Signed-off-by: Allen Xu <allxu@nvidia.com>

...src/test/scala/org/apache/spark/sql/rapids/execution/InternalColumnarRDDConverterSuite.scala

jlowe · 2021-11-30T18:45:10Z

java.lang.UnsatisfiedLinkError: ai.rapids.cudf.ColumnView.getNativeValidPointerSize(I)J

The problem is caused by a bug in cudf where ColumnView does not guarantee the cudf native libraries are loaded before attempting a JNI call into those libraries. I've posted a fix at rapidsai/cudf#9800.

Signed-off-by: Allen Xu <allxu@nvidia.com>

wjxiz1992 · 2021-12-02T13:46:26Z

build

...ugin/src/main/scala/org/apache/spark/sql/rapids/execution/InternalColumnarRddConverter.scala

…n/InternalColumnarRddConverter.scala Co-authored-by: Jason Lowe <jlowe@nvidia.com>

…to HEAD

wjxiz1992 · 2021-12-03T15:55:49Z

build

Signed-off-by: Allen Xu <allxu@nvidia.com>

wjxiz1992 · 2021-12-04T03:42:26Z

build

wjxiz1992 · 2021-12-04T06:19:50Z

build

wjxiz1992

Hi @jlowe , not in the unit tests, but in the performance tests for the PCA training, I saw several ERROR ColumnVector: A DEVICE COLUMN VECTOR WAS LEAKED.
After enabling the ai.rapids.refcount.debug, I can see the detailed logs:

21/12/06 16:54:40 ERROR ColumnVector: A DEVICE COLUMN VECTOR WAS LEAKED (ID: 503900 7f3d5c8a8730)
21/12/06 16:54:40 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
21/12/06 16:54:40 ERROR MemoryCleaner: Leaked vector (ID: 503900): 2021-12-06 16:54:08.0841 CST: INC
java.lang.Thread.getStackTrace(Thread.java:1556)
ai.rapids.cudf.MemoryCleaner$RefCountDebugItem.<init>(MemoryCleaner.java:301)
ai.rapids.cudf.MemoryCleaner$Cleaner.addRef(MemoryCleaner.java:82)
ai.rapids.cudf.ColumnVector.incRefCountInternal(ColumnVector.java:245)
ai.rapids.cudf.ColumnVector.<init>(ColumnVector.java:134)
ai.rapids.cudf.ColumnView$NestedColumnVector.createColumnVector(ColumnView.java:3835)
ai.rapids.cudf.HostColumnVector.copyToDevice(HostColumnVector.java:220)
ai.rapids.cudf.HostColumnVector$ColumnBuilder.buildAndPutOnDevice(HostColumnVector.java:1290)
com.nvidia.spark.rapids.GpuColumnVector$GpuColumnarBatchBuilder.buildAndPutOnDevice(GpuColumnVector.java:402)
com.nvidia.spark.rapids.GpuColumnVector$GpuColumnarBatchBuilderBase.build(GpuColumnVector.java:277)
org.apache.spark.sql.rapids.execution.ExternalRowToColumnarIterator.buildBatch(InternalColumnarRddConverter.scala:614)
org.apache.spark.sql.rapids.execution.ExternalRowToColumnarIterator.next(InternalColumnarRddConverter.scala:578)
org.apache.spark.sql.rapids.execution.ExternalRowToColumnarIterator.next(InternalColumnarRddConverter.scala:561)
scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
...

2021-12-06 16:54:08.0841 CST: INC
java.lang.Thread.getStackTrace(Thread.java:1556)
ai.rapids.cudf.MemoryCleaner$RefCountDebugItem.<init>(MemoryCleaner.java:301)
ai.rapids.cudf.MemoryCleaner$Cleaner.addRef(MemoryCleaner.java:82)
ai.rapids.cudf.ColumnVector.incRefCountInternal(ColumnVector.java:245)
ai.rapids.cudf.ColumnVector.incRefCount(ColumnVector.java:241)
ai.rapids.cudf.Table.<init>(Table.java:71)
com.nvidia.spark.rapids.GpuColumnVector.from(GpuColumnVector.java:610)
org.apache.spark.sql.rapids.execution.InternalColumnarRddConverter$.$anonfun$convert$10(InternalColumnarRddConverter.scala:724)
scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
...

2021-12-06 16:54:08.0841 CST: DEC
java.lang.Thread.getStackTrace(Thread.java:1556)
ai.rapids.cudf.MemoryCleaner$RefCountDebugItem.<init>(MemoryCleaner.java:301)
ai.rapids.cudf.MemoryCleaner$Cleaner.delRef(MemoryCleaner.java:90)
ai.rapids.cudf.ColumnVector.close(ColumnVector.java:213)
com.nvidia.spark.rapids.GpuColumnVector.close(GpuColumnVector.java:1045)
org.apache.spark.sql.vectorized.ColumnarBatch.close(ColumnarBatch.java:48)
org.apache.spark.sql.rapids.execution.InternalColumnarRddConverter$.$anonfun$convert$10(InternalColumnarRddConverter.scala:726)
scala.collection.Iterator$$anon$10.next(Iterator.scala:459)
...

The pattern is like "2 INC 1 DEC". But checking the code accordingly, the one that has no DEC, is this line. But I saw the comments in below lines saying that the consumer should be responsible to close it. I'm confused here, should it be closed somewhere manually?

revans2 · 2021-12-06T16:12:31Z

The InternalColumnarRDDConverter assumes that whatever is consuming the Tables it produces will close them. That is not happening in this case and the stack traces do not show enough information to know which part of the code is calling this and not closing the input Table. If it is only happening in PCA training then you need to look at that code.

jlowe · 2021-12-06T16:14:19Z

According to the leak stacktraces, the leak is caused by whoever is responsible for the resulting RDD[Table]. Note that we see a Table.<init> call that is incrementing references, but we do not see a corresponding Table.close call that is decrementing the column references. As documented in the ml integration docs, the resulting tables from the RDD[Table] must be closed, and that is not happening in this case. The PCA code probably has a bug where it's not always closing the tables it is using.

jlowe

Minor nit but otherwise lgtm.

jlowe · 2021-12-06T16:20:38Z

...ugin/src/main/scala/org/apache/spark/sql/rapids/execution/InternalColumnarRddConverter.scala

+      }
+    }
+
+    override def getNullSize: Double = OFFSET + VALIDITY


Nit: Can use VALIDITY_N_OFFSET here, also applies to the instance a few lines below.

wjxiz1992 · 2021-12-07T02:18:43Z

build

wjxiz1992 · 2021-12-07T02:28:31Z

Oh, yes, I create a table from my PCA side and didn't close it. Thx for the analysis.

firestarman · 2021-12-07T02:37:26Z

...ugin/src/main/scala/org/apache/spark/sql/rapids/execution/InternalColumnarRddConverter.scala

+
+  private object BooleanConverter extends TypeConverter {
+    override def append(row: Row,
+      column: Int,


Suggested change

column: Int,

column: Int,

NIT: 2 more spaces for parameters would be better.

Porting GpuRowToColumnar converters to InternalColumnarRDDConverter

98b8cea

- allow converting array, map, struct and decimal Signed-off-by: Allen Xu <allxu@nvidia.com>

wjxiz1992 requested review from wbo4958 and revans2 November 24, 2021 10:53

wjxiz1992 self-assigned this Nov 24, 2021

jlowe reviewed Nov 24, 2021

View reviewed changes

firestarman reviewed Nov 25, 2021

View reviewed changes

sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala Outdated Show resolved Hide resolved

update but still bugs

1fad51b

Signed-off-by: Allen Xu <allxu@nvidia.com>

wjxiz1992 marked this pull request as draft November 26, 2021 09:49

wjxiz1992 added 2 commits November 29, 2021 11:35

pass compile

2fadbd4

Signed-off-by: Allen Xu <allxu@nvidia.com>

bug fix

a873315

Signed-off-by: Allen Xu <allxu@nvidia.com>

wjxiz1992 commented Nov 30, 2021

View reviewed changes

...src/test/scala/org/apache/spark/sql/rapids/execution/InternalColumnarRDDConverterSuite.scala Show resolved Hide resolved

Add in tests for other supported types

7c56b23

Signed-off-by: Allen Xu <allxu@nvidia.com>

wjxiz1992 marked this pull request as ready for review December 2, 2021 13:46

fix typo

565776f

wjxiz1992 requested a review from jlowe December 2, 2021 13:51

jlowe reviewed Dec 2, 2021

View reviewed changes

wjxiz1992 and others added 3 commits December 3, 2021 18:34

Update sql-plugin/src/main/scala/org/apache/spark/sql/rapids/executio…

f1d67a2

…n/InternalColumnarRddConverter.scala Co-authored-by: Jason Lowe <jlowe@nvidia.com>

resolve comments

c0bc51a

Merge remote-tracking branch 'wjxiz/update-columnar-rdd-converter' in…

39f2d78

…to HEAD

wjxiz1992 added 2 commits December 4, 2021 11:40

Add license

c1b6c35

Signed-off-by: Allen Xu <allxu@nvidia.com>

fix typo

838cdf5

Signed-off-by: Allen Xu <allxu@nvidia.com>

update

cd5abb8

wjxiz1992 requested a review from jlowe December 6, 2021 07:20

wjxiz1992 commented Dec 6, 2021

View reviewed changes

jlowe previously approved these changes Dec 6, 2021

View reviewed changes

nit fixup

8c0eca8

wjxiz1992 dismissed jlowe’s stale review via 8c0eca8 December 7, 2021 02:13

firestarman approved these changes Dec 7, 2021

View reviewed changes

firestarman reviewed Dec 7, 2021

View reviewed changes

wjxiz1992 merged commit df4a047 into NVIDIA:branch-22.02 Dec 7, 2021

sameerz added the task Work required that improves the product but is not user facing label Dec 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Porting GpuRowToColumnar converters to InternalColumnarRDDConverter #4206

Porting GpuRowToColumnar converters to InternalColumnarRDDConverter #4206

wjxiz1992 commented Nov 24, 2021 •

edited

Loading

jlowe left a comment

firestarman left a comment •

edited

Loading

jlowe commented Nov 30, 2021

wjxiz1992 commented Dec 2, 2021

wjxiz1992 commented Dec 3, 2021

wjxiz1992 commented Dec 4, 2021

wjxiz1992 commented Dec 4, 2021

wjxiz1992 left a comment •

edited

Loading

revans2 commented Dec 6, 2021

jlowe commented Dec 6, 2021

jlowe left a comment

jlowe Dec 6, 2021

wjxiz1992 commented Dec 7, 2021

wjxiz1992 commented Dec 7, 2021

firestarman Dec 7, 2021

firestarman Dec 7, 2021

Porting GpuRowToColumnar converters to InternalColumnarRDDConverter #4206

Porting GpuRowToColumnar converters to InternalColumnarRDDConverter #4206

Conversation

wjxiz1992 commented Nov 24, 2021 • edited Loading

jlowe left a comment

Choose a reason for hiding this comment

firestarman left a comment • edited Loading

Choose a reason for hiding this comment

jlowe commented Nov 30, 2021

wjxiz1992 commented Dec 2, 2021

wjxiz1992 commented Dec 3, 2021

wjxiz1992 commented Dec 4, 2021

wjxiz1992 commented Dec 4, 2021

wjxiz1992 left a comment • edited Loading

Choose a reason for hiding this comment

revans2 commented Dec 6, 2021

jlowe commented Dec 6, 2021

jlowe left a comment

Choose a reason for hiding this comment

jlowe Dec 6, 2021

Choose a reason for hiding this comment

wjxiz1992 commented Dec 7, 2021

wjxiz1992 commented Dec 7, 2021

firestarman Dec 7, 2021

Choose a reason for hiding this comment

firestarman Dec 7, 2021

Choose a reason for hiding this comment

wjxiz1992 commented Nov 24, 2021 •

edited

Loading

firestarman left a comment •

edited

Loading

wjxiz1992 left a comment •

edited

Loading