-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sequence support [databricks] #4376
Conversation
Signed-off-by: Bobby Wang <wbo4958@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looking really good. I assume all of this issues I am pointing out are just because this is still a work in progress.
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/collectionOperations.scala
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/collectionOperations.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/collectionOperations.scala
Outdated
Show resolved
Hide resolved
@revans2, Thx for the review. Yeah, this PR is still WIP, but it can work for IntegerType. Will refine this and add more types for support. But I'd not like to add TimeStamp and DateType for support in this PR, since the size calculation may be quite different which may cause this PR pretty big. |
Seq[ColumnVector] = { | ||
withResource(stop.sub(start)) { difference => | ||
withResource(Scalar.fromInt(1)) { scalarOne => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be GpuScalar(1, dataType)
or similar, so we can support various types not just integer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thx
withResource(difference.floorDiv(step)) { quotient => | ||
withResource(Scalar.fromInt(1)) { scalarOne => | ||
withResource(quotient.add(scalarOne)) { sizeWithNegative => | ||
withResource(Scalar.fromInt(0)) { scalarZero => | ||
withResource(sizeWithNegative.greaterOrEqualTo(scalarZero)) { pred => | ||
withResource(pred.ifElse(sizeWithNegative, scalarZero)) { tmpSize => | ||
// when start==stop, step==0, size will be 0. but we should change size to 1 | ||
withResource(difference.equalTo(scalarZero)) { diffHasZero => | ||
step match { | ||
case stepScalar: Scalar => | ||
withResource(ColumnVector.fromScalar(stepScalar, rows)) { stepV => | ||
withResource(stepV.equalTo(scalarZero)) { stepHasZero => | ||
withResource(diffHasZero.and(stepHasZero)) { predWithZero => | ||
predWithZero.ifElse(scalarOne, tmpSize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any way to get rid of such nested withResource
? Otherwise, this looks so cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, refined this.
withResource(numberScalar(dt, 1)) { one => | ||
withResource(quotient.add(one)) { sizeWithNegative => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When withResource
nests this deeply, that's usually an indication that we're holding onto one or more GPU results longer than necessary, adding undesired and avoidable memory pressure. For example, we compute quotient
here and only need it to compute sizeWithNegative
, yet we hold onto the GPU memory for the quotient
result until after the entire calculation completes. The memory can be freed earlier with something like this:
withResource(numberScalar(dt, 1)) { one =>
val sizeWithNegative = withResource(difference.floorDiv(step)) { quotient =>
quotient.add(one)
}
withResource(sizeWithNegative) { sizeWithNegative =>
....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx @jlowe. Changed this accordingly.
FYI: cudf PR has been merged. |
Thx for the information |
build |
build |
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/collectionOperations.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/collectionOperations.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/collectionOperations.scala
Outdated
Show resolved
Hide resolved
build |
build |
1 similar comment
build |
This is to close #3512. and this PR depends on rapidsai/cudf#9839 and #4376
For now, the PR only supports sequence on IntegerType.