-
Notifications
You must be signed in to change notification settings - Fork 931
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JNI: Add generateListOffsets API #10683
JNI: Add generateListOffsets API #10683
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-22.06 #10683 +/- ##
================================================
+ Coverage 86.38% 86.39% +0.01%
================================================
Files 142 142
Lines 22333 22304 -29
================================================
- Hits 19292 19270 -22
+ Misses 3041 3034 -7
Continue to review full report at Codecov.
|
#include <cudf/detail/iterator.cuh> | ||
#include <cudf/detail/valid_if.cuh> | ||
#include <cudf/strings/detail/utilities.cuh> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you're having to include a header from a detail
folder, something is wrong. These can break at any time, for any reason, and there will be no sympathy for any code that breaks.
auto const begin_iter = list_length.template begin<cudf::size_type>(); | ||
auto const end_iter = list_length.template end<cudf::size_type>(); | ||
|
||
return cudf::strings::detail::make_offsets_child_column(begin_iter, end_iter, stream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no reason to use a detail API here. This is just a scan and can use cudf::scan
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jrhemstad, we need to allocate a output buffer whose size is N + 1. Then, project the result of inclusive scan to the output buffer (with the offset 1).
Is thrust::inclusive_scan
more flexible to do this job than cudf::scan
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I am not wrong, we need to run an extra memcpy with the cudf::scan
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Humm, if we can't use the detail API then I hate to recommend this: just copy exactly the code from cudf::strings::detail::make_offsets_child_column
. It will break such dependency, with the cost of having duplicate implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, I rollbacked the change as it is unsafe to use a detail API outside libcudf.
Co-authored-by: Nghia Truong <ttnghia@users.noreply.github.com>
@@ -3473,6 +3473,16 @@ public final ColumnVector listSortRows(boolean isDescending, boolean isNullSmall | |||
return new ColumnVector(listSortRows(getNativeView(), isDescending, isNullSmallest)); | |||
} | |||
|
|||
/** | |||
* Generate list offsets from sizes of each list. | |||
* NOTICE: This column should type in INT32. And no null and negative value is allowed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this requirement for the input column ? What if it is not matched ? Better to add comment for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
rerun tests |
1 similar comment
rerun tests |
@gpucibot merge |
Add generateListOffsets API, converting list lengths to list offsets, which is useful in the development of spark-rapids.
For example, the support of array_repeat and arrays_zip relies on this API.