-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix reading ORC/PARQUET over empty clipped schema #5674
Fix reading ORC/PARQUET over empty clipped schema #5674
Conversation
Signed-off-by: sperlingxx <lovedreamf@gmail.com>
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks OK but there are questions about the consistency of acquiring the semaphore on empty batches. Could address this as part of #4568.
// not reading any data, so return a degenerate ColumnarBatch with the row count | ||
if (currentChunkMeta.numTotalRows == 0) { | ||
None | ||
} else { | ||
val rows = currentChunkMeta.numTotalRows.toInt | ||
// Someone is going to process this data, even if it is just a row count | ||
GpuSemaphore.acquireIfNecessary(TaskContext.get(), metrics(SEMAPHORE_WAIT_TIME)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is inconsistent with the semantics of the ORC scan below. In this case, it will acquire the semaphore despite returning a row-count-only batch, while the ORC case below will avoid acquiring the semaphore unless it actually generates null columns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aligned all GPU semaphore acquisitions for empty batches.
case meta: HostMemoryEmptyMetaData => | ||
// Not reading any data, but add in partition data if needed | ||
// Someone is going to process this data, even if it is just a row count | ||
GpuSemaphore.acquireIfNecessary(TaskContext.get(), metrics(SEMAPHORE_WAIT_TIME)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same inconsistent acquisition on empty batches issue here.
// Not reading any data, but add in partition data if needed | ||
// Someone is going to process this data, even if it is just a row count | ||
GpuSemaphore.acquireIfNecessary(TaskContext.get(), metrics(SEMAPHORE_WAIT_TIME)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same semaphore acquisition on empty batch consistency here.
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs upmerged
@sperlingxx does this also fix #5694? If not can you please take a look at that issue. |
55791f2
to
eeb3a61
Compare
Yes, @tgravescs. It will fix #5694 as well. |
I apologize that I did an upmerge to 22.08 by mistake. To fix the blunder, I had to run a force update to shift the base. |
build |
2 similar comments
build |
build |
build |
Fixes #5672
Fixes #5694
Fix reading ORC/PARQUET over empty clipped schema through bypassing the call of cuDF table reader.
Signed-off-by: sperlingxx lovedreamf@gmail.com