You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
While working filter pushdown for iceberg-rs: apache/iceberg-rust#295, I am going to use the APIs like ArrowPredicateFn and RowFilter.
When constructing ArrowPredicateFn for iceberg predicate, we provide a filtering function that takes RecordBatch based on the given projection.
The RecordBatch contains the columns specified in the projection. And we need to access correct column in the batch to evaluate the predicate.
For top-level column, it should be straightforward. But for nested column, seems no way to access the particular array from the RecordBatch.
We only have the projection (i.e., ProjectionMask) which contains indices of leaf columns in the batch.
For example, if the schema has [a, b, c] top columns. b is a struct column with [aa, bb, cc] columns. Give a predicate like cc > 1, and we know the leaf indices of the nested column cc is 3.
Is there API we can use to access the array of cc in the RecordBatch?
Describe the solution you'd like
Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered:
I don't believe we currently have a mechanism for nested projection of RecordBatch but this is something that I think would be generally useful and a worthwhile addition
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
While working filter pushdown for iceberg-rs: apache/iceberg-rust#295, I am going to use the APIs like
ArrowPredicateFn
andRowFilter
.When constructing
ArrowPredicateFn
for iceberg predicate, we provide a filtering function that takesRecordBatch
based on the given projection.The
RecordBatch
contains the columns specified in the projection. And we need to access correct column in the batch to evaluate the predicate.For top-level column, it should be straightforward. But for nested column, seems no way to access the particular array from the
RecordBatch
.We only have the projection (i.e.,
ProjectionMask
) which contains indices of leaf columns in the batch.For example, if the schema has
[a, b, c]
top columns.b
is a struct column with[aa, bb, cc]
columns. Give a predicate likecc > 1
, and we know the leaf indices of the nested columncc
is 3.Is there API we can use to access the array of
cc
in theRecordBatch
?Describe the solution you'd like
Describe alternatives you've considered
Additional context
The text was updated successfully, but these errors were encountered: