bug: FileScanTask project_field_ids order could be inconsistent with the RecordBatch schema #627

chenzl25 · 2024-09-11T11:22:48Z

As we know, FileScanTask has two fields project_field_ids and schema. I think the RecordBatch from the reader of this FileScanTask should always follow the schema specified in FileScanTask. However, in some case the schema could be inconsistent.

Considering we have an iceberg table with schema (c1 int, c2 int, c3 int). If we select the table with this order c3, c2, c1. The RecordBatch schema still is c1, c2, c3 which confuses me a lot.

pub struct FileScanTask {
    data_file_path: String,
    project_field_ids: Vec<i32>,
    schema: SchemaRef,
    ...
}

The text was updated successfully, but these errors were encountered:

liurenjie1024 · 2024-09-26T02:47:38Z

I think this could be solve together with other problems like type promotion.

chenzl25 · 2024-11-01T04:38:30Z

I think this issue has been resolved by type promotion

chenzl25 mentioned this issue Sep 12, 2024

fix: reorder record batch #629

Closed

chenzl25 closed this as completed Nov 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: FileScanTask project_field_ids order could be inconsistent with the RecordBatch schema #627

bug: FileScanTask project_field_ids order could be inconsistent with the RecordBatch schema #627

chenzl25 commented Sep 11, 2024

liurenjie1024 commented Sep 26, 2024

chenzl25 commented Nov 1, 2024

bug: FileScanTask project_field_ids order could be inconsistent with the RecordBatch schema #627

bug: FileScanTask project_field_ids order could be inconsistent with the RecordBatch schema #627

Comments

chenzl25 commented Sep 11, 2024

liurenjie1024 commented Sep 26, 2024

chenzl25 commented Nov 1, 2024