Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: FileScanTask project_field_ids order could be inconsistent with the RecordBatch schema #627

Closed
chenzl25 opened this issue Sep 11, 2024 · 2 comments

Comments

@chenzl25
Copy link
Contributor

As we know, FileScanTask has two fields project_field_ids and schema. I think the RecordBatch from the reader of this FileScanTask should always follow the schema specified in FileScanTask. However, in some case the schema could be inconsistent.

Considering we have an iceberg table with schema (c1 int, c2 int, c3 int). If we select the table with this order c3, c2, c1. The RecordBatch schema still is c1, c2, c3 which confuses me a lot.

pub struct FileScanTask {
    data_file_path: String,
    project_field_ids: Vec<i32>,
    schema: SchemaRef,
    ...
}
@liurenjie1024
Copy link
Collaborator

I think this could be solve together with other problems like type promotion.

@chenzl25
Copy link
Contributor Author

chenzl25 commented Nov 1, 2024

I think this issue has been resolved by type promotion

@chenzl25 chenzl25 closed this as completed Nov 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants