Skip to content

Commit

Permalink
fix: use projected_table_schema for projection in DeltaSchemaAdapter
Browse files Browse the repository at this point in the history
After upgrading from deltalake 0.20.1 to 0.22.3 it looks like Parquet
column projection is  broken when using DeltaTable::scan. Instead of
scanning only the a single column, it looks like all columns are
fetched from storage.

Inspection with a debugger revelas that the adapted_projections are
wrong here:
https://github.com/apache/datafusion/blob/88f58bf929167c5c5e2250ad87caa88d4dff11e5/datafusion/core/src/datasource/physical_plan/parquet/opener.rs#L153-L159
The adapted_projections are obtained in
https://github.com/delta-io/delta-rs/blob/5b2f46b06e0eb508f932a8b39feb11b568a78a32/crates/core/src/delta_datafusion/schema_adapter.rs#L46-L60
Changing line 49 to use the projected_table_schema seems to solve the
problem.

Signed-off-by: Jonas Irgens Kylling <jonas@dune.xyz>
  • Loading branch information
jkylling authored and ion-elgreco committed Dec 20, 2024
1 parent 4874c12 commit 2272ff7
Showing 1 changed file with 6 additions and 1 deletion.
7 changes: 6 additions & 1 deletion crates/core/src/delta_datafusion/schema_adapter.rs
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,12 @@ impl SchemaAdapter for DeltaSchemaAdapter {
let mut projection = Vec::with_capacity(file_schema.fields().len());

for (file_idx, file_field) in file_schema.fields.iter().enumerate() {
if self.table_schema.fields().find(file_field.name()).is_some() {
if self
.projected_table_schema
.fields()
.find(file_field.name())
.is_some()
{
projection.push(file_idx);
}
}
Expand Down

0 comments on commit 2272ff7

Please sign in to comment.