Skip to content

Commit

Permalink
fix: Reading Parquet with Null dictionary page
Browse files Browse the repository at this point in the history
This fixes an issue with some Parquet writers that write dictionary pages for Null arrays (why?? I have no idea?).

Fixes pola-rs#18085.
Fixes pola-rs#18079.

Possibly also pola-rs#18061.
  • Loading branch information
coastalwhite committed Aug 8, 2024
1 parent 3dda47e commit de3db83
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 6 deletions.
2 changes: 2 additions & 0 deletions crates/polars-parquet/src/arrow/read/deserialize/null.rs
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,8 @@ pub fn iter_to_arrays(
data_type: ArrowDataType,
mut filter: Option<Filter>,
) -> ParquetResult<Box<dyn Array>> {
_ = iter.read_dict_page()?;

let num_rows = Filter::opt_num_rows(&filter, iter.total_num_values());

let mut len = 0usize;
Expand Down
14 changes: 8 additions & 6 deletions crates/polars-parquet/src/parquet/read/compression.rs
Original file line number Diff line number Diff line change
Expand Up @@ -215,12 +215,14 @@ impl Iterator for BasicDecompressor {
Some(Ok(p)) => p,
};

Some(decompress(page, &mut self.buffer).map(|p| {
if let Page::Data(p) = p {
p
} else {
panic!("Found compressed page in the middle of the pages")
}
Some(decompress(page, &mut self.buffer).and_then(|p| {
let Page::Data(p) = p else {
return Err(ParquetError::oos(
"Found dictionary page beyond the first page of a column chunk",
));
};

Ok(p)
}))
}

Expand Down

0 comments on commit de3db83

Please sign in to comment.