Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async FileStreamer does not write statistics #139

Closed
TurnOfACard opened this issue May 10, 2022 · 2 comments · Fixed by #144
Closed

Async FileStreamer does not write statistics #139

TurnOfACard opened this issue May 10, 2022 · 2 comments · Fixed by #144
Labels
bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed

Comments

@TurnOfACard
Copy link
Contributor

Compare:

parquet2/src/write/file.rs

Lines 123 to 186 in 47edd88

pub fn end(&mut self, key_value_metadata: Option<Vec<KeyValue>>) -> Result<u64> {
// compute file stats
let num_rows = self.row_groups.iter().map(|group| group.num_rows).sum();
if self.options.write_statistics {
// write column indexes (require page statistics)
self.row_groups
.iter_mut()
.zip(self.page_specs.iter())
.try_for_each(|(group, pages)| {
group.columns.iter_mut().zip(pages.iter()).try_for_each(
|(column, pages)| {
let offset = self.offset;
column.column_index_offset = Some(offset as i64);
self.offset += write_column_index(&mut self.writer, pages)?;
let length = self.offset - offset;
column.column_index_length = Some(length as i32);
Result::Ok(())
},
)?;
Result::Ok(())
})?;
};
// write offset index
self.row_groups
.iter_mut()
.zip(self.page_specs.iter())
.try_for_each(|(group, pages)| {
group
.columns
.iter_mut()
.zip(pages.iter())
.try_for_each(|(column, pages)| {
let offset = self.offset;
column.offset_index_offset = Some(offset as i64);
self.offset += write_offset_index(&mut self.writer, pages)?;
column.offset_index_length = Some((self.offset - offset) as i32);
Result::Ok(())
})?;
Result::Ok(())
})?;
let metadata = FileMetaData::new(
self.options.version.into(),
self.schema.clone().into_thrift(),
num_rows,
self.row_groups.clone(),
key_value_metadata,
self.created_by.clone(),
None,
None,
None,
);
let len = end_file(&mut self.writer, metadata)?;
Ok(self.offset + len)
}
/// Returns the underlying writer.
pub fn into_inner(self) -> W {
self.writer
}
}

pub async fn end(mut self, key_value_metadata: Option<Vec<KeyValue>>) -> Result<(u64, W)> {
// compute file stats
let num_rows = self.row_groups.iter().map(|group| group.num_rows).sum();
let metadata = FileMetaData::new(
self.options.version.into(),
self.schema.into_thrift(),
num_rows,
self.row_groups,
key_value_metadata,
self.created_by,
None,
None,
None,
);
let len = end_file(&mut self.writer, metadata).await?;
Ok((self.offset + len, self.writer))
}

@jorgecarleitao jorgecarleitao added bug Something isn't working help wanted Extra attention is needed good first issue Good for newcomers labels May 10, 2022
@jorgecarleitao
Copy link
Owner

indeed, it slipped - would you like to PR it or should I?

@TurnOfACard
Copy link
Contributor Author

I'm happy to look into it; I could perhaps check it out next weekend?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants