-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-43427: [C++][Parquet] Deprecate ColumnChunk::file_offset field and no longer write Metadata at end of Chunk #43428
Conversation
|
Also cc @emkornfield @etseidl |
We should not write arrow/cpp/src/parquet/column_writer.cc Line 670 in 62ee676
|
cpp/src/parquet/column_writer.cc
Outdated
// Write metadata at end of column chunk | ||
metadata_->WriteTo(sink_.get()); | ||
|
||
// Not write metadata at end of column chunk since we will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather removing these lines to not confuse readers.
cpp/src/parquet/column_writer.cc
Outdated
@@ -667,8 +669,9 @@ class BufferedPageWriter : public PageWriter { | |||
has_dictionary, fallback, pager_->dict_encoding_stats_, | |||
pager_->data_encoding_stats_, pager_->meta_encryptor_); | |||
|
|||
// Write metadata at end of column chunk | |||
metadata_->WriteTo(in_memory_sink_.get()); | |||
// Not write metadata at end of column chunk since we will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
cpp/src/parquet/metadata.cc
Outdated
// https://github.com/apache/parquet-format/pull/440 | ||
// The `file_offset` field is deprecated and should be set to 0 for writer | ||
// if the column chunk has not been written outsidethe footer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// https://github.com/apache/parquet-format/pull/440 | |
// The `file_offset` field is deprecated and should be set to 0 for writer | |
// if the column chunk has not been written outsidethe footer. | |
// The `file_offset` field is deprecated and should be set to 0. | |
// See https://github.com/apache/parquet-format/pull/440 for detail. |
cpp/src/parquet/metadata.h
Outdated
// column chunk | ||
// Byte offset of `ColumnMetaData` in `file_path()`. | ||
// | ||
// Note that the meaning of this field has been inconsistent between implementations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Note that the meaning of this field has been inconsistent between implementations | |
// Note that the meaning of this field has been inconsistent among implementations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few nits. Looks great. Thanks for doing this @mapleFU!
cpp/src/parquet/metadata.h
Outdated
// Byte offset of `ColumnMetaData` in `file_path()`. | ||
// | ||
// Note that the meaning of this field has been inconsistent between implementations | ||
// so its use has since been deprecated in the Parquet specification. Moder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// so its use has since been deprecated in the Parquet specification. Moder | |
// so its use has since been deprecated in the Parquet specification. Modern |
cpp/src/parquet/metadata.cc
Outdated
} | ||
// https://github.com/apache/parquet-format/pull/440 | ||
// The `file_offset` field is deprecated and should be set to 0 for writer | ||
// if the column chunk has not been written outsidethe footer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// if the column chunk has not been written outsidethe footer. | |
// if the column chunk has not been written outside the footer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is removed lol
Would fix this later:
|
Failed CI is unrelated. Will merge if no more negative comment in August 6th |
Merge because collect 1 approve |
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 9b58454. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 31 possible false positives for unstable benchmarks that are known to sometimes produce them. |
Rationale for this change
See github issue
What changes are included in this PR?
Force
ColumnChunk::file_offset
tobe 0Are these changes tested?
No
Are there any user-facing changes?
Might affect user using legacy
ColumnChunk::file_offset