You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the usage question you have. Please include as many useful details as possible.
parquet.thrift in parquet-format repo describes RowGrouptotal_byte_size field meaning as
Total byte size of all the uncompressed column data in this row group
This is also the case for C++ implementation of parquet in this repo.
But in case of Go implementation total_byte_size is described as
TotalByteSize is the total size of this rowgroup on disk
The difference between these values can be large, when compression is applied to column chunks.
My question is: Is that intentional inconsistency with format definition and other implementations? And if so, why does this distinction has been made?
Component(s)
Go, Parquet
The text was updated successfully, but these errors were encountered:
Describe the usage question you have. Please include as many useful details as possible.
parquet.thrift
in parquet-format repo describesRowGroup
total_byte_size
field meaning asThis is also the case for C++ implementation of parquet in this repo.
But in case of Go implementation
total_byte_size
is described asThe difference between these values can be large, when compression is applied to column chunks.
My question is: Is that intentional inconsistency with format definition and other implementations? And if so, why does this distinction has been made?
Component(s)
Go, Parquet
The text was updated successfully, but these errors were encountered: