-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[C++][Parquet] Parquet writer supports writing int32/int64 for decimal type #15239
Comments
I will work on it shortly. cc @emkornfield @pitrou |
…15244) As the parquet [specs](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#decimal) states, DECIMAL can be used to annotate the following types: - int32: for 1 <= precision <= 9 - int64: for 1 <= precision <= 18; precision < 10 will produce a warning - fixed_len_byte_array: precision is limited by the array size. Length n can store <= floor(log_10(2^(8*n - 1) - 1)) base-10 digits - binary: precision is not limited, but is required. The minimum number of bytes to store the unscaled value should be used. The aim of this patch is to provide a writer option to use int32 to annotate decimal when 1 <= precision <= 9 and int64 when 10 <= precision <= 18. * Closes: #15239 Authored-by: Gang Wu <ustcwg@gmail.com> Signed-off-by: Will Jones <willjones127@gmail.com>
When talking about datasets (multiple parquet files) are the mixed physical types supported? Some files written using the old way, some files with the improved physical type. |
The physical type does not change the logical type in the Parquet file, just how the data is serialized. Datasets shouldn't care about the Parquet physical type; it should only care about the logical one. |
🥳 thanks! |
Describe the enhancement requested
As the parquet specs states below, decimal types with small precision can use int32/int64 physical types.
The aim of this issue is to provide a writer option to write decimal types using int32 when 1 <= precision <= 9 and int64 when 10 <= precision <= 18.
Component(s)
C++, Parquet
The text was updated successfully, but these errors were encountered: