Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix failure when converting json stats to parquet in Delta Lake #15517

Merged
merged 2 commits into from
Jan 10, 2023

Conversation

ebyhr
Copy link
Member

@ebyhr ebyhr commented Dec 23, 2022

Description

Fixes #15496

Release notes

(x) Release notes are required, with the following suggested text:

# Delta Lake
* Fix failure when converting JSON statistics to Parquet format. ({issue}`15496`)

@cla-bot cla-bot bot added the cla-signed label Dec 23, 2022
@@ -110,7 +110,13 @@ public static Object jsonValueToTrinoValue(Type type, @Nullable Object jsonValue
return (long) (int) jsonValue;
}
if (type == BIGINT) {
return (long) (int) jsonValue;
if (jsonValue instanceof Long) {
return (long) jsonValue;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (IntelliJ eye candy) : can we add here //noinspection RedundantCast

same as here

if (icebergType instanceof Types.LongType) {
//noinspection RedundantCast
return (long) value;
}

}
if (jsonValue instanceof Integer) {
return (long) (int) jsonValue;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried out locally (snippet is used in io.trino.plugin.deltalake.transactionlog.statistics.DeltaLakeJsonFileStatistics)

parseJson(OBJECT_MAPPER, "{\"numRecords\":1,\"minValues\":{\"col\":0},\"maxValues\":{\"col\":9223372036854775807324},\"nullCount\":{\"col\":0}}", DeltaLakeJsonFileStatistics.class)

and the parsed value has BigInteger .

However, in this case, I agree that an exception should be thrown because the value provided is greater than bigint max value in Trino https://trino.io/docs/current/language/types.html#bigint

{"integer", "1", "2147483647", "0.0", "1", "2147483647"},
{"tinyint", "2", "127", "0.0", "2", "127"},
{"smallint", "3", "32767", "0.0", "3", "32767"},
{"bigint", "1000", "9223372036854775807", "0.0", "1000", "9223372036854775807"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add

{"bigint", "1001", "2147483647", "0.0", "1001", "2147483647"},

in order to test also the branch used to test also the branch if (jsonValue instanceof Integer) { in io.trino.plugin.deltalake.transactionlog.DeltaLakeParquetStatisticsUtils#jsonValueToTrinoValue

Copy link
Member Author

@ebyhr ebyhr Dec 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both branches are already covered in the existing case. 1000 goes to Integer and 9223372036854775807 goes to Long.

@ebyhr ebyhr force-pushed the ebi/delta-json-stats branch from 92e338e to d007838 Compare December 23, 2022 10:31
@ebyhr ebyhr requested a review from findepi January 4, 2023 00:34
@ebyhr ebyhr force-pushed the ebi/delta-json-stats branch from 4d6d895 to a4ed68d Compare January 6, 2023 23:56
@ebyhr
Copy link
Member Author

ebyhr commented Jan 6, 2023

Just rebased on upstream without changes.

@ebyhr ebyhr force-pushed the ebi/delta-json-stats branch from a4ed68d to f6b4863 Compare January 7, 2023 03:00
@ebyhr ebyhr self-assigned this Jan 10, 2023
@ebyhr ebyhr merged commit 7f9587a into trinodb:master Jan 10, 2023
@ebyhr ebyhr deleted the ebi/delta-json-stats branch January 10, 2023 10:20
@ebyhr ebyhr mentioned this pull request Jan 10, 2023
@github-actions github-actions bot added this to the 406 milestone Jan 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Converting Delta Lake JSON statistics to Parquet format fails
3 participants