Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support timestamp type in Iceberg migrate procedure #17391

Conversation

marcinsbd
Copy link
Contributor

@marcinsbd marcinsbd commented May 8, 2023

Description

Fixes #17006

Hive:
It occurs that Hive takes the timestamp and adjust the timestamp to get UTC value according to the writer's TZ. It saves the timestamp in UTC and adds also the writer TZ info to the footer.
During read, it reads the timestamp in UTC and adjust it according to the writer's TZ.

Trino uses the following property hive.parquet.time-zone for setting Parquet reader's and Parquet writer's timezone when dealing with Hive.

Iceberg:
Trino always sets the UTC timezone for Parquet reader's and Parquet writer's timezone.
In this way, it always return UTC values.

The introduced change:

  • For Orc the timestamp without timezone will be map as timestamp without timezone.
  • For Parquet the timestamp without the timezone will be map to timestamp with timezone. There is an assumption made that timezones values stored in Hive files are in the UTC.

#17785

Release notes

(x) Release notes are required, with the following suggested text:

# Iceberg
* Support migrating hive tables with `timestamp` type columns through `migrate` procedure. ({issue}`17006`)

@cla-bot cla-bot bot added the cla-signed label May 8, 2023
@marcinsbd marcinsbd requested a review from ebyhr May 8, 2023 15:29
@github-actions github-actions bot added the iceberg Iceberg connector label May 8, 2023
ebyhr
ebyhr previously requested changes May 8, 2023
Copy link
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the commit title likes:

Support timestamp(3) type in Iceberg migrate procedure

You don't need to include "Fix issue 17006" in the PR and commit title. Please add "Fixes #17006" to the PR description so that we can close the issue automatically.

@marcinsbd marcinsbd changed the title Fix issue 17006, migrating timestamp(3) col into iceberg Support timestamp(3) type in Iceberg migrate procedure May 9, 2023
@ebyhr
Copy link
Member

ebyhr commented May 16, 2023

@marcinsbd What's the current status of this PR?

@findinpath findinpath self-requested a review May 17, 2023 04:15
@marcinsbd marcinsbd marked this pull request as draft May 17, 2023 09:46
@rotem-ad
Copy link
Member

I'd love to see this PR merged as well (:
I'm currently unable to migrate our most important table due to this bug.

@marcinsbd marcinsbd force-pushed the migrate_table_from_hive_to_iceber_with_timestamp3 branch from 9d48ba0 to b4bfb24 Compare May 18, 2023 00:41
@marcinsbd marcinsbd force-pushed the migrate_table_from_hive_to_iceber_with_timestamp3 branch 4 times, most recently from 9075fb3 to daa31f5 Compare May 19, 2023 01:46
@marcinsbd marcinsbd requested a review from ebyhr May 19, 2023 01:48
@marcinsbd marcinsbd marked this pull request as ready for review May 19, 2023 05:50
@marcinsbd marcinsbd force-pushed the migrate_table_from_hive_to_iceber_with_timestamp3 branch from daa31f5 to 797bfb1 Compare May 19, 2023 10:31
@marcinsbd marcinsbd requested review from homar and ebyhr May 19, 2023 10:42
@marcinsbd marcinsbd force-pushed the migrate_table_from_hive_to_iceber_with_timestamp3 branch from 797bfb1 to aa49221 Compare May 19, 2023 17:38
@ebyhr ebyhr dismissed their stale review May 24, 2023 08:41

Comment addressed

@marcinsbd marcinsbd force-pushed the migrate_table_from_hive_to_iceber_with_timestamp3 branch from aa49221 to 27294a0 Compare May 24, 2023 09:51
@marcinsbd marcinsbd force-pushed the migrate_table_from_hive_to_iceber_with_timestamp3 branch from 27294a0 to 782d3ab Compare May 30, 2023 09:30
@marcinsbd marcinsbd force-pushed the migrate_table_from_hive_to_iceber_with_timestamp3 branch 2 times, most recently from 064da03 to dc8eec1 Compare September 27, 2024 14:43
@marcinsbd marcinsbd force-pushed the migrate_table_from_hive_to_iceber_with_timestamp3 branch 3 times, most recently from cf70bb3 to 8ec037d Compare October 1, 2024 10:13
Copy link
Member

@raunaqmorarka raunaqmorarka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % comments

@marcinsbd marcinsbd force-pushed the migrate_table_from_hive_to_iceber_with_timestamp3 branch from 8ec037d to 012c96e Compare October 4, 2024 15:34
@marcinsbd marcinsbd force-pushed the migrate_table_from_hive_to_iceber_with_timestamp3 branch from 012c96e to d9c5e6c Compare October 4, 2024 19:12
@marcinsbd marcinsbd force-pushed the migrate_table_from_hive_to_iceber_with_timestamp3 branch from d9c5e6c to 0cd9b4d Compare October 7, 2024 14:13
Copy link
Member

@ebyhr ebyhr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you fix CI failures?

@marcinsbd marcinsbd force-pushed the migrate_table_from_hive_to_iceber_with_timestamp3 branch from 0cd9b4d to 4d8367f Compare October 8, 2024 11:16
@marcinsbd marcinsbd force-pushed the migrate_table_from_hive_to_iceber_with_timestamp3 branch from 4d8367f to 32b4e68 Compare October 8, 2024 11:41
@raunaqmorarka raunaqmorarka force-pushed the migrate_table_from_hive_to_iceber_with_timestamp3 branch from 32b4e68 to 0ba2aad Compare October 8, 2024 12:06
@marcinsbd marcinsbd force-pushed the migrate_table_from_hive_to_iceber_with_timestamp3 branch from 0ba2aad to 01efcfd Compare October 8, 2024 12:15
@marcinsbd marcinsbd force-pushed the migrate_table_from_hive_to_iceber_with_timestamp3 branch from 01efcfd to 725c3e4 Compare October 8, 2024 12:47
@raunaqmorarka raunaqmorarka merged commit 1b758af into trinodb:master Oct 8, 2024
49 checks passed
@github-actions github-actions bot added this to the 461 milestone Oct 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed docs iceberg Iceberg connector stale-ignore Use this label on PRs that should be ignored by the stale bot so they are not flagged or closed.
Development

Successfully merging this pull request may close these issues.

iceberg cannot migrate table from hive with timestamp(3) column
8 participants