Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Guard Hive's OPTIMIZE table procedure with session property #9761

Merged
merged 1 commit into from
Oct 25, 2021

Conversation

losipiuk
Copy link
Member

OPTIMIZE procedure is disabled by default; even
though code is written in a way to avoid data loss, calling procedure
is inherently unsafe due to non transactional nature of
committing changes done to Hive table. If Trino looses connectivity to
HDFS cluster while deleting post-optimize data files duplicate rows will be
left in table and manual cleanup from user will be required.

@cla-bot cla-bot bot added the cla-signed label Oct 25, 2021
@losipiuk losipiuk requested a review from findepi October 25, 2021 14:04
OPTIMIZE procedure is disabled by default; even
though code is written in a way to avoid data loss, calling procedure
is inherently unsafe due to non transactional nature of
committing changes done to Hive table. If Trino looses connectivity to
HDFS cluster while deleting post-optimize data files duplicate rows will be
left in table and manual cleanup from user will be required.
@losipiuk losipiuk force-pushed the lo/guard-hive-optimize branch from d791773 to 2568c61 Compare October 25, 2021 14:06
@losipiuk losipiuk merged commit 7b727c9 into trinodb:master Oct 25, 2021
@github-actions github-actions bot added this to the 364 milestone Oct 25, 2021
@losipiuk losipiuk mentioned this pull request Oct 26, 2021
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

2 participants