Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-3743] Support DELETE_PARTITION for metadata table #5169

Merged
merged 8 commits into from
Apr 1, 2022

Conversation

codope
Copy link
Member

@codope codope commented Mar 29, 2022

What is the purpose of the pull request

In order to drop any metadata partition (index), we can reuse the DELETE_PARTITION operation in metadata table. Subsequent to this, we can support drop index (with table config update) for async metadata indexer.

Brief change log

  • Add a new API in HoodieTableMetadataWriter
  • Current only supported for Spark metadata writer

Verify this pull request

Added a unit test in TestHoodieBackedMetadata, which creates multiple metadata partitions and then drops one. Asserted that there are no file slice from the dropped partition.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@codope codope force-pushed the metadata-delete-partition branch from f13c723 to 25b886d Compare March 29, 2022 13:19
@XuQianJin-Stars
Copy link
Contributor

LGTM, In addition, I want to wait for #4489 to merge in and then merge this?

@codope
Copy link
Member Author

codope commented Mar 30, 2022

LGTM, In addition, I want to wait for #4489 to merge in and then merge this?

Got it. I'll wait for that to land first.

@codope codope added the priority:critical production down; pipelines stalled; Need help asap. label Mar 30, 2022
@nsivabalan nsivabalan added priority:blocker and removed priority:critical production down; pipelines stalled; Need help asap. labels Mar 30, 2022
Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@codope codope force-pushed the metadata-delete-partition branch from 0912c87 to dfe7b9c Compare March 31, 2022 08:31
@codope codope force-pushed the metadata-delete-partition branch from 338c69a to 6c89a72 Compare March 31, 2022 20:34
@hudi-bot
Copy link

hudi-bot commented Apr 1, 2022

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@nsivabalan nsivabalan merged commit a048e94 into apache:master Apr 1, 2022
try {
// Because the partition of BaseTableMetadata has been deleted,
// all partition information can only be obtained from FileSystemBackedTableMetadata.
FileSystemBackedTableMetadata fsBackedTableMetadata = new FileSystemBackedTableMetadata(context,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codope : do we really need to go w/ FileSystemBackedMetadata here? this is causing performance issues for some cases when getPartitionPathsForFullCleaning is being used.
#6373

nsivabalan added a commit that referenced this pull request Apr 21, 2023
#8384)

- Looks like when we fallback to full partition cleaning in clean planner, we do FS based listing even though metadata is enabled. It was added in #5169 mainly due to how delete_partition was designed back then. Later delete_partition logic evolved and now we should be good to make this metadata based if applicable.
yihua pushed a commit to yihua/hudi that referenced this pull request May 15, 2023
apache#8384)

- Looks like when we fallback to full partition cleaning in clean planner, we do FS based listing even though metadata is enabled. It was added in apache#5169 mainly due to how delete_partition was designed back then. Later delete_partition logic evolved and now we should be good to make this metadata based if applicable.
yihua pushed a commit to yihua/hudi that referenced this pull request May 15, 2023
apache#8384)

- Looks like when we fallback to full partition cleaning in clean planner, we do FS based listing even though metadata is enabled. It was added in apache#5169 mainly due to how delete_partition was designed back then. Later delete_partition logic evolved and now we should be good to make this metadata based if applicable.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants