Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-3404] Automatically adjust write configs based on metadata table and write concurrency mode #4975

Conversation

yihua
Copy link
Contributor

@yihua yihua commented Mar 8, 2022

What is the purpose of the pull request

This PR adds logic to automatically adjust write configs based on metadata table enablement and write concurrency mode, to prevent possible missing configs that are required for enabling metadata table and multi-writer. Write configs in scope are:

hoodie.metadata.enable (read-only in the new logic)
hoodie.write.concurrency.mode (can be auto updated in the new logic)
hoodie.write.lock.provider (can be auto updated in the new logic)
hoodie.cleaner.policy.failed.writes (can be auto updated in the new logic)

The following scenarios are checked:

Metadata Table Async Table Services Lock Provider Set by User Action
Enabled Enabled No Set hoodie.write.concurrency.mode=optimistic_concurrency_control, hoodie.write.lock.provider=org.apache.hudi.client.transaction.lock.InProcessLockProvider and hoodie.cleaner.policy.failed.writes=LAZY
Enabled Enabled Yes Set hoodie.cleaner.policy.failed.writes=LAZY if user has set hoodie.write.concurrency.mode=optimistic_concurrency_control
Enabled Disabled Yes/No Set hoodie.cleaner.policy.failed.writes=LAZY if user has set hoodie.write.concurrency.mode=optimistic_concurrency_control
Disabled Enabled/Disabled Yes/No Set hoodie.cleaner.policy.failed.writes=LAZY if user has set hoodie.write.concurrency.mode=optimistic_concurrency_control

Note that the logic also depends on Hudi table type, since by default, MOR table has async compaction enabled, while COW table has all inline table services.

Brief change log

  • Adds auto config adjustment to in HoodieWriteConfig.
  • Adds unit tests for different user-defined config scenarios in TestHoodieWriteConfig to make sure the logic is correct.

Verify this pull request

This change adds tests in TestHoodieWriteConfig to verify the logic.

Committer checklist

  • Has a corresponding JIRA in PR title & commit

  • Commit message is descriptive of the change

  • CI is green

  • Necessary doc changes done or have another open PR

  • For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.

@nsivabalan nsivabalan self-assigned this Mar 8, 2022
Copy link
Contributor

@nsivabalan nsivabalan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I am not wrong, we discussed that w/ deltatstreamer, its not easy to distinguish if async table services are enabled since no explicit configs are set. So, don't we need some fix in deltastreamer class towards this ?

@nsivabalan
Copy link
Contributor

based on our discussion, lets add table type check and check for compaction only for MOR.
bcoz, incase of Deltastreamer, no explicit compaction configs are set, but still we need to detect if async table services are enabled or not.

@yihua
Copy link
Contributor Author

yihua commented Mar 17, 2022

based on our discussion, lets add table type check and check for compaction only for MOR. bcoz, incase of Deltastreamer, no explicit compaction configs are set, but still we need to detect if async table services are enabled or not.

I added the logic based on table type. By default COW table should not have any async table service enabled, while MOR table have async compaction enabled. So the auto config adjustment for metadata table is different for different types of tables.

@yihua yihua force-pushed the HUDI-3404-auto-adjust-metadata-table-related-configs branch from 0c1aad3 to 1bc5f11 Compare March 17, 2022 00:14
@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@nsivabalan
Copy link
Contributor

Good job on the patch Ethan!

@nsivabalan nsivabalan merged commit 95e6e53 into apache:master Mar 17, 2022
vingov pushed a commit to vingov/hudi that referenced this pull request Apr 3, 2022
stayrascal pushed a commit to stayrascal/hudi that referenced this pull request Apr 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants