Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Delta Sync for metadata sync to HMS/Glue #1478

Closed
agrawalreetika opened this issue Nov 10, 2022 · 14 comments
Closed

[Feature Request] Delta Sync for metadata sync to HMS/Glue #1478

agrawalreetika opened this issue Nov 10, 2022 · 14 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@agrawalreetika
Copy link

agrawalreetika commented Nov 10, 2022

Hudi has Hudi Sync which allows sync of table metadata from transaciton logs to HMS/Glue.
I wanted to know if there is something similar for Delta Tables?

@agrawalreetika agrawalreetika added the enhancement New feature or request label Nov 10, 2022
@tdas
Copy link
Contributor

tdas commented Nov 10, 2022

This will be a good feature to add. However, from my past experience, HMS interactions are often flaky and buggy. Do you know how well the Hudi sync works?

@agrawalreetika
Copy link
Author

@tdas Thanks for your response. I have tried Hudi Sync with Glue, haven't seen any issue so far

@agrawalreetika
Copy link
Author

Hi @tdas,
Could you please help me understand what kind of flakiness and bugs you faced earlier with HMS sync?
Is there any work done already on the sync of table metadata from transaction logs to HMS/Glue, which I could follow?

@dennyglee
Copy link
Contributor

Out of curiosity, would Glue Crawler reading Delta tables work in this scenario, or would you need to go beyond that? Shameless plug of a recent session by @moomindami and myself on this topic btw https://www.youtube.com/watch?v=GrqjZoVokNQ

@agrawalreetika
Copy link
Author

Sorry for the delay. Thanks, @danny for sharing the Video link.
I checked the Glue crawler which is reading the metadata from transaction logs and updating it to glue.
But it is creating symlink tables, looks like it is not configurable while configuring the crawler.
And I do not find any specific properties in the metadata to identify if it is a Delta Table.

Please correct me if I am missing something here.

@dennyglee
Copy link
Contributor

Oh, sorry, I had jumped too quickly ;-). Could you try AWS Glue 4.0 with Delta Lake 2.1?

@agrawalreetika
Copy link
Author

@dennyglee I am using Glue Crawler for updating and maintaining metadata for the Delta table in the Glue catalog.
As per the given document, it looks like it's for Glue jobs for data read/writte in Delta tables?

@agrawalreetika
Copy link
Author

@tdas @dennyglee What should be the next steps to get this feature, as I didn't find any option to do metadata sync to glue/HMS?

@dennyglee
Copy link
Contributor

@agrawalreetika Thanks for your patience - some quick questions:

  • Does the Glue catalog contain the information from the Glue crawler? Or is this lacking necessary metadata?
  • Would a post-commit web hook to load Glue, HMS, etc. be a potential solution?

@agrawalreetika
Copy link
Author

Hi @dennyglee, Thanks for your response.

  • Yes, Glue catalog stores the metadata via Glue Crawler from Delta table path but it creates Symlink table. There is no option in Glue Crawler to avoid that.
  • I am not sure about post-commit web hook, but I think if glue crawler could have configurable option for table type (metadata) it could be good. Though this is specific to glue. I was looking for some option similar to Hudi Sync tool which could be used to sync metadata from transaction logs to HMS/Glue.
    Please let me know if you have any other questions.

@agrawalreetika
Copy link
Author

Hi @dennyglee, Just checking in do you need any other details from my side?

@dhruvarya-db
Copy link
Collaborator

Hi all, I have started working on this issue.

@fuyun2024
Copy link

I am looking forward to the completion of this feature

@vkorukanti
Copy link
Collaborator

This is resolved in #2409

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Development

No branches or pull requests

7 participants