Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Metadata wiped during compaction #1898

Open
1 of 3 tasks
boonware opened this issue Jul 11, 2023 · 2 comments
Open
1 of 3 tasks

[BUG] Metadata wiped during compaction #1898

boonware opened this issue Jul 11, 2023 · 2 comments
Labels
bug Something isn't working

Comments

@boonware
Copy link

boonware commented Jul 11, 2023

Bug

Describe the problem

My application which writes to a table also stores metadata in the table. After I ran the compaction operation on a table, the application attempted to read the most recent metadata from table history, but the metadata is now empty.

Steps to reproduce

  1. Write a DataFrame to a table and store metadata:
metadata_dict = ['a': 'b']
df.write.format('delta').mode('append').option('userMetadata', json.dumps(metadata_dict))
  1. Run compaction on the same table:
DeltaTable.forPath(spark, table_url).optimize().executeCompaction()
  1. Attempt to read metadata from most recent table history:
metadata = DeltaTable.forPath(spark, table_url).history(1).first()['userMetadata']
assert metadata == None

Observed results

Most recent table history is now None.

Expected results

Table history is not None and I can access the metadata stored when my application wrote a DataFrame to the table. Assuming the compaction operation is also writing to the table, these writes will not include my metadata. How can I access or preserve my metadata after a compaction operation.

Further details

N/A

Environment information

  • Delta Lake version: 2.1.0
  • Spark version: 3.3.2
  • Scala version: 2.12

Willingness to contribute

The Delta Lake Community encourages bug fix contributions. Would you or another member of your organization be willing to contribute a fix for this bug to the Delta Lake code base?

  • Yes. I can contribute a fix for this bug independently.
  • Yes. I would be willing to contribute a fix for this bug with guidance from the Delta Lake community.
  • No. I cannot contribute a bug fix at this time.
@boonware boonware added the bug Something isn't working label Jul 11, 2023
@felipepessoto
Copy link
Contributor

felipepessoto commented Jul 13, 2023

Isn't userMetadata per commit? In that case the commit from step 2 won't have any.

I think you need to consider another solution, like table properties.

@scottsand-db
Copy link
Collaborator

+1 ^

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants