Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use createOrOverwrite to create transaction log files on s3 #20945

Merged
merged 1 commit into from
Mar 6, 2024

Conversation

findinpath
Copy link
Contributor

@findinpath findinpath commented Mar 5, 2024

Description

Follow-up work for e178494 to ensure that transaction log files are written atomically on S3 for Delta Lake in order to avoid to end up on the storage half-baked (leading to table corruption).

Additional context and related issues

#20913

Release notes

(x) This is not user-visible or is docs only, and no release notes are required.

@cla-bot cla-bot bot added the cla-signed label Mar 5, 2024
@findinpath findinpath self-assigned this Mar 5, 2024
@findinpath findinpath added delta-lake Delta Lake connector and removed cla-signed labels Mar 5, 2024
@findinpath findinpath requested review from electrum, findepi and ebyhr March 5, 2024 21:20
outputStream.write(entryContents);
}
// write transaction log entry atomically by keeping in mind that S3 does not support creating files exclusively
fileSystem.newOutputFile(newLogEntryPath).createOrOverwrite(entryContents);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This no longer creates an empty file, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the creation or overwrite (s3 does not allow exclusive file writing) should be atomic.
The concern was actually here not to have it half-baked - containing not all the bytes of the content.

@ebyhr
Copy link
Member

ebyhr commented Mar 6, 2024

/test-with-secrets sha=e3fcd2514570cc0cc7a3ad3f1ab4b8ac216a9fc7

Copy link

github-actions bot commented Mar 6, 2024

The CI workflow run with tests that require additional secrets has been started: https://github.com/trinodb/trino/actions/runs/8164693715

@electrum electrum merged commit 4ebe042 into trinodb:master Mar 6, 2024
24 checks passed
@github-actions github-actions bot added this to the 440 milestone Mar 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
delta-lake Delta Lake connector
Development

Successfully merging this pull request may close these issues.

5 participants