Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow disabling client segment acknowledgment #24125

Merged
merged 1 commit into from
Nov 13, 2024

Conversation

wendigo
Copy link
Contributor

@wendigo wendigo commented Nov 13, 2024

This allows for storage-wide expiration policies to be implemented
that doesn't require explicit API call to remove an expired segment.

Description

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

## Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Nov 13, 2024
@wendigo wendigo requested a review from losipiuk November 13, 2024 17:04
@martint
Copy link
Member

martint commented Nov 13, 2024

What's the motivation for this? Does "files will be deleted by the pruning process" mean they are no longer deleted as soon as the client doesn't need them anymore? If so, this can cause the spool to run out of space for certain workloads if the pruning process is not tuned carefully.

@wendigo wendigo force-pushed the serafin/make-ack-uri-optional branch from c90c3a8 to bae01aa Compare November 13, 2024 17:49
@wendigo wendigo changed the title Disable explicit ack for direct storage retrieval Allow disabling client segment acknowledgment Nov 13, 2024
Copy link
Member

@losipiuk losipiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with configuration looks good to me

@losipiuk
Copy link
Member

What's the motivation for this? Does "files will be deleted by the pruning process" mean they are no longer deleted as soon as the client doesn't need them anymore? If so, this can cause the spool to run out of space for certain workloads if the pruning process is not tuned carefully.

IIUC from offline conversation the motivation is to support deployment cases when explicit deletion of data is not supported. Could be due to technical implications (filesystem implementation etc); or due to process built around Trino, if we want to be sure that data returned by query is available for multiple third parties for prolonged time.

@losipiuk
Copy link
Member

What's the motivation for this? Does "files will be deleted by the pruning process" mean they are no longer deleted as soon as the client doesn't need them anymore? If so, this can cause the spool to run out of space for certain workloads if the pruning process is not tuned carefully.

IIUC from offline conversation the motivation is to support deployment cases when explicit deletion of data is not supported. Could be due to technical implications (filesystem implementation etc); or due to process built around Trino, if we want to be sure that data returned by query is available for multiple third parties for prolonged time.

@wendigo maybe put motivation in commit message/ PR desc to make it more obvious for others/future self.

@martint
Copy link
Member

martint commented Nov 13, 2024

I don’t think we should be adding yet another configuration knob. I’d like to understand more why this is needed, what problem it’s solving, and think about whether there may be a better solution.

@electrum
Copy link
Member

electrum commented Nov 13, 2024

Is this related to using spooling for result set caching?

@wendigo
Copy link
Contributor Author

wendigo commented Nov 13, 2024

@electrum yes, exactly

@wendigo
Copy link
Contributor Author

wendigo commented Nov 13, 2024

@martint @electrum @losipiuk I've just checked that having a bucket retention policy makes basically deletes free, while explicit delete makes it incur an additional API call cost.

This allows for storage-wide expiration policies to be implemented
that doesn't require explicit API call to remove an expired segment.
@wendigo wendigo force-pushed the serafin/make-ack-uri-optional branch from bae01aa to 1072b38 Compare November 13, 2024 18:32
@wendigo wendigo merged commit 2125ef9 into master Nov 13, 2024
102 of 103 checks passed
@wendigo wendigo deleted the serafin/make-ack-uri-optional branch November 13, 2024 19:16
@github-actions github-actions bot added this to the 465 milestone Nov 13, 2024
@martint
Copy link
Member

martint commented Nov 14, 2024

This allows for storage-wide expiration policies to be implemented that doesn't require explicit API call to remove an expired segment.

Doing this at the protocol level doesn't seem the right place. If the underlying spooling implementation doesn't support on-demand deletion, or the implementation wants to have another policy, this should be done within the spool implementation. It should be transparent to the protocol, which should always indicate when it's done using a segment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants