Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revert "Add RetryMode to ConnectorTableFunction.analyze" #20777

Merged
merged 1 commit into from
Mar 1, 2024

Conversation

ebyhr
Copy link
Member

@ebyhr ebyhr commented Feb 20, 2024

Description

Revert "Add RetryMode to ConnectorTableFunction.analyze"

@cla-bot cla-bot bot added the cla-signed label Feb 20, 2024
@ebyhr ebyhr requested review from martint and findepi February 20, 2024 23:17
@findepi
Copy link
Member

findepi commented Feb 21, 2024

This PR removes possibility to have a table function with side effects. We should have a replacement APIs for such case.

@martint
Copy link
Member

martint commented Feb 21, 2024

Table functions are not currently meant to have side effects. For that, we need to add a lifecycle/commit protocol.

@findepi
Copy link
Member

findepi commented Feb 22, 2024

Table functions are not currently meant to have side effects.

Are we talking about table functions that exist in Trino, or about table functions that are possible to implement in 3rd party plugins?

AFAIK it's possible to implement table functions with side effects, but not everything is possible to implement

we need to add a lifecycle/commit protocol.

❤️ that would be awesome to have and would make spectrum of possible functions wider!

@martint
Copy link
Member

martint commented Feb 22, 2024

AFAIK it's possible to implement table functions with side effects, but not everything is possible to implement

It is possible, but it might not be safe to use them in all settings. So from the point of view of Trino, that usage is "unsupported". If someone wanted to disable such functions, they can do it via access control.

@findepi
Copy link
Member

findepi commented Feb 23, 2024

It is possible, but it might not be safe to use them in all settings.

Oh!... what are such setting?

@martint martint merged commit c578342 into master Mar 1, 2024
100 checks passed
@martint martint deleted the ebi/revert-spi branch March 1, 2024 16:18
@github-actions github-actions bot added this to the 440 milestone Mar 1, 2024
@findepi
Copy link
Member

findepi commented Mar 5, 2024

It is possible, but it might not be safe to use them in all settings.

Oh!... what are such setting?

It might have been discussed offline, but it would be good to clarify here as well.

@martint
Copy link
Member

martint commented Mar 5, 2024

It might have been discussed offline, but it would be good to clarify here as well.

Whenever a function has side effects, you need to consider:

  • What happens if the query fails mid-execution. Do you have out-of-band mechanisms to clean up those side effects?
  • If you're running with retries (either within Trino or via external mechanisms such as a client automatically retrying the query), the side effects need to be idempotent, or you may see duplicated effects.

If any of these apply to you and you have a function that may have side effects, you should consider whether you're ok with the potential consequences of those side effects (they may be benign), or whether you want to disallow those functions for the users/clusters/use cases that might be problematic.

@findepi
Copy link
Member

findepi commented Mar 5, 2024

Most of the same challenges apply to inserts into Hive tables or CTAS statements (and perhaps others - atomicity is currently not guaranteed by Trino given that some connectors operate on multiple systems such as storage and metastore)
External retries are not safe if the client does not know & understand what was the failure mode and whether some form of a cleanup is necessary.
Internal retries is something Trino does on its own (given they are enabled), and knowing whether something supports internal retries is important. That's why RetryMode was added to insert/ctas/others SPI methods. That's also why it should be in functions SPI methods.

@martint
Copy link
Member

martint commented Mar 5, 2024

That's why RetryMode was added to insert/ctas/others SPI methods. That's also why it should be in functions SPI methods.

Those are not the same. From the point of view of trino, Insert/ctas/etc are expected to have side effects. Functions are not. Writing functions with side effects is “at your own risk”.

That’s why earlier I said that to support side effects we need a commit protocol, just like insert/ctas/etc. have. Until that happens, it doesn’t make sense to have functions know about concepts related to side effects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants