-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support skip archive for Glue in Iceberg #14336
Conversation
@electrum Can you add |
For what it's worth, I think we should just get rid of glue archiving altogether. Either that, or at some point we should make it default skip archiving. Otherwise, every Trino Iceberg cluster will blow up after so many thousands of transactions in Glue (which shouldn't be the default behavior out of the box). |
9f653a8
to
affa7c8
Compare
To support removing old archives, we need two more API calls (getTableVersions & batchDeleteTableVersion{Async}) & two more IAM permissions (GetTableVersions & BatchDeleteTableVersion). I lean toward enabling the flag by default in the future. |
|
||
public class IcebergGlueCatalogConfig | ||
{ | ||
private boolean skipArchive; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why false
by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For backward compatibility. I can enable by default if there's no objection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's unlikely that most people care about the Glue table versions for Iceberg tables, but it's also not unreasonable that someone uses that.
i think we should have it disabled (false) for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, disabled by default makes sense.
Do you plan any follow-ups on hive/delta as well? |
48e5dfa
to
616e5b7
Compare
AWSCredentialsProvider credentialsProvider) | ||
{ | ||
this.fileSystemFactory = requireNonNull(fileSystemFactory, "fileSystemFactory is null"); | ||
this.stats = requireNonNull(stats, "stats is null"); | ||
requireNonNull(glueConfig, "glueConfig is null"); | ||
requireNonNull(credentialsProvider, "credentialsProvider is null"); | ||
this.glueClient = createAsyncGlueClient(glueConfig, credentialsProvider, Optional.empty(), stats.newRequestMetricsCollector()); | ||
this.glueClient = createAsyncGlueClient(glueConfig, credentialsProvider, Optional.of(new SkipArchiveRequestHandler(icebergGlueConfig.isSkipArchive())), stats.newRequestMetricsCollector()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like we want to unify this with plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/glue/TrinoGlueCatalogFactory.java by providing some Glue client provider interface, especially that createAsyncGlueClient
has this parameter Optional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a preparatory commit to unify them. Let me know if it's different from your intention.
616e5b7
to
81d22c7
Compare
public class GlueClientProvider | ||
implements Provider<AWSGlueAsync> | ||
{ | ||
private final AWSGlueAsync glueClient; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think the Provider.get
should do the instantiation so that the provided instance can be bound with various scopes.
ie. this works correctly only because the Module uses SINGLETON.
|
||
public class IcebergGlueCatalogConfig | ||
{ | ||
private boolean skipArchive; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's unlikely that most people care about the Glue table versions for Iceberg tables, but it's also not unreasonable that someone uses that.
i think we should have it disabled (false) for now
if (request instanceof UpdateTableRequest updateTableRequest) { | ||
return updateTableRequest.withSkipArchive(skipArchive); | ||
} | ||
return request; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glue Catalog uses only a couple of different requests.
Can we enumerate all these, to ensure this class gets updated if any new request variant is added?
for example, UpdateTableRequestV2 or something?
@ebyhr sorry for the delay. Done now. |
152f88c
to
bbb3182
Compare
bbb3182
to
fbcc74b
Compare
@ebyhr does this need to be included in docs as a new property, or is this just propagation of a property that already exists elsewhere? |
This one needs to be documented. (follow-up PR) |
Description
TestIcebergGlueCatalogSkipArchive.testSkipArchive
fails until updating IAM policy.Fixes #13413
Release notes
( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text: