Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support skip archive for Glue in Iceberg #14336

Merged
merged 2 commits into from
Oct 27, 2022
Merged

Conversation

ebyhr
Copy link
Member

@ebyhr ebyhr commented Sep 28, 2022

Description

TestIcebergGlueCatalogSkipArchive.testSkipArchive fails until updating IAM policy.
Fixes #13413

Release notes

( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# Iceberg
* Add support for skipping archive when committing a table in Glue when
  the `iceberg.glue.skip-archive` configuration property is set to true. ({issue}`13413`)

@ebyhr
Copy link
Member Author

ebyhr commented Sep 29, 2022

@electrum Can you add glue:GetTableVersions and glue:BatchDeleteTableVersion action in IAM for presto-ci account? Sample screenshot:
Screen Shot 2022-09-28 at 11 51 34

@erichwang
Copy link
Contributor

erichwang commented Oct 12, 2022

For what it's worth, I think we should just get rid of glue archiving altogether. Either that, or at some point we should make it default skip archiving. Otherwise, every Trino Iceberg cluster will blow up after so many thousands of transactions in Glue (which shouldn't be the default behavior out of the box).

@ebyhr ebyhr force-pushed the ebi/iceberg-glue-skip-archive branch 2 times, most recently from 9f653a8 to affa7c8 Compare October 12, 2022 04:54
@ebyhr
Copy link
Member Author

ebyhr commented Oct 12, 2022

To support removing old archives, we need two more API calls (getTableVersions & batchDeleteTableVersion{Async}) & two more IAM permissions (GetTableVersions & BatchDeleteTableVersion). I lean toward enabling the flag by default in the future.


public class IcebergGlueCatalogConfig
{
private boolean skipArchive;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why false by default?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For backward compatibility. I can enable by default if there's no objection.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's unlikely that most people care about the Glue table versions for Iceberg tables, but it's also not unreasonable that someone uses that.

i think we should have it disabled (false) for now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, disabled by default makes sense.

@findinpath
Copy link
Contributor

Do you plan any follow-ups on hive/delta as well?

@ebyhr ebyhr requested review from findepi and electrum October 13, 2022 00:03
@ebyhr ebyhr force-pushed the ebi/iceberg-glue-skip-archive branch 2 times, most recently from 48e5dfa to 616e5b7 Compare October 13, 2022 12:33
AWSCredentialsProvider credentialsProvider)
{
this.fileSystemFactory = requireNonNull(fileSystemFactory, "fileSystemFactory is null");
this.stats = requireNonNull(stats, "stats is null");
requireNonNull(glueConfig, "glueConfig is null");
requireNonNull(credentialsProvider, "credentialsProvider is null");
this.glueClient = createAsyncGlueClient(glueConfig, credentialsProvider, Optional.empty(), stats.newRequestMetricsCollector());
this.glueClient = createAsyncGlueClient(glueConfig, credentialsProvider, Optional.of(new SkipArchiveRequestHandler(icebergGlueConfig.isSkipArchive())), stats.newRequestMetricsCollector());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we want to unify this with plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/glue/TrinoGlueCatalogFactory.java by providing some Glue client provider interface, especially that createAsyncGlueClient has this parameter Optional

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a preparatory commit to unify them. Let me know if it's different from your intention.

@ebyhr ebyhr force-pushed the ebi/iceberg-glue-skip-archive branch from 616e5b7 to 81d22c7 Compare October 14, 2022 03:32
public class GlueClientProvider
implements Provider<AWSGlueAsync>
{
private final AWSGlueAsync glueClient;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the Provider.get should do the instantiation so that the provided instance can be bound with various scopes.
ie. this works correctly only because the Module uses SINGLETON.


public class IcebergGlueCatalogConfig
{
private boolean skipArchive;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's unlikely that most people care about the Glue table versions for Iceberg tables, but it's also not unreasonable that someone uses that.

i think we should have it disabled (false) for now

if (request instanceof UpdateTableRequest updateTableRequest) {
return updateTableRequest.withSkipArchive(skipArchive);
}
return request;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glue Catalog uses only a couple of different requests.
Can we enumerate all these, to ensure this class gets updated if any new request variant is added?
for example, UpdateTableRequestV2 or something?

@electrum
Copy link
Member

@ebyhr sorry for the delay. Done now.

@ebyhr ebyhr force-pushed the ebi/iceberg-glue-skip-archive branch 2 times, most recently from 152f88c to bbb3182 Compare October 26, 2022 05:45
@ebyhr ebyhr force-pushed the ebi/iceberg-glue-skip-archive branch from bbb3182 to fbcc74b Compare October 26, 2022 11:31
@ebyhr ebyhr merged commit d0eadf7 into master Oct 27, 2022
@ebyhr ebyhr deleted the ebi/iceberg-glue-skip-archive branch October 27, 2022 05:15
@ebyhr ebyhr mentioned this pull request Oct 27, 2022
@github-actions github-actions bot added this to the 402 milestone Oct 27, 2022
@colebow
Copy link
Member

colebow commented Nov 1, 2022

@ebyhr does this need to be included in docs as a new property, or is this just propagation of a property that already exists elsewhere?

@findinpath
Copy link
Contributor

@Config("iceberg.glue.skip-archive")
@ConfigDescription("Skip archiving an old table version when creating a new version in a commit")

This one needs to be documented. (follow-up PR)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Skip Glue archive in Iceberg table commits
6 participants