Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Glue catalog in Iceberg connector #10151

Closed
wants to merge 1 commit into from

Conversation

jackye1995
Copy link
Member

Add support for reading Glue catalog data and creating Glue table, based on draft in #9646.

Also reorganized the files in the following ways:

  1. moved catalog related classes to the catalog module
  2. changed TrinoCatalogFactory to an interface to work with dependency injection
  3. split some logic in AbstractMetastoreTableOperations to AbstractIcebergTableOperations to share with Glue implementation
  4. add catalog config iceberg.default-schema-location as discussed in Support Iceberg default warehouse location config #9614
  5. add test profile test-iceberg-glue to not run the Glue test because it requires AWS setup. I have ran the test to make sure all 280 tests pass.

@jackye1995
Copy link
Member Author

@findepi @losipiuk @electrum

* on ways to set your AWS credentials which will be needed to run this test.
*/
public class TestIcebergGlueCatalogConnectorTest
extends TestIcebergParquetConnectorTest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could probably use BaseConnectorSmokeTest.

Also, a test class should not extend from a test class.
For code sharing, an explicit abstract Base.. class should be used

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to use BaseIcebergConnectorTest

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry did not read the first line, changed to smoke test instead

{
return createIcebergQueryRunner(
Map.of(),
Map.of("iceberg.file-format", "parquet",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Map.of("iceberg.file-format", "parquet",
Map.of(
"iceberg.file-format", "parquet",

public void testView()
{
assertThatThrownBy(super::testView)
.hasStackTraceContaining("createView is not supported by Trino Glue catalog");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trino Glue catalog -> Iceberg Glue catalog

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Comment on lines 101 to 129
@Override
public void testShowCreateSchema()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why override? document

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because Glue has no database user concept and thus does not support AUTHORIZATION USER

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the test in smoke test works, removed

@jackye1995 jackye1995 force-pushed the glue-catalog branch 2 times, most recently from b9feee2 to 3ac1f4d Compare December 9, 2021 07:34
@jackye1995 jackye1995 requested a review from findepi December 9, 2021 07:34
@jackye1995 jackye1995 force-pushed the glue-catalog branch 2 times, most recently from 523fede to 4ef57e9 Compare December 9, 2021 18:50
@jackye1995
Copy link
Member Author

@findepi Regarding running the Glue test, I checked Hive does run the Glue test suite using:

      - name: Run Hive Glue Tests
        env:
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESSKEY }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRETKEY }}
          AWS_REGION: us-east-2
        run: |
          if [ "${AWS_ACCESS_KEY_ID}" != "" ]; then
            $MAVEN test ${MAVEN_TEST} -pl :trino-hive -P test-hive-glue
          fi

But hive-tests is the only one that does this, Iceberg is in test-other-modules. I think it's probably cleaner to create a new suite dedicated for iceberg-test, please let me know if that's the right way to update CI, or if there is any other way preferred, thanks.

@jackye1995 jackye1995 requested a review from losipiuk December 10, 2021 18:42
</profile>

<profile>
<id>test-iceberg-glue</id>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be run on CI.

Maybe instead of a profile with single test, let's have a profile with all tests (no exclusions).

- ":trino-iceberg,:trino-druid"

would become

- ":trino-iceberg -P test-all,:trino-druid"

also, we should move trino-druid to a different group, maybe with kudu (need to check running times)

@@ -36,6 +36,8 @@
ICEBERG_CURSOR_ERROR(9, EXTERNAL),
ICEBERG_WRITE_VALIDATION_FAILED(10, INTERNAL_ERROR),
ICEBERG_INVALID_SNAPSHOT_ID(11, USER_ERROR),
ICEBERG_CATALOG_ERROR(12, EXTERNAL),
ICEBERG_COMMIT_ERROR(13, EXTERNAL)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extract commit introducing this error code.
It should be used in io.trino.plugin.iceberg.catalog.hms.AbstractMetastoreTableOperations#commitNewTable and io.trino.plugin.iceberg.catalog.hms.HiveMetastoreTableOperations#commitToExistingTable

return defaultSchemaLocation;
}

@Config("iceberg.default-schema-location")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is applicable to Glue only, so should be in a config class that's bound only when Glue catalog is used.
it should start with iceberg.glue.

@@ -37,6 +37,7 @@
import io.trino.plugin.hive.gcs.HiveGcsModule;
import io.trino.plugin.hive.metastore.HiveMetastore;
import io.trino.plugin.hive.s3.HiveS3Module;
import io.trino.plugin.iceberg.catalog.IcebergCatalogModule;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extract commit that introduces io.trino.plugin.iceberg.catalog package.

import static org.apache.iceberg.TableMetadataParser.getFileExtension;
import static org.apache.iceberg.TableProperties.METADATA_COMPRESSION;
import static org.apache.iceberg.TableProperties.METADATA_COMPRESSION_DEFAULT;
import static org.apache.iceberg.TableProperties.WRITE_METADATA_LOCATION;

@NotThreadSafe
public abstract class AbstractMetastoreTableOperations
public abstract class AbstractIcebergTableOperations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extract a commit which splits AbstractMetastoreTableOperations into AbstractMetastoreTableOperations and AbstractIcebergTableOperations

throw new TrinoException(ICEBERG_COMMIT_ERROR, format("Cannot commit %s due to unexpected exception", getSchemaTableName()), e);
}
finally {
cleanupMetadataLocation(!succeeded, newMetadataLocation);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can swallow exception in flight.

throw new TrinoException(ICEBERG_COMMIT_ERROR, format("Cannot commit %s because of concurrent update", getSchemaTableName()), e);
}
finally {
cleanupMetadataLocation(!succeeded, newMetadataLocation);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can swallow exception in flight.

try {
io().deleteFile(metadataLocation);
}
catch (RuntimeException ex) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ex -> e

return createIcebergQueryRunner(
ImmutableMap.of(),
ImmutableMap.of(
"iceberg.file-format", "orc",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

Comment on lines +47 to +63
protected boolean hasBehavior(TestingConnectorBehavior connectorBehavior)
{
switch (connectorBehavior) {
case SUPPORTS_RENAME_SCHEMA:
case SUPPORTS_COMMENT_ON_COLUMN:
case SUPPORTS_TOPN_PUSHDOWN:
case SUPPORTS_CREATE_VIEW:
case SUPPORTS_CREATE_MATERIALIZED_VIEW:
case SUPPORTS_RENAME_MATERIALIZED_VIEW:
case SUPPORTS_RENAME_MATERIALIZED_VIEW_ACROSS_SCHEMAS:
return false;

case SUPPORTS_DELETE:
return true;
default:
return super.hasBehavior(connectorBehavior);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class and TestIcebergConnectorSmokeTest should have a common base (BaseIcebergConnectorSmokeTest) capturing Iceberg behavior.

@findepi
Copy link
Member

findepi commented Dec 16, 2021

cc @phd3

import static org.apache.iceberg.BaseMetastoreTableOperations.ICEBERG_TABLE_TYPE_VALUE;
import static org.apache.iceberg.BaseMetastoreTableOperations.TABLE_TYPE_PROP;

public class GlueTableOperations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GlueIcebergTableOperations, seems more natural from the base class AbstractIcebergTableOperations

@Override
protected String getRefreshedLocation()
{
return stats.getGetTable().call(() -> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wrap the Glue API call in the lambda

@Inject
public GlueTableOperationsProvider(FileIoProvider fileIoProvider)
{
this.fileIoProvider = fileIoProvider;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requireNonNull

private final String catalogId;
private final GlueMetastoreStats stats;

private final Map<SchemaTableName, TableMetadata> tableMetadataCache = new ConcurrentHashMap<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are entries in this cache invalidated?

this.tableOperationsProvider = requireNonNull(tableOperationsProvider, "tableOperationsProvider is null");
this.glueClient = requireNonNull(glueClient, "glueClient is null");
this.stats = requireNonNull(stats, "stats is null");
this.catalogId = catalogId; // null is a valid catalogId, meaning the current account
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an @nullable annotation

@Override
public void renameNamespace(ConnectorSession session, String source, String target)
{
throw new TrinoException(NOT_SUPPORTED, "renameNamespace is not supported by Iceberg Glue catalog");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error message nit:

Suggested change
throw new TrinoException(NOT_SUPPORTED, "renameNamespace is not supported by Iceberg Glue catalog");
throw new TrinoException(NOT_SUPPORTED, "renameNamespace is not supported for Iceberg Glue catalogs");

@findepi
Copy link
Member

findepi commented Feb 7, 2022

Superseded by #10845

@findepi findepi closed this Feb 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants