-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support views in Delta lake Connector #11763
Conversation
@mdesmet the PR seems to cover AC changes that are unrelated to views. |
Some view tests within BaseConnectorTest failed because of existing AC setup, I looked at how it's done in iceberg and based myself on that but haven't really tested all AC use cases and the implications of this change. I will setup a separate PR. Also for clarity. in the original bug report Spark views were mentioned. I have not investigated that for now. I think If we want to support that we might need a translation layer similar like what Linkedin did for the HIve views (Coral). My focus has only been to support creation of views as in Iceberg/Hive. |
@@ -112,7 +112,7 @@ public void testNoColumnStats() | |||
assertQuery("SELECT c_str FROM no_column_stats WHERE c_int = 42", "VALUES 'foo'"); | |||
} | |||
|
|||
@Test | |||
@Test(enabled = false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is related to supporting all access modes. See #11782
I have added delta.security = system
and that causes the metadata to be retrieved in MetadataManager
(before the access control was set to SYSTEM),
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please reenable the test.
then let's think how to make it pass. maybe we don't need delta.security = system
here.
or maybe we need to merge the other PR first.
IDK yet, but let's have the test remind us (ie be failing until we can fix it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is related to the other PR. I will remove that commit here, but as I said before it seems that more tests are failing because of the AC setup in the BaseConnectorTest.
plugin/trino-delta-lake/src/main/java/io/trino/plugin/deltalake/DeltaLakeSecurityModule.java
Outdated
Show resolved
Hide resolved
@@ -118,7 +157,9 @@ public HiveMetastoreBackedDeltaLakeMetastore( | |||
{ | |||
Optional<Table> candidate = delegate.getTable(databaseName, tableName); | |||
candidate.ifPresent(table -> { | |||
if (!TABLE_PROVIDER_VALUE.equalsIgnoreCase(table.getParameters().get(TABLE_PROVIDER_PROPERTY))) { | |||
if (((table.getTableType().equals(EXTERNAL_TABLE.name()) || table.getTableType().equals(MANAGED_TABLE.name())) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one check per line pls (for readability sake)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be better to split these checks into two static util methods so that this line reads !isDeltaDataTable(table) && !isDeltaView(table)
} | ||
|
||
private static boolean isView(String tableType, Map<String, String> tableParameters) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove empty line
Delta, Hive and Iceberg can use the same catalog (eg Glue or HMS). |
Understood. Let's separate and iterate from there
This is totally different story. I don't know what's the level of support for Spark SQL in Coral. So far we were reading only HiveQL-based views with Coral, and that was really challenging to get right-ish. We can invest into SparkSQL-based views for Delta Lake, but that's certainly a project on its own. Let's have Trino Views supported first, just as we did for Iceberg. |
02a785a
to
2737614
Compare
So basically the issue is with following check in
Then in the That's why i had implemented the Access Control modes first in order to be able to leverage the existing BaseConnector tests as the system AC doesn't support DEFINER views. |
getView(session, name).ifPresent(view -> views.put(name, view)); | ||
} | ||
catch (TrinoException e) { | ||
if (e.getErrorCode().equals(HIVE_VIEW_TRANSLATION_ERROR.toErrorCode())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be simpler to have a Set with these 3 error codes(you can put comments in the place where set is created) and here only one if, something like:
if(errorCoded.contains(e.getErrorCode()) { //do nothing } else { throw e}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HIVE_VIEW_TRANSLATION_ERROR
should never be thrown here anyway, since we're not translating views.
2737614
to
1701125
Compare
2708f08
to
284cbe7
Compare
955b6bf
to
d8c4da0
Compare
|
||
public class HiveMetastoreBackedDeltaLakeMetastore | ||
implements DeltaLakeMetastore | ||
{ | ||
public static final String TABLE_PROVIDER_PROPERTY = "spark.sql.sources.provider"; | ||
public static final String TABLE_PROVIDER_VALUE = "DELTA"; | ||
|
||
// Be compatible with views defined by the Hive connector, which can be useful under certain conditions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
, which can be useful under certain conditions.
-> this part of the comment does add a bit of mistery to the statement, but has no real gain in understanding for the reader.
private List<String> listDatabases(Optional<String> database) | ||
{ | ||
if (database.isPresent()) { | ||
// TODO: should this decision logic go into DeltaLakeMetadata or here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a question for the review?
@@ -188,6 +224,106 @@ public void renameTable(ConnectorSession session, SchemaTableName from, SchemaTa | |||
delegate.renameTable(from.getSchemaName(), from.getTableName(), to.getSchemaName(), to.getTableName()); | |||
} | |||
|
|||
@Override | |||
public void createView(ConnectorSession session, SchemaTableName schemaViewName, ConnectorViewDefinition definition, boolean replace) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic from this method seems to be 1 to 1 copied from io.trino.plugin.iceberg.catalog.hms.TrinoHiveCatalog#createView
from trino-iceberg
.
This applies also for the other methods of the class newly introduced in this PR.
Can you think of a way to avoid such kind of code duplication?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code in Delta and Iceberg is indeed similar but different from the Hive connector as the Hive connector acts upon a SemiTransactionalhiveMetastore
with slightly different method calls. In Iceberg and Delta we don't have the SemiTransactionalhiveMetastore
.
To share the code we would need to create a helper class in the Hive connector (the easier part) with all views related operations. This helper class would need access to the HiveMetastore
implementation (which will relay to either actual Hive Metastore or GlueHiveMetastore
(Glue API calls) due to Guice wiring). This works well for Delta. However on Iceberg the implementation is different. It has two factories to create either a TrinoCatalog
for Hive metastore, which has access to the HiveMetastore
implementation, either to The Glue catalog which directly calls the Glue API.
We could fix the latter by also using Hive's GlueHiveMetastore
in the Icebergs
TrinoGlueCatalogin order for the helper class to be used in both cases. There are however also differences between the
TrinoGlueCatalogand
GlueHiveMetastore` that will need to be investigated and properly tested.
Code reuse is possible but at definitely at a much higher cost also it would probably be better handled in a separate PR. The question arises what we do first: provide the views feature and do the internal refactoring after or first refactor and delay the views feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd go for trying out (as discussed offline) for grouping together the code which would be otherwise repetitive after landing your PR. This seems rather cheap to do and shouldn't add an increased complexity in the code of Delta and Iceberg connectors.
Regarding TrinoGlueCatalog
I'd suggest to leave it as it is (at least for now).
The main purpose of my comment was DRY (Don't repeat yourself) and not a more complex refactoring (which in any case would not be in the scope of this PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have followed your advice and added a ViewsMetastoreHelper
class, potentially it could also be used for Hive but then we would have to use callbacks. I didn't do that for now.
@@ -170,6 +170,9 @@ protected QueryRunner createQueryRunner() | |||
protected boolean hasBehavior(TestingConnectorBehavior connectorBehavior) | |||
{ | |||
switch (connectorBehavior) { | |||
case SUPPORTS_CREATE_VIEW: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a follow-up to this PR, could you please create an issue (with good-first-issue
label) to add support for comments on Delta views ?
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/thrift/ThriftHiveMetastore.java
Outdated
Show resolved
Hide resolved
@@ -170,6 +170,9 @@ protected QueryRunner createQueryRunner() | |||
protected boolean hasBehavior(TestingConnectorBehavior connectorBehavior) | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR is missing a test where Trino views for Delta gets created on a Glue backed metastore.
Consider either such a test either in trino-delta-lake
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test added
c99bf35
to
a016c9c
Compare
a016c9c
to
e00f037
Compare
@@ -76,7 +76,6 @@ | |||
{ | |||
public static final String TABLE_PROVIDER_PROPERTY = "spark.sql.sources.provider"; | |||
public static final String TABLE_PROVIDER_VALUE = "DELTA"; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unrelated change - please restore.
String view = "test_glue_view_" + randomTableSuffix(); | ||
try { | ||
assertUpdate(format("CREATE VIEW %s AS SELECT 1 AS val ", view), 1); | ||
assertQuery("SELECT val FROM " + view, "VALUES (1)"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: The paranthesis in the expression VALUES (1)
are not necessary.
public class TestDeltaLakeViewsGlueMetastore | ||
extends AbstractTestQueryFramework | ||
{ | ||
protected static final String SCHEMA = "test_delta_lake_glue_views_" + randomTableSuffix(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why protected
? I think private
will do just fine.
return Optional.of(definition); | ||
} | ||
|
||
public static boolean isView(String tableType, Map<String, String> tableParameters) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method can stay private
} | ||
} | ||
|
||
public List<SchemaTableName> listViews(ConnectorSession session, Optional<String> database) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ConnectorSession session
parameter seems unused.
.map(table -> new SchemaTableName(schema, table)); | ||
} | ||
|
||
public Optional<ConnectorViewDefinition> getView(ConnectorSession session, SchemaTableName viewName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ConnectorSession session
parameter seems to be unused.
Make sure that also the other methods calling this method drop it as well : dropView
, getViews
.
view.getOwner())); | ||
} | ||
|
||
private Map<String, String> createViewProperties(ConnectorSession session) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use createViewProperties(ConnectorSession session, String trinoVersion, String connectorName)
call instead (to avoid code duplication) ?
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/thrift/ThriftHiveMetastore.java
Outdated
Show resolved
Hide resolved
} | ||
} | ||
|
||
@AfterClass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AfterClass(alwaysRun = true)
protected static final String SCHEMA = "test_delta_lake_glue_views_" + randomTableSuffix(); | ||
protected static final String CATALOG_NAME = "test_delta_lake_glue_views"; | ||
|
||
private File schemaLocation; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be converted to local variable.
@ebyhr : The only way to test that all three together is through a product test. It seems we currently don't have any configuration that has both Hive, Iceberg and Delta. Is that what we want or do you see another way to test it? |
b8fc01a
to
5f8597e
Compare
I think you can use BaseSharedMetastoreTest from Side note: @mdesmet the product test enviroment |
d335361
to
e27141c
Compare
...elta-lake/src/test/java/io/trino/plugin/deltalake/BaseDeltaLakeSharedMetastoreViewsTest.java
Show resolved
Hide resolved
...elta-lake/src/test/java/io/trino/plugin/deltalake/BaseDeltaLakeSharedMetastoreViewsTest.java
Outdated
Show resolved
Hide resolved
...elta-lake/src/test/java/io/trino/plugin/deltalake/BaseDeltaLakeSharedMetastoreViewsTest.java
Outdated
Show resolved
Hide resolved
...elta-lake/src/test/java/io/trino/plugin/deltalake/BaseDeltaLakeSharedMetastoreViewsTest.java
Outdated
Show resolved
Hide resolved
...elta-lake/src/test/java/io/trino/plugin/deltalake/BaseDeltaLakeSharedMetastoreViewsTest.java
Outdated
Show resolved
Hide resolved
...elta-lake/src/test/java/io/trino/plugin/deltalake/TestDeltaLakeSharedFileMetastoreViews.java
Outdated
Show resolved
Hide resolved
...elta-lake/src/test/java/io/trino/plugin/deltalake/TestDeltaLakeSharedGlueMetastoreViews.java
Outdated
Show resolved
Hide resolved
...-lake/src/test/java/io/trino/plugin/deltalake/metastore/glue/TestDeltaLakeGlueMetastore.java
Show resolved
Hide resolved
.../src/test/java/io/trino/plugin/deltalake/metastore/glue/TestDeltaLakeViewsGlueMetastore.java
Outdated
Show resolved
Hide resolved
.../src/test/java/io/trino/plugin/deltalake/metastore/glue/TestDeltaLakeViewsGlueMetastore.java
Outdated
Show resolved
Hide resolved
e27141c
to
2f14321
Compare
8792f4c
to
0cfc576
Compare
@ebyhr : PTAL |
/test-with-secrets sha=0cfc5761b4c02ec5137f9dfb751137f637bf42d0 |
...elta-lake/src/test/java/io/trino/plugin/deltalake/BaseDeltaLakeSharedMetastoreViewsTest.java
Show resolved
Hide resolved
...elta-lake/src/test/java/io/trino/plugin/deltalake/BaseDeltaLakeSharedMetastoreViewsTest.java
Outdated
Show resolved
Hide resolved
...elta-lake/src/test/java/io/trino/plugin/deltalake/TestDeltaLakeSharedGlueMetastoreViews.java
Outdated
Show resolved
Hide resolved
.../src/test/java/io/trino/plugin/deltalake/metastore/glue/TestDeltaLakeViewsGlueMetastore.java
Show resolved
Hide resolved
.../src/test/java/io/trino/plugin/deltalake/metastore/glue/TestDeltaLakeViewsGlueMetastore.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/TrinoViewHiveMetastore.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/TrinoViewUtil.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/glue/GlueHiveMetastore.java
Outdated
Show resolved
Hide resolved
0cfc576
to
cd69957
Compare
/test-with-secrets sha=cd69957b50ee03ac83f57a86fa77017c47d65c4f https://github.com/trinodb/trino/actions/runs/3615689927 |
...elta-lake/src/test/java/io/trino/plugin/deltalake/BaseDeltaLakeSharedMetastoreViewsTest.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/glue/TrinoGlueCatalog.java
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/glue/TrinoGlueCatalog.java
Outdated
Show resolved
Hide resolved
cd69957
to
94226a3
Compare
/test-with-secrets sha=94226a3573dde2ac87ba7cadb5cf607174f4bf8f https://github.com/trinodb/trino/actions/runs/3644481861 |
|
||
if (!isPrestoView(tableParameters)) { | ||
// Hive views are not compatible | ||
throw new HiveViewNotSupportedException(viewName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the situations where isView
above return true (the method is defined as isHiveOrPrestoView(tableType) && PRESTO_VIEW_COMMENT.equals(tableParameters.get(TABLE_COMMENT))
) but isPrestoView
will return false?
intuitively this line is not reachable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(i am changing this in #18570)
Description
Support views in Delta lake connector
Improvement
Delta lake connector
Related issues, pull requests, and links
Release notes
(x) Release notes entries required with the following suggested text: