-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flush transaction log cache in Delta flush_metadata_cache
procedure
#16466
Conversation
flush_metadata_cache
procedure
b5015ef
to
aa7008d
Compare
...duct-tests/src/main/java/io/trino/tests/product/deltalake/TestDeltaLakeActiveFilesCache.java
Outdated
Show resolved
Hide resolved
...duct-tests/src/main/java/io/trino/tests/product/deltalake/TestDeltaLakeActiveFilesCache.java
Outdated
Show resolved
Hide resolved
...src/main/java/io/trino/plugin/deltalake/metastore/HiveMetastoreBackedDeltaLakeMetastore.java
Show resolved
Hide resolved
...in/trino-hive/src/main/java/io/trino/plugin/hive/metastore/DecoratedHiveMetastoreModule.java
Show resolved
Hide resolved
@@ -56,7 +68,9 @@ protected void setup(Binder binder) | |||
newExporter(binder).export(HiveMetastoreFactory.class) | |||
.as(generator -> generator.generatedNameOf(CachingHiveMetastore.class)); | |||
|
|||
newSetBinder(binder, Procedure.class).addBinding().toProvider(FlushHiveMetastoreCacheProcedure.class).in(Scopes.SINGLETON); | |||
if (installFlushMetadataCacheProcedure) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want this procedure installed for iceberg
?
tbh, I didn't know that it existed in iceberg already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather have the method always available in the connector (register it on HiveProcedureModule
) and throw an exception (as your implementation does for delta) in case that the caching metastore is not used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added another commit to avoid installing the procedure in Iceberg.
I'd rather have the method always available in the connector
What's the rationale or motivation of this? My opinion is basically the opposite (no need to install a procedure when it's unusable).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the rationale or motivation of this?
I was thinking more from a user perspective where a user may be confused that the procedure is not available (e.g. : while using AWS Glue metastore) for the hive/delta connector even though is advertised to be available in the documentation.
...ino-delta-lake/src/main/java/io/trino/plugin/deltalake/procedure/FlushMetadataProcedure.java
Outdated
Show resolved
Hide resolved
aa7008d
to
8e50366
Compare
0e7d6ac
to
266aec9
Compare
...ino-delta-lake/src/main/java/io/trino/plugin/deltalake/procedure/FlushMetadataProcedure.java
Outdated
Show resolved
Hide resolved
...ino-delta-lake/src/main/java/io/trino/plugin/deltalake/procedure/FlushMetadataProcedure.java
Outdated
Show resolved
Hide resolved
266aec9
to
5a09dfb
Compare
...elta-lake/src/main/java/io/trino/plugin/deltalake/procedure/FlushMetadataCacheProcedure.java
Outdated
Show resolved
Hide resolved
...elta-lake/src/main/java/io/trino/plugin/deltalake/procedure/FlushMetadataCacheProcedure.java
Outdated
Show resolved
Hide resolved
...a-lake/src/test/java/io/trino/plugin/deltalake/TestDeltaLakeFlushMetadataCacheProcedure.java
Outdated
Show resolved
Hide resolved
...a-lake/src/test/java/io/trino/plugin/deltalake/TestDeltaLakeFlushMetadataCacheProcedure.java
Outdated
Show resolved
Hide resolved
5a09dfb
to
e08137e
Compare
...in/trino-hive/src/main/java/io/trino/plugin/hive/metastore/DecoratedHiveMetastoreModule.java
Outdated
Show resolved
Hide resolved
@@ -42,6 +42,18 @@ | |||
public class DecoratedHiveMetastoreModule | |||
extends AbstractConfigurationAwareModule | |||
{ | |||
private final boolean installFlushMetadataCacheProcedure; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the flush procedure should be installed whenever caching is supported
what about changing the parameter name and logic.
for example, we could skip binding CachingHiveMetastoreConfig
for iceberg (and maybe some others)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the flush procedure should be installed whenever caching is supported
This makes it hard to exclude Hive's flush_metadata_cache
from Delta Lake connector. I initially tried to exclude the procedure from DeltaLakeConnector#getProcedures
, but Procedure
class doesn't contain catalog or connector information (MethodHandle doesn't work nicely to compare objects).
TransactionLogAccess transactionLogAccess) | ||
{ | ||
this.metastoreFactory = requireNonNull(metastoreFactory, "metastoreFactory is null"); | ||
this.cachingHiveMetastore = requireNonNull(cachingHiveMetastore, "cachingHiveMetastore is null"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it make sense to try to delegate to Hive's flush procedure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to delegate to the connector at first, but the code was a little redundant. Let me merge as-is. I will take another look later.
import static io.trino.tests.product.utils.QueryExecutors.onTrino; | ||
import static java.lang.String.format; | ||
|
||
public class TestDeltaLakeActiveFilesCache |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need a product test?
maybe file: -based tests are enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FileHiveMetastore
has the additional cache with listTablesCache
#13115. It requires a refactoring to disable the cache (the duration is hard-coded now) or wait until the internal cache is expired.
Additionally, replace toUnmodifiableList with toImmutableList.
7892030
to
118e67d
Compare
Rebased on master to resolve conflicts. |
c6ae7f8
to
5bb3280
Compare
The procedure was unusable because Iceberg connector always disables caching metastore.
Co-Authored-By: Marius Grama <findinpath@gmail.com>
5bb3280
to
3f88633
Compare
Description
Relates to #13737
Release notes
(x) Release notes are required, with the following suggested text: