-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use metastore locking around read-modify-write operations for transaction commit #9584
Conversation
{ | ||
this.fileIo = requireNonNull(fileIo, "fileIo is null"); | ||
this.metastore = requireNonNull(metastore, "metastore is null"); | ||
this.session = requireNonNull(session, "session is null"); | ||
this.thriftMetastore = requireNonNull(thriftMetastore, "thriftMetastore is null"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This breaks abstractions, since the class was supposed to be dependent on HiveMetastore only.
This is intentional though, because Thrift HMS and Glue will use different locking protocol, and so we will have different implementations of table operations. This will be made simpler if we flatten configuration, ie remove hive.metastore
as config from Iceberg and configure everything via iceberg.catalog.type
(#9577)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, I wonder if we can just name this class as ThriftHiveTableOperations
, where we can directly guarantee the existence of Thrift metastore, and for file metastore we don't need the stub, and we can have FileHiveTableOperations
that inherit the same base class for all the shared functionalities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. I knew this refactor is imminent, but i didn't want to do this yet (understood you may be working on this already). Anyway, since you asked, i did that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
42b3544
to
2efe784
Compare
CI #9300 |
2efe784
to
c3e5174
Compare
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/HiveTableOperations.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV2.java
Outdated
Show resolved
Hide resolved
session.getQueryId(), | ||
database, | ||
tableName)); | ||
Table currentTable = fromMetastoreApiTable(thriftMetastore.getTable(identity, database, tableName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not notice originally that we are passing thriftMetastore
for sake of obtaining "fresh" table here.
Maybe cleaner approach would be to extend HiveMetastore
interface to have getTableUncached
. Then for all but CachingHiveMetastore
the method would just return getTable
and for CachingHiveMetastore
we would go straight to the backend.
Then you would not need to pass both HiveMetastore
and ThriftMetastore
to HiveTableOperations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then you would not need to pass both HiveMetastore and ThriftMetastore to HiveTableOperations
For that you would also need to expose acquireTableExclusiveLock
via HiveMetastore
interface. But it is not bad I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then you would not need to pass both
HiveMetastore
andThriftMetastore
toHiveTableOperations
To alleviate the need for this, i would also need to expose table lock/unlock functions in all HiveMetastore layers, which seems working against the abstraction we try to defend.
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/thrift/ThriftHiveMetastore.java
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/HiveTableOperations.java
Outdated
Show resolved
Hide resolved
testing/trino-product-tests/src/main/java/io/trino/tests/product/iceberg/TestIcebergInsert.java
Outdated
Show resolved
Hide resolved
testing/trino-product-tests/src/main/java/io/trino/tests/product/iceberg/TestIcebergInsert.java
Outdated
Show resolved
Hide resolved
b5bf534
to
748f4ac
Compare
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/thrift/ThriftMetastore.java
Outdated
Show resolved
Hide resolved
{ | ||
this.fileIo = requireNonNull(fileIo, "fileIo is null"); | ||
this.metastore = requireNonNull(metastore, "metastore is null"); | ||
this.session = requireNonNull(session, "session is null"); | ||
this.thriftMetastore = requireNonNull(thriftMetastore, "thriftMetastore is null"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, I wonder if we can just name this class as ThriftHiveTableOperations
, where we can directly guarantee the existence of Thrift metastore, and for file metastore we don't need the stub, and we can have FileHiveTableOperations
that inherit the same base class for all the shared functionalities.
748f4ac
to
4ad29cf
Compare
4ad29cf
to
730769b
Compare
This replaces `HiveTableOperations` with `HiveMetastoreTableOperations` and `FileMetastoreTableOperations` suitable for `HIVE_METASTORE` and `TESTING_FILE_METASTORE` Iceberg catalogs respectively, along with necessary interfaces.
730769b
to
fdb18e7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall looks good to me!
@@ -224,44 +224,7 @@ protected void commitNewTable(TableMetadata metadata) | |||
metastore.createTable(identity, table, privileges); | |||
} | |||
|
|||
protected void commitToExistingTable(TableMetadata base, TableMetadata metadata) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: would be cool to have a separate commit which just inlines this into subclasses. And the do modifications in following commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks that FileMetastoreTableOperations is just inline and in Hive version there is locking added. But it is hard to see exaclty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extracted for visibility. Hive version looks like a bigger change, because nesting changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
{ | ||
IcebergTableOperations createTableOperations( | ||
HiveMetastore hiveMetastore, | ||
HdfsContext hdfsContext, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hdfsContext
and queryId
are all a part of the session, I think we can remove them from the interface.
HiveMetastore
can be a part of the injected dependency, so we can probably also remove that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds like potential follow up candidates, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, i misunderstood you, sorry. -> #9661
In the following commit, the implementation will diverge.
37e33bc
to
85b8c6f
Compare
Previously, no locking was applied when writing Iceberg data, thus in case of concurrent writes (from same cluster, from multiple Trino clusters, or from different applications) successfully committed data could get made unreachable by a concurrent transaction's commit. This behavior is illustrated with a test being added here. Before the fix, the writes would always succeed, but the part of written data would not be visible in the final table state.
85b8c6f
to
005c74c
Compare
Added additional test in |
CI #9658 |
Fixes #9583