Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use metastore locking around read-modify-write operations for transaction commit #9584

Merged
merged 8 commits into from
Oct 15, 2021

Conversation

findepi
Copy link
Member

@findepi findepi commented Oct 11, 2021

Fixes #9583

{
this.fileIo = requireNonNull(fileIo, "fileIo is null");
this.metastore = requireNonNull(metastore, "metastore is null");
this.session = requireNonNull(session, "session is null");
this.thriftMetastore = requireNonNull(thriftMetastore, "thriftMetastore is null");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This breaks abstractions, since the class was supposed to be dependent on HiveMetastore only.
This is intentional though, because Thrift HMS and Glue will use different locking protocol, and so we will have different implementations of table operations. This will be made simpler if we flatten configuration, ie remove hive.metastore as config from Iceberg and configure everything via iceberg.catalog.type (#9577)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, I wonder if we can just name this class as ThriftHiveTableOperations, where we can directly guarantee the existence of Thrift metastore, and for file metastore we don't need the stub, and we can have FileHiveTableOperations that inherit the same base class for all the shared functionalities.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. I knew this refactor is imminent, but i didn't want to do this yet (understood you may be working on this already). Anyway, since you asked, i did that.

Copy link
Member

@losipiuk losipiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@findepi findepi force-pushed the findepi/iceberg-hms-lock branch 2 times, most recently from 42b3544 to 2efe784 Compare October 11, 2021 11:23
@findepi
Copy link
Member Author

findepi commented Oct 11, 2021

CI #9300

@findepi findepi force-pushed the findepi/iceberg-hms-lock branch from 2efe784 to c3e5174 Compare October 11, 2021 13:32
session.getQueryId(),
database,
tableName));
Table currentTable = fromMetastoreApiTable(thriftMetastore.getTable(identity, database, tableName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not notice originally that we are passing thriftMetastore for sake of obtaining "fresh" table here.
Maybe cleaner approach would be to extend HiveMetastore interface to have getTableUncached. Then for all but CachingHiveMetastore the method would just return getTable and for CachingHiveMetastore we would go straight to the backend.
Then you would not need to pass both HiveMetastore and ThriftMetastore to HiveTableOperations

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then you would not need to pass both HiveMetastore and ThriftMetastore to HiveTableOperations

For that you would also need to expose acquireTableExclusiveLock via HiveMetastore interface. But it is not bad I think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then you would not need to pass both HiveMetastore and ThriftMetastore to HiveTableOperations

To alleviate the need for this, i would also need to expose table lock/unlock functions in all HiveMetastore layers, which seems working against the abstraction we try to defend.

@findepi findepi force-pushed the findepi/iceberg-hms-lock branch from b5bf534 to 748f4ac Compare October 12, 2021 13:06
{
this.fileIo = requireNonNull(fileIo, "fileIo is null");
this.metastore = requireNonNull(metastore, "metastore is null");
this.session = requireNonNull(session, "session is null");
this.thriftMetastore = requireNonNull(thriftMetastore, "thriftMetastore is null");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, I wonder if we can just name this class as ThriftHiveTableOperations, where we can directly guarantee the existence of Thrift metastore, and for file metastore we don't need the stub, and we can have FileHiveTableOperations that inherit the same base class for all the shared functionalities.

@findepi findepi force-pushed the findepi/iceberg-hms-lock branch from 748f4ac to 4ad29cf Compare October 13, 2021 20:59
@findepi findepi force-pushed the findepi/iceberg-hms-lock branch from 4ad29cf to 730769b Compare October 13, 2021 21:04
This replaces `HiveTableOperations` with `HiveMetastoreTableOperations`
and `FileMetastoreTableOperations` suitable for `HIVE_METASTORE` and
`TESTING_FILE_METASTORE` Iceberg catalogs respectively, along with
necessary interfaces.
@findepi findepi force-pushed the findepi/iceberg-hms-lock branch from 730769b to fdb18e7 Compare October 13, 2021 21:12
Copy link
Member

@jackye1995 jackye1995 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks good to me!

@@ -224,44 +224,7 @@ protected void commitNewTable(TableMetadata metadata)
metastore.createTable(identity, table, privileges);
}

protected void commitToExistingTable(TableMetadata base, TableMetadata metadata)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would be cool to have a separate commit which just inlines this into subclasses. And the do modifications in following commit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks that FileMetastoreTableOperations is just inline and in Hive version there is locking added. But it is hard to see exaclty.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracted for visibility. Hive version looks like a bigger change, because nesting changed.

Copy link
Member

@losipiuk losipiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

{
IcebergTableOperations createTableOperations(
HiveMetastore hiveMetastore,
HdfsContext hdfsContext,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hdfsContext and queryId are all a part of the session, I think we can remove them from the interface.
HiveMetastore can be a part of the injected dependency, so we can probably also remove that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds like potential follow up candidates, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, i misunderstood you, sorry. -> #9661

In the following commit, the implementation will diverge.
@findepi findepi force-pushed the findepi/iceberg-hms-lock branch from 37e33bc to 85b8c6f Compare October 15, 2021 15:12
Previously, no locking was applied when writing Iceberg data, thus in
case of concurrent writes (from same cluster, from multiple Trino
clusters, or from different applications) successfully committed data
could get made unreachable by a concurrent transaction's commit.

This behavior is illustrated with a test being added here. Before the
fix, the writes would always succeed, but the part of written data would
not be visible in the final table state.
@findepi findepi force-pushed the findepi/iceberg-hms-lock branch from 85b8c6f to 005c74c Compare October 15, 2021 15:37
@findepi
Copy link
Member Author

findepi commented Oct 15, 2021

Added additional test in TestIcebergSparkCompatibility.

@findepi
Copy link
Member Author

findepi commented Oct 15, 2021

CI #9658

@findepi findepi merged commit bff0931 into trinodb:master Oct 15, 2021
@findepi findepi mentioned this pull request Oct 15, 2021
12 tasks
@findepi findepi deleted the findepi/iceberg-hms-lock branch October 15, 2021 19:43
@github-actions github-actions bot added this to the 364 milestone Oct 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

Data loss due to lack of commit orchestration in Iceberg
3 participants