Fail Iceberg queries against v2 tables with row level deletes #8450

alexjo2144 · 2021-07-01T16:00:39Z

The v2 specification is not final but some writers are already adding support for it. For now, ensure that any tables with the new format cannot be queried.

#7226

hashhar

Do we want to fail with v2 tables or v2 tables with row-level deletes only?

Are there any known writers that write v2 tables (I can see PRs for read-support for v2 tables in Iceberg but not for writes)? cc: @rdblue @phd3

cc: @losipiuk

hashhar · 2021-07-02T08:17:28Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java

@@ -247,7 +247,14 @@ public IcebergTableHandle getTableHandle(ConnectorSession session, SchemaTableNa
            throw new UnknownTableTypeException(tableName);
        }

-        org.apache.iceberg.Table table = getIcebergTable(session, hiveTable.get().getSchemaTableName());
+        TableMetadata metadata = tableMetadataCache.computeIfAbsent(


Why not change getIcerbergTable itself if the goal is to prevent working with v2 tables?

Good point. I'm not sure why I had it set in my head that this needed to go here. Thanks

hashhar · 2021-07-02T08:18:04Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplitSource.java

@@ -87,6 +89,11 @@ private ConnectorSplit toIcebergSplit(FileScanTask task)
        //       The predicate here is used by readers for predicate push down at reader level,
        //       so when we do not use residual expression, we are just wasting CPU cycles
        //       on reader side evaluating a condition that we know will always be true.
+        if (!task.deletes().isEmpty()) {


IMO this should be the only change in this PR.

Disallowing v2 tables even if there are no row-level deletes applied sounds a bit extreme.

Yeah - I agree. I would very much prefer to only fail tables with delete markers.

I guess this is more tricky to write test though. Does spark already support row-level deletes? If so we can use PT env we already have I think.

I guess my thought is that with the spec not being final there's a chance other breaking changes will get added that this change won't catch. I don't know if there's still active development going on with the spec though.

It makes sense to let other v2 tables go through. We can probably add spark compatibility tests against two spec versions to discover breaking issues if any, once v2 is final.

On a quick look at Iceberg code, I think we should be able to leverage table.newRowDelta API for testing. this indicates that adding extensions should allow performing Row level deletes with spark 3, but haven't verified it myself. cc @electrum if you've more context around the status of row-delete writes through other engines.

rdblue · 2021-07-02T23:55:19Z

Do we want to fail with v2 tables or v2 tables with row-level deletes only?

I recommend failing at planning time if any FileScanTask has a delete file that needs to be applied. That's what we used to do in other engines before adding read support.

Are there any known writers that write v2 tables (I can see PRs for read-support for v2 tables in Iceberg but not for writes)?

Right now, the library requires you to manually update a table to v2, but 0.11.0 will successfully write deltas to such a table. The only engine that has support is Flink, though. So to find v2 tables right now, someone would have to manually update a table to accept deltas and then use either Flink or encode deltas programmatically and commit them to the table.

alexjo2144 · 2021-07-06T19:03:14Z

Thanks for the input, updated the changes to just fail for table scans that include deletes.

I did have to change the IcebergSplitSource slightly to have all of the FileScanTasks upfront instead of streaming/iterating over them as we go. This was just to make sure that we catch a delete first before starting work on a query.

losipiuk · 2021-07-06T19:37:37Z

losipiuk · 2021-07-06T19:49:09Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplitManager.java

@@ -71,8 +80,21 @@ public ConnectorSplitSource getSplits(

        // TODO Use residual. Right now there is no way to propagate residual to Trino but at least we can
        //      propagate it at split level so the parquet pushdown can leverage it.
-        IcebergSplitSource splitSource = new IcebergSplitSource(tableScan.planTasks());
+        ImmutableList.Builder<FileScanTask> fileScanTasks = ImmutableList.builder();


This change makes splits materialized eagerly during planning. While previously (IIUC) they were listed lazilly as the execution went.

I would prefer to not change that and fail only if we encounter the file which contains deletes as we go.

It may be later during the query, which is a somewhat worse experience for the user when they try to read V2 table with deletes. But I would argue that we should not optimize for nicer handling of unsupported tables if that hinders the typical case.

I agree with this. Iterating through the result of planTasks will perform planning separately just to validate there are no delete files, which is almost certainly something you don't want to do.

Gotcha, thanks. I didn't have a good sense of how much work was being done lazily in the CloseableIterable<CombinedScanTask>. I'll switch it back.

rdblue · 2021-07-06T20:13:03Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplitSource.java

-                .map(CombinedScanTask::files)
-                .flatMap(Collection::stream)
-                .iterator();
+        this.fileScanIterator = requireNonNull(fileScanTasks, "fileScanTasks is null").iterator();


@phd3, it looks odd to me that the code here was previously getting all of the file scan tasks, which basically defeats the purpose of using the combined task iterable. Can you give me some background on what's going on here?

That said, I think that this is where the check would go in the older code:

this.fileScanIterator = Streams.stream(combinedScanIterable) .map(CombinedScanTask::files) .flatMap(Collection::stream) .map(file -> { if (!file.deletes().isEmpty()) { throw new TrinoException(...); } return file; }) .iterator();

Do you mean we should be assigning the whole CombinedScanTask as one split instead of undoing the balancing work by getting individual files from it?

In the current implementation, while some FileScanTasks can be much smaller, I think the usage of CombinedScanIterable helps provide an upper bound on split size, as opposed to the alternative planFiles. @electrum may know more if there was a previous discussion around this.

(if this is indeed what you're referring to) the # of pending splits are also considered while assigning new splits to Trino's tasks. So I think the imbalance caused by ignoring the CombinedScanTask grouping would be diluted, since the tasks finishing small splits would become eligible for getting assigned some new ones faster.

Initially Iceberg connector's data reader heavily reused hive connector code, so it's likely another reason that we kept using the same model of scanning one part of one file for one Trino split. ( Please ignore this if I'm way off base here from your point : ) )

Yes, I think you understood what I was asking. Using the combined tasks means that files will be both split and combined. It seems a little weird to undo combining small tasks into larger ones. You're right that this does provide an upper bound on the split sizes, but I don't know why you would want to remove combining splits across files.

Maybe we can move this over to an Issue? It would require changing the Split format I'd rather do it in a separate PR.

#8486

currently, one FileScanTask corresponds to one Trino split, and every Trino task keeps getting assigned more splits based on available capacity. IIUC, assigning a CombinedScanTask to a split means that Trino's iceberg reader will need to scan multiple files for every split, which is a bit different from the way things are modeled currently.

Okay, so it sounds like Trino handles split combining on its own dynamically. Is that right? If so, then this makes sense. You could also avoid some work by not combining, but it probably doesn't matter much.

yes. I'd guess that using combined tasks wouldn't provide a lot of improvement over the current approach.

You could also avoid some work by not combining

I didn't see any TableScan method though that avoids combining, but still divides huge files. I guess we'd need to do that in Trino after using planFiles.

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/IcebergQueryRunner.java

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplitSource.java

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV2.java

phd3 · 2021-07-07T02:49:59Z

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV2.java

+        HdfsConfiguration configuration = new HiveHdfsConfiguration(new HdfsConfigurationInitializer(config), ImmutableSet.of());
+        hdfsEnvironment = new HdfsEnvironment(configuration, config, new NoHdfsAuthentication());
+
+        metastore = new FileHiveMetastore(


It'd better to create the directory path here and use it to initialize both the queryRunner and everywhere else in tests. Also, there's a createTestingFileHiveMetastore to help with the boilerplate.

I ran into one problem with this, that the NodeVersion used in createTestingFileHiveMetastore doesn't match the one used in TestingTrinoServer so queries would fail with an incompatible version error in FileHiveMetastore. I changed the versions to match in the first commit as a fix.

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV2.java

phd3 · 2021-07-07T03:44:36Z

thanks @alexjo2144, just added some comments w.r.t. the test.

If FileHiveMetastore#createTestingFileHiveMetastore is used along with TestingTrinoServer the version check would fail.

alexjo2144 · 2021-07-07T18:32:39Z

Comments addressed

phd3

LGTM % some comments.

This will cause non-deterministic success/failure for LIMIT queries, but I think that's still better than the alternative of eager task loading.

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV2.java

phd3 · 2021-07-08T19:05:39Z

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplitSource.java

@@ -59,6 +61,9 @@ public IcebergSplitSource(CloseableIterable<CombinedScanTask> combinedScanIterab
        Iterator<FileScanTask> iterator = limit(fileScanIterator, maxSize);
        while (iterator.hasNext()) {
            FileScanTask task = iterator.next();
+            if (!task.deletes().isEmpty()) {
+                throw new TrinoException(NOT_SUPPORTED, "Iceberg tables with delete files are not supported");


missed this earlier, a SchemaTableName would be good to add here in the error message.

The v2 specification is not final but some writers are already adding support for it. For now, ensure that any tables with the new row level delete format cannot be queried.

alexjo2144 · 2021-07-08T21:16:38Z

Updated, thanks for the review

findepi · 2021-07-12T12:19:32Z

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV2.java

+    {
+        String tableName = "test_v2_table_read" + randomTableSuffix();
+        assertUpdate("CREATE TABLE " + tableName + " AS SELECT * FROM tpch.tiny.nation", 25);
+        updateTableToV2(tableName);


As a follow up, i would love to see a product test with Spark Iceberg for this too.

We can also add the version info in one of our system tables to make it easier to verify that Spark indeed writes v2.

phd3 · 2021-07-12T13:24:44Z

Merged, thanks @alexjo2144!

cla-bot bot added the cla-signed label Jul 1, 2021

alexjo2144 force-pushed the iceberg/delete-markers branch from 1db410a to 125bcd6 Compare July 1, 2021 16:09

alexjo2144 requested a review from hashhar July 1, 2021 16:15

alexjo2144 force-pushed the iceberg/delete-markers branch from 125bcd6 to 73f19ae Compare July 1, 2021 16:45

hashhar reviewed Jul 2, 2021

View reviewed changes

alexjo2144 force-pushed the iceberg/delete-markers branch from 87982c3 to a8c9c57 Compare July 6, 2021 18:57

alexjo2144 changed the title ~~Fail Iceberg queries against v2 tables~~ Fail Iceberg queries against v2 tables with row level deletes Jul 6, 2021

alexjo2144 force-pushed the iceberg/delete-markers branch from a8c9c57 to 914bd9a Compare July 6, 2021 18:59

losipiuk reviewed Jul 6, 2021

View reviewed changes

rdblue reviewed Jul 6, 2021

View reviewed changes

alexjo2144 force-pushed the iceberg/delete-markers branch from 914bd9a to 10f8a31 Compare July 6, 2021 20:56

rdblue reviewed Jul 6, 2021

View reviewed changes

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/IcebergQueryRunner.java Outdated Show resolved Hide resolved

rdblue reviewed Jul 6, 2021

View reviewed changes

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergSplitSource.java Outdated Show resolved Hide resolved

phd3 reviewed Jul 7, 2021

View reviewed changes

Fix FileHiveMetastore test version to match TestingTrinoServer

fa8248e

If FileHiveMetastore#createTestingFileHiveMetastore is used along with TestingTrinoServer the version check would fail.

alexjo2144 force-pushed the iceberg/delete-markers branch from 10f8a31 to d3566c8 Compare July 7, 2021 16:03

alexjo2144 mentioned this pull request Jul 7, 2021

IcebergSplitSource throws away CombinedScanTask combinations #8486

Open

alexjo2144 force-pushed the iceberg/delete-markers branch from d3566c8 to ef2504f Compare July 8, 2021 15:06

phd3 approved these changes Jul 8, 2021

View reviewed changes

Fail Iceberg queries against v2 tables with row level deletes

2295b40

The v2 specification is not final but some writers are already adding support for it. For now, ensure that any tables with the new row level delete format cannot be queried.

alexjo2144 force-pushed the iceberg/delete-markers branch from ef2504f to 2295b40 Compare July 8, 2021 20:47

findepi approved these changes Jul 12, 2021

View reviewed changes

phd3 approved these changes Jul 12, 2021

View reviewed changes

phd3 merged commit 47301fb into trinodb:master Jul 12, 2021

alexjo2144 deleted the iceberg/delete-markers branch July 12, 2021 13:31

phd3 mentioned this pull request Jul 27, 2021

Release notes for 360 #8455

Closed

11 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fail Iceberg queries against v2 tables with row level deletes #8450

Fail Iceberg queries against v2 tables with row level deletes #8450

alexjo2144 commented Jul 1, 2021

hashhar left a comment

hashhar Jul 2, 2021

alexjo2144 Jul 2, 2021

hashhar Jul 2, 2021

losipiuk Jul 2, 2021

losipiuk Jul 2, 2021

alexjo2144 Jul 2, 2021

phd3 Jul 2, 2021

rdblue commented Jul 2, 2021

alexjo2144 commented Jul 6, 2021

losipiuk commented Jul 6, 2021

losipiuk Jul 6, 2021

rdblue Jul 6, 2021

alexjo2144 Jul 6, 2021

rdblue Jul 6, 2021 •

edited

Loading

phd3 Jul 7, 2021

phd3 Jul 7, 2021

phd3 Jul 7, 2021

rdblue Jul 7, 2021

alexjo2144 Jul 7, 2021

phd3 Jul 8, 2021

rdblue Jul 8, 2021

phd3 Jul 8, 2021

phd3 Jul 7, 2021

alexjo2144 Jul 7, 2021

phd3 commented Jul 7, 2021

alexjo2144 commented Jul 7, 2021

phd3 left a comment

phd3 Jul 8, 2021

alexjo2144 commented Jul 8, 2021

findepi Jul 12, 2021

phd3 Jul 12, 2021

phd3 commented Jul 12, 2021

Fail Iceberg queries against v2 tables with row level deletes #8450

Fail Iceberg queries against v2 tables with row level deletes #8450

Conversation

alexjo2144 commented Jul 1, 2021

hashhar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rdblue commented Jul 2, 2021

alexjo2144 commented Jul 6, 2021

losipiuk commented Jul 6, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rdblue Jul 6, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

phd3 commented Jul 7, 2021

alexjo2144 commented Jul 7, 2021

phd3 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alexjo2144 commented Jul 8, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

phd3 commented Jul 12, 2021

rdblue Jul 6, 2021 •

edited

Loading