-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not use metadata deletes during Iceberg writes #18533
Conversation
78528c3
to
d38f2b7
Compare
4cd45f7
to
b45178b
Compare
some tests have failed |
@@ -154,8 +154,8 @@ public void testAnalyzeWithSchemaEvolution() | |||
VALUES | |||
('nationkey', null, 25, 0, null, '0', '24'), | |||
('regionkey', null, 5, 0, null, '0', '4'), | |||
('name', 1314.0, 25, 0, null, null, null), | |||
('info', 4417.0, null, 0, null, null, null), | |||
('name', 1907.9999999999998, 25, 0, null, null, null), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That does not look like reliably repeatable result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's also different when I run the test locally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it mean the test doesn't pass locally, or is flaky?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW double results are asserted with rounding. you should probably have 1908.0 instead of 1907.9999999999998
(locally the test passed for me with both values; i would use the shorter one)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it mean the test doesn't pass locally, or is flaky?
Consistently fails. We chatted about this test on Slack the other day
c6df7b5
to
4829f2b
Compare
@findepi last build was green, just squashed |
@@ -154,8 +154,8 @@ public void testAnalyzeWithSchemaEvolution() | |||
VALUES | |||
('nationkey', null, 25, 0, null, '0', '24'), | |||
('regionkey', null, 5, 0, null, '0', '4'), | |||
('name', 1314.0, 25, 0, null, null, null), | |||
('info', 4417.0, null, 0, null, null, null), | |||
('name', 1907.9999999999998, 25, 0, null, null, null), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it mean the test doesn't pass locally, or is flaky?
@@ -154,8 +154,8 @@ public void testAnalyzeWithSchemaEvolution() | |||
VALUES | |||
('nationkey', null, 25, 0, null, '0', '24'), | |||
('regionkey', null, 5, 0, null, '0', '4'), | |||
('name', 1314.0, 25, 0, null, null, null), | |||
('info', 4417.0, null, 0, null, null, null), | |||
('name', 1907.9999999999998, 25, 0, null, null, null), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW double results are asserted with rounding. you should probably have 1908.0 instead of 1907.9999999999998
(locally the test passed for me with both values; i would use the shorter one)
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergStatistics.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV2.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/TestIcebergV2.java
Outdated
Show resolved
Hide resolved
@@ -124,28 +128,38 @@ public void testDeleteRowsConcurrently() | |||
int threads = 4; | |||
CyclicBarrier barrier = new CyclicBarrier(threads); | |||
ExecutorService executor = newFixedThreadPool(threads); | |||
List<String> rows = ImmutableList.of("(1, 0, 0, 0)", "(0, 1, 0, 0)", "(0, 0, 1, 0)", "(0, 0, 0, 1)"); | |||
|
|||
String[] expectedErrors = new String[]{"Failed to commit Iceberg update to table:", "Failed to replace table due to concurrent updates:"}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
verifyConcurrentUpdateFailurePermissible
allows "Failed to commit Iceberg update to table:"
only. is the latter possible as well?
assertUpdate("INSERT INTO " + tableName + " VALUES (1, 0, 0, 0)", 1); | ||
assertUpdate("INSERT INTO " + tableName + " VALUES (0, 1, 0, 0)", 1); | ||
assertUpdate("INSERT INTO " + tableName + " VALUES (0, 0, 1, 0)", 1); | ||
assertUpdate("INSERT INTO " + tableName + " VALUES (0, 0, 0, 1)", 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test created 5 files, now it creates ~1 file with 4 rows.
If this doesn't matter for the test purpose, let's do this change as a separate commit.
if this matters and should always be so (~1 file), let's do this as a separate (prep) commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It used to matter but doesn't any more. One line per file guaranteed that we always used metadata deletes in this test but now we no longer use those.
@@ -1991,6 +1992,7 @@ public void testPartitionPredicatePushdownWithHistoricalPartitionSpecs() | |||
|
|||
// The old partition scheme is no longer used so pushdown using the hour transform is allowed | |||
assertUpdate("DELETE FROM " + tableName + " WHERE year(d) = 1969", 3); | |||
assertUpdate("ALTER TABLE " + tableName + " EXECUTE optimize"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why change this test?
(separate commit?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test relied on the behavior that files were fully removed from the manifest during the delete operation, but now they are not until optimize is run.
@@ -1378,6 +1378,7 @@ public void testUpdateWithSortOrder() | |||
"test_sorted_update", | |||
"WITH (sorted_by = ARRAY['comment']) AS TABLE tpch.tiny.lineitem WITH NO DATA")) { | |||
assertUpdate( | |||
withSmallRowGroups, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why change this test?
(separate commit?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The data created by the insert used to not be checked for if it was sorted because those files were removed from the manifest. Now they are checked for sorting, along with the files added by the update operations.
transaction = null; | ||
} | ||
|
||
private static boolean fileIsFullyDeleted(List<CommitTaskData> positionDeletes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😢 i will miss you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it will rise from the ashes again...
4829f2b
to
437c1a5
Compare
The DeleteFiles operation does not validate conflicts like RowDelta does. In the event a file is removed during a write operation this can result in a update operation being reflected as an insert. As follow up this optimization could be put back using the OverwriteFiles API.
437c1a5
to
71040d4
Compare
Description
This is an alternative to #18412 but with more minimal changes.
Additional context and related issues
Fixes: #18393
Release notes
( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text: