Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase query timeout in testDeleteRowsConcurrently #20328

Conversation

findepi
Copy link
Member

@findepi findepi commented Jan 10, 2024

Maybe fixes #13995

@cla-bot cla-bot bot added the cla-signed label Jan 10, 2024
@findepi findepi added test no-release-notes This pull request does not require release notes entry and removed cla-signed labels Jan 10, 2024
@@ -156,7 +157,9 @@ public void testDeleteRowsConcurrently()
.collect(toImmutableList());

Stream<Optional<String>> expectedRows = Streams.mapWithIndex(futures.stream(), (future, index) -> {
Copy link
Contributor

@findinpath findinpath Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rather reduce the concurrency

nThreads = 3

TestTable table = new TestTable(
                getQueryRunner()::execute,
                "test_concurrent_delete",
                "(col0 INTEGER, col1 INTEGER, col2 INTEGER)")) {

The way this test is conceived, it can get into a metastore replacement operation failure spiral and there's nothing that Trino (at the moment) can do about it.

See https://github.com/apache/iceberg/blob/f8d21116b1990ff0f7d7960a0f41f3a807756141/core/src/main/java/org/apache/iceberg/BaseTransaction.java#L340-L353

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is "metastore replacement operation failure spiral"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially trying to update repeatedly the metastore entry at the same time repeatedly

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how long can it reasonably take for 4 queries? and how long in extreme case? what would be safe test timeout?
how does the number look like with 3 queries?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#13995 states

Error:  io.trino.plugin.iceberg.TestIcebergGcsConnectorSmokeTest.testDeleteRowsConcurrently  Time elapsed: 120.014 s  <<< FAILURE!

so - in some peculiar cases it seems to take more than 2 minutes. i don't see any value in investing over 2 minutes to test this functionality.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in that issue there are two failure modes
overall test timeout after 120s. last known occurrence April 2023 #13995 (comment)
timeout when waiuting for individual query to complete (increaeased here) "NoSuchElementException: No value present" -- last known occurence Aug 2023 #13995 (comment) and today

this is this second problem that i am trying to change

i don't feel strongly though, if you have a better idea how to help this test fail less frequently

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. let's fix the second problem.

if the timeout will happen again we can reduce the concurrency.

@github-actions github-actions bot added the iceberg Iceberg connector label Jan 10, 2024
@cla-bot cla-bot bot added the cla-signed label Jan 11, 2024
@findepi
Copy link
Member Author

findepi commented Jan 11, 2024

CI #20326, #20349

@github-actions github-actions bot added tests:hive hive Hive connector labels Jan 11, 2024
@findepi findepi force-pushed the findepi/increase-query-timeout-in-testdeleterowsconcurrently-f9506a branch from 60664a2 to dbe9067 Compare January 11, 2024 19:33
@findepi
Copy link
Member Author

findepi commented Jan 12, 2024

CI #20326

@findepi findepi merged commit 035806c into master Jan 12, 2024
63 of 64 checks passed
@findepi findepi deleted the findepi/increase-query-timeout-in-testdeleterowsconcurrently-f9506a branch January 12, 2024 08:28
@github-actions github-actions bot added this to the 437 milestone Jan 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed hive Hive connector iceberg Iceberg connector no-release-notes This pull request does not require release notes entry test
Development

Successfully merging this pull request may close these issues.

Flaky TestIcebergGcsConnectorSmokeTest.testDeleteRowsConcurrently
3 participants