-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase query timeout in testDeleteRowsConcurrently #20328
Increase query timeout in testDeleteRowsConcurrently #20328
Conversation
@@ -156,7 +157,9 @@ public void testDeleteRowsConcurrently() | |||
.collect(toImmutableList()); | |||
|
|||
Stream<Optional<String>> expectedRows = Streams.mapWithIndex(futures.stream(), (future, index) -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's rather reduce the concurrency
nThreads = 3
TestTable table = new TestTable(
getQueryRunner()::execute,
"test_concurrent_delete",
"(col0 INTEGER, col1 INTEGER, col2 INTEGER)")) {
The way this test is conceived, it can get into a metastore replacement operation failure spiral and there's nothing that Trino (at the moment) can do about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is "metastore replacement operation failure spiral"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potentially trying to update repeatedly the metastore entry at the same time repeatedly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how long can it reasonably take for 4 queries? and how long in extreme case? what would be safe test timeout?
how does the number look like with 3 queries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#13995 states
Error: io.trino.plugin.iceberg.TestIcebergGcsConnectorSmokeTest.testDeleteRowsConcurrently Time elapsed: 120.014 s <<< FAILURE!
so - in some peculiar cases it seems to take more than 2 minutes. i don't see any value in investing over 2 minutes to test this functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in that issue there are two failure modes
overall test timeout after 120s. last known occurrence April 2023 #13995 (comment)
timeout when waiuting for individual query to complete (increaeased here) "NoSuchElementException: No value present" -- last known occurence Aug 2023 #13995 (comment) and today
this is this second problem that i am trying to change
i don't feel strongly though, if you have a better idea how to help this test fail less frequently
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. let's fix the second problem.
if the timeout will happen again we can reduce the concurrency.
60664a2
to
dbe9067
Compare
CI #20326 |
Maybe fixes #13995