Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent failure of FileTests #3584

Closed
jamesnetherton opened this issue Mar 2, 2022 · 23 comments
Closed

Intermittent failure of FileTests #3584

jamesnetherton opened this issue Mar 2, 2022 · 23 comments
Assignees
Labels
Milestone

Comments

@jamesnetherton
Copy link
Contributor

Regular offender for CI failures:

2022-03-01T12:54:16.4609416Z org.awaitility.core.ConditionTimeoutException: org.apache.camel.quarkus.component.file.it.FileTest expected "Hello1" but was "" within 10 seconds.
2022-03-01T12:54:16.4610283Z 	at org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:164)
2022-03-01T12:54:16.4611019Z 	at org.awaitility.core.AbstractHamcrestCondition.await(AbstractHamcrestCondition.java:86)
2022-03-01T12:54:16.4612118Z 	at org.awaitility.core.ConditionFactory.until(ConditionFactory.java:939)
2022-03-01T12:54:16.4612630Z 	at org.awaitility.core.ConditionFactory.until(ConditionFactory.java:645)
2022-03-01T12:54:16.4613190Z 	at org.apache.camel.quarkus.component.file.it.FileTest.idempotent(FileTest.java:137)
2022-03-01T12:54:16.4613758Z 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
2022-03-01T12:54:16.4614376Z 	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
@aldettinger aldettinger self-assigned this Mar 7, 2022
@aldettinger
Copy link
Contributor

The same kind of issue happens in FileTest.filter:

org.awaitility.core.ConditionTimeoutException: org.apache.camel.quarkus.component.file.it.FileTest expected "Hello2" but was "" within 10 seconds.
	at org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:164)
	at org.awaitility.core.AbstractHamcrestCondition.await(AbstractHamcrestCondition.java:86)
	at org.awaitility.core.ConditionFactory.until(ConditionFactory.java:939)
	at org.awaitility.core.ConditionFactory.until(ConditionFactory.java:645)
	at org.apache.camel.quarkus.component.file.it.FileTest.filter(FileTest.java:170)

After investigations, the race condition could be made more probable with manual changes below:

         String result = mockEndpoint.getExchanges().stream().map(e -> e.getIn().getBody(String.class))
                 .collect(Collectors.joining(SEPARATOR));
+        Log.info("Calling getFromMock/" + mockId + " result = " + result.length());
 
-        mockEndpoint.reset();
+        Thread.sleep(600);
+        //mockEndpoint.reset();
 
         return result;
     }
diff --git a/integration-tests/file/src/main/java/org/apache/camel/quarkus/component/file/it/FileRoutes.java b/integration-tests/file/src/main/java/org/apache/camel/quarkus/component/file/it/FileRoutes.java
index 891339c..b03ab4a 100644
--- a/integration-tests/file/src/main/java/org/apache/camel/quarkus/component/file/it/FileRoutes.java
+++ b/integration-tests/file/src/main/java/org/apache/camel/quarkus/component/file/it/FileRoutes.java
@@ -67,7 +67,7 @@ public class FileRoutes extends RouteBuilder {
                 .convertBodyTo(String.class).to("mock:idempotent");
 
         bindToRegistry("myFilter", new MyFileFilter<>());
-        from(("file://target/filter?initialDelay=0&delay=10&filter=#myFilter"))
+        from(("file://target/filter?initialDelay=2500&delay=10&filter=#myFilter"))

At the end of the day, it could happen that result has length 0, right after the mock receive an exchange, but then mockEndpoint.reset() is called, thus clearing the received exchange from the mock. As such, the race condition triggers a missed exchange that break the testing logic.

aldettinger added a commit to aldettinger/camel-quarkus that referenced this issue Mar 7, 2022
@aldettinger aldettinger added this to the 2.8.0 milestone Mar 7, 2022
aldettinger added a commit that referenced this issue Mar 7, 2022
jamesnetherton pushed a commit to jamesnetherton/camel-quarkus that referenced this issue Mar 8, 2022
jamesnetherton pushed a commit that referenced this issue Mar 9, 2022
@aldettinger aldettinger reopened this Mar 22, 2022
@aldettinger aldettinger modified the milestones: 2.8.0-M1, 2.8.0 Mar 22, 2022
@aldettinger
Copy link
Contributor

It seems to still be there after 2.8.0-M1.

At least, we have seen it once in a Windows build:

org.awaitility.core.ConditionTimeoutException: org.apache.camel.quarkus.component.file.it.FileTest expected "Hello2" but was "" within 10 seconds.
	at org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:164)
	at org.awaitility.core.AbstractHamcrestCondition.await(AbstractHamcrestCondition.java:86)
	at org.awaitility.core.ConditionFactory.until(ConditionFactory.java:939)
	at org.awaitility.core.ConditionFactory.until(ConditionFactory.java:645)
	at org.apache.camel.quarkus.component.file.it.FileTest.filter(FileTest.java:157)

Let records few occurences to see how frequent it is and whether it still happen on Linux or not.

@aldettinger aldettinger changed the title Intermittent failure of FileTest.idempotent Intermittent failure of FileTests Mar 23, 2022
@aldettinger aldettinger changed the title Intermittent failure of FileTests Intermittent failure of FileTests Mar 23, 2022
@aldettinger aldettinger changed the title Intermittent failure of FileTests Intermittent failure of FileTests Mar 23, 2022
@jamesnetherton jamesnetherton modified the milestones: 2.8.0, 2.9.0 Apr 4, 2022
@aldettinger
Copy link
Contributor

aldettinger commented Apr 6, 2022

We had another occurence of this in https://github.com/apache/camel-quarkus/runs/5790386191?check_suite_focus=true :

2022-04-01T15:54:17.3802934Z org.awaitility.core.ConditionTimeoutException: org.apache.camel.quarkus.component.file.it.FileTest expected "Hello2" but was "" within 10 seconds.
2022-04-01T15:54:17.3803866Z 	at org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:167)
2022-04-01T15:54:17.3804696Z 	at org.awaitility.core.AbstractHamcrestCondition.await(AbstractHamcrestCondition.java:86)
2022-04-01T15:54:17.3805481Z 	at org.awaitility.core.ConditionFactory.until(ConditionFactory.java:985)
2022-04-01T15:54:17.3806537Z 	at org.awaitility.core.ConditionFactory.until(ConditionFactory.java:691)
2022-04-01T15:54:17.3807215Z 	at org.apache.camel.quarkus.component.file.it.FileTest.filter(FileTest.java:157)

This time, it was not Linux platform with JDK version 17. Chances are that this is an infrequent flaky test on all platforms, jdk versions.

@aldettinger
Copy link
Contributor

Caught another occurence in integration-tests-alternative-jdk, this time in FileTest.idempotent:

2022-05-05T17:20:42.1752186Z org.awaitility.core.ConditionTimeoutException: org.apache.camel.quarkus.component.file.it.FileTest expected "Hello1" but was "" within 10 seconds.
2022-05-05T17:20:42.1752848Z 	at org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:167)
2022-05-05T17:20:42.1760375Z 	at org.awaitility.core.AbstractHamcrestCondition.await(AbstractHamcrestCondition.java:86)
2022-05-05T17:20:42.1761109Z 	at org.awaitility.core.ConditionFactory.until(ConditionFactory.java:985)
2022-05-05T17:20:42.1769491Z 	at org.awaitility.core.ConditionFactory.until(ConditionFactory.java:691)
2022-05-05T17:20:42.1770136Z 	at org.apache.camel.quarkus.component.file.it.FileTest.idempotent(FileTest.java:118)

@aldettinger
Copy link
Contributor

At this stage, we've experienced this flakiness once on alternative-platform, few times on alternative-jdks and never on regular itests. And as of now, the failing test is either idempotent or filter.

aldettinger added a commit to aldettinger/camel-quarkus that referenced this issue May 13, 2022
aldettinger added a commit to aldettinger/camel-quarkus that referenced this issue May 13, 2022
@aldettinger
Copy link
Contributor

This PR introduces extra logs to attempt catching more information.

The flaky tests caught, idempotent and filter, are both using getFromMock. However, only idempotent is using resetMock, so we may be facing a new issue. On next occurence, it would be of special interest to check what files are really consumed by the filter and idempotent routes.

@aldettinger
Copy link
Contributor

The filter and idempotent tests have been rewritten from scratch in a new test harness:
c3ee20a

Let's give some time to check whether they are still flaky. If not, maybe we could rewrite all tests.

essobedo pushed a commit that referenced this issue Jul 11, 2023
@aldettinger
Copy link
Contributor

On July 20th, the rewritten file tests are not flaky for more than a month.
The next step would be to rewrite all tests with the new approach.

aldettinger added a commit to aldettinger/camel-quarkus that referenced this issue Aug 17, 2023
aldettinger added a commit to aldettinger/camel-quarkus that referenced this issue Aug 17, 2023
@aldettinger
Copy link
Contributor

2 monthes without flakiness. Time to migrate more tests to the new test harness.

aldettinger added a commit to aldettinger/camel-quarkus that referenced this issue Sep 8, 2023
aldettinger added a commit to aldettinger/camel-quarkus that referenced this issue Sep 22, 2023
aldettinger added a commit to aldettinger/camel-quarkus that referenced this issue Nov 2, 2023
aldettinger added a commit to aldettinger/camel-quarkus that referenced this issue Nov 3, 2023
aldettinger added a commit to aldettinger/camel-quarkus that referenced this issue Nov 3, 2023
aldettinger added a commit to aldettinger/camel-quarkus that referenced this issue Nov 7, 2023
aldettinger added a commit to aldettinger/camel-quarkus that referenced this issue Dec 21, 2023
aldettinger added a commit to aldettinger/camel-quarkus that referenced this issue Dec 21, 2023
aldettinger added a commit to aldettinger/camel-quarkus that referenced this issue Feb 2, 2024
aldettinger added a commit to aldettinger/camel-quarkus that referenced this issue Feb 16, 2024
aldettinger added a commit to aldettinger/camel-quarkus that referenced this issue Mar 5, 2024
@aldettinger
Copy link
Contributor

Every tests have been migrated to the new harness and no more flakiness on ci. Let's see how it behaves on quarkus platform ci on next release.

@aldettinger
Copy link
Contributor

The new harness is also fixing issues on the quarkus-platform as shown in this pr.

@aldettinger aldettinger added this to the 3.10.0 milestone Mar 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants