-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
org.elasticsearch.repositories.gcs.GoogleCloudStorageBlobStoreRepositoryTests testSnapshotWithLargeSegmentFiles #51446
Comments
Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore) |
This test was still very GC heavy in Java 8 runs in particular which seems to slow down request processing to the point of timeouts in some runs. This PR completely removes the large number of O(MB) `byte[]` allocations that were happening in the mock http handler which cuts the allocation rate by about a factor of 5 in my local testing for the GC heavy `testSnapshotWithLargeSegmentFiles` run. Closes elastic#51446
This test was still very GC heavy in Java 8 runs in particular which seems to slow down request processing to the point of timeouts in some runs. This PR completely removes the large number of O(MB) `byte[]` allocations that were happening in the mock http handler which cuts the allocation rate by about a factor of 5 in my local testing for the GC heavy `testSnapshotWithLargeSegmentFiles` run. Closes #51446 Closes #50754
This test was still very GC heavy in Java 8 runs in particular which seems to slow down request processing to the point of timeouts in some runs. This PR completely removes the large number of O(MB) `byte[]` allocations that were happening in the mock http handler which cuts the allocation rate by about a factor of 5 in my local testing for the GC heavy `testSnapshotWithLargeSegmentFiles` run. Closes elastic#51446 Closes elastic#50754
This test was still very GC heavy in Java 8 runs in particular which seems to slow down request processing to the point of timeouts in some runs. This PR completely removes the large number of O(MB) `byte[]` allocations that were happening in the mock http handler which cuts the allocation rate by about a factor of 5 in my local testing for the GC heavy `testSnapshotWithLargeSegmentFiles` run. Closes elastic#51446 Closes elastic#50754
This test was still very GC heavy in Java 8 runs in particular which seems to slow down request processing to the point of timeouts in some runs. This PR completely removes the large number of O(MB) `byte[]` allocations that were happening in the mock http handler which cuts the allocation rate by about a factor of 5 in my local testing for the GC heavy `testSnapshotWithLargeSegmentFiles` run. Closes #51446 Closes #50754
This test was still very GC heavy in Java 8 runs in particular which seems to slow down request processing to the point of timeouts in some runs. This PR completely removes the large number of O(MB) `byte[]` allocations that were happening in the mock http handler which cuts the allocation rate by about a factor of 5 in my local testing for the GC heavy `testSnapshotWithLargeSegmentFiles` run. Closes #51446 Closes #50754
New failure on 7.6: https://gradle-enterprise.elastic.co/s/prwylyxj4k5ja |
This looks like it's in fact this JDK bug https://bugs.openjdk.java.net/browse/JDK-8180754, that's why we're only seeing the failure on JDK-8. Looking into a workaround ... |
There is an open JDK bug that is causing an assertion in the JDK's http server to trip if we don't drain the request body before sending response headers. See https://bugs.openjdk.java.net/browse/JDK-8180754 Working around this issue here by always draining the request at the beginning of the handler. Fixes elastic#51446
There is an open JDK bug that is causing an assertion in the JDK's http server to trip if we don't drain the request body before sending response headers. See https://bugs.openjdk.java.net/browse/JDK-8180754 Working around this issue here by always draining the request at the beginning of the handler. Fixes #51446
There is an open JDK bug that is causing an assertion in the JDK's http server to trip if we don't drain the request body before sending response headers. See https://bugs.openjdk.java.net/browse/JDK-8180754 Working around this issue here by always draining the request at the beginning of the handler. Fixes elastic#51446
There is an open JDK bug that is causing an assertion in the JDK's http server to trip if we don't drain the request body before sending response headers. See https://bugs.openjdk.java.net/browse/JDK-8180754 Working around this issue here by always draining the request at the beginning of the handler. Fixes elastic#51446
There is an open JDK bug that is causing an assertion in the JDK's http server to trip if we don't drain the request body before sending response headers. See https://bugs.openjdk.java.net/browse/JDK-8180754 Working around this issue here by always draining the request at the beginning of the handler. Fixes #51446
There is an open JDK bug that is causing an assertion in the JDK's http server to trip if we don't drain the request body before sending response headers. See https://bugs.openjdk.java.net/browse/JDK-8180754 Working around this issue here by always draining the request at the beginning of the handler. Fixes #51446
@original-brownbear This happened again on 7.x: https://gradle-enterprise.elastic.co/s/w45psdcg3vtji and https://gradle-enterprise.elastic.co/s/jauynygl5zsxw. Would you taking another look? |
We were not correctly respecting the download range which lead to the GCS SDK client closing the connection at times. Also, we are randomly chosing to fail requests in these tests but set the retry count to `0` by force which could lead to issues when a chunk of a ranged download is failed and the retrying input stream doesn't retry the chunk (GCS SDK strangely does not retry this case on a `5xx`). Closes elastic#51446
We were not correctly respecting the download range which lead to the GCS SDK client closing the connection at times. Closes elastic#51446
We were not correctly respecting the download range which lead to the GCS SDK client closing the connection at times. Also, fixes another instance of failing to drain the request fully before sending the response headers. Closes #51446
We were not correctly respecting the download range which lead to the GCS SDK client closing the connection at times. Also, fixes another instance of failing to drain the request fully before sending the response headers. Closes elastic#51446
We were not correctly respecting the download range which lead to the GCS SDK client closing the connection at times. Also, fixes another instance of failing to drain the request fully before sending the response headers. Closes elastic#51446
Tests in GoogleCloudStorageBlobStoreRepositoryTests are known to be flaky on JDK 8 (#51446, #52430 ) and we suspect a JDK bug (https://bugs.openjdk.java.net/browse/JDK-8180754) that triggers some assertion on the server side logic that emulates the Google Cloud Storage service. Sadly we were not able to reproduce the failures, even when using the same OS (Debian 9, Ubuntu 16.04) and JDK (Oracle Corporation 1.8.0_241 [Java HotSpot(TM) 64-Bit Server VM 25.241-b07]) of almost all the test failures on CI. While we spent some time fixing code (#51933, #52431) to circumvent the JDK bug they are still flaky on JDK-8. This commit mute these tests for JDK-8 only. Close ##52906
Tests in GoogleCloudStorageBlobStoreRepositoryTests are known to be flaky on JDK 8 (#51446, #52430 ) and we suspect a JDK bug (https://bugs.openjdk.java.net/browse/JDK-8180754) that triggers some assertion on the server side logic that emulates the Google Cloud Storage service. Sadly we were not able to reproduce the failures, even when using the same OS (Debian 9, Ubuntu 16.04) and JDK (Oracle Corporation 1.8.0_241 [Java HotSpot(TM) 64-Bit Server VM 25.241-b07]) of almost all the test failures on CI. While we spent some time fixing code (#51933, #52431) to circumvent the JDK bug they are still flaky on JDK-8. This commit mute these tests for JDK-8 only. Close ##52906
Reproduce with (does not repro locally):
Error
Build scan : https://gradle-enterprise.elastic.co/s/5agjwd5uxd4g6
90 day history
Suspected related issues (via comments from prior failures)
Note - it seems this happens (exclusively?) on 7.5/7.6/7.x and sometimes, but not always have a SocketTimeout too. For example (different build scan then above) https://gradle-enterprise.elastic.co/s/4z2vxrxrohjmq/console-log?anchor=7206
The text was updated successfully, but these errors were encountered: