org.elasticsearch.repositories.gcs.GoogleCloudStorageBlobStoreRepositoryTests testSnapshotWithLargeSegmentFiles #51446

jakelandis · 2020-01-24T20:20:55Z

Reproduce with (does not repro locally):

./gradlew ':plugins:repository-gcs:test' --tests "org.elasticsearch.repositories.gcs.GoogleCloudStorageBlobStoreRepositoryTests.testSnapshotWithLargeSegmentFiles" -Dtests.seed=79146B9E4DC8DB20 -Dtests.security.manager=true -Dtests.locale=ro -Dtests.timezone=America/La_Paz -Dcompiler.java=13 -Druntime.java=8

Error

 java.lang.AssertionError: Only index blobs should remain in repository but found [indices/U3Ra9xA9QYOIf2JnrS9qiw/0/__U-yvyHGGSC-HG3-rYgESAQ, indices/U3Ra9xA9QYOIf2JnrS9qiw/0/__2W1fsDR7RHqVaHUbR2ljDg, indices/U3Ra9xA9QYOIf2JnrS9qiw/0/__PYCKH3IuR8OvsesbZ53Q4A, indices/U3Ra9xA9QYOIf2JnrS9qiw/0/__ChKaNwG-QDeOLEFMBmV4hg, indices/U3Ra9xA9QYOIf2JnrS9qiw/0/__4BrkUvaDSoaRxda8zaYzmw, indices/U3Ra9xA9QYOIf2JnrS9qiw/0/__8TyJLropTGCbwEdpGgkCgg, snap-6BvhIzAVS8Wk0yEzY1yHHw.dat, indices/U3Ra9xA9QYOIf2JnrS9qiw/0/__TYjSMqKWSAyimBORtvz2hQ, indices/U3Ra9xA9QYOIf2JnrS9qiw/0/__nr1hKmAYS8mwQoYGfONLxA, indices/U3Ra9xA9QYOIf2JnrS9qiw/0/__AIvhEKrQTT6rQ95Q_a6Mcw, indices/U3Ra9xA9QYOIf2JnrS9qiw/0/snap-6BvhIzAVS8Wk0yEzY1yHHw.dat, indices/U3Ra9xA9QYOIf2JnrS9qiw/0/__OYx13Q6MT329W_IM0oSGsg, indices/U3Ra9xA9QYOIf2JnrS9qiw/0/__lc7ep5qgSfGUi5XOU2GgVA, indices/U3Ra9xA9QYOIf2JnrS9qiw/0/__FnKJqi01RL6xd-BqebZ9qw, indices/U3Ra9xA9QYOIf2JnrS9qiw/0/__iv8fkpEiRkGaiSfcDb6cTg, indices/U3Ra9xA9QYOIf2JnrS9qiw/meta-6BvhIzAVS8Wk0yEzY1yHHw.dat, indices/U3Ra9xA9QYOIf2JnrS9qiw/0/__45KqhxnDQJyJbNxg5bOdBw, indices/U3Ra9xA9QYOIf2JnrS9qiw/0/__zLdkon40Q5aMTM5YjMsXWQ, indices/U3Ra9xA9QYOIf2JnrS9qiw/0/__UAbVvyV5QHq0RsZGVRdphg, indices/U3Ra9xA9QYOIf2JnrS9qiw/0/__jXvlBf1lTn6Ct1d2xyEeHg, indices/U3Ra9xA9QYOIf2JnrS9qiw/0/__ED_68lu_Qci0brGcdBWWqA, indices/U3Ra9xA9QYOIf2JnrS9qiw/0/__0j2RqpNKRKqy6RNbv4iniA, indices/U3Ra9xA9QYOIf2JnrS9qiw/0/__SqeeF3m4TkqcEJrcZfx0XA, indices/U3Ra9xA9QYOIf2JnrS9qiw/0/__6j7bvik8RDqSNmZIHaHdlw, indices/U3Ra9xA9QYOIf2JnrS9qiw/0/__gj5mlfNKQmC_sqRVWkQh4A, meta-6BvhIzAVS8Wk0yEzY1yHHw.dat, indices/U3Ra9xA9QYOIf2JnrS9qiw/0/__YCpLvzOYR6m3ZMbSCjvCuw]	
 
    Expected: a collection with size <0>	
         but: collection size was <27>	
        at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:18)	
        at org.junit.Assert.assertThat(Assert.java:956)	
        at org.elasticsearch.repositories.blobstore.ESMockAPIBasedRepositoryIntegTestCase.tearDownHttpServer(ESMockAPIBasedRepositoryIntegTestCase.java:112)

Build scan : https://gradle-enterprise.elastic.co/s/5agjwd5uxd4g6

90 day history

Suspected related issues (via comments from prior failures)

Note - it seems this happens (exclusively?) on 7.5/7.6/7.x and sometimes, but not always have a SocketTimeout too. For example (different build scan then above) https://gradle-enterprise.elastic.co/s/4z2vxrxrohjmq/console-log?anchor=7206

The text was updated successfully, but these errors were encountered:

elasticmachine · 2020-01-24T20:20:58Z

Pinging @elastic/es-distributed (:Distributed/Snapshot/Restore)

This test was still very GC heavy in Java 8 runs in particular which seems to slow down request processing to the point of timeouts in some runs. This PR completely removes the large number of O(MB) `byte[]` allocations that were happening in the mock http handler which cuts the allocation rate by about a factor of 5 in my local testing for the GC heavy `testSnapshotWithLargeSegmentFiles` run. Closes elastic#51446

This test was still very GC heavy in Java 8 runs in particular which seems to slow down request processing to the point of timeouts in some runs. This PR completely removes the large number of O(MB) `byte[]` allocations that were happening in the mock http handler which cuts the allocation rate by about a factor of 5 in my local testing for the GC heavy `testSnapshotWithLargeSegmentFiles` run. Closes #51446 Closes #50754

This test was still very GC heavy in Java 8 runs in particular which seems to slow down request processing to the point of timeouts in some runs. This PR completely removes the large number of O(MB) `byte[]` allocations that were happening in the mock http handler which cuts the allocation rate by about a factor of 5 in my local testing for the GC heavy `testSnapshotWithLargeSegmentFiles` run. Closes elastic#51446 Closes elastic#50754

This test was still very GC heavy in Java 8 runs in particular which seems to slow down request processing to the point of timeouts in some runs. This PR completely removes the large number of O(MB) `byte[]` allocations that were happening in the mock http handler which cuts the allocation rate by about a factor of 5 in my local testing for the GC heavy `testSnapshotWithLargeSegmentFiles` run. Closes #51446 Closes #50754

ywelsch · 2020-02-05T11:17:38Z

New failure on 7.6: https://gradle-enterprise.elastic.co/s/prwylyxj4k5ja

original-brownbear · 2020-02-05T11:29:09Z

This looks like it's in fact this JDK bug https://bugs.openjdk.java.net/browse/JDK-8180754, that's why we're only seeing the failure on JDK-8. Looking into a workaround ...

There is an open JDK bug that is causing an assertion in the JDK's http server to trip if we don't drain the request body before sending response headers. See https://bugs.openjdk.java.net/browse/JDK-8180754 Working around this issue here by always draining the request at the beginning of the handler. Fixes elastic#51446

There is an open JDK bug that is causing an assertion in the JDK's http server to trip if we don't drain the request body before sending response headers. See https://bugs.openjdk.java.net/browse/JDK-8180754 Working around this issue here by always draining the request at the beginning of the handler. Fixes #51446

There is an open JDK bug that is causing an assertion in the JDK's http server to trip if we don't drain the request body before sending response headers. See https://bugs.openjdk.java.net/browse/JDK-8180754 Working around this issue here by always draining the request at the beginning of the handler. Fixes elastic#51446

There is an open JDK bug that is causing an assertion in the JDK's http server to trip if we don't drain the request body before sending response headers. See https://bugs.openjdk.java.net/browse/JDK-8180754 Working around this issue here by always draining the request at the beginning of the handler. Fixes #51446

dnhatn · 2020-02-25T15:40:25Z

@original-brownbear This happened again on 7.x: https://gradle-enterprise.elastic.co/s/w45psdcg3vtji and https://gradle-enterprise.elastic.co/s/jauynygl5zsxw. Would you taking another look?

We were not correctly respecting the download range which lead to the GCS SDK client closing the connection at times. Also, we are randomly chosing to fail requests in these tests but set the retry count to `0` by force which could lead to issues when a chunk of a ranged download is failed and the retrying input stream doesn't retry the chunk (GCS SDK strangely does not retry this case on a `5xx`). Closes elastic#51446

We were not correctly respecting the download range which lead to the GCS SDK client closing the connection at times. Closes elastic#51446

We were not correctly respecting the download range which lead to the GCS SDK client closing the connection at times. Also, fixes another instance of failing to drain the request fully before sending the response headers. Closes #51446

We were not correctly respecting the download range which lead to the GCS SDK client closing the connection at times. Also, fixes another instance of failing to drain the request fully before sending the response headers. Closes elastic#51446

We were not correctly respecting the download range which lead to the GCS SDK client closing the connection at times. Also, fixes another instance of failing to drain the request fully before sending the response headers. Closes #51446

Tests in GoogleCloudStorageBlobStoreRepositoryTests are known to be flaky on JDK 8 (#51446, #52430 ) and we suspect a JDK bug (https://bugs.openjdk.java.net/browse/JDK-8180754) that triggers some assertion on the server side logic that emulates the Google Cloud Storage service. Sadly we were not able to reproduce the failures, even when using the same OS (Debian 9, Ubuntu 16.04) and JDK (Oracle Corporation 1.8.0_241 [Java HotSpot(TM) 64-Bit Server VM 25.241-b07]) of almost all the test failures on CI. While we spent some time fixing code (#51933, #52431) to circumvent the JDK bug they are still flaky on JDK-8. This commit mute these tests for JDK-8 only. Close ##52906

jakelandis added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs >test-failure Triaged test failures from CI labels Jan 24, 2020

original-brownbear self-assigned this Jan 24, 2020

original-brownbear mentioned this issue Jan 29, 2020

Optimize GCS Mock #51593

Merged

original-brownbear closed this as completed in #51593 Jan 29, 2020

original-brownbear mentioned this issue Jan 29, 2020

Optimize GCS Mock (#51593) #51594

Merged

original-brownbear mentioned this issue Jan 29, 2020

Optimize GCS Mock (#51593) #51595

Merged

ywelsch reopened this Feb 5, 2020

original-brownbear mentioned this issue Feb 5, 2020

Fix GCS Mock Http Handler JDK Bug #51933

Merged

original-brownbear closed this as completed in #51933 Feb 5, 2020

original-brownbear mentioned this issue Feb 5, 2020

Fix GCS Mock Http Handler JDK Bug (#51933) #51941

Merged

original-brownbear mentioned this issue Feb 5, 2020

Fix GCS Mock Http Handler JDK Bug (#51933) #51944

Merged

dnhatn reopened this Feb 25, 2020

original-brownbear added a commit to original-brownbear/elasticsearch that referenced this issue Feb 26, 2020

Fix GCS Mock Range Downloads

45dfcb3

We were not correctly respecting the download range which lead to the GCS SDK client closing the connection at times. Closes elastic#51446

original-brownbear mentioned this issue Feb 26, 2020

Fix GCS Mock Range Downloads #52804

Merged

original-brownbear closed this as completed in #52804 Feb 26, 2020

original-brownbear mentioned this issue Feb 26, 2020

Fix GCS Mock Range Downloads (#52804) #52830

Merged

original-brownbear mentioned this issue Feb 26, 2020

Fix GCS Mock Range Downloads (#52804) #52831

Merged

tlrx mentioned this issue Mar 4, 2020

Mute GoogleCloudStorageBlobStoreRepositoryTests on jdk8 #53119

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

org.elasticsearch.repositories.gcs.GoogleCloudStorageBlobStoreRepositoryTests testSnapshotWithLargeSegmentFiles #51446

org.elasticsearch.repositories.gcs.GoogleCloudStorageBlobStoreRepositoryTests testSnapshotWithLargeSegmentFiles #51446

jakelandis commented Jan 24, 2020 •

edited

Loading

elasticmachine commented Jan 24, 2020

ywelsch commented Feb 5, 2020

original-brownbear commented Feb 5, 2020

dnhatn commented Feb 25, 2020

org.elasticsearch.repositories.gcs.GoogleCloudStorageBlobStoreRepositoryTests testSnapshotWithLargeSegmentFiles #51446

org.elasticsearch.repositories.gcs.GoogleCloudStorageBlobStoreRepositoryTests testSnapshotWithLargeSegmentFiles #51446

Comments

jakelandis commented Jan 24, 2020 • edited Loading

elasticmachine commented Jan 24, 2020

ywelsch commented Feb 5, 2020

original-brownbear commented Feb 5, 2020

dnhatn commented Feb 25, 2020

jakelandis commented Jan 24, 2020 •

edited

Loading