Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ExtremeReadsTest #3070

Merged
merged 5 commits into from
Oct 13, 2017
Merged

Add ExtremeReadsTest #3070

merged 5 commits into from
Oct 13, 2017

Conversation

jean-philippe-martin
Copy link
Contributor

This was a very useful debug tool when working on issue #2685. It sends many parallel reads for a long time. This makes sure that the combination of the cloud provider's throttling and our own retry parameters allows us to eventually read everything to completion and not fail with disconnection errors.

This test is disabled by default, because it takes too long to be run every time. But if there's a doubt about retries we can dust it off and run it.

@codecov-io
Copy link

codecov-io commented Jun 8, 2017

Codecov Report

Merging #3070 into master will decrease coverage by 0.095%.
The diff coverage is n/a.

@@               Coverage Diff               @@
##              master     #3070       +/-   ##
===============================================
- Coverage     79.721%   79.626%   -0.095%     
- Complexity     18162     18804      +642     
===============================================
  Files           1223      1233       +10     
  Lines          66649     69240     +2591     
  Branches       10409     11069      +660     
===============================================
+ Hits           53133     55133     +2000     
- Misses          9309      9788      +479     
- Partials        4207      4319      +112
Impacted Files Coverage Δ Complexity Δ
...institute/hellbender/utils/read/SamComparison.java 57.403% <0%> (-22.961%) 86% <0%> (ø)
...bender/tools/walkers/annotator/OxoGReadCounts.java 86.408% <0%> (-9.244%) 62% <0%> (+43%)
...e/hellbender/engine/spark/SparkContextFactory.java 71.233% <0%> (-2.74%) 11% <0%> (ø)
...bender/tools/walkers/annotator/FragmentLength.java 100% <0%> (ø) 12% <0%> (+5%) ⬆️
...der/tools/walkers/mutect/M2ArgumentCollection.java 100% <0%> (ø) 1% <0%> (ø) ⬇️
...otypecaller/HaplotypeCallerArgumentCollection.java 100% <0%> (ø) 4% <0%> (+2%) ⬆️
...ecaller/AssemblyBasedCallerArgumentCollection.java 100% <0%> (ø) 1% <0%> (ø) ⬇️
...ellbender/tools/walkers/annotator/BaseQuality.java 100% <0%> (ø) 15% <0%> (+7%) ⬆️
...bender/tools/walkers/annotator/MappingQuality.java 100% <0%> (ø) 11% <0%> (+5%) ⬆️
...nder/cmdline/PicardCommandLineProgramExecutor.java 57.143% <0%> (ø) 2% <0%> (?)
... and 30 more

@droazen droazen self-assigned this Jun 9, 2017
@jean-philippe-martin
Copy link
Contributor Author

With this new version I'm able to make it fail again; I opened a million channels to read the same file (across 1k threads) and got the error below. Yes I know a million parallel reads on a single file is more than a normal user would issue.

shaded.cloud_nio.com.google.api.client.http.HttpRequest execute
WARNING: exception thrown while executing request
javax.net.ssl.SSLHandshakeException: Remote host closed connection during handshake
	at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:992)
	at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375)
	at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1403)
	at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1387)
	at sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:559)
	at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:185)
	at sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(HttpURLConnection.java:1316)
	at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1291)
	at sun.net.www.protocol.https.HttpsURLConnectionImpl.getOutputStream(HttpsURLConnectionImpl.java:250)
	at shaded.cloud_nio.com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:77)
	at shaded.cloud_nio.com.google.api.client.http.HttpRequest.execute(HttpRequest.java:972)
	at shaded.cloud_nio.com.google.auth.oauth2.ServiceAccountCredentials.refreshAccessToken(ServiceAccountCredentials.java:365)
	at shaded.cloud_nio.com.google.auth.oauth2.OAuth2Credentials.refresh(OAuth2Credentials.java:149)
	at shaded.cloud_nio.com.google.auth.oauth2.OAuth2Credentials.getRequestMetadata(OAuth2Credentials.java:135)
	at shaded.cloud_nio.com.google.auth.http.HttpCredentialsAdapter.initialize(HttpCredentialsAdapter.java:96)
	at com.google.cloud.http.HttpTransportOptions$1.initialize(HttpTransportOptions.java:156)
	at shaded.cloud_nio.com.google.api.client.http.HttpRequestFactory.buildRequest(HttpRequestFactory.java:93)
	at shaded.cloud_nio.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.buildHttpRequest(AbstractGoogleClientRequest.java:300)
	at shaded.cloud_nio.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:419)
	at shaded.cloud_nio.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeUnparsed(AbstractGoogleClientRequest.java:352)
	at shaded.cloud_nio.com.google.api.client.googleapis.services.AbstractGoogleClientRequest.executeMedia(AbstractGoogleClientRequest.java:380)
	at shaded.cloud_nio.com.google.api.services.storage.Storage$Objects$Get.executeMedia(Storage.java:5130)
	at com.google.cloud.storage.spi.v1.HttpStorageRpc.read(HttpStorageRpc.java:494)
	at com.google.cloud.storage.BlobReadChannel$1.call(BlobReadChannel.java:127)
	at com.google.cloud.storage.BlobReadChannel$1.call(BlobReadChannel.java:124)
	at shaded.cloud_nio.com.google.api.gax.retrying.DirectRetryingExecutor.submit(DirectRetryingExecutor.java:93)
	at com.google.cloud.RetryHelper.runWithRetries(RetryHelper.java:49)
	at com.google.cloud.storage.BlobReadChannel.read(BlobReadChannel.java:124)
	at com.google.cloud.storage.contrib.nio.CloudStorageReadChannel.read(CloudStorageReadChannel.java:113)
	at org.broadinstitute.hellbender.utils.nio.SeekableByteChannelPrefetcher$WorkUnit.call(SeekableByteChannelPrefetcher.java:131)
	at org.broadinstitute.hellbender.utils.nio.SeekableByteChannelPrefetcher$WorkUnit.call(SeekableByteChannelPrefetcher.java:104)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.EOFException: SSL peer shut down incorrectly
	at sun.security.ssl.InputRecord.read(InputRecord.java:505)
	at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973)
	... 34 more

Copy link
Contributor

@droazen droazen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two comments to address, back to @jean-philippe-martin

**/
@Test(groups={"bucket"}, enabled=false)
public void manyParallelReads() throws InterruptedException {
List<Thread> threads = new ArrayList<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you switch to using an ExecutorService for the thread pool here? Instantiating Threads directly is not recommended practice. Could you also ensure that the threads are marked as daemon threads?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Though of course we explicitly want to use all threads at the same time, going against the main use case of thread pools.

* Stress test for reading lots of data from the cloud using a very small prefetch buffer.
* Do not run this too often.
*/
public final class ExtremeReadsTest extends BaseTest implements Runnable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you implement a static inner Runnable class, instead of making the test class itself Runnable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@jean-philippe-martin
Copy link
Contributor Author

@droazen all addressed!

@droazen
Copy link
Contributor

droazen commented Sep 18, 2017

@jean-philippe-martin Given that we've finally resolved our intermittent GCS failures at scale, do you think it's still valuable to keep this around, or should this PR be closed?

@jean-philippe-martin
Copy link
Contributor Author

I think we should merge this PR. Our fixes are a sort of workaround the fact that GCS doesn't warm up as quickly as we'd like it to. It's good to have the code to test that again if needed.

@droazen
Copy link
Contributor

droazen commented Sep 25, 2017

@jean-philippe-martin Could you please rebase this onto the latest master? This branch needs a git-lfs-related change in order to pass tests. Thanks!

This is necessary to reach a higher number of parallel reads.
Switch from Thread to Executor,
Switch from the test class implementing Runnable to
using a different class instead.
@jean-philippe-martin
Copy link
Contributor Author

No problem @droazen , rebased.

Copy link
Contributor

@droazen droazen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor remaining TODOs, then we can merge this.

static String fname = GCS_GATK_TEST_RESOURCES + "large/CEUTrio.HiSeq.WGS.b37.NA12878.20.21.bam";

static int THREAD_COUNT = 1000;
static int CHANNELS_PER_THREAD = 1000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make these constants final

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK!


static int THREAD_COUNT = 1000;
static int CHANNELS_PER_THREAD = 1000;
static volatile int errors = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make errors an instance variable rather than a mutable static variable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually on second thought, let's not. If it's not static then Runner cannot access it unless we make Runner non-static or manually pass it a reference to the ExtremeReadsTest instance.

// EOF
long position = chan.position();
if (size != position) {
System.out.println("Done at wrong position! " + position + " != " + size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace all System.out.println() calls with logger calls.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, but now I don't see the output anymore in the IntelliJ GUI.

errors = 0;
final Runner runner = new Runner();
for (int i=0; i<THREAD_COUNT; i++) {
executor.execute(runner);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass in a separate Runner instance on each iteration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but why? Runner is stateless.

@jean-philippe-martin
Copy link
Contributor Author

@droazen I have a new version up with your changes.

@droazen droazen merged commit ffb47f9 into master Oct 13, 2017
@droazen droazen deleted the jp_extreme_reads_test branch October 13, 2017 20:12
@jean-philippe-martin
Copy link
Contributor Author

Thank you @droazen!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants