-
Notifications
You must be signed in to change notification settings - Fork 593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ExtremeReadsTest #3070
Add ExtremeReadsTest #3070
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3070 +/- ##
===============================================
- Coverage 79.721% 79.626% -0.095%
- Complexity 18162 18804 +642
===============================================
Files 1223 1233 +10
Lines 66649 69240 +2591
Branches 10409 11069 +660
===============================================
+ Hits 53133 55133 +2000
- Misses 9309 9788 +479
- Partials 4207 4319 +112
|
With this new version I'm able to make it fail again; I opened a million channels to read the same file (across 1k threads) and got the error below. Yes I know a million parallel reads on a single file is more than a normal user would issue.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two comments to address, back to @jean-philippe-martin
**/ | ||
@Test(groups={"bucket"}, enabled=false) | ||
public void manyParallelReads() throws InterruptedException { | ||
List<Thread> threads = new ArrayList<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you switch to using an ExecutorService
for the thread pool here? Instantiating Threads
directly is not recommended practice. Could you also ensure that the threads are marked as daemon threads?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Though of course we explicitly want to use all threads at the same time, going against the main use case of thread pools.
* Stress test for reading lots of data from the cloud using a very small prefetch buffer. | ||
* Do not run this too often. | ||
*/ | ||
public final class ExtremeReadsTest extends BaseTest implements Runnable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you implement a static inner Runnable
class, instead of making the test class itself Runnable
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@droazen all addressed! |
@jean-philippe-martin Given that we've finally resolved our intermittent GCS failures at scale, do you think it's still valuable to keep this around, or should this PR be closed? |
I think we should merge this PR. Our fixes are a sort of workaround the fact that GCS doesn't warm up as quickly as we'd like it to. It's good to have the code to test that again if needed. |
@jean-philippe-martin Could you please rebase this onto the latest master? This branch needs a git-lfs-related change in order to pass tests. Thanks! |
This is necessary to reach a higher number of parallel reads.
Switch from Thread to Executor, Switch from the test class implementing Runnable to using a different class instead.
e3d7be4
to
e9c2f82
Compare
No problem @droazen , rebased. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor remaining TODOs, then we can merge this.
static String fname = GCS_GATK_TEST_RESOURCES + "large/CEUTrio.HiSeq.WGS.b37.NA12878.20.21.bam"; | ||
|
||
static int THREAD_COUNT = 1000; | ||
static int CHANNELS_PER_THREAD = 1000; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make these constants final
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK!
|
||
static int THREAD_COUNT = 1000; | ||
static int CHANNELS_PER_THREAD = 1000; | ||
static volatile int errors = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make errors
an instance variable rather than a mutable static variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually on second thought, let's not. If it's not static then Runner cannot access it unless we make Runner non-static or manually pass it a reference to the ExtremeReadsTest instance.
// EOF | ||
long position = chan.position(); | ||
if (size != position) { | ||
System.out.println("Done at wrong position! " + position + " != " + size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace all System.out.println()
calls with logger calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, but now I don't see the output anymore in the IntelliJ GUI.
errors = 0; | ||
final Runner runner = new Runner(); | ||
for (int i=0; i<THREAD_COUNT; i++) { | ||
executor.execute(runner); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pass in a separate Runner
instance on each iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, but why? Runner is stateless.
@droazen I have a new version up with your changes. |
Thank you @droazen! |
This was a very useful debug tool when working on issue #2685. It sends many parallel reads for a long time. This makes sure that the combination of the cloud provider's throttling and our own retry parameters allows us to eventually read everything to completion and not fail with disconnection errors.
This test is disabled by default, because it takes too long to be run every time. But if there's a doubt about retries we can dust it off and run it.