-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ADAM-916] New strategy for writing header. #917
[ADAM-916] New strategy for writing header. #917
Conversation
Test FAILed. Build result: FAILUREGitHub pull request #917 of commit 2c38bbd automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/917/merge^{commit} # timeout=10 > git branch -a --contains 8ff91b477e29e28d277a812a87673b43fbbff9a1 # timeout=10 > git rev-parse remotes/origin/pr/917/merge^{commit} # timeout=10Checking out Revision 8ff91b477e29e28d277a812a87673b43fbbff9a1 (origin/pr/917/merge) > git config core.sparsecheckout # timeout=10 > git checkout -f 8ff91b477e29e28d277a812a87673b43fbbff9a1First time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.4.1,centosTriggering ADAM-prb ? 2.6.0,2.10,1.4.1,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'Test FAILed. |
2c38bbd
to
4d1c406
Compare
This is good to go now. Ping for review and merge. |
Test FAILed. Build result: FAILURE[...truncated 24 lines...]Triggering ADAM-prb ? 2.3.0,2.11,1.2.1,centosTriggering ADAM-prb ? 2.3.0,2.10,1.2.1,centosTriggering ADAM-prb ? 1.0.4,2.11,1.2.1,centosTriggering ADAM-prb ? 2.6.0,2.11,1.5.2,centosTriggering ADAM-prb ? 2.3.0,2.10,1.3.1,centosTriggering ADAM-prb ? 2.3.0,2.10,1.4.1,centosTriggering ADAM-prb ? 2.3.0,2.10,1.5.2,centosTriggering ADAM-prb ? 2.3.0,2.11,1.3.1,centosTriggering ADAM-prb ? 2.3.0,2.11,1.4.1,centosADAM-prb ? 2.3.0,2.11,1.5.2,centos completed with result SUCCESSADAM-prb ? 1.0.4,2.10,1.2.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.11,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.10,1.5.2,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.2.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.2.1,centos completed with result SUCCESSADAM-prb ? 1.0.4,2.11,1.2.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.11,1.5.2,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.4.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.5.2,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.4.1,centos completed with result SUCCESSNotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'Test FAILed. |
OK, it seems like on Spark 1.2 we got a merging order that is inconsistent with the ordering we're expecting. I was worried about that... Let me look into this more and see if there is a way we can ensure a consistent merging order. |
4d1c406
to
b8ec48e
Compare
Rebased and somewhat fixed. |
Test FAILed. Build result: FAILUREGitHub pull request #917 of commit b8ec48e automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/917/merge^{commit} # timeout=10 > git branch -a --contains 0a51cd7242eb066b795ff1d414b695e3eff0c6b6 # timeout=10 > git rev-parse remotes/origin/pr/917/merge^{commit} # timeout=10Checking out Revision 0a51cd7242eb066b795ff1d414b695e3eff0c6b6 (origin/pr/917/merge) > git config core.sparsecheckout # timeout=10 > git checkout -f 0a51cd7242eb066b795ff1d414b695e3eff0c6b6First time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.4.1,centosTriggering ADAM-prb ? 2.6.0,2.10,1.4.1,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'Test FAILed. |
Single unit test failure
|
Jenkins, retest this please. |
Test FAILed. Build result: FAILUREGitHub pull request #917 of commit b8ec48e automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/917/merge^{commit} # timeout=10 > git branch -a --contains 0a51cd7242eb066b795ff1d414b695e3eff0c6b6 # timeout=10 > git rev-parse remotes/origin/pr/917/merge^{commit} # timeout=10Checking out Revision 0a51cd7242eb066b795ff1d414b695e3eff0c6b6 (origin/pr/917/merge) > git config core.sparsecheckout # timeout=10 > git checkout -f 0a51cd7242eb066b795ff1d414b695e3eff0c6b6First time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.4.1,centosTriggering ADAM-prb ? 2.6.0,2.10,1.4.1,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'Test FAILed. |
Jenkins, test this please. |
Test FAILed. Build result: FAILUREGitHub pull request #917 of commit b8ec48e automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/917/merge^{commit} # timeout=10 > git branch -a --contains 0a51cd7242eb066b795ff1d414b695e3eff0c6b6 # timeout=10 > git rev-parse remotes/origin/pr/917/merge^{commit} # timeout=10Checking out Revision 0a51cd7242eb066b795ff1d414b695e3eff0c6b6 (origin/pr/917/merge) > git config core.sparsecheckout # timeout=10 > git checkout -f 0a51cd7242eb066b795ff1d414b695e3eff0c6b6First time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.4.1,centosTriggering ADAM-prb ? 2.6.0,2.10,1.4.1,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'Test FAILed. |
Jenkins, test this please. |
Test FAILed. Build result: FAILUREGitHub pull request #917 of commit b8ec48e automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/917/merge^{commit} # timeout=10 > git branch -a --contains 0a51cd7242eb066b795ff1d414b695e3eff0c6b6 # timeout=10 > git rev-parse remotes/origin/pr/917/merge^{commit} # timeout=10Checking out Revision 0a51cd7242eb066b795ff1d414b695e3eff0c6b6 (origin/pr/917/merge) > git config core.sparsecheckout # timeout=10 > git checkout -f 0a51cd7242eb066b795ff1d414b695e3eff0c6b6First time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.4.1,centosTriggering ADAM-prb ? 2.6.0,2.10,1.4.1,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'Test FAILed. |
b8ec48e
to
1c9927f
Compare
Pushed a fix; tests should pass on this run. From my side, this PR is good to go (assuming tests pass on @AmplabJenkins). |
Test FAILed. Build result: FAILURE[...truncated 32 lines...]Triggering ADAM-prb ? 2.3.0,2.10,1.3.1,centosTriggering ADAM-prb ? 2.3.0,2.11,1.4.1,centosTriggering ADAM-prb ? 2.3.0,2.10,1.4.1,centosTriggering ADAM-prb ? 2.3.0,2.10,1.5.2,centosTriggering ADAM-prb ? 2.3.0,2.11,1.3.1,centosADAM-prb ? 2.3.0,2.11,1.5.2,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.6.0,centos completed with result SUCCESSADAM-prb ? 1.0.4,2.10,1.2.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.11,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.10,1.5.2,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.2.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.2.1,centos completed with result SUCCESSADAM-prb ? 1.0.4,2.11,1.2.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.6.0,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.6.0,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.11,1.5.2,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.10,1.6.0,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.4.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.4.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.5.2,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.3.1,centos completed with result SUCCESSNotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'Test FAILed. |
1c9927f
to
47255c4
Compare
Good news: the |
Test FAILed. Build result: FAILURE[...truncated 32 lines...]Triggering ADAM-prb ? 2.3.0,2.10,1.3.1,centosTriggering ADAM-prb ? 2.3.0,2.11,1.4.1,centosTriggering ADAM-prb ? 2.3.0,2.10,1.4.1,centosTriggering ADAM-prb ? 2.3.0,2.10,1.5.2,centosTriggering ADAM-prb ? 2.3.0,2.11,1.3.1,centosADAM-prb ? 2.3.0,2.11,1.5.2,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.6.0,centos completed with result FAILUREADAM-prb ? 1.0.4,2.10,1.2.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.10,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.11,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.10,1.5.2,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.2.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.2.1,centos completed with result SUCCESSADAM-prb ? 1.0.4,2.11,1.2.1,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.6.0,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.6.0,centos completed with result FAILUREADAM-prb ? 2.6.0,2.11,1.5.2,centos completed with result SUCCESSADAM-prb ? 2.6.0,2.10,1.6.0,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.3.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.4.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.4.1,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.10,1.5.2,centos completed with result SUCCESSADAM-prb ? 2.3.0,2.11,1.3.1,centos completed with result SUCCESSNotifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'Test FAILed. |
eb165a7
to
e9cedc1
Compare
Test PASSed. |
Test FAILed. Build result: FAILUREGitHub pull request #917 of commit e9cedc1 automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/917/merge^{commit} # timeout=10 > git branch -a --contains 5dade159e8469006b153c4b151caacb23d6f6952 # timeout=10 > git rev-parse remotes/origin/pr/917/merge^{commit} # timeout=10Checking out Revision 5dade159e8469006b153c4b151caacb23d6f6952 (origin/pr/917/merge) > git config core.sparsecheckout # timeout=10 > git checkout -f 5dade159e8469006b153c4b151caacb23d6f6952First time build. Skipping changelog.Triggering ADAM-prb ? 2.6.0,2.11,1.4.1,centosTriggering ADAM-prb ? 2.6.0,2.10,1.4.1,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'Test FAILed. |
This all works, except for the |
e9cedc1
to
386f759
Compare
Test FAILed. Build result: FAILUREGitHub pull request #917 of commit 386f759 automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/917/merge^{commit} # timeout=10 > git branch -a --contains 0c8180369152d9447e08c7c89edf1068950c3a00 # timeout=10 > git rev-parse remotes/origin/pr/917/merge^{commit} # timeout=10Checking out Revision 0c8180369152d9447e08c7c89edf1068950c3a00 (origin/pr/917/merge) > git config core.sparsecheckout # timeout=10 > git checkout -f 0c8180369152d9447e08c7c89edf1068950c3a00 > git rev-list 5dade159e8469006b153c4b151caacb23d6f6952 # timeout=10Triggering ADAM-prb ? 2.6.0,2.11,1.4.1,centosTriggering ADAM-prb ? 2.6.0,2.10,1.4.1,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'Test FAILed. |
Jenkins, retest this please. |
Also, even when we fall back on the single threaded copy code, it takes ~1sec per file to merge the files. E.g., if we have ~2000 partitions, then it takes about 2000 seconds to merge the whole file together. This is unfortunate, but is much better than the ~30hrs latency @beaunorgeot had been seeing when doing something like |
Test PASSed. |
Thanks for tackling this @fnothaft; I'm missing state on this, can you answer a few high-level questions:
|
OK, this succeeded on the cluster. Save time is slower than I'd like (~40 min to cat NA12878 together) but it worked. I'm running another test now that sorts then saves, then I'll pull the file out of HDFS and run samtools index locally. If that succeeds, then we are good to go. I'll update when that's done; I'm guessing it'll probably finish sometime late tonight. |
God, I still have no idea what's causing it. I agree—I thought our fix was high confidence, but the log files are pretty unambiguous that we're getting a partition that tries to write to disk but the header is not set.
Exactly. It turns out that this is how Hadoop-BAM's CLI uses the Hadoop-BAM output formats. This isn't obvious though... I got the idea for this just by reading the Hadoop-BAM CLI source code.
I only touched the headerless output formats, so they're still vulnerable to the transient header disappearance issue. I'll need to propagate the fix to all output formats later. We'll probably have to do the same thing for VCF as well. |
Anyways, this is good to merge now. |
@ryan-williams can you review? I'll take a look and run a few extra tests this morning. We would like to get this merged as soon as possible. |
|
||
// concat files together | ||
// we need to do this through reflection because the concat method is | ||
// NOT in hadoop 1.x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after this is merged, we should create an issue to revisit it once we drop Hadoop 1.x support
Unfortunately I don't have much time to get in to the weeds on this one, but my main thought is: if we don't know why the previous fix didn't fix the issue, even though we all still believe it should have from everything we know about the code, then I don't like complicating matters in this way. Ultimately, I'll defer to y'all since it sounds like the goal here is more to unblock specific user(s) than to have a fix that we understand, but obviously that doesn't seem like a great state of affairs longer-term, especially if this introduces a slow-down to boot. Can we deterministically repro the crash? I agree that something is still wrong, @danvk (on #721) and I (undocumented?) have both seen the bug since the last "fix", IIRC. |
The perf is only bad when saving BAM as a single file, and "bad" is relative to the FileSystem 'concat' method. This is a correctness fix (block ordering + header attach) relative to the old 'FileUtil.copyMerge' approach. I agree that we need to harmonize the single file save/many file save approaches, but that's for later. |
Resolves bigdatagenomics#916. Makes several modifications that should eliminate the header attach issue when writing back to SAM/BAM: * Writes the SAM/BAM header as a single file. * Instead of trying to attach the SAM/BAM header to the output format via a singleton object, we pass the path to the SAM/BAM header file via the Hadoop configuration. * The output format reads the header from HDFS when creating the record writer. * At the end, once we've written the full RDD and the header file, we merge all via Hadoop's FsUtil.
386f759
to
7d4b409
Compare
Test FAILed. Build result: FAILUREGitHub pull request #917 of commit 7d4b409 automatically merged.Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'[EnvInject] - Loading node environment variables.Building remotely on amp-jenkins-worker-05 (centos spark-test) in workspace /home/jenkins/workspace/ADAM-prb > git rev-parse --is-inside-work-tree # timeout=10Fetching changes from the remote Git repository > git config remote.origin.url https://github.com/bigdatagenomics/adam.git # timeout=10Fetching upstream changes from https://github.com/bigdatagenomics/adam.git > git --version # timeout=10 > git fetch --tags --progress https://github.com/bigdatagenomics/adam.git +refs/pull/:refs/remotes/origin/pr/ > git rev-parse origin/pr/917/merge^{commit} # timeout=10 > git branch -a --contains b12af513d5cb95c54fe156912eef38bc2a37e23c # timeout=10 > git rev-parse remotes/origin/pr/917/merge^{commit} # timeout=10Checking out Revision b12af513d5cb95c54fe156912eef38bc2a37e23c (origin/pr/917/merge) > git config core.sparsecheckout # timeout=10 > git checkout -f b12af513d5cb95c54fe156912eef38bc2a37e23c > git rev-list 0c8180369152d9447e08c7c89edf1068950c3a00 # timeout=10Triggering ADAM-prb ? 2.6.0,2.11,1.4.1,centosTriggering ADAM-prb ? 2.6.0,2.10,1.4.1,centosTouchstone configurations resulted in FAILURE, so aborting...Notifying endpoint 'HTTP:https://webhooks.gitter.im/e/ac8bb6e9f53357bc8aa8'Test FAILed. |
This is good to go from my side. @heuermh were you able to complete your testing? Just to follow up more on @ryan-williams' comments from earlier (sorry, I was on the run so I kept my comments brief), this fix only impacts the single file save code, and does not lead to a performance decrease over the current single file save code. This fix does three things:
The manual merge code path is much slower than the As for the million dollar question: why does the header detach issue keep on happening, despite our best efforts to fix it? Frankly, I'm not entirely sure why it's failing, but the root cause is something along the lines of "when we set the header in the singleton object that accompanies the output format, it doesn't get consistently set across all of the JVMs where an instance of the actual output format will be created and executed." The new approach uses the Hadoop job configuration to pass the path to the header file. This conf is passed to the OutputFormat by Spark itself, so we can guarantee that the configuration seen by the OutputFormat will be consistent across the cluster. Since the header is written to the underlying file system (which again, if your underlying file system is not accessible across the whole cluster, then you're running some horrible frankencluster and I can't help you), we can guarantee that reading the header file from the path passed in the config will succeed. I think a TL;DR for why this failed is that the obvious way of attaching the header (I call it obvious because the method is named |
Jenkins, retest this please. |
Test PASSed. |
[ADAM-916] New strategy for writing header.
Tested ok for me, merged from the ski lift. Thank you, @fnothaft! |
Resolves #916. Not fully done yet; seems to work for BAM but not SAM. The SAM file header doesn't write correctly... Still debugging that. Needs more testing.