Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Downstream tools [GATK] affected by disq #110

Closed
SHuang-Broad opened this issue Jul 3, 2019 · 6 comments · Fixed by #111
Closed

Downstream tools [GATK] affected by disq #110

SHuang-Broad opened this issue Jul 3, 2019 · 6 comments · Fixed by #111
Milestone

Comments

@SHuang-Broad
Copy link

Hello Tom,

I am reporting a behavior here that I observed while running GATK 4 PrintReadsSpark in a vanilla manner.
I do not know how to summarize the behavior so I am simply copying the error message here

A USER ERROR has occurred: Couldn't write file hdfs://shuang-disq-test-m:8020/data/m64020_190208_213731.subreads.ccs.aligned.merged.chr21.bam because writing failed with exception concat: source file /data/m64020_190208_213731.subreads.ccs.aligned.merged.chr21.bam.parts/part-r-00000 is invalid or empty or underConstruction
	at org.apache.hadoop.hdfs.server.namenode.FSDirConcatOp.verifySrcFiles(FSDirConcatOp.java:160)
	at org.apache.hadoop.hdfs.server.namenode.FSDirConcatOp.concat(FSDirConcatOp.java:68)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.concat(FSNamesystem.java:1935)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.concat(NameNodeRpcServer.java:1001)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.concat(ClientNamenodeProtocolServerSideTranslatorPB.java:583)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606)

NOTE: the bam is a long read bam (PacBio sequence) produced by minimap2.

If you need more, please let me know.

Thanks!


In the attached zip, there's the setup.dataproc.sh for creating the dataproc cluster, printReadsSpark.sh for extracting reads mapped to chr21, and the full stacktrace PrintReadsSpark_Error.log.

PrintReadsSpark_error.zip

@lbergelson
Copy link
Contributor

@tomwhite I was meaning to ask you about this when we spoke just now, but I forgot... Any thoughts about what's going on here?

@tomwhite
Copy link
Member

tomwhite commented Jul 8, 2019

@SHuang-Broad thanks for the error report. The error log says that the concat operation is failing. Do you know if you get the same error if you remove the interval filter (-L chr21)? I wonder if it's creating empty part files that due to the filtering.

@SHuang-Broad
Copy link
Author

@tomwhite

I've done several experiments here

  • with the same "only chr21" request, but disabling the tool's default filters (leading to the same error as before)
  • request reads mapped to chrX (and later chr1 in a separate run), also disabling the tool's default filters (ditto)
  • copy the whole BAM with the tool's default read filters in place (passing, good)
  • requesting "chr21" reads, on the just-copied bam, which unlike the original bam, now has an accompanying "sbi" index file (again, fail with the same error)

Screen Shot 2019-07-08 at 1 31 04 PM

So it seems like requesting -L is causing the trouble?


experiments_on_July_8th.zip

@tomwhite
Copy link
Member

tomwhite commented Jul 9, 2019

Thanks @SHuang-Broad! I will try to reproduce and write a fix.

@heuermh
Copy link
Contributor

heuermh commented Jul 10, 2019

@SHuang-Broad Thank you for reporting this issue! If the fix merged in #111 doesn't work for you, please reopen this issue or create a new one.

@heuermh heuermh added this to the 0.4.0 milestone Jul 10, 2019
@SHuang-Broad
Copy link
Author

@tomwhite @heuermh
Louis and I just tested the patch and it is working as expected.
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants