-
Notifications
You must be signed in to change notification settings - Fork 513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2.7.8a 2-pass produces corrupt SAM #1220
Comments
Hi @schnabelr thanks for detailed report! The truncation may happen if you run out of disk space during a run, though it's strange that it worked fine with the older version. Cheers |
@alexdobin Well this is interesting. For the input for this sample, I have a total of 10 paired fastq files and 20 unpaired (after trimming with trimmomatic one of the reads was removed) for a total of 30 STAR output SAM files being produced. I counted the number of lines in each of the SAM files and also ran So it appears that for PE input data it is producing a truncated file and for the SE input data it is warning about a missing tag. As I said previously, all of this data works with 2.5.2b. I just ran 100 transcriptomes last night, including this example, and they all finished just fine with 2.5.2b. Would it be of any value to repeat this with a STAR version between these two? If so, just let me know which one. It really doesn't take much effort. Alternatively, I can put the reference.fa, GTF and pair of fastq files somewhere for you grab if that would help. Bob |
Hi Bob, thanks for investigating it. The NM tag absence is OK, since you did not specify it in --outSAMattrbiutes. You can suppress picard's checking of this tag, but it look like SE mapping is OK. You are not using any uncommon parameter combinations, so the problem is either dataset specific, or system specific. Can you do a couple of more checks with 2.7.8a?
Thanks! |
Alex,
That worked. So to try and figure out if the issue was one (or more) of the flags, I ran them sequentially, adding one at a time and checking the sam output until something broke. That's how I found out it was For this run with
I then go back to the output without any flags (that is complete) and pull out the matching reads. The second line below corresponds to the last line of the corrupt file. For the 3rd line below, am I reading that CIGAR correctly as a 149,879 base gap?
Hopefully this may give you an idea of where something is going wrong. |
Hi Bob: thanks for the thorough investigation! Cheers |
Hi Bob, the 2.7.9a release should have fixed this issue, please try it out. Cheers |
Alex, |
Hi Bob, great! Many thanks for helping to find the bug! Cheers |
STAR finishes normally. However,
samtools sort
complains about the file(s)And picard ValidateSamFile confirms issues.
This happens for every file that I've tried and is reproducible. I'll only use one fastq pair as an example of the error/issue.
If I look at the SAM file, the RNAME position contains data that doesn't correspond to a chr/contig name.
This is cow so the chromosomes are named 1..29,X,Y,MT and NKLS* for unplaced contigs. Note lines 1,3-5,7-8 and the last line.
I run this via a Perl pipeline and was able to exclude issues with the input FASTQ and also the reference and GTF.
Everything works/completes normally using the exact same reference fasta and GTF to build the genome using 2.5.2b.
I also excluded compilation issues by testing with the binaries distributed with the source and get same type of errors.
I also tested using the master (2.7.8a_2021-03-08) and saw the same behavior. Note, when I rebuild the genome for each of these different tests they are in their own directories.
This appears somewhat similar to report #1209. However, I'm only using 6 threads for the alignment so I don't think it's a thread number issue per se. Please let me know if there are any other tests that I could do or parameters to change that may help fix/diagnose this.
Attached are the various logs.
2.7.8a_Log.out
is the genomeGenerate log.2.5.2b_Log.out
is the genomeGenerate log.Example log files
HFD.93471.56804.R.AP.02.DUP.P.Log.out
2.7.8a_Log.out.txt
2.5.2b_Log.out.txt
HFD.93471.56804.R.AP.02.DUP.P.Log.out.txt
Steps to build the reference. The rest is in the logs above for the example.
Build 2.7.8a reference
Build 2.5.2b reference (This works fine)
The text was updated successfully, but these errors were encountered: