Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BAM error #36

Closed
tjakobi opened this issue Apr 10, 2017 · 9 comments
Closed

BAM error #36

tjakobi opened this issue Apr 10, 2017 · 9 comments
Assignees
Labels

Comments

@tjakobi
Copy link
Contributor

tjakobi commented Apr 10, 2017

Problem occurs when -G and -B is specified:

Traceback (most recent call last):
File "/home/sstrohbuecker/.local/bin/DCC", line 9, in
load_entry_point('DCC==0.4.4', 'console_scripts', 'DCC')()
File "build/bdist.linux-x86_64/egg/DCC/main.py", line 408, in main
File "build/bdist.linux-x86_64/egg/DCC/main.py", line 675, in checkBAMsorting
File "pysam/libcalignmentfile.pyx", line 351, in pysam.libcalignmentfile.AlignmentFile.cinit (pysam/libcalignmentfile.c:5200)
File "pysam/libcalignmentfile.pyx", line 584, in pysam.libcalignmentfile.AlignmentFile._open (pysam/libcalignmentfile.c:7797)
ValueError: file has no sequences defined (mode='rb') - is it SAM/BAM format? Consider opening with check_sq=False

@tjakobi tjakobi self-assigned this Apr 10, 2017
@tjakobi tjakobi added the bug label Apr 10, 2017
@tjakobi tjakobi added this to the DCC version 0.4.5 milestone Apr 10, 2017
@MaxHills
Copy link

MaxHills commented Jun 19, 2019

I am using DCC version 0.4.7 and I have this exact issue. I have single-end stranded data.
My command line and error output:
python DCC/DCC/main.py dcc_fileLists/ADAR_samplesheet -B dcc_fileLists/ADAR_BAM_fileList -an mm10/mm10.all.gtf -T 14 -M -Nr 2 1 -G -A mm10/mm10.ucsc.fa -R mm10/mm10.allRepeats.gtf
DCC 0.4.7 started
32 CPU cores available, using 14
Traceback (most recent call last):
File "DCC/DCC/main.py", line 826, in
main()
File "DCC/DCC/main.py", line 427, in main
unsortedBAMS = checkBAMsorting(bamfiles)
File "DCC/DCC/main.py", line 706, in checkBAMsorting
bamfile = pysam.AlignmentFile(file, "rb")
File "pysam/libcalignmentfile.pyx", line 734, in pysam.libcalignmentfile.AlignmentFile.cinit
File "pysam/libcalignmentfile.pyx", line 983, in pysam.libcalignmentfile.AlignmentFile._open
ValueError: file has no sequences defined (mode='rb') - is it SAM/BAM format? Consider opening with check_sq=False

My BAM file list:
/l/Yu/YuLab/Bioinformatics/projects/mhh_CIRCexplorer2/align/ADAR/ACTTGA/ACTTGA.Aligned.out.bam
/l/Yu/YuLab/Bioinformatics/projects/mhh_CIRCexplorer2/align/ADAR/CAGATC/CAGATC.Aligned.out.bam
/l/Yu/YuLab/Bioinformatics/projects/mhh_CIRCexplorer2/align/ADAR/CCGTCC/CCGTCC.Aligned.out.bam
/l/Yu/YuLab/Bioinformatics/projects/mhh_CIRCexplorer2/align/ADAR/GCCAAT/GCCAAT.Aligned.out.bam

My sample sheet:
/l/Yu/YuLab/Bioinformatics/projects/mhh_CIRCexplorer2/align/ADAR/ACTTGA/ACTTGA.Chimeric.out.junction
/l/Yu/YuLab/Bioinformatics/projects/mhh_CIRCexplorer2/align/ADAR/CAGATC/CAGATC.Chimeric.out.junction
/l/Yu/YuLab/Bioinformatics/projects/mhh_CIRCexplorer2/align/ADAR/CCGTCC/CCGTCC.Chimeric.out.junction
/l/Yu/YuLab/Bioinformatics/projects/mhh_CIRCexplorer2/align/ADAR/GCCAAT/GCCAAT.Chimeric.out.junction

The head of my first BAM file in my BAM list (using samtools view <file.bam> | head):
D00575:258:H35GGBCXY:1:1105:19038:86055 16 chr8 129234652 255 101M * 0 0 GATCCTGCACTCACCATGACCTCCTTCGTAGCTTGCTTGAACTTTCTTCACAGCACTTCCCCTTCTTGAAGGTATCTGATAGCCTGTTACTGAACTTGGAG HIIHIIIIIIIIIIIHHHHIIHIIIIIIIIIIIIIIIIIIIIIIIIIIHHIGIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHIIDDDDD NH:i:1 HI:i:1 AS:i:99 nM:i:0
D00575:258:H35GGBCXY:1:1105:19163:86071 16 chr8 83932886 255 77M463N24M * 0 0 GGTGTTCCCCCAAGAGTATCCCAGTGAGAACTCCATTCAGCTCTCCGCCAACACCATCAAGCAGAACAGCCGCAACGGTGTGGTGAAAGTTGTCTTCATTC IIIHGIIIHIIIIIHIIIIHGHHHIIIHHGIHIIIIIIIHGIIHIIIIHHHHEIIIIGIIIIGIHHHIIIIIHFIHIIIIIHIIIIIIIIIIIIHFDDDDD NH:i:1 HI:i:1 AS:i:101 nM:i:0
D00575:258:H35GGBCXY:1:1105:19027:86147 16 chr17 39845997 255 99M2S * 0 0 AACCCACCACCCTGTGCTCCGCGCCCGGTGCGGTCGACGTTCCGGCTCTCCCGATGCCGAGGGGTTCGGGATTTGTGCCGGGGACGGAGGGGAGAGCGGGT GFIHGDHHFIIIIIIIIIIIIIIIHIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIDDDDD NH:i:1 HI:i:1 AS:i:95 nM:i:1
D00575:258:H35GGBCXY:1:1105:19141:86150 0 chr1 155278727 255 101M * 0 0 CCGCCACTCAGCTCACTACCAGAGAAAGAAGCTGACAATTCACAGGGCTCTGGATACACAGTACCACTGATTTTATTTGTACAAGAAATGACTGGTCACTG DDDDDIIIIIIIIIIIIIIIIIIHIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHHIIIIH NH:i:1 HI:i:1 AS:i:99 nM:i:0

I see that this issue is resolved, but I have been unable to overcome this error and am unsure of how to proceed. I glanced at the fix for this and do not think it applies in my case. Any insight or help would be greatly appreciated.

@tjakobi tjakobi reopened this Jun 23, 2019
@tjakobi
Copy link
Contributor Author

tjakobi commented Jun 24, 2019

The issue may be related to the fact that you are employing single stranded data, as the old fix only may have worked for the paired-end mode I normally use. I will look into it.

@tjakobi
Copy link
Contributor Author

tjakobi commented Jun 30, 2019

Hi @MaxHills,

are the .bai indices available for all of the BAM files? I am wondering that a ValueError is thrown as that exception should be handled correctly. Do you have the log file of DCC? Do you see something like "BAM file XX has no index (XX.bai is missing)" ? If not, what does file XX.bam show?

Cheers,
Tobias

@MaxHills
Copy link

MaxHills commented Jul 1, 2019

Hi @tjakobi,
Yes, the .bai files are available for all BAM files and in the same directory as the BAM files and the chimeric.out.junction files. The DCC log file contains only 2 lines; the first saying that DCC 0.4.7 started and the second providing a document of the commands given on the command line. The only error messages available are those as seen in my original post, above.

If I use samtools view XX.bam | head the file appears as a normal sorted file in SAM format, as seen below:

samtools view Aligned.sortedByCoord.out.bam | head
D00575:258:H35GGBCXY:2:2215:9582:96683 256 chr1 3001362 1 70M31S * 0 0 CTGTCTTTTTCCCTGAGGTGGGTTTCCTGTAAGCAACAAAATGTTGGGTCCTGTTTGTGTAGCCAGTCTAGATCGGAAGAGCACACGTCTGAACTCCAGTC DDDDDHIIIGHHIIIIIIGHHIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIHHIIIIHIIIIIIIIIIIIIGIIIIHIIIIIIIII NH:i:4 HI:i:2 AS:i:68 nM:i:0
D00575:258:H35GGBCXY:2:1114:12175:20089 0 chr1 3006185 255 101M * 0 0 CACGCCTGCTCAAAATGCAGAGTTGTGAAGCCCAGTTACAACTGATATACCTATAACACAAATTCTACACCTAAACCTTGAGGACTATTGTGGAAGAAGGG DDDCDHHIIHIIIIGIHHIIGGHHIFFHHHHHHHHIIIIIIIGIIIIIIHIIHIIIIIIGIIHHHHEHGHEHIIIIGIEHGHHHGIIIGIIHI?FHFHHHH NH:i:1 HI:i:1 AS:i:99 nM:i:0
D00575:259:H52C5BCXY:1:1213:20285:64486 0 chr1 3006185 255 101M * 0 0 CACGCCTGCTCAAAATGCAGAGTTGTGAAGCCCAGTTACAACTGATATACCTATAACACAAATTCTACACCTAAACCTTGAGGACTATTGTGGAAGAAGGG DDDA@HHHHHHHHIIIHGIHHHGHHIHHGHFHHIHHIHIIIIGIGEHH@HGHIIIIIIIHIIIIIIIIGHIIIIEHHHHHIIEEH@GHHIHIIFHHHGHII NH:i:1 HI:i:1 AS:i:99 nM:i:0
D00575:259:H52C5BCXY:1:2109:5490:18367 0 chr1 3006185 255 101M * 0 0 CACGCCTGCTCAAAATGCAGAGTTGTGAAGCCCAGTTACAACTGATATACCTATAACACAAATTCTACACCTAAACCTTGAGGACTATTGTGGAAGAAGGG DDDDDIIHIICGHHIIIIIIIIIHIIIIIIIIHIIIHIIIIIIHIIIIIIIIIIIGHIIIIIIIIIIIIIIIIIIIIIHIHCHHHIIIHHIIIIGIIIIII NH:i:1 HI:i:1 AS:i:99 nM:i:0

I wish I could provide you with a key to understanding my issue, but I am also perplexed.

Best regards,
Max

@tjakobi
Copy link
Contributor Author

tjakobi commented Jul 2, 2019

Dear @MaxHills,

would it be possible to upload one of the bam files (the first few 100 lines + head are probably enough) for further debugging? I have a suspicion but would need to run more tests.

Cheers,
Tobias

@MaxHills
Copy link

MaxHills commented Jul 2, 2019

Dear @tjakobi,
I have uploaded a sam file (with '.txt' extension, so GitHub will accept it) with the header lines and a few hundred reads.
dcc.test.txt

@tjakobi
Copy link
Contributor Author

tjakobi commented Jul 2, 2019

Hi @MaxHills,

lets try something: instead of

-B dcc_fileLists/ADAR_BAM_fileList

use

-B @dcc_fileLists/ADAR_BAM_fileList.

in the DCC call.

Cheers,
Tobias

@MaxHills
Copy link

MaxHills commented Jul 2, 2019

Okay, I am no longer receiving the BAM error. Thank you.

@tjakobi tjakobi closed this as completed in 85817e4 Jul 4, 2019
@tjakobi
Copy link
Contributor Author

tjakobi commented Jul 4, 2019

Added CLI check to make sure BAM file list is either binary or ASCII multi line with @.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants