Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with CRAM parsing #280

Closed
ernfrid opened this issue Mar 2, 2017 · 9 comments
Closed

Problem with CRAM parsing #280

ernfrid opened this issue Mar 2, 2017 · 9 comments
Assignees
Labels
Milestone

Comments

@ernfrid
Copy link

ernfrid commented Mar 2, 2017

We first noticed this on version 0.6.4, but it appears to exist in 0.6.5 as well.

Running a command like: ./sambamba_v0.6.5 view -C -f bam file.cram -l 0

Eventually results in an error like:

*** glibc detected *** ./sambamba_v0.6.5: corrupted double-linked list: 0x00000000013f1430 ***

If instead we output to SAM, I get a slightly different error:

sambamba-view: Failure in cram_decode_slice

samtools is able to parse these CRAMs without issue so I don't believe that the issue is with the files themselves.

@pjotrp pjotrp self-assigned this Mar 4, 2017
@pjotrp pjotrp added the bug label Mar 4, 2017
@pjotrp pjotrp added this to the v1.0.0 milestone Mar 4, 2017
@pjotrp
Copy link
Member

pjotrp commented Mar 4, 2017

Thanks @ernfrid. Can you run the recent sambamba with debug information using https://github.com/lomereiter/sambamba#troubleshooting and see if that fails? And do you have an example CRAM file we can use to replicate the issue?

@lomereiter
Copy link
Contributor

We should absolutely upgrade htslib. Good news is that we apparently can use (almost) vanilla htslib now; the core issue was having bitfields in bam1_core_t, whose layout is unspecified, and it is finally fixed by samtools/htslib@6d927df
Other changes introduced in my fork were also either merged into htslib or fixed since then. The only patch sambamba seems to need now is cram_to_bam function not having static linkage.

@ernfrid
Copy link
Author

ernfrid commented Mar 30, 2017

Sorry for the long delay. My CRAM file was human data and not open access, but I just downloaded the following CRAM from 1000 Genomes and observed the same error.

ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/data/NA12878/alignment/NA12878.alt_bwamem_GRCh38DH.20150718.CEU.low_coverage.cram

I failed at my first attempt in getting a sambamba with debug information turned on so I don't have additional information at this time.

@lomereiter
Copy link
Contributor

Thanks @ernfrid, I can reproduce the issue, will look into it on the weekend.

@hzpc-joostk
Copy link

hzpc-joostk commented Oct 13, 2017

Hi everyone, any news here? For me sambamba v0.6.6 raises the same error with a CRAM v2.1 file produced by sambamba.

BAM to CRAM: sambamba view --format=cram --ref-filename=${reference} -o ${input}.cram ${input}

CRAM to SAM: sambamba view --cram-input ${input}.cram | head

@hzpc-joostk
Copy link

Btw, sambamba dumped this on stderr in my terminal:

*** glibc detected *** bin/sambamba: malloc(): smallbin double linked list corrupted: 0x0000000002568220 ***
======= Backtrace: =========
/lib64/libc.so.6[0x388c075dee]
/lib64/libc.so.6[0x388c07a4b8]
/lib64/libc.so.6(__libc_malloc+0x5c)[0x388c07aaac]
bin/sambamba[0x54bc49]
bin/sambamba[0x5499e2]
bin/sambamba[0x513fff]
bin/sambamba[0x4f1dba]
bin/sambamba[0x4f21c8]
bin/sambamba[0x4f2516]
bin/sambamba[0x41714f]
bin/sambamba[0x416fb6]
bin/sambamba[0x4c7ad7]
bin/sambamba[0x4c78da]
bin/sambamba[0x4ef5f1]
bin/sambamba[0x4edc1d]
bin/sambamba[0x4d5b4a]
bin/sambamba[0x40552d]
bin/sambamba[0x58dd0f]
bin/sambamba[0x58dcd4]
bin/sambamba[0x58dc04]
/lib64/libc.so.6(__libc_start_main+0xfd)[0x388c01ed1d]
bin/sambamba[0x404869]
======= Memory map: ========
00400000-0064f000 r-xp 00000000 fd:04 358001                             /usr/share/cluster-apps/tools/bin/sambamba
0084e000-00879000 rw-p 0024e000 fd:04 358001                             /usr/share/cluster-apps/tools/bin/sambamba
02532000-03818000 rw-p 00000000 00:00 0                                  [heap]
388bc00000-388bc20000 r-xp 00000000 fd:00 141                            /lib64/ld-2.12.so
388be20000-388be21000 r--p 00020000 fd:00 141                            /lib64/ld-2.12.so
388be21000-388be22000 rw-p 00021000 fd:00 141                            /lib64/ld-2.12.so
388be22000-388be23000 rw-p 00000000 00:00 0 
388c000000-388c18a000 r-xp 00000000 fd:00 678                            /lib64/libc-2.12.so
388c18a000-388c38a000 ---p 0018a000 fd:00 678                            /lib64/libc-2.12.so
388c38a000-388c38e000 r--p 0018a000 fd:00 678                            /lib64/libc-2.12.so
388c38e000-388c390000 rw-p 0018e000 fd:00 678                            /lib64/libc-2.12.so
388c390000-388c394000 rw-p 00000000 00:00 0 
388c400000-388c483000 r-xp 00000000 fd:00 1289                           /lib64/libm-2.12.so
388c483000-388c682000 ---p 00083000 fd:00 1289                           /lib64/libm-2.12.so
388c682000-388c683000 r--p 00082000 fd:00 1289                           /lib64/libm-2.12.so
388c683000-388c684000 rw-p 00083000 fd:00 1289                           /lib64/libm-2.12.so
388c800000-388c817000 r-xp 00000000 fd:00 1731                           /lib64/libpthread-2.12.so
388c817000-388ca17000 ---p 00017000 fd:00 1731                           /lib64/libpthread-2.12.so
388ca17000-388ca18000 r--p 00017000 fd:00 1731                           /lib64/libpthread-2.12.so
388ca18000-388ca19000 rw-p 00018000 fd:00 1731                           /lib64/libpthread-2.12.so
388ca19000-388ca1d000 rw-p 00000000 00:00 0 
388d000000-388d007000 r-xp 00000000 fd:00 4056                           /lib64/librt-2.12.so
388d007000-388d206000 ---p 00007000 fd:00 4056                           /lib64/librt-2.12.so
388d206000-388d207000 r--p 00006000 fd:00 4056                           /lib64/librt-2.12.so
388d207000-388d208000 rw-p 00007000 fd:00 4056                           /lib64/librt-2.12.so
388e000000-388e016000 r-xp 00000000 fd:00 13347                          /lib64/libresolv-2.12.so
388e016000-388e216000 ---p 00016000 fd:00 13347                          /lib64/libresolv-2.12.so
388e216000-388e217000 r--p 00016000 fd:00 13347                          /lib64/libresolv-2.12.so
388e217000-388e218000 rw-p 00017000 fd:00 13347                          /lib64/libresolv-2.12.so
388e218000-388e21a000 rw-p 00000000 00:00 0 
3891000000-3891016000 r-xp 00000000 fd:00 13368                          /lib64/libgcc_s-4.4.7-20120601.so.1
3891016000-3891215000 ---p 00016000 fd:00 13368                          /lib64/libgcc_s-4.4.7-20120601.so.1
3891215000-3891216000 rw-p 00015000 fd:00 13368                          /lib64/libgcc_s-4.4.7-20120601.so.1
7fb4ec000000-7fb4ec021000 rw-p 00000000 00:00 0 
...
7fb53eaad000-7fb53ec25000 rw-p 00000000 00:00 0 
7fb53ec83000-7fb53ec88000 r-xp 00000000 fd:00 8301                       /lib64/libnss_dns-2.12.so
7fb53ec88000-7fb53ee87000 ---p 00005000 fd:00 8301                       /lib64/libnss_dns-2.12.so
7fb53ee87000-7fb53ee88000 r--p 00004000 fd:00 8301                       /lib64/libnss_dns-2.12.so
7fb53ee88000-7fb53ee89000 rw-p 00005000 fd:00 8301                       /lib64/libnss_dns-2.12.so
7fb53ee89000-7fb53ee96000 r-xp 00000000 fd:00 8316                       /lib64/libnss_files-2.12.so
7fb53ee96000-7fb53f095000 ---p 0000d000 fd:00 8316                       /lib64/libnss_files-2.12.so
7fb53f095000-7fb53f096000 r--p 0000c000 fd:00 8316                       /lib64/libnss_files-2.12.so
7fb53f096000-7fb53f097000 rw-p 0000d000 fd:00 8316                       /lib64/libnss_files-2.12.so
7fb53f097000-7fb53f1a9000 rw-p 00000000 00:00 0 
...
7ffc4e20c000-7ffc4e221000 rw-p 00000000 00:00 0                          [stack]
7ffc4e363000-7ffc4e364000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

Of any help?

Input file is "CRAM version 2.1 compressed sequence data" according to htsfile (htslib) 1.3, again this CRAM file was produced by sambamba.

The same error is raised when reading CRAM 3.0 format produced by samtools view.

@hzpc-joostk
Copy link

Sambamba v6.6.7 view command doesn't raise an error when reading CRAM files created by HTSlib v1.6 (samtools view), but outputs verbose debug messages on both stdout and stderr. See also #328.
Moreover, many other commands still raise an error:
Error reading BGZF block starting from offset 0: wrong BGZF magic

$ module load htslib/1.6 sambamba/0.6.7

$ htsfile --version
htsfile (htslib) 1.6
Copyright (C) 2017 Genome Research Ltd.

$ sambamba --version
sambamba 0.6.7

This version was built with:
    LDC 1.1.1
    using DMD v2.071.2
    using LLVM 3.8.1
    bootstrapped with LDC - the LLVM D compiler (0.17.4)

$ file sample.cram
sample.cram: data

$ htsfile sample.cram
sample.cram:	CRAM version 3.0 compressed sequence data

$ samtools view -h sample.cram | tee >(file - >&2) | htsfile -
-:	SAM version 1.3 sequence text
/dev/stdin: ASCII text, with very long lines

$ samtools view -u sample.cram | tee >(file - >&2) | htsfile -
-:	BAM version 1 compressed sequence data
/dev/stdin: gzip compressed data, extra field

$ sambamba view sample.cram | file -
sambamba-view: Error reading BGZF block starting from offset 0: wrong BGZF magic
/dev/stdin: no read permission

$ sambamba view -C sample.cram | file -
Init cram_fd* #1
Init cram_fd* #2
Init _Anonymous_25* #1
cram_read_slice (1/1)
Init cram_slice* #1
Init _Anonymous_25* #2
...
cram_read_slice (1/1)
Init cram_slice* #35
Init _Anonymous_5* #2
/dev/stdin: ASCII text, with very long lines

$ sambamba view -C sample.cram 2>/dev/null | file -
/dev/stdin: ASCII text, with very long lines

$ sambamba markdup sample.cram /dev/null
sambamba-markdup: Error reading BGZF block starting from offset 0: wrong BGZF magic

$ sambamba depth base sample.cram
REF	POS	COV	A	C	G	T	DEL	REFSKIP	SAMPLE
sambamba-depth: Error reading BGZF block starting from offset 0: wrong BGZF magic

$ samtools view -u sample.cram | sambamba depth base /dev/stdin
REF	POS	COV	A	C	G	T	DEL	REFSKIP	SAMPLE
sambamba-depth: All files must be indexed

All in all pretty useless... :-( Too bad, because I really like sambamba's enhanced features! Is there anyway I can help resolving this?

@pjotrp
Copy link
Member

pjotrp commented Feb 1, 2018

We need to look into this, that is why the issue is open ;). The first step would be to upgrade htslib and the build tools to latest. I'll ping you here when I get round to it. I hope early spring.

@pjotrp
Copy link
Member

pjotrp commented Nov 28, 2019

CRAM support will be dropped.

@pjotrp pjotrp closed this as completed Nov 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants