Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pbmm2 M5 field in BAM header missing #58

Closed
ctxchris opened this issue Dec 6, 2018 · 1 comment
Closed

pbmm2 M5 field in BAM header missing #58

ctxchris opened this issue Dec 6, 2018 · 1 comment

Comments

@ctxchris
Copy link

ctxchris commented Dec 6, 2018

Operating system
Ubuntu 18.04.1 LTS

Package name
pbmm2 0.11.0

Conda environment

blas                      1.1                    openblas    conda-forge
bzip2                     1.0.6                h470a237_2    conda-forge
ca-certificates           2018.11.29           ha4d7672_0    conda-forge
certifi                   2018.11.29            py27_1000    conda-forge
curl                      7.62.0               h74213dd_0    conda-forge
decorator                 4.3.0                      py_0    conda-forge
future                    0.17.0                py27_1000    conda-forge
htslib                    1.9                  hc238db4_4    bioconda
krb5                      1.16.2               hbb41f41_0    conda-forge
libcurl                   7.62.0               hbdb9355_0    conda-forge
libdeflate                1.0                  h470a237_0    bioconda
libedit                   3.1.20170329         haf1bffa_1    conda-forge
libffi                    3.2.1                hfc679d8_5    conda-forge
libgcc                    7.2.0                h69d50b8_2    conda-forge
libgcc-ng                 7.2.0                hdf63c60_3    conda-forge
libgfortran               3.0.0                         1    conda-forge
libssh2                   1.8.0                h5b517e9_3    conda-forge
libstdcxx-ng              7.2.0                hdf63c60_3    conda-forge
ncurses                   6.1                  hfc679d8_1    conda-forge
networkx                  2.2                        py_1    conda-forge
numpy                     1.15.1          py27_blas_openblashd3ea46f_0    conda-forge
openblas                  0.2.20                        8    conda-forge
openssl                   1.0.2p               h470a237_1    conda-forge
pb-falcon                 0.2.1            py27hf50d5a6_0    bioconda
pbbam                     0.19.0               h6678c95_1    bioconda
pbcopper                  0.4.2                h02d93b8_0    bioconda
pbmm2                     0.11.0               ha87ae23_0    bioconda
pip                       18.1                  py27_1000    conda-forge
python                    2.7.15               h33da82c_6    conda-forge
python-edlib              1.2.3.post1      py27h470a237_0    bioconda
python-intervaltree       2.1.0                      py_0    bioconda
python-msgpack            0.5.6            py27h470a237_0    bioconda
python-sortedcontainers   2.1.0                      py_0    bioconda
readline                  7.0                  haf1bffa_1    conda-forge
setuptools                40.6.2                   py27_0    conda-forge
sqlite                    3.26.0               hb1c47c0_0    conda-forge
tk                        8.6.9                ha92aebf_0    conda-forge
wheel                     0.32.3                   py27_0    conda-forge
xz                        5.2.4                h470a237_1    conda-forge
zlib                      1.2.11               h470a237_3    conda-forge

Describe the bug

I aligned *subreads.bam files with pbmm2 to a reference and tried to call variants with arrow:

pbmm2 index ref.fasta ref.mmi
pbmm2 align --preset SUBREAD ref.mmi reds.subreads.bam | samtools sort -o ref.aligned.bam
arrow ref.align.bam -r ref.fasta -o ref.arrow.fastq

The alignment file ref.aligned.bam is missing the M5 tag:

samtools view -H ref.aligned.bam

@HD	VN:1.5	SO:coordinate	pb:3.0.3
@SQ	SN:Chr1	LN:115533412
@SQ	SN:Chr2	LN:105765756
@SQ	SN:Chr3	LN:101328534

Error message
From the arrow command

[INFO] Using pbcommand v1.1.1
[INFO] completed setting up logger with <functools.partial object at 0x7f85ed444f18>
[INFO] log opts {'file_name': 'arrow.log', 'level': 10}
[INFO] ConsensusCore version: 1.0.2
[INFO] ConsensusCore2 version: 3.1.0
[INFO] Starting.
[DEBUG] Updating metatypes...
[DEBUG] Done updating metatypes
[DEBUG] Checking that the files exist...
[DEBUG] Done checking that the files exist
[DEBUG] Making elementtree...
[DEBUG] Done making ElementTree...
[DEBUG] Converting ElementTree to string...
[DEBUG] Done converting ElementTree to string
[DEBUG] Opening ReadSet resources
[ERROR] 'M5'
Traceback (most recent call last):
File "/home1/chris/anaconda2/envs/denovo_asm/lib/python2.7/site-packages/pbcommand/cli/core.py", line 138, in _pacbio_main_runner
return_code = exe_main_func(*args, **kwargs)
File "/home1/chris/anaconda2/envs/denovo_asm/lib/python2.7/site-packages/GenomicConsensus/main.py", line 340, in args_runner
return tr.main()
File "/home1/chris/anaconda2/envs/denovo_asm/lib/python2.7/site-packages/GenomicConsensus/main.py", line 258, in main
with AlignmentSet(options.inputFilename) as peekFile:
File "/home1/chris/anaconda2/envs/denovo_asm/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 2699, in init
super(AlignmentSet, self).init(*files, **kwargs)
File "/home1/chris/anaconda2/envs/denovo_asm/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 1963, in init
super(ReadSet, self).init(*files, **kwargs)
File "/home1/chris/anaconda2/envs/denovo_asm/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 450, in init
self.updateCounts()
File "/home1/chris/anaconda2/envs/denovo_asm/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 2517, in updateCounts
self.assertIndexed()
File "/home1/chris/anaconda2/envs/denovo_asm/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 2347, in assertIndexed
self._assertIndexed((IndexedBamReader, CmpH5Reader))
File "/home1/chris/anaconda2/envs/denovo_asm/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 1920, in _assertIndexed
self._openFiles()
File "/home1/chris/anaconda2/envs/denovo_asm/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 2044, in _openFiles
resource = IndexedBamReader(location)
File "/home1/chris/anaconda2/envs/denovo_asm/lib/python2.7/site-packages/pbcore/io/align/BamIO.py", line 373, in init
super(IndexedBamReader, self).init(fname, referenceFastaFname)
File "/home1/chris/anaconda2/envs/denovo_asm/lib/python2.7/site-packages/pbcore/io/align/BamIO.py", line 181, in init
self._loadReferenceInfo()
File "/home1/chris/anaconda2/envs/denovo_asm/lib/python2.7/site-packages/pbcore/io/align/BamIO.py", line 49, in _loadReferenceInfo
refMD5s = [r["M5"] for r in refRecords]
KeyError: 'M5'

Expected behavior

BAM output compatible with arrow

@pb-dseifert
Copy link
Contributor

Your version of pbcore is outdated. Update your whole set or packages using conda update --all and don't forget to update your channel priorities before: https://bioconda.github.io/#set-up-channels

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants