Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

samfile.fetch() does not retrieve reads beyond ~2^29 for large chromosomes #996

Closed
pinbo opened this issue Feb 25, 2021 · 7 comments
Closed

Comments

@pinbo
Copy link

pinbo commented Feb 25, 2021

Hi There,

I work on wheat (has very large chromosomes). I found pysam AlignmentFile fetch could not retrieve reads beyond ~2^29 for large chromosomes, but it works if I give it a chromosome name. Here is the code I tested with pysam 0.16.0.1.

samfile = pysam.AlignmentFile("chr5B.bam", "rb") # my bam file is csi indexed
outfile = pysam.AlignmentFile("test.bam", "wb", template=samfile)
for read in samfile.fetch(): # output bam is truncated for each chromosome at position around 2^29
    n = outfile.write(read)
outfile.close()
samfile.close()

# It produces correct bam files if I give a chromosome name
for read in samfile.fetch('chr5B'):
    n = outfile.write(read)
@IanSudbery
Copy link

Came here to say the same thing.

See CGATOxford/UMI-tools#458

@jmarshall
Copy link
Member

Thanks for the report. Left over after #732, there is one remaining instance of 1<<29 in the pysam code and it is the cause of this bug. I'll check in the following fix soon:

--- a/pysam/libcalignmentfile.pyx
+++ b/pysam/libcalignmentfile.pyx
@@ -2219,7 +2222,7 @@ cdef class IteratorRowAllRefs(IteratorRow):
         self.rowiter = IteratorRowRegion(self.samfile,
                                          self.tid,
                                          0,
-                                         1<<29)
+                                         MAX_POS)
         # set htsfile and header of the rowiter
         # to the values in this iterator to reflect multiple_iterators
         self.rowiter.htsfile = self.htsfile

@pinbo
Copy link
Author

pinbo commented Feb 25, 2021

Hi @jmarshall,
Thank you so much for your quick response. I rebuild the pysam with this fix and it works perfectly now.
Another 2 files also have 1<<29 and you may have a look too.

#grep -r "1<<29" *
bcftools/csq.c:        assert( aux->nseq < 1<<29 );  // see gf_gene_t.iseq and ftr_t.iseq
bcftools/csq.c.pysam.c:        assert( aux->nseq < 1<<29 );  // see gf_gene_t.iseq and ftr_t.iseq
pysam/libcalignmentfile.c: *                                          1<<29)
pysam/libcalignmentfile.pyx:                                         1<<29)

@jmarshall
Copy link
Member

Thanks for confirming.

The instance in csq.c relates to something else, so is a bit of a coincidence and doesn't limit any chromosome length.

@pinbo
Copy link
Author

pinbo commented Feb 28, 2021

Thank you @jmarshall for the quick fix.

Can I reinstall pysam with pip install to get the fixed version or do I still need to compile it manually for now?
Thanks.

@jmarshall
Copy link
Member

I assume pip will install only releases, not a commit from a Git repository branch. So you will need to compile from Git yourself until the next pysam release.

@pinbo
Copy link
Author

pinbo commented Feb 28, 2021

I see. Thank you @jmarshall!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants