Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with FASTA reference for arrow #5

Closed
rorycraig337 opened this issue Sep 7, 2018 · 2 comments
Closed

Issue with FASTA reference for arrow #5

rorycraig337 opened this issue Sep 7, 2018 · 2 comments

Comments

@rorycraig337
Copy link

rorycraig337 commented Sep 7, 2018

Hi,

I'm running iterations of Arrow and Pilon, and I'm running into some sort of problem with my FASTA or .fai reference from the output of Pilon (which I have modified to restore some large INDELs corrected by Pilon). I've done simple checks to validate the FASTA file and all looks ok. Any help would be much appreciated!

Command is:

arrow -j16 c_incerta_canu_v2.pbalign_v2.bam -r ../pilon_1/c_incerta.canu_v2.arrow_pilon_v1.fa -o c_incerta.canu_v2.arrow_v2.fa -o c_incerta.canu_v2.arrow_v2.fq -o c_incerta.canu_v2.arrow_v2.variants.gff

Error message is:

  File "/home/craigror/anaconda2/lib/python2.7/site-packages/pbcommand/cli/core.py", line 138, in _pacbio_main_runner
    return_code = exe_main_func(*args, **kwargs)
  File "/home/craigror/anaconda2/lib/python2.7/site-packages/GenomicConsensus/main.py", line 340, in args_runner
    return tr.main()
  File "/home/craigror/anaconda2/lib/python2.7/site-packages/GenomicConsensus/main.py", line 267, in main
    self._loadReference(peekFile)
  File "/home/craigror/anaconda2/lib/python2.7/site-packages/GenomicConsensus/main.py", line 125, in _loadReference
    reference.loadFromFile(options.referenceFilename, alnFile)
  File "/home/craigror/anaconda2/lib/python2.7/site-packages/GenomicConsensus/reference.py", line 97, in loadFromFile
    f = ReferenceSet(filename_)
  File "/home/craigror/anaconda2/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 4336, in __init__
    super(ReferenceSet, self).__init__(*files, **kwargs)
  File "/home/craigror/anaconda2/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 3887, in __init__
    super(ContigSet, self).__init__(*files, **kwargs)
  File "/home/craigror/anaconda2/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 450, in __init__
    self.updateCounts()
  File "/home/craigror/anaconda2/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 4092, in updateCounts
    if not self.isIndexed:
  File "/home/craigror/anaconda2/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 4241, in isIndexed
    lambda x: isinstance(x, IndexedFastaReader))
  File "/home/craigror/anaconda2/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 1832, in _pollResources
    return [func(resource) for resource in self.resourceReaders()]
  File "/home/craigror/anaconda2/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 4200, in resourceReaders
    self._openFiles()
  File "/home/craigror/anaconda2/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 4155, in _openFiles
    resource = self._openFile(urlparse(extRes.resourceId).path)
  File "/home/craigror/anaconda2/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 4170, in _openFile
    resource = IndexedFastaReader(location)
  File "/home/craigror/anaconda2/lib/python2.7/site-packages/pbcore/io/FastaIO.py", line 410, in __init__
    self.fai = loadFastaIndex(self.faiFilename, self.view)
  File "/home/craigror/anaconda2/lib/python2.7/site-packages/pbcore/io/FastaIO.py", line 267, in loadFastaIndex
    assert (header_[0] == ">" and header_[-1] == "\n")
AssertionError
[ERROR]
Traceback (most recent call last):
  File "/home/craigror/anaconda2/lib/python2.7/site-packages/pbcommand/cli/core.py", line 138, in _pacbio_main_runner
    return_code = exe_main_func(*args, **kwargs)
  File "/home/craigror/anaconda2/lib/python2.7/site-packages/GenomicConsensus/main.py", line 340, in args_runner
    return tr.main()
  File "/home/craigror/anaconda2/lib/python2.7/site-packages/GenomicConsensus/main.py", line 267, in main
    self._loadReference(peekFile)
  File "/home/craigror/anaconda2/lib/python2.7/site-packages/GenomicConsensus/main.py", line 125, in _loadReference
    reference.loadFromFile(options.referenceFilename, alnFile)
  File "/home/craigror/anaconda2/lib/python2.7/site-packages/GenomicConsensus/reference.py", line 97, in loadFromFile
    f = ReferenceSet(filename_)
  File "/home/craigror/anaconda2/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 4336, in __init__
    super(ReferenceSet, self).__init__(*files, **kwargs)
  File "/home/craigror/anaconda2/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 3887, in __init__
    super(ContigSet, self).__init__(*files, **kwargs)
  File "/home/craigror/anaconda2/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 450, in __init__
    self.updateCounts()
  File "/home/craigror/anaconda2/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 4092, in updateCounts
    if not self.isIndexed:
  File "/home/craigror/anaconda2/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 4241, in isIndexed
    lambda x: isinstance(x, IndexedFastaReader))
  File "/home/craigror/anaconda2/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 1832, in _pollResources
    return [func(resource) for resource in self.resourceReaders()]
  File "/home/craigror/anaconda2/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 4200, in resourceReaders
    self._openFiles()
  File "/home/craigror/anaconda2/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 4155, in _openFiles
    resource = self._openFile(urlparse(extRes.resourceId).path)
  File "/home/craigror/anaconda2/lib/python2.7/site-packages/pbcore/io/dataset/DataSetIO.py", line 4170, in _openFile
    resource = IndexedFastaReader(location)
  File "/home/craigror/anaconda2/lib/python2.7/site-packages/pbcore/io/FastaIO.py", line 410, in __init__
    self.fai = loadFastaIndex(self.faiFilename, self.view)
  File "/home/craigror/anaconda2/lib/python2.7/site-packages/pbcore/io/FastaIO.py", line 267, in loadFastaIndex
    assert (header_[0] == ">" and header_[-1] == "\n")
AssertionError```
@armintoepfer armintoepfer added the valid bug Something isn't working label Sep 7, 2018
@armintoepfer
Copy link
Member

Could it be that your reference fasta file is not valid? The error message

assert (header_[0] == ">" and header_[-1] == "\n")

tells that there is a line starting with > but is not followed by another line with sequence data.

@rorycraig337
Copy link
Author

Thanks for clarifying the error message, I had a bug in my script that was manipulating the Pilon output that was occasionally duplicating a new line, I should've caught that. Arrow is now working with my file, thanks again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants