Newer python versions and Bio Alphabet #112

NathanSiemers · 2021-05-23T14:13:26Z

Hello, I'm trying to build a running tracer on a more modern version of python (3.8.10). SInce then, Bio.Alphabet has been removed from python, and the recommendation is that calls to it (IUPAC) can be removed from most code without a problem.

Is it feasible to do this? Any know successes or issues with later versions of python?

Thank you.

File "/usr/local/lib/python3.8/site-packages/tracer-0.5-py3.8.egg/tracerlib/tracer_func.py", line 29, in

from Bio.Alphabet import IUPAC

File "/usr/local/lib/python3.8/site-packages/Bio/Alphabet/init.py", line 20, in

raise ImportError(

ImportError: Bio.Alphabet has been removed from Biopython. In many cases, the alphabet can simply be ignored and removed from scripts. In a few cases, you may need to specify the \

``molecule_type`

NathanSiemers · 2021-05-24T12:44:47Z

I tested removal of the import calls in init.py and one other file, and tracer loaded correctly, but haven't made a test run.

mstubb · 2021-05-24T12:51:56Z

Hi Nathan,

Thanks for this! I'll be happy to accept a PR that updates this if you'd like to submit one.

All the best,

Mike

NathanSiemers · 2021-05-31T04:51:30Z

I've spent several days working on a pull request. I removed the Bio Alphabet dependencies and changed the creating of the Seq objects to remove dependencies on Bio Alphabet IUPAC. I have also have been editing the Dockerfile to update packages to bring everything to a modern version, and also to run the tests. I can send you what I have so far, but: There's an error in the 'tracer test'. It seems that there's still an obscure call to Bio Alphabet in the pickle dump/load that I find difficult to trace. Partially likely because I'm not a python hacker, I can't resolve this one. Some help from the group would be appreciated.

(fragment of tracer test below, I can't find a remaining reference to Bio Alphabet anywhere in the code base.)

##Running Kallisto##
##Making Kallisto indices##

[build] loading fasta file /tracer/test_data/results/cell1/expression_quantification/kallisto_index/cell1_transcriptome.fa
[build] k-mer length: 31
[build] warning: clipped off poly-A tail (longer than 10)
from 654 target sequences
[build] warning: replaced 3 non-ACGUT characters in the input sequence
with pseudorandom nucleotides
[build] counting k-mers ... done.
[build] building target de Bruijn graph ... done
[build] creating equivalence classes ... done
[build] target de Bruijn graph has 781463 contigs and contains 113560426 k-mers

##Quantifying with Kallisto##

[quant] fragment length distribution will be estimated from the data
[index] k-mer length: 31
[index] number of targets: 131,104
[index] number of k-mers: 113,560,426
[index] number of equivalence classes: 460,618
[quant] running in paired-end mode
[quant] will process pair 1: /tracer/test_data/cell1_1.fastq
/tracer/test_data/cell1_2.fastq
[quant] finding pseudoalignments for the reads ... done
[quant] processed 1,135 reads, 1,042 reads pseudoaligned
[quant] estimated average fragment length: 106.333
[ em] quantifying the abundances ... done
[ em] the Expectation-Maximization algorithm ran for 52 rounds

##Filtering by read count##
Traceback (most recent call last):
File "/usr/local/bin/tracer", line 11, in
load_entry_point('tracer==0.5', 'console_scripts', 'tracer')()
File "/usr/local/lib/python3.7/dist-packages/tracer-0.5-py3.7.egg/tracerlib/launcher.py", line 43, in launch
Task().run()
File "/usr/local/lib/python3.7/dist-packages/tracer-0.5-py3.7.egg/tracerlib/tasks.py", line 1230, in run
loci=['A', 'B'], species='Mmus').run()
File "/usr/local/lib/python3.7/dist-packages/tracer-0.5-py3.7.egg/tracerlib/tasks.py", line 766, in run
cl = pickle.load(pkl)
File "/usr/local/lib/python3.7/dist-packages/Bio/Alphabet/init.py", line 21, in
"Bio.Alphabet has been removed from Biopython. In many cases, the alphabet can simply be ignored and removed from scripts. In a few cases, you may need to specify the molecule_type as an annotation on a SeqRecord for your script to work correctly. Please see https://biopython.org/wiki/Alphabet for more information."
ImportError: Bio.Alphabet has been removed from Biopython. In many cases, the alphabet can simply be ignored and removed from scripts. In a few cases, you may need to specify the molecule_type as an annotation on a SeqRecord for your script to work correctly. Please see https://biopython.org/wiki/Alphabet for more information.

NathanSiemers · 2021-05-31T23:24:56Z

I think the untraceability of the error is due to the Bio Alphabet embedding in the pkl test data reference files in directories like this:

https://github.com/Teichlab/tracer/tree/master/test_data/results/cell2/unfiltered_TCR_seqs

If that's true then the error is due to modern python not being able to load the old reference test results that were pickled.

N

(some text strings from the pkl file below)

S'alphabet'p154g0(cBio.AlphabetHasStopCodonp155g2Ntp156Rp157(dp158S'stop_symbol'p159S'*'p160sg154g0(cBio.Alphabet.IUPACExtendedIUPACProteinp161g2Ntp162Rp163sS'letters'

mstubb · 2021-06-01T12:48:50Z

Thanks Nathan.

Yes, I think you're right that the error comes from test trying to load the old pickled files that were created with a previous version.

I think that a solution here would be to use an environment with the old BioPython to load those pickled files and then write them out as some kind of parseable text file (not as a pickle).

The pickles are representations of a Cell (

tracer/tracerlib/core.py

Line 10 in 84f53e5

class Cell(object):

) object and its Recombinant (

tracer/tracerlib/core.py

Line 298 in 84f53e5

class Recombinant(object):

) objects.

These classes aren't very complex so you could write out a text file containing their instance variables.

You could then switch to an environment with the new version of BioPython, recreate the objects using the values in your text file and then repickle them. Those should then be compatible and test should pass.

Cheers,

Mike

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Newer python versions and Bio Alphabet #112

Newer python versions and Bio Alphabet #112

NathanSiemers commented May 23, 2021

NathanSiemers commented May 24, 2021

mstubb commented May 24, 2021

NathanSiemers commented May 31, 2021

NathanSiemers commented May 31, 2021

mstubb commented Jun 1, 2021

Newer python versions and Bio Alphabet #112

Newer python versions and Bio Alphabet #112

Comments

NathanSiemers commented May 23, 2021

File "/usr/local/lib/python3.8/site-packages/tracer-0.5-py3.8.egg/tracerlib/tracer_func.py", line 29, in

from Bio.Alphabet import IUPAC

File "/usr/local/lib/python3.8/site-packages/Bio/Alphabet/init.py", line 20, in

raise ImportError(

ImportError: Bio.Alphabet has been removed from Biopython. In many cases, the alphabet can simply be ignored and removed from scripts. In a few cases, you may need to specify the \

NathanSiemers commented May 24, 2021

mstubb commented May 24, 2021

NathanSiemers commented May 31, 2021

NathanSiemers commented May 31, 2021

mstubb commented Jun 1, 2021