GitHub - brinkmanlab/BioPython-Convert: Tool to interconvert between various bioinformatics formats that BioPython supports

BioPython-Convert

Interconvert various file formats supported by BioPython.

Supports querying records with JMESPath.

Installation

pip install biopython-convert

or:

conda install biopython-convert

or:

git clone https://github.com/brinkmanlab/BioPython-Convert.git
cd BioPython-Convert
./setup.py install

Use

biopython.convert [-s] [-v] [-i] [-q JMESPath] input_file input_type output_file output_type
    -s Split records into seperate files
    -q JMESPath to select records. Must return list of SeqIO records or mappings. Root is list of input SeqIO records.
    -i Print out details of records during conversion
    -v Print version and exit

Supported formats: abi, abi-trim, ace, cif-atom, cif-seqres, clustal, embl, fasta, fasta-2line, fastq-sanger, fastq, fastq-solexa, fastq-illumina, genbank, gb, ig, imgt, nexus, pdb-seqres, pdb-atom, phd, phylip, pir, seqxml, sff, sff-trim, stockholm, swiss, tab, qual, uniprot-xml, gff3, txt, json, yaml

JMESPath

The root node for a query is a list of SeqRecord objects. The query can return a list with a subset of these or a mapping, keying to the constructor parameters of a SeqRecord object.

If the formats are txt, json, or yaml, then the JMESPath resulting object will simply be dumped in those formats.

A web based tool is available to experiment with constructing queries in real time on your data. Simply convert your dataset to JSON and load it into the JMESPath playground to begin composing your query. It supports loading JSON files directly rather than trying to copy/paste the data.

split() and let() functions are available in addition to the JMESPath standard functions

extract(Seq, SeqFeature) is also made available to allow access to the SeqFeature.extract() function within the query

Examples:

Append a new record:

[@, [{'seq': 'AAAA', 'name': 'my_new_record'}]] | []

Filter out any plasmids:

[?!(features[?type=='source'].qualifiers.plasmid)]

Keep only the first record:

[0]

Output taxonomy of each record (txt output):

[*].annotations.taxonomy

Output json object containing id and molecule type:

[*].{id: id, type: annotations.molecule_type}

Convert dataset to PTT format using text output:

[0].[join(' - 1..', [description, to_string(length(seq))]), join(' ', [to_string(length(features[?type=='CDS' && qualifiers.translation])), 'proteins']), join(`"\t"`, ['Location', 'Strand', 'Length', 'PID', 'Gene', 'Synonym', 'Code', 'COG', 'Product']), (features[?type=='CDS' && qualifiers.translation].[join('..', [to_string(sum([location.start, `1`])), to_string(location.end)]), [location.strand][?@==`1`] && '+' || '-', length(qualifiers.translation[0]), (qualifiers.db_xref[?starts_with(@, 'GI')].split(':', @)[1])[0] || '-', qualifiers.gene[0] || '-', qualifiers.locus_tag[0] || '-', '-', '-', qualifiers.product[0] ] | [*].join(`"\t"`, [*].to_string(@)) )] | []

Convert dataset to faa format using fasta output:

[0].let({org: (annotations.organism || annotations.source)}, &(features[?type=='CDS' && qualifiers.translation].{id:
join('|', [
        (qualifiers.db_xref[?starts_with(@, 'GI')].['gi', split(':', @)[1]]),
        (qualifiers.protein_id[*].['ref', @]),
        (qualifiers.locus_tag[*].['locus', @]),
        join('', [':', [location][?strand==`-1`] && 'c' || '', to_string(sum([location.start, `1`])), '..', to_string(location.end)])
][][]),
seq: qualifiers.translation[0],
description: (org && join('', [qualifiers.product[0], ' [', org, ']']) || qualifiers.product[0])}))

See CONTRIBUTING.rst for information on contributing to this repo.

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
.github/workflows		.github/workflows
.idea		.idea
bin		bin
biopython_convert		biopython_convert
test-data		test-data
tests		tests
.gitignore		.gitignore
CONTRIBUTING.rst		CONTRIBUTING.rst
LICENCE.rst		LICENCE.rst
MANIFEST.in		MANIFEST.in
README.rst		README.rst
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BioPython-Convert

Installation

Use

JMESPath

About

Releases 7

Packages

Languages

License

brinkmanlab/BioPython-Convert

Folders and files

Latest commit

History

Repository files navigation

BioPython-Convert

Installation

Use

JMESPath

About

Resources

License

Stars

Watchers

Forks

Releases 7

Packages 0

Languages

Packages