A Python package for automatically accessing the inverted repeats of thousands of plastid genomes stored on NCBI Nucleotide
To get the most recent stable version of airpg, run:
pip install airpg
Or, alternatively, if you want to get the latest development version of airpg, run:
pip install git+https://github.com/michaelgruenstaeudl/airpg.git
To install airpg, clone it via git, cd into the cloned directory, open a terminal and run:
sudo pip install .
Tutorial 1: Very short survey (runtime ca. 5 min.; for the impatient)
Survey of all plastid genomes of flowering plants submitted to NCBI Nucleotide within the past 10 days.
Tutorial 2: Short survey (runtime ca. 15 min.; for testing)
Survey of all plastid genomes of flowering plants submitted to NCBI Nucleotide within the current month.
Tutorial 3: Medium survey (runtime ca. 5 hours)
Survey of all plastid genomes of flowering plants submitted to NCBI Nucleotide in 2019 only. Note: The results of this survey are available on Zenodo via DOI 10.5281/zenodo.4335906
Tutorial 4: Full survey (runtime ca. 19 hours; with explanations)
Survey of all plastid genomes of flowering plants submitted to NCBI Nucleotide from January 2000 until, and including, December 2020. Note: The results of this survey are available on Zenodo via DOI 10.5281/zenodo.4335906
sort -t$'\t' -k7.1,7.4 -k7.6,7.7 -k7.9,7.10 -n output_script1.tsv > output_script1.sorted.tsv
awk '{print $2}' output_script1.sorted.tsv > output_script1.sorted.index
awk 'NR==FNR{o[FNR]=$1; next} {t[$1]=$0} END{for(x=1; x<=FNR; x++){y=o[x]; print t[y]}}' output_script1.sorted.index output_script2.tsv > output_script2.sorted.tsv
awk 'NR==FNR{o[FNR]=$1; next} {t[$1]=$0} END{for(x=1; x<=FNR; x++){y=o[x]; print t[y]}}' output_script1.sorted.index output_script3.tsv > output_script3.sorted.tsv
How to measure the number of angiosperm families represented by the plastid genomes archived on GenBank
# Using the sorted output of script1 as input
awk -F'\t' '{print $11}' output_script1.sorted.tsv | tr ";" "\n" | grep "aceae" | grep -v "incertae sedis" | sort -u | wc -l
See CHANGELOG.md
for a list of recent changes to the software.