Skip to content

2. Downloading and processing NCBI data

shenjean edited this page Feb 10, 2021 · 8 revisions

NCBI blast databases ftp site: https://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/

  • Download and unzip NCBI sequence file:
cd ..
wget https://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nt.gz
wget https://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nt.gz.md5
md5sum -c nt.gz.md5 >md5sum.log
gunzip nt.gz
  • Get accession numbers:
grep ">" nt | cut -d ' ' -f1 | tr -d ">"  >nt.accession 
  • Get gene names:
grep ">" nt | cut -d ' ' -f2- >nt.genenames 
  • Make a tab-separated list of accession number and species:
paste -d "\t" nt.accession nt.genenames >nt.list