-
Notifications
You must be signed in to change notification settings - Fork 8
2. Downloading and processing NCBI data
shenjean edited this page Feb 10, 2021
·
8 revisions
NCBI blast databases ftp site: https://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/
- Download and unzip NCBI sequence file:
cd ..
wget https://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nt.gz
wget https://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/nt.gz.md5
md5sum -c nt.gz.md5 >md5sum.log
gunzip nt.gz
- Get accession numbers:
grep ">" nt | cut -d ' ' -f1 | tr -d ">" >nt.accession
- Get gene names:
grep ">" nt | cut -d ' ' -f2- >nt.genenames
- Make a tab-separated list of accession number and species:
paste -d "\t" nt.accession nt.genenames >nt.list