Skip to content

Darg-Iztech/EpiCoV_downloader

 
 

Repository files navigation

GISAID EpiCoV Downloader

DOI

This is an updated version of GISAID downloader to retrieve all EpiCoV sequences and the table. The script utilizes Selenium to acess the GISAID website through a Firefox webdriver.

WARNING: By using this software you agree GISAID's Terms of Use and reaffirm your understanding of these terms.

Installation

We provide a package file (environment.yml) to create a new environment (gisaid) using conda:

$ git clone https://github.com/poeli/EpiCoV_downloader.git
$ cd EpiCoV_downloader/
$ conda env create -f environment.yml
$ conda activate gisaid

⚠️ Download geckodriver and put it in the same directory as this script.

Example Run Command:

python gisaid_EpiCoV_downloader.py -u '<username>' -p '<password>' -o 'downloads/' -ht 'Human' -ss '2023-04-26' -voc 'omicron' -cg -hc -nnd

Example Logs:

2023-04-26 18:27 [INFO] GISAID EpiCoV Utility v21.05.10
2023-04-26 18:27 [INFO] Opening browser...
2023-04-26 18:27 [INFO] Opening website GISAID...
2023-04-26 18:27 [INFO] GISAID Initiative
2023-04-26 18:27 [INFO] Loggining into GISAID...
2023-04-26 18:27 [INFO] Navigating to EpiCoV...
2023-04-26 18:27 [INFO] Searching in EpiCoV...
2023-04-26 18:28 [INFO] Total: 15,470,138 viruses...
2023-04-26 18:28 [INFO] Setting host...
2023-04-26 18:28 [INFO] Total: 15,455,310 viruses...
2023-04-26 18:28 [INFO] Setting submissions start date...
2023-04-26 18:28 [INFO] Total: 2,179 viruses...
2023-04-26 18:28 [INFO] Setting variant...
2023-04-26 18:28 [INFO] Selected VOC Omicron GRA (B.1.1.529+BA.*) first detected in Botswana/Hong Kong/South Africa...
2023-04-26 18:28 [INFO] Total: 2,143 viruses...
2023-04-26 18:28 [INFO] Complete genome only...
2023-04-26 18:28 [INFO] Total: 2,049 viruses...
2023-04-26 18:28 [INFO] High coverage only...
2023-04-26 18:28 [INFO] Total: 87 viruses...
2023-04-26 18:28 [INFO] Downloading sequences for selected genomes...
2023-04-26 18:28 [INFO] Switching to data selection iframe...
2023-04-26 18:28 [INFO] Selecting FASTA files...
2023-04-26 18:29 [INFO] Clicking download button...
2023-04-26 18:29 [INFO] Switching back to default page...
2023-04-26 18:29 [INFO] Switching to agreement iframe...
2023-04-26 18:29 [INFO] Accepting terms of use...
2023-04-26 18:29 [INFO] Clicking download button again...
2023-04-26 18:29 [INFO] Switching back to default page...
2023-04-26 18:29 [INFO] Downloaded to gisaid_hcov-19_2023_04_26_15.fasta.
2023-04-26 18:29 [INFO] Completed.

Usage

usage: gisaid_EpiCoV_downloader.py [-h] -u [STR] -p [STR] [-o [STR]]
                                   [-l [STR]] [-ht [STR]] [-cs [YYYY-MM-DD]]
                                   [-ce [YYYY-MM-DD]] [-ss [YYYY-MM-DD]]
                                   [-se [YYYY-MM-DD]] [-cg] [-hc] [-le]
                                   [-t [INT]] [-r [INT]] [-i [INT]] [-m]
                                   [--normal]

Download EpiCoV sequences from GISAID

optional arguments:
  -h, --help            show this help message and exit
  -u [STR], --username [STR]
                        GISAID username
  -p [STR], --password [STR]
                        GISAID password
  -o [STR], --outdir [STR]
                        Output directory
  -l [STR], --location [STR]
                        sample location
  -ht [STR], --host [STR]
                        Specify a host of the sample. Default is human.
  -cs [YYYY-MM-DD], --colstart [YYYY-MM-DD]
                        collection starts date
  -ce [YYYY-MM-DD], --colend [YYYY-MM-DD]
                        collection ends date
  -ss [YYYY-MM-DD], --substart [YYYY-MM-DD]
                        submissions start date
  -se [YYYY-MM-DD], --subend [YYYY-MM-DD]
                        submitssions end date
  -voc [STR], --variant [STR]
                        Variant of concern. One of:
                        ['', 'alpha', 'beta', 'gamma', 'delta', 'omicron']
  -cg, --complete       complete genome only
  -hc, --highcoverage   high coverage only
  -le, --lowcoverageExcl
                        low coverage excluding
  -t [INT], --timeout [INT]
                        set action timeout seconds. Default is 90 secs.
  -r [INT], --retry [INT]
                        retry how many times when the action fails. Default is
                        5 times.
  -i [INT], --interval [INT]
                        time interval between retries in second(s). Default is
                        3 seconds.
  --normal              run firefox in normal mode.

About

Download all EpiCoV sequcnes from GISAID

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.9%
  • Dockerfile 1.1%