Skip to content

Collection of script to generate sample registration forms and bulk-ena-webincli spreadsheet for submission of reads to ENA.

Notifications You must be signed in to change notification settings

bge-barcoding/ena-read-upload

 
 

Repository files navigation

1_populate_tsv.py

Used to generate ENA Tree of Life sample submisison checklist to create sample accession numbers. Takes relevant fields from sample_metadata.csv and outputs them in ToL checklist format for manual upload to ENA.

usage: python 1_populate_tsv.py [/path/to/sample_metadata.csv] [/path/to/tol_ena_checklist.tsv

  • sample_metadata.csv = Generated during sample-processing from BOLD container dowload.
  • tol_ena_checklist.tsv = Contains following fields: 'taxid', 'scientific_name', 'sample_alias', 'sample_title', 'sample_description', 'organism part', 'lifestage', 'project name', 'identified_by', 'collected_by', 'collection date', 'geographic location (country and/or sea)', 'geographic location (latitude)', 'geographic location (longitude)', 'geographic location (region and locality)', 'habitat', 'sex', 'collecting institution', 'specimen_voucher'.

See 'BOLD_download-ENA_ToL_checklist_field_mapping.xlsx' for information on how fields from BOLD container downloads are used to populate required fields in ENA's Tree of Life sample registration checklist.

2_create_ena_submission_sheet.py

Used to generate the sample_submission_spreadsheet required by ena-bulk-webincli to produce the manifest file for bulk read upload to ENA. Takes the .csv file returned by ENA containing created sample accession numbers (and other metadata), and a path to a directory containing trimmed R1.fastq and R2.fastq files.

usage: python 2_create_ena_submission_sheet.py path/to/sample_accession_output.csv path/to/trimmed/read/dir path/to/output_dir/output.tsv

  • path/to/sample_accession_output.csv = Output by ENA and downloaded manually from Webin account. Contains sample accession numbers and 'title' fields (i.e. Process ID).
  • path/to/trimmed/read/dir = path to directory containing trimmed reads (.fastq) output by MGE or Skim2Mito (or another pipeline).
  • path/to/output_dir/output.tsv = sample_submission_spreadsheet

3_ena_bulk_webincli.sh

Primarily generates manifest file for bulk upload of trimmed PE read data for upload to ENA using output.tsv from 2_create_ena_submission_sheet.py script. Requires ena-bulk-webincli to be installed in conda env.

usage: python path/to/bulk_webincli.py -u Webin-XXXXX -p XXXXX -g reads -s path/to/sample_submission_spreadsheet.tsv -m validate -pc 8

  • path/to/bulk_webincli.py = Path to run script supplied with ena-bulk-webincli.
  • -u = Webin account username
  • -p = Webin account password
  • -g =Genetic context (reads, sequence, genome, transcriptome, taxrefset)
  • -s = path to sample_submission_spreadhseet.tsv generated by 2_create_ena_submission_sheet.py (can be tab-separated .txt, .csv, .xlsx/.xls or .tsv)
  • -m = mode (subkit/validate)
  • -pc = parellel cores (between 1 and 10)
  • -t = test mode for use of Webin test submission services

About

Collection of script to generate sample registration forms and bulk-ena-webincli spreadsheet for submission of reads to ENA.

Resources

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 89.0%
  • Shell 11.0%