GitHub - bge-barcoding/ena-read-upload: Collection of script to generate sample registration forms and bulk-ena-webincli spreadsheet for submission of reads to ENA.

1_populate_tsv.py

Used to generate ENA Tree of Life sample submisison checklist to create sample accession numbers. Takes relevant fields from sample_metadata.csv and outputs them in ToL checklist format for manual upload to ENA.

usage: python 1_populate_tsv.py [/path/to/sample_metadata.csv] [/path/to/tol_ena_checklist.tsv

sample_metadata.csv = Generated during sample-processing from BOLD container dowload.
tol_ena_checklist.tsv = Contains following fields: 'taxid', 'scientific_name', 'sample_alias', 'sample_title', 'sample_description', 'organism part', 'lifestage', 'project name', 'identified_by', 'collected_by', 'collection date', 'geographic location (country and/or sea)', 'geographic location (latitude)', 'geographic location (longitude)', 'geographic location (region and locality)', 'habitat', 'sex', 'collecting institution', 'specimen_voucher'.

See 'BOLD_download-ENA_ToL_checklist_field_mapping.xlsx' for information on how fields from BOLD container downloads are used to populate required fields in ENA's Tree of Life sample registration checklist.

2_create_ena_submission_sheet.py

Used to generate the sample_submission_spreadsheet required by ena-bulk-webincli to produce the manifest file for bulk read upload to ENA. Takes the .csv file returned by ENA containing created sample accession numbers (and other metadata), and a path to a directory containing trimmed R1.fastq and R2.fastq files.

usage: python 2_create_ena_submission_sheet.py path/to/sample_accession_output.csv path/to/trimmed/read/dir path/to/output_dir/output.tsv

path/to/sample_accession_output.csv = Output by ENA and downloaded manually from Webin account. Contains sample accession numbers and 'title' fields (i.e. Process ID).
path/to/trimmed/read/dir = path to directory containing trimmed reads (.fastq) output by MGE or Skim2Mito (or another pipeline).
path/to/output_dir/output.tsv = sample_submission_spreadsheet

3_ena_bulk_webincli.sh

Primarily generates manifest file for bulk upload of trimmed PE read data for upload to ENA using output.tsv from 2_create_ena_submission_sheet.py script. Requires ena-bulk-webincli to be installed in conda env.

usage: python path/to/bulk_webincli.py -u Webin-XXXXX -p XXXXX -g reads -s path/to/sample_submission_spreadsheet.tsv -m validate -pc 8

path/to/bulk_webincli.py = Path to run script supplied with ena-bulk-webincli.
-u = Webin account username
-p = Webin account password
-g =Genetic context (reads, sequence, genome, transcriptome, taxrefset)
-s = path to sample_submission_spreadhseet.tsv generated by 2_create_ena_submission_sheet.py (can be tab-separated .txt, .csv, .xlsx/.xls or .tsv)
-m = mode (subkit/validate)
-pc = parellel cores (between 1 and 10)
-t = test mode for use of Webin test submission services

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

1_populate_tsv.py

2_create_ena_submission_sheet.py

3_ena_bulk_webincli.sh

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
1_populate_tsv.py		1_populate_tsv.py
2_create_ena_submission_sheet.py		2_create_ena_submission_sheet.py
3_ena_bulk_webincli.sh		3_ena_bulk_webincli.sh
BOLD_download-ENA_ToL_checklist_field_mapping.xlsx		BOLD_download-ENA_ToL_checklist_field_mapping.xlsx
README.md		README.md

bge-barcoding/ena-read-upload

Folders and files

Latest commit

History

Repository files navigation

1_populate_tsv.py

2_create_ena_submission_sheet.py

3_ena_bulk_webincli.sh

About

Resources

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages