SRA-Collector

$${\color{red}\textbf{📢 Deprecation Notice}}$$

Due to the lack of sponsors, the project is discontinued as its cloud expenses are not affordable. An alternative in the form of a CLI is being considered. Contact by mail ✉️ to marta.arcones@gmail.com in case you are interested in project continuity.

SRA-Collector

Collect NIH NCBI 🧬 metadata of several NCBI studies in one search 🔮. The system will provide the following stats in a CSV file of all the experiments related to all the studies fetched by the GEO search:

NCBI Study ID: text field containing the study identifier fetched by the text query in NCBI database.
GSE: text field containing the GEO identifier for each NCBI study.
SRP: text field containing the SRA project identifier relative to the GEO entity.
SRR: text field containing the SRA run identifier for each SRP entity.
Spots: integer number representing the sequencing depth of the SRR.
Bases: integer number representing the amount of sequenced bases in the SRR.
Organism: text field containing from which species the SRR was obtained.
Layout: text field containing one of the two possibles sequencing strategies, i.e., single or paired.
Phred Score From 30: float number field showing the percentage of reads above a phred score of 30. In other words, this value shows the percentage of reads with an accuracy of 99.9 %.
Phred Score From 37: float number field showing the percentage of reads above a phred score of 37. Accordingly, this value shows the percentage of reads with an accuracy of 99.98 %.
Read 0 Count: integer number field representing the count of reads in the main read direction.
Read 0 Average: float number field representing the average sequence length for the main read direction.
Read 0 Stdev: float number field representing the standard deviation of the sequence length average in the main direction.
Read 1 Count: integer number field representing the count of reads in the reverse read direction. This field will be filled only in paired layout samples as those require both reads.
Read 1 Average: float number field representing the average sequence length for reverse read direction.
Read 1 Stdev: float number field representing the standard deviation of the sequence length average in the reverse read direction.

Use Case Example

For the query hypercholesterolemia and rna seq that can be done in NIH NCBI in their web:

https://www.ncbi.nlm.nih.gov/gds/?term=hypercholesterolemia+and+rna+seq

The query retrieves, at the moment of writing, 25 studies. If the query instead is done in sra-collector system, it will be able to generate a CSV report like this one containing the statistics for all the experiments related to all the studies found by NCBI.

Name		Name	Last commit message	Last commit date
Latest commit History 106 Commits
.github		.github
.tests_run		.tests_run
db		db
docs		docs
infra		infra
swagger		swagger
tests		tests
utils		utils
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Makefile		Makefile
README.md		README.md
index.html		index.html
schemaspy.properties		schemaspy.properties
swagger.yaml		swagger.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SRA-Collector

Use Case Example

Full Project Documentation

General Product & Infra Diagram

Database Diagram

Tech Stack

About

Contributors 3

Languages

arcones/sra-collector

Folders and files

Latest commit

History

Repository files navigation

SRA-Collector

Use Case Example

Full Project Documentation

General Product & Infra Diagram

Database Diagram

Tech Stack

About

Topics

Resources

Stars

Watchers

Forks

Contributors 3

Languages