-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Discussion about input parameters #5
Conversation
Think using those as yamls is the move. Then we can do a "on-changes" CI to launch tower jobs. Ooo better, yet, we can cat the yamls together if they've changed and launch one tower job PR commit instead of multiple. I was hating on the |
I like where this is going |
assets/genomes/GRCh38.yml
Outdated
gtf: | ||
type: string | ||
default: s3://ngi-igenomes/igenomes/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Annotation/Genes/genes.gtf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One problem I have with the current genomes is the static nature. There isn't one version of GRCh38, there is a 1-to-many relationship between the genome build and annotation data (here, a GTF file). Can we include that somehow? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See Sarek with it's various GRCh37/38 builds: https://github.com/nf-core/sarek/blob/6aeac929c924ba382baa42a0fe969b4e0e753ca9/conf/igenomes.config#L14
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's fine, these are just some initial examples.
I don't think the bottleneck isn't going to be:
- Storage space
- Compute
It's going to be:
- Collaboration overhead
- Getting things right and keeping it simple (The Sarek example is perfect)
- Egress fees
assets/genomes/GRCh38.yml
Outdated
- ensembl | ||
- ucsc | ||
- ncbi | ||
- gencode | ||
- refseq | ||
- encode | ||
- custom |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These need to at least be maps or something to capture versions, different builds etc. For example, if we add MANE (the most appropriate transcriptome build for most people):
- ensembl | |
- ucsc | |
- ncbi | |
- gencode | |
- refseq | |
- encode | |
- custom | |
- ensembl | |
- ucsc | |
- ncbi | |
- gencode | |
- refseq | |
- encode | |
- MANE | |
- refseq | |
- ensembl | |
- custom |
assets/genomes/GRCh38.yml
Outdated
properties: | ||
# Required | ||
reference_id: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
properties: | |
# Required | |
reference_id: | |
properties: | |
version: 0 # version of this document! | |
# Required | |
reference_id: |
assets/genomes/GRCh38.yml
Outdated
# OR Manually submitted | ||
# Each optional, build what we can based on what is provided | ||
fasta: | ||
type: string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surely you mean?
type: string | |
type: file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't look at me, @ewels 😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm gonna go ahead with this one and merge it, and we can follow up with those in another PR.
I feel like there's a better/prettier way to do the yaml, but I can't seem to get it to work with nf-validation.
Result of discussion at the nf-core core team retreat.