Skip to content

Commit

Permalink
Merge pull request #5 from nf-core/retreat-brainstorming
Browse files Browse the repository at this point in the history
WIP: Discussion about input parameters
  • Loading branch information
edmundmiller authored Mar 12, 2024
2 parents 186265e + 7a64f75 commit 3c523cd
Show file tree
Hide file tree
Showing 7 changed files with 138 additions and 2 deletions.
49 changes: 49 additions & 0 deletions .github/workflows/build_reference.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
name: Build reference genomes that changed
on:
push:
branches:
- main
paths:
- 'assets/genomes/*.yml'

jobs:
run-tower:
name: Run AWS full tests
if: github.repository == 'nf-core/nascent'
runs-on: ubuntu-latest
steps:
- name: Find changed genomes
id: changed-genome-files
uses: tj-actions/changed-files@v42
with:
files: |
assets/genomes/*.yml
- name: Concatinate all the yamls together
if: steps.changed-files-specific.outputs.any_changed == 'true'
env:
CHANGED_FILES: ${{ steps.changed-files-specific.outputs.all_changed_files }}
run: cat ${CHANGED_FILES} > samplesheet.yml
# - name: Upload samplesheet.yml to s3 or Tower Datasets
# run: TODO
- name: Launch workflow via tower
uses: seqeralabs/action-tower-launch@v2
with:
workspace_id: ${{ secrets.TOWER_WORKSPACE_ID }}
access_token: ${{ secrets.TOWER_ACCESS_TOKEN }}
compute_env: ${{ secrets.TOWER_COMPUTE_ENV }}
revision: ${{ github.sha }}
workdir: s3://${{ secrets.AWS_S3_SCRATCH_BUCKET }}/work
parameters: |
{
"input": "samplesheet.yml"
"hook_url": "${{ secrets.MEGATESTS_ALERTS_SLACK_HOOK_URL }}",
"outdir": "s3://${{ secrets.AWS_S3_BUCKET }}/nascent/results-${{ github.sha }}"
}
profiles: cloud

- uses: actions/upload-artifact@v4
with:
name: Tower debug log file
path: |
tower_action_*.log
tower_action_*.json
7 changes: 7 additions & 0 deletions assets/genomes/GRCh38.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# FIXME Some check this
- genome: GRCh38.p14
fasta: https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_45/GRCh38.primary_assembly.genome.fa.gz
gtf: https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_45/gencode.v45.chr_patch_hapl_scaff.annotation.gtf.gz
mito_name: MT
site: https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000001405.40
reference_version: GCF_000001405.40
6 changes: 6 additions & 0 deletions assets/genomes/GRCm39.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
- genome: GRCm39
fasta: https://hgdownload.soe.ucsc.edu/goldenPath/mm39/bigZips/mm39.fa.gz
gtf: https://hgdownload.soe.ucsc.edu/goldenPath/mm39/bigZips/genes/mm39.ncbiRefSeq.gtf.gz
mito_name: MT
site: https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000001635.27/
reference_version: GCF_000001635.27
53 changes: 53 additions & 0 deletions assets/genomes/R64-1-1.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
- genome: R64-1-1
fasta: s3://ngi-igenomes/igenomes/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Sequence/WholeGenomeFasta/genome.fa
gtf: s3://ngi-igenomes/igenomes/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Annotation/Genes/genes.gtf
bed12: s3://ngi-igenomes/igenomes/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Annotation/Genes/genes.bed
mito_name: MT
macs_gsize: 1.2e7
readme: s3://ngi-igenomes/igenomes/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Annotation/README.txt
# TODO
# Required
# reference_id:
# type: string
# default: R64-1-1
# reference_version:
# type: string
# default: '111'
# created_at:
# type: string
# format: date
# default: 2024-02-07

# # Source specific
# source_type:
# type: string
# enum:
# - ensembl
# - ucsc
# - ncbi
# - gencode
# - refseq
# - encode
# - custom

# # OR Manually submitted
# # Each optional, build what we can based on what is provided
# fasta:
# type: string
# default:
# gtf:
# type: string
# default: s3://ngi-igenomes/igenomes/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Annotation/Genes/genes.gtf
# bed12:
# type: string
# default: s3://ngi-igenomes/igenomes/Saccharomyces_cerevisiae/Ensembl/R64-1-1/Annotation/Genes/genes.bed
# mito_name:
# type: string
# default: MT
# macs_gsize:
# type: string
# default: 1.2e7

# # Markdown block?
# description:
# type: string
3 changes: 2 additions & 1 deletion assets/schema_input.json
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
},
"gtf": {
"type": "string",
"pattern": "^\\S+\\.gtf(\\.gz)?$",
"errorMessage": "TODO"
},
"bed12": {
Expand All @@ -37,7 +38,7 @@
"macs_gsize": {
"type": "string",
"errorMessage": "TODO"
},
}
},
"required": ["genome", "fasta", "gtf"]
}
Expand Down
18 changes: 18 additions & 0 deletions docs/retreat-brainstrorming.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Brainstorming

## Generate

- md5 checksums (validate downloads if possible)

## Track within the pipeline

- software_versions
- copy of command.sh (or just save Nextflow report?)
- Asset input paths
- Show skipped reference types if already existed
- Allow appending to the readme (treat like changelog), in case new asset types added

## Strategy

When adding a new asset, build for the latest reference versions only. Do all genomes.
Optionally backfill old releases on demand if specifically triggered.
4 changes: 3 additions & 1 deletion nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,9 @@ params {
}

profiles {
test { includeConfig 'conf/test.config' }
test {
params.input = "${projectDir}/assets/genomes/R64-1-1.yml"
}
cloud {
params.input = "${projectDir}/assets/genomes.csv"
}
Expand Down

0 comments on commit 3c523cd

Please sign in to comment.