Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding support for fastq.gz.spring-files as input #1534

Merged
merged 23 commits into from
Jun 19, 2024
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
b5b766d
Merge pull request #1484 from nf-core/dev
maxulysse May 7, 2024
85905b3
Adding support for spring-compressed fastq.gz as input
asp8200 May 21, 2024
5e9a3df
Improve error msg
asp8200 May 21, 2024
3967a8c
Adding test config
asp8200 May 22, 2024
37e2fa7
Aligning profile-name and config-name for test alignment_from_fastq_a…
asp8200 May 22, 2024
c239c37
Improving names of variables and module instances
asp8200 May 22, 2024
2944812
pleasing the linter
asp8200 May 22, 2024
1719e39
Merge branch 'nf-core:master' into spring_II
asp8200 May 27, 2024
b6421a7
Merge branch 'dev' into spring_II
asp8200 May 27, 2024
c11092f
setup test of alignment from bam,fastq and spring in one input-csv
asp8200 May 27, 2024
1863b61
Updating changelog
asp8200 May 27, 2024
85f03c1
Adding pytest alignment_from_everything
asp8200 May 29, 2024
e434c1c
fix typo
asp8200 May 29, 2024
8515a83
prettier
asp8200 May 29, 2024
1a649a7
Disabling default publishing of fastq.gz-files from SPRING_DECOMPRESS
asp8200 May 30, 2024
9504c26
Reduce code-duplication by introducing function addReadgroupToMeta
asp8200 Jun 10, 2024
0a60386
Merge branch 'dev' into spring_II
asp8200 Jun 10, 2024
2e16ba1
Adding some docs on fastq.gz.spring-files as input
asp8200 Jun 10, 2024
5be6cd5
prettier
asp8200 Jun 10, 2024
84abe8d
Update workflows/sarek/main.nf
asp8200 Jun 17, 2024
b1a4ac0
Very minor update of usage.md
asp8200 Jun 18, 2024
4a9b22c
variable names all lowercase
asp8200 Jun 18, 2024
67e3f02
Merge branch 'spring_II' of https://github.com/asp8200/sarek into spr…
asp8200 Jun 18, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 37 additions & 3 deletions assets/schema_input.json
Original file line number Diff line number Diff line change
Expand Up @@ -47,17 +47,20 @@
"pattern": "^\\S+$",
"unique": ["patient", "sample"],
"anyOf": [
{
"dependentRequired": ["bam"]
},
{
"dependentRequired": ["fastq_1"]
},
{
"dependentRequired": ["bam"]
"dependentRequired": ["spring_1"]
}
],
"meta": ["lane"]
},
"fastq_1": {
"errorMessage": "FastQ file for reads 1 cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'",
"errorMessage": "Gzipped FastQ file for reads 1 cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'",
"anyOf": [
{
"type": "string",
Expand All @@ -72,7 +75,7 @@
"exists": true
},
"fastq_2": {
"errorMessage": "FastQ file for reads 2 cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'",
"errorMessage": "Gzipped FastQ file for reads 2 cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'",
"dependentRequired": ["fastq_1"],
"anyOf": [
{
Expand All @@ -87,6 +90,37 @@
"format": "file-path",
"exists": true
},
"spring_1": {
"errorMessage": "Gzipped and spring-compressed FastQ file for reads 1 cannot contain spaces and must have extension '.fq.gz.spring' or '.fastq.gz.spring'",
"anyOf": [
{
"type": "string",
"pattern": "^\\S+\\.f(ast)?q\\.gz.spring$"
},
{
"type": "string",
"maxLength": 0
}
],
"format": "file-path",
"exists": true
},
"spring_2": {
"errorMessage": "Gzipped and spring-compressed FastQ file for reads 2 cannot contain spaces and must have extension '.fq.gz.spring' or '.fastq.gz.spring'",
"dependentRequired": ["spring_1"],
"anyOf": [
{
"type": "string",
"pattern": "^\\S+\\.f(ast)?q\\.gz.spring$"
},
{
"type": "string",
"maxLength": 0
}
],
"format": "file-path",
"exists": true
},
"table": {
"errorMessage": "Recalibration table cannot contain spaces and must have extension '.table'",
"anyOf": [
Expand Down
4 changes: 4 additions & 0 deletions conf/modules/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,10 @@ process {
]
}

withName: 'NFCORE_SAREK:SAREK:SPRING_DECOMPRESS_.*' {
ext.prefix = { "${spring.simpleName}" }
}

withName: 'MOSDEPTH' {
ext.args = { !params.wes ? "-n --fast-mode --by 500" : ""}
ext.prefix = {
Expand Down
15 changes: 15 additions & 0 deletions conf/test/alignment_from_fastq_and_spring.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Nextflow config file for running minimal tests
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Defines input files and everything required to run a fast and simple pipeline test.

Use as follows:
nextflow run nf-core/sarek -profile test,<extra_test_profile>,<docker/singularity> --outdir <OUTDIR>
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/

params {
input = "${projectDir}/tests/csv/3.0/fastq_and_spring.csv"
tools = null
}
5 changes: 5 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -449,6 +449,11 @@
"git_sha": "2f3db6f45147ebbb56b371536e31bdf622b5bfee",
"installed_by": ["modules", "vcf_annotate_snpeff"]
},
"spring/decompress": {
"branch": "master",
"git_sha": "3f5420aa22e00bd030a2556dfdffc9e164ec0ec5",
"installed_by": ["modules"]
},
"strelka/germline": {
"branch": "master",
"git_sha": "e8f2c77a6e4174ee0a48d073d4cc8ff06c44bb4c",
Expand Down
7 changes: 7 additions & 0 deletions modules/nf-core/spring/decompress/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

40 changes: 40 additions & 0 deletions modules/nf-core/spring/decompress/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

42 changes: 42 additions & 0 deletions modules/nf-core/spring/decompress/meta.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -296,6 +296,8 @@ profiles {
}

// Extra test profiles for more complete CI
// TO-DO: Indentation!
alignment_from_fastq_and_spring { includeConfig 'conf/test/alignment_from_fastq_and_spring.config' }
asp8200 marked this conversation as resolved.
Show resolved Hide resolved
alignment_to_fastq { includeConfig 'conf/test/alignment_to_fastq.config' }
annotation { includeConfig 'conf/test/annotation.config' }
markduplicates_bam { includeConfig 'conf/test/markduplicates_bam.config' }
Expand Down
66 changes: 22 additions & 44 deletions subworkflows/local/samplesheet_to_channel/main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -33,32 +33,43 @@ workflow SAMPLESHEET_TO_CHANNEL{

main:
ch_from_samplesheet.dump(tag:"ch_from_samplesheet")
input_sample = ch_from_samplesheet.map{ meta, fastq_1, fastq_2, table, cram, crai, bam, bai, vcf, variantcaller ->
input_sample = ch_from_samplesheet.map{ meta, fastq_1, fastq_2, spring_1, spring_2, table, cram, crai, bam, bai, vcf, variantcaller ->
// generate patient_sample key to group lanes together
[ meta.patient + meta.sample, [meta, fastq_1, fastq_2, table, cram, crai, bam, bai, vcf, variantcaller] ]
[ meta.patient + meta.sample, [meta, fastq_1, fastq_2, spring_1, spring_2, table, cram, crai, bam, bai, vcf, variantcaller] ]
}.tap{ ch_with_patient_sample } // save the channel
.groupTuple() //group by patient_sample to get all lanes
.map { patient_sample, ch_items ->
// get number of lanes per sample
[ patient_sample, ch_items.size() ]
}.combine(ch_with_patient_sample, by: 0) // for each entry add numLanes
.map { patient_sample, num_lanes, ch_items ->
(meta, fastq_1, fastq_2, table, cram, crai, bam, bai, vcf, variantcaller) = ch_items
(meta, fastq_1, fastq_2, spring_1, spring_2, table, cram, crai, bam, bai, vcf, variantcaller) = ch_items
if (meta.lane && fastq_2) {
meta = meta + [id: "${meta.sample}-${meta.lane}".toString()]
def CN = seq_center ? "CN:${seq_center}\\t" : ''

def flowcell = flowcellLaneFromFastq(fastq_1)
// Don't use a random element for ID, it breaks resuming
def read_group = "\"@RG\\tID:${flowcell}.${meta.sample}.${meta.lane}\\t${CN}PU:${meta.lane}\\tSM:${meta.patient}_${meta.sample}\\tLB:${meta.sample}\\tDS:${fasta}\\tPL:${seq_platform}\""

meta = meta - meta.subMap('lane') + [num_lanes: num_lanes.toInteger(), read_group: read_group.toString(), data_type: 'fastq', size: 1]
meta = meta + [id: "${meta.sample}-${meta.lane}".toString(), data_type: "fastq_gz", num_lanes: num_lanes.toInteger(), size: 1]

if (step == 'mapping') return [ meta, [ fastq_1, fastq_2 ] ]
else {
error("Samplesheet contains fastq files but step is `$step`. Please check your samplesheet or adjust the step parameter.\nhttps://nf-co.re/sarek/usage#input-samplesheet-configurations")
}

// start from TWO spring-files - one with R1 and one with R2
} else if (meta.lane && spring_1 && spring_2) {
meta = meta + [id: "${meta.sample}-${meta.lane}".toString(), data_type: "two_fastq_gz_spring", num_lanes: num_lanes.toInteger(), size: 1]

if (step == 'mapping') return [ meta, [ spring_1, spring_2 ] ]
else {
error("Samplesheet contains spring files (in columns `spring_1` and `spring_2`) but step is `$step`. Please check your samplesheet or adjust the step parameter.\nhttps://nf-co.re/sarek/usage#input-samplesheet-configurations")
}

// start from ONE spring-file containing both R1 and R2
} else if (meta.lane && spring_1 && !spring_2) {
meta = meta + [id: "${meta.sample}-${meta.lane}".toString(), data_type: "one_fastq_gz_spring", num_lanes: num_lanes.toInteger(), size: 1]

if (step == 'mapping') return [ meta, [ spring_1 ] ]
else {
error("Samplesheet contains a spring file (in columns `spring_1`) but step is `$step`. Please check your samplesheet or adjust the step parameter.\nhttps://nf-co.re/sarek/usage#input-samplesheet-configurations")
}

// start from BAM
} else if (meta.lane && bam) {
if (step != 'mapping' && !bai) {
Expand Down Expand Up @@ -270,36 +281,3 @@ Joint germline variant calling also requires intervals in order to genotype the
emit:
input_sample
}

/*
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
FUNCTIONS
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/
// Parse first line of a FASTQ file, return the flowcell id and lane number.
def flowcellLaneFromFastq(path) {
// expected format:
// xx:yy:FLOWCELLID:LANE:... (seven fields)
// or
// FLOWCELLID:LANE:xx:... (five fields)
def line
path.withInputStream {
InputStream gzipStream = new java.util.zip.GZIPInputStream(it)
Reader decoder = new InputStreamReader(gzipStream, 'ASCII')
BufferedReader buffered = new BufferedReader(decoder)
line = buffered.readLine()
}
assert line.startsWith('@')
line = line.substring(1)
def fields = line.split(':')
String fcid

if (fields.size() >= 7) {
// CASAVA 1.8+ format, from https://support.illumina.com/help/BaseSpace_OLH_009008/Content/Source/Informatics/BS/FileFormat_FASTQ-files_swBS.htm
// "@<instrument>:<run number>:<flowcell ID>:<lane>:<tile>:<x-pos>:<y-pos>:<UMI> <read>:<is filtered>:<control number>:<index>"
fcid = fields[2]
} else if (fields.size() == 5) {
fcid = fields[0]
}
return fcid
}
4 changes: 4 additions & 0 deletions tests/csv/3.0/fastq_and_spring.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
patient,sex,status,sample,lane,fastq_1,fastq_2,spring_1,spring_2
test,XX,0,test,test_L1,https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/illumina/fastq/test_1.fastq.gz,https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/illumina/fastq/test_2.fastq.gz,,
test2,XX,0,test2,test2_L1,,,https://raw.githubusercontent.com/nf-core/test-datasets/sarek3/data/genomics/homo_sapiens/illumina/spring/test_1.fastq.gz.spring,https://raw.githubusercontent.com/nf-core/test-datasets/sarek3/data/genomics/homo_sapiens/illumina/spring/test_2.fastq.gz.spring
test3,XX,0,test3,test3_L1,,,https://raw.githubusercontent.com/nf-core/test-datasets/sarek3/data/genomics/homo_sapiens/illumina/spring/test_R1_R2.fastq.gz.spring,
Loading
Loading