-
Notifications
You must be signed in to change notification settings - Fork 417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use a meta map #256
use a meta map #256
Conversation
Thanks @maxulysse! Will have a proper look at this tomorrow at some point 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love how readable everything is now 😍
fastqc -t 2 -q ${idSample}_${idRun}_R1.fastq.gz ${idSample}_${idRun}_R2.fastq.gz | ||
[ ! -f ${prefix}_1.fastq.gz ] && ln -s ${reads[0]} ${prefix}_1.fastq.gz | ||
[ ! -f ${prefix}_2.fastq.gz ] && ln -s ${reads[1]} ${prefix}_2.fastq.gz | ||
fastqc --threads ${task.cpus} ${prefix}_1.fastq.gz ${prefix}_2.fastq.gz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity: What does [! -f ...] do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea, just copied it over from chipseq.
I figured it would be easier to update from the nf-core modules from there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe @drpatelh then knows 😃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its bash notation for checking if a file doesnt exist 🙂
${extra} \ | ||
-t ${task.cpus} \ | ||
${fasta} ${reads} | \ | ||
samtools sort --threads ${task.cpus} -m 2G - > ${meta.id}.bam |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the memory hardcoded here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did not paid attention enough there, I guess I just copied it over from the current sarek dev, I can set it back the way it was
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mainly some reorganisation of the module structure and consistency with using module options and syntax for reusability and flexibility.
def bwamem2_mem_options = [:] | ||
|
||
bwamem2_mem_options.args_bwamem2 = "-K 100000000 -M" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These options can be pre-defined in a map in conf/modules.config
like here, included via nextflow.config
like here and then you can even append parameters like here in the main script if required.
Hopefully, this means the software parameters are easier to pass around the script and are more customisable by the developed/user. Also, means you dont have to initialise maps all over the place for the module settings because this is already explicitly done in modules.config
.
We should stick the same notation to access variable in the module files though i.e. the 5 I have needed so far are listed here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
publishDir "${params.outdir}/bwamem2_mem", mode: 'copy' | ||
publishDir "${params.outdir}/bwamem2/${meta.sample}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should remove any customisation from this code in terms of output directories as this should be customisable from the opts
map that comes into the module. The logic still needs a little work but for now this is the generic code I am using here
|
||
script: | ||
CN = params.sequencing_center ? "CN:${params.sequencing_center}\\t" : "" | ||
readGroup = "@RG\\tID:${run}\\t${CN}PU:${run}\\tSM:${sample}\\tLB:${sample}\\tPL:${params.sequencer}" | ||
readGroup = "@RG\\tID:${meta.run}\\t${CN}PU:${meta.run}\\tSM:${meta.sample}\\tLB:${meta.sample}\\tPL:ILLUMINA" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should come in via the meta
parameter because not everyone will want to create the read group in this way because they wont have all of the same values in map e.g. see here
|
||
output: | ||
tuple val(patient), val(sample), val(run), path("*.bam"), path("*.bai") | ||
tuple val(meta), path("*.bam"), path("*.bai") | ||
|
||
script: | ||
CN = params.sequencing_center ? "CN:${params.sequencing_center}\\t" : "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CN = params.sequencing_center ? "CN:${params.sequencing_center}\\t" : "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This parameter should come in via arguments created from the pipeline and not hardcoded here. Not everyone will use this!
samtools sort --threads ${task.cpus} -m 2G - > ${sample}_${run}.bam | ||
samtools index ${sample}_${run}.bam | ||
bwa-mem2 mem \ | ||
${options.args_bwamem2} \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
${options.args_bwamem2} \ | |
$opts.args \ |
${fasta} ${reads} | \ | ||
samtools sort --threads ${task.cpus} -m 2G - > ${meta.id}.bam | ||
|
||
samtools index ${meta.id}.bam |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will need to allow for suffixes too in order to allow for naming the bam files differently if required. This would apply to other modules too e.g. here
fastqc -t 2 -q ${idSample}_${idRun}_R1.fastq.gz ${idSample}_${idRun}_R2.fastq.gz | ||
[ ! -f ${prefix}_1.fastq.gz ] && ln -s ${reads[0]} ${prefix}_1.fastq.gz | ||
[ ! -f ${prefix}_2.fastq.gz ] && ln -s ${reads[1]} ${prefix}_2.fastq.gz | ||
fastqc --threads ${task.cpus} ${prefix}_1.fastq.gz ${prefix}_2.fastq.gz |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its bash notation for checking if a file doesnt exist 🙂
for |
I am not too keen on using a mixture of |
name of the tool is now |
Ah, I see. Good point 👍 If we want to split on the |
that's a good point |
Using a meta map à la @drpatelh
nf-core/sarek pull request
Many thanks for contributing to nf-core/sarek!
Please fill in the appropriate checklist below (delete whatever is not relevant).
These are the most common things requested on pull requests (PRs).
PR checklist
nextflow run . -profile test,docker
).nf-core lint .
).docs
is updatedCHANGELOG.md
is updatedREADME.md
is updatedLearn more about contributing: CONTRIBUTING.md