Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use a meta map #256

Merged
merged 10 commits into from
Jul 24, 2020
Merged

use a meta map #256

merged 10 commits into from
Jul 24, 2020

Conversation

maxulysse
Copy link
Member

Using a meta map à la @drpatelh

nf-core/sarek pull request

Many thanks for contributing to nf-core/sarek!

Please fill in the appropriate checklist below (delete whatever is not relevant).
These are the most common things requested on pull requests (PRs).

PR checklist

  • This comment contains a description of changes (with reason)
  • If you've fixed a bug or added code that should be tested, add tests!
  • If necessary, also make a PR on the nf-core/sarek branch on the nf-core/test-datasets repo
  • Ensure the test suite passes (nextflow run . -profile test,docker).
  • Make sure your code lints (nf-core lint .).
  • Documentation in docs is updated
  • CHANGELOG.md is updated
  • README.md is updated

Learn more about contributing: CONTRIBUTING.md

@drpatelh
Copy link
Member

Thanks @maxulysse! Will have a proper look at this tomorrow at some point 👍

Copy link
Contributor

@FriederikeHanssen FriederikeHanssen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love how readable everything is now 😍

fastqc -t 2 -q ${idSample}_${idRun}_R1.fastq.gz ${idSample}_${idRun}_R2.fastq.gz
[ ! -f ${prefix}_1.fastq.gz ] && ln -s ${reads[0]} ${prefix}_1.fastq.gz
[ ! -f ${prefix}_2.fastq.gz ] && ln -s ${reads[1]} ${prefix}_2.fastq.gz
fastqc --threads ${task.cpus} ${prefix}_1.fastq.gz ${prefix}_2.fastq.gz
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity: What does [! -f ...] do?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea, just copied it over from chipseq.
I figured it would be easier to update from the nf-core modules from there

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe @drpatelh then knows 😃

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its bash notation for checking if a file doesnt exist 🙂

${extra} \
-t ${task.cpus} \
${fasta} ${reads} | \
samtools sort --threads ${task.cpus} -m 2G - > ${meta.id}.bam
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the memory hardcoded here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not paid attention enough there, I guess I just copied it over from the current sarek dev, I can set it back the way it was

main.nf Show resolved Hide resolved
@FriederikeHanssen FriederikeHanssen merged commit 538a296 into nf-core:dsl2 Jul 24, 2020
Copy link
Member

@drpatelh drpatelh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly some reorganisation of the module structure and consistency with using module options and syntax for reusability and flexibility.

main.nf Show resolved Hide resolved
main.nf Show resolved Hide resolved
main.nf Show resolved Hide resolved
Comment on lines +360 to +362
def bwamem2_mem_options = [:]

bwamem2_mem_options.args_bwamem2 = "-K 100000000 -M"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These options can be pre-defined in a map in conf/modules.config like here, included via nextflow.config like here and then you can even append parameters like here in the main script if required.

Hopefully, this means the software parameters are easier to pass around the script and are more customisable by the developed/user. Also, means you dont have to initialise maps all over the place for the module settings because this is already explicitly done in modules.config.

We should stick the same notation to access variable in the module files though i.e. the 5 I have needed so far are listed here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also means that every module will need to be passed an additional opts value which is a map containing the generic options for the module e.g. here because this will allow more flexibility to control things like output directory etc. like here


publishDir "${params.outdir}/bwamem2_mem", mode: 'copy'
publishDir "${params.outdir}/bwamem2/${meta.sample}",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should remove any customisation from this code in terms of output directories as this should be customisable from the opts map that comes into the module. The logic still needs a little work but for now this is the generic code I am using here


script:
CN = params.sequencing_center ? "CN:${params.sequencing_center}\\t" : ""
readGroup = "@RG\\tID:${run}\\t${CN}PU:${run}\\tSM:${sample}\\tLB:${sample}\\tPL:${params.sequencer}"
readGroup = "@RG\\tID:${meta.run}\\t${CN}PU:${meta.run}\\tSM:${meta.sample}\\tLB:${meta.sample}\\tPL:ILLUMINA"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should come in via the meta parameter because not everyone will want to create the read group in this way because they wont have all of the same values in map e.g. see here


output:
tuple val(patient), val(sample), val(run), path("*.bam"), path("*.bai")
tuple val(meta), path("*.bam"), path("*.bai")

script:
CN = params.sequencing_center ? "CN:${params.sequencing_center}\\t" : ""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
CN = params.sequencing_center ? "CN:${params.sequencing_center}\\t" : ""

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This parameter should come in via arguments created from the pipeline and not hardcoded here. Not everyone will use this!

samtools sort --threads ${task.cpus} -m 2G - > ${sample}_${run}.bam
samtools index ${sample}_${run}.bam
bwa-mem2 mem \
${options.args_bwamem2} \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
${options.args_bwamem2} \
$opts.args \

${fasta} ${reads} | \
samtools sort --threads ${task.cpus} -m 2G - > ${meta.id}.bam

samtools index ${meta.id}.bam
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will need to allow for suffixes too in order to allow for naming the bam files differently if required. This would apply to other modules too e.g. here

fastqc -t 2 -q ${idSample}_${idRun}_R1.fastq.gz ${idSample}_${idRun}_R2.fastq.gz
[ ! -f ${prefix}_1.fastq.gz ] && ln -s ${reads[0]} ${prefix}_1.fastq.gz
[ ! -f ${prefix}_2.fastq.gz ] && ln -s ${reads[1]} ${prefix}_2.fastq.gz
fastqc --threads ${task.cpus} ${prefix}_1.fastq.gz ${prefix}_2.fastq.gz
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its bash notation for checking if a file doesnt exist 🙂

@maxulysse
Copy link
Member Author

for BWA_MEM2_MEM could we use BWA-MEM2_MEM instead actually?
I think it makes more sense

@maxulysse maxulysse deleted the dsl2_meta branch July 24, 2020 08:50
@drpatelh
Copy link
Member

I am not too keen on using a mixture of _ and -...I think we should take this opportunity to unify the syntax as much as possible otherwise its the Wild West out there in terms of naming conventions! Also means it will easier to write downstream tools based on this structure if the syntax is the same!

@maxulysse
Copy link
Member Author

name of the tool is now BWA-MEM2, so I'm all for BWAMEM2 or BWA-MEM2, but not keen on BWA_MEM2, I think it's confusing with the suffix we add for _INDEX or _MEM afterwards

@drpatelh
Copy link
Member

Ah, I see. Good point 👍 If we want to split on the _ to be able to extract the sub-tool used too then it makes sense to maybe rethink this. Although, what do we do for example with trim_galore that is invoked as trim_galore - maybe that should be one word instead? 🤔

@maxulysse
Copy link
Member Author

that's a good point

@maxulysse maxulysse added the DSL2 label Jan 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants