Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to DSL2 Best Practices #379

Merged
merged 53 commits into from
Jun 11, 2021
Merged

Conversation

maxulysse
Copy link
Member

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
    • If you've added a new tool - add to the software_versions process and a regex to scrape_software_versions.py
    • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
    • If necessary, also make a PR on the nf-core/sarek branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint .).
  • Ensure the test suite passes (nextflow run . -profile test,docker).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@github-actions
Copy link

github-actions bot commented May 20, 2021

nf-core lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 68e1cc4

+| ✅ 134 tests passed       |+
#| ❔  11 tests were ignored |#
!| ❗  82 tests had warnings |!

❗ Test warnings:

  • files_exist - File not found: environment.yml
  • files_exist - File not found: Dockerfile
  • nextflow_config - Config variable not found: process.container
  • params_used - Config variable not found in main.nf: params.input
  • params_used - Config variable not found in main.nf: params.step
  • params_used - Config variable not found in main.nf: params.genome
  • params_used - Config variable not found in main.nf: params.genomes_base
  • params_used - Config variable not found in main.nf: params.save_reference
  • params_used - Config variable not found in main.nf: params.help
  • params_used - Config variable not found in main.nf: params.no_intervals
  • params_used - Config variable not found in main.nf: params.nucleotides_per_second
  • params_used - Config variable not found in main.nf: params.sentieon
  • params_used - Config variable not found in main.nf: params.skip_qc
  • params_used - Config variable not found in main.nf: params.target_bed
  • params_used - Config variable not found in main.nf: params.tools
  • params_used - Config variable not found in main.nf: params.trim_fastq
  • params_used - Config variable not found in main.nf: params.clip_r1
  • params_used - Config variable not found in main.nf: params.clip_r2
  • params_used - Config variable not found in main.nf: params.three_prime_clip_r1
  • params_used - Config variable not found in main.nf: params.three_prime_clip_r2
  • params_used - Config variable not found in main.nf: params.trim_nextseq
  • params_used - Config variable not found in main.nf: params.save_trimmed
  • params_used - Config variable not found in main.nf: params.split_fastq
  • params_used - Config variable not found in main.nf: params.aligner
  • params_used - Config variable not found in main.nf: params.markdup_java_options
  • params_used - Config variable not found in main.nf: params.use_gatk_spark
  • params_used - Config variable not found in main.nf: params.save_bam_mapped
  • params_used - Config variable not found in main.nf: params.skip_markduplicates
  • params_used - Config variable not found in main.nf: params.ascat_ploidy
  • params_used - Config variable not found in main.nf: params.ascat_purity
  • params_used - Config variable not found in main.nf: params.cf_coeff
  • params_used - Config variable not found in main.nf: params.cf_contamination
  • params_used - Config variable not found in main.nf: params.cf_contamination_adjustment
  • params_used - Config variable not found in main.nf: params.cf_ploidy
  • params_used - Config variable not found in main.nf: params.cf_window
  • params_used - Config variable not found in main.nf: params.generate_gvcf
  • params_used - Config variable not found in main.nf: params.no_strelka_bp
  • params_used - Config variable not found in main.nf: params.pon
  • params_used - Config variable not found in main.nf: params.pon_index
  • params_used - Config variable not found in main.nf: params.ignore_soft_clipped_bases
  • params_used - Config variable not found in main.nf: params.umi
  • params_used - Config variable not found in main.nf: params.read_structure1
  • params_used - Config variable not found in main.nf: params.read_structure2
  • params_used - Config variable not found in main.nf: params.annotate_tools
  • params_used - Config variable not found in main.nf: params.annotation_cache
  • params_used - Config variable not found in main.nf: params.cadd_cache
  • params_used - Config variable not found in main.nf: params.cadd_indels
  • params_used - Config variable not found in main.nf: params.cadd_indels_tbi
  • params_used - Config variable not found in main.nf: params.cadd_wg_snvs
  • params_used - Config variable not found in main.nf: params.cadd_wg_snvs_tbi
  • params_used - Config variable not found in main.nf: params.genesplicer
  • params_used - Config variable not found in main.nf: params.snpeff_cache
  • params_used - Config variable not found in main.nf: params.config_profile_contact
  • params_used - Config variable not found in main.nf: params.config_profile_description
  • params_used - Config variable not found in main.nf: params.config_profile_url
  • params_used - Config variable not found in main.nf: params.outdir
  • params_used - Config variable not found in main.nf: params.publish_dir_mode
  • params_used - Config variable not found in main.nf: params.sequencing_center
  • params_used - Config variable not found in main.nf: params.multiqc_config
  • params_used - Config variable not found in main.nf: params.monochrome_logs
  • params_used - Config variable not found in main.nf: params.email
  • params_used - Config variable not found in main.nf: params.email_on_fail
  • params_used - Config variable not found in main.nf: params.plaintext_email
  • params_used - Config variable not found in main.nf: params.max_multiqc_email_size
  • params_used - Config variable not found in main.nf: params.hostnames
  • params_used - Config variable not found in main.nf: params.validate_params
  • params_used - Config variable not found in main.nf: params.tracedir
  • params_used - Config variable not found in main.nf: params.enable_conda
  • params_used - Config variable not found in main.nf: params.pull_docker_container
  • params_used - Config variable not found in main.nf: params.cpus
  • params_used - Config variable not found in main.nf: params.max_cpus
  • params_used - Config variable not found in main.nf: params.max_memory
  • params_used - Config variable not found in main.nf: params.max_time
  • actions_awsfulltest - .github/workflows/awsfulltest.yml should test full datasets, not -profile test
  • readme - README did not have a Nextflow minimum version badge.
  • pipeline_todos - TODO string in main.nf: It MUST be possible to pass additional parameters to the tool as a command-line string via the "$ioptions.args" variable
  • pipeline_todos - TODO string in main.nf: If the tool supports multi-threading then you MUST provide the appropriate parameter
  • pipeline_todos - TODO string in test_full.config: Specify the paths to your full test data ( on nf-core/test-datasets or directly in repositories, e.g. SRA)
  • pipeline_todos - TODO string in test_full.config: Give any required params for the test so that command line flags are not needed
  • pipeline_todos - TODO string in awsfulltest.yml: You can customise AWS full pipeline tests as required
  • pipeline_todos - TODO string in awstest.yml: You can customise CI pipeline run tests as required
  • schema_description - No description provided in schema for parameter: cpus

❔ Tests ignored:

  • files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/bug_report.md
  • files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/feature_request.md
  • files_unchanged - File ignored due to lint config: .github/PULL_REQUEST_TEMPLATE.md
  • files_unchanged - File ignored due to lint config: assets/nf-core-sarek_logo.png
  • files_unchanged - File ignored due to lint config: docs/images/nf-core-sarek_logo.png
  • files_unchanged - File ignored due to lint config: lib/NfcoreSchema.groovy
  • files_unchanged - File ignored due to lint config: .gitignore or foo
  • files_unchanged - File does not exist: .github/workflows/push_dockerhub_dev.yml
  • files_unchanged - File does not exist: .github/workflows/push_dockerhub_release.yml
  • conda_env_yaml - No environment.yml file found - skipping conda_env_yaml test
  • conda_dockerfile - No environment.yml / Dockerfile file found - skipping conda_dockerfile test

✅ Tests passed:

Run details

  • nf-core/tools version 1.14
  • Run at 2021-06-08 08:37:54

@maxulysse
Copy link
Member Author

Following the @drpatelh DSL2 Best Pratices

@maxulysse
Copy link
Member Author

OK, so now back on track, with some tiny added improvements:
With nf-core/test-datasets#262, I introduced samplesheet.csv files in sarek
With nf-core/test-datasets#263, I want to try out how minimal such a file can be within sarek, and how flexible we can be with our csv files.

I really love the samplesheet check process introduced by @drpatelh in the DSL2 pipeline, but I feel that we're still wasting a process for that, which is what I'm exploring here in Sarek:

sarek/workflows/sarek.nf

Lines 376 to 412 in 4fbbf80

def extract_csv(csv_file) {
Channel.from(csv_file)
.splitCsv(header: true)
.map{ row ->
def meta = [:]
meta.patient = row.patient
meta.sample = row.sample.toString()
// If no gender specified, gender is not considered (only used for somatic CNV)
if (row.gender == null) {
meta.gender = "NA"
} else meta.gender = row.gender.toString()
// If no status specified, sample is considered normal
if (row.status == null) {
meta.status = 0
} else meta.status = row.status.toInteger()
if (row.lane == null) {
// variant_calling
meta.id = meta.sample
def bam = file(row.bam, checkIfExists: true)
def bai = file(row.bai, checkIfExists: true)
return [meta, bam, bai]
} else {
// mapping with fastq
meta.id = "${row.sample}-${row.lane}".toString()
def read1 = file(row.fastq1, checkIfExists: true)
def read2 = file(row.fastq2, checkIfExists: true)
def CN = params.sequencing_center ? "CN:${params.sequencing_center}\\t" : ''
def read_group = "\"@RG\\tID:${row.lane}\\t${CN}PU:${row.lane}\\tSM:${row.sample}\\tLB:${row.sample}\\tPL:ILLUMINA\""
meta.read_group = read_group
return [meta, [read1, read2]]
}
}
}

Plan is to asses as much as we can from the header and infer the rest.
So won't change much for our regular users (apart from switching to DSL2 and having tons of images instead of one, and switching from a tsv to a csv file with a header).
But it could simplify access for new users and we don't need all the information all the time.
We only need gender when doing cnv, and we only need status for tumor/normal pairs.
So if it's not specified, why not assigning by default a 0 as status and "NA" as gender?
Can we remove anything else?
Do we need any more information?

@nf-core/core @nf-core/sarek what do you think?

@maxulysse
Copy link
Member Author

maxulysse commented Jun 7, 2021

ok, now that I fixed CI once again, I'll now update all current modules.
And then extend CI

@maxulysse maxulysse marked this pull request as ready for review June 8, 2021 08:12
@maxulysse maxulysse mentioned this pull request Jun 8, 2021
11 tasks
Copy link
Contributor

@FriederikeHanssen FriederikeHanssen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. This is the PR depending on something weird with the test data, right?

conf/test.config Show resolved Hide resolved
main.nf Show resolved Hide resolved
@maxulysse
Copy link
Member Author

looks good. This is the PR depending on something weird with the test data, right?

Just with CSV instead of TSV actually.
(but yeah now we can go weird with the CSV too).

@maxulysse maxulysse merged commit 39fd254 into nf-core:dsl2 Jun 11, 2021
@maxulysse maxulysse deleted the dsl2_modules_update branch June 11, 2021 13:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants