add `sample_name` as possible column in samplesheet #31

sgsutcliffe · 2024-10-25T14:59:22Z

Modified the template for input samplesheet.csv file to include the sample_name column in addition to sample in-line with changes to IRIDA-Next update as seen with the speciesabundance pipeline and staramrnf for example. What this means is that the output files and the sample name will be changed to sample_name if a sample_name is called. If gasclustering is being locally then the sample_name can be left blank.

Made a few changes:
- sample_name special characters will be replaced with "_"
- If no sample_name is supplied in the column sample will be used
- To avoid repeat values for sample_name all sample_name values will be suffixed with sample
- Tests to check that the variety of different sample_names work with the

PR checklist

github-actions · 2024-10-25T15:10:12Z

`nf-core pipelines lint` overall result: Passed ✅ ⚠️

Posted for pipeline commit d877ee2

+| ✅ 147 tests passed       |+
#| ❔  28 tests were ignored |#
!| ❗   4 tests had warnings |!

❗ Test warnings:

files_exist - File not found: conf/igenomes_ignored.config
nextflow_config - nf-validation has been detected in the pipeline. Please migrate to nf-schema: https://nextflow-io.github.io/nf-schema/latest/migration_guide/
nextflow_config - Config manifest.version should end in dev: 0.3.0
schema_lint - Schema $id should be https://raw.githubusercontent.com/phac-nml/gasclustering/master/nextflow_schema.json
Found https://raw.githubusercontent.com/phac-nml/gasclustering/main/nextflow_schema.json

❔ Tests ignored:

files_exist - File is ignored: assets/nf-core-gasclustering_logo_light.png
files_exist - File is ignored: docs/images/nf-core-gasclustering_logo_light.png
files_exist - File is ignored: docs/images/nf-core-gasclustering_logo_dark.png
files_exist - File is ignored: .github/workflows/awstest.yml
files_exist - File is ignored: .github/workflows/awsfulltest.yml
files_exist - File is ignored: lib/Utils.groovy
files_exist - File is ignored: lib/WorkflowMain.groovy
files_exist - File is ignored: lib/NfcoreTemplate.groovy
files_exist - File is ignored: lib/WorkflowGasclustering.groovy
nextflow_config - Config variable ignored: manifest.name
nextflow_config - Config variable ignored: manifest.homePage
nextflow_config - Config variable ignored: params.max_cpus
files_unchanged - File ignored due to lint config: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_unchanged - File ignored due to lint config: .github/CONTRIBUTING.md
files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/bug_report.yml
files_unchanged - File ignored due to lint config: .github/PULL_REQUEST_TEMPLATE.md
files_unchanged - File ignored due to lint config: .github/workflows/branch.yml
files_unchanged - File ignored due to lint config: assets/email_template.html
files_unchanged - File ignored due to lint config: assets/email_template.txt
files_unchanged - File ignored due to lint config: assets/sendmail_template.txt
files_unchanged - File does not exist: assets/nf-core-gasclustering_logo_light.png
files_unchanged - File does not exist: docs/images/nf-core-gasclustering_logo_light.png
files_unchanged - File does not exist: docs/images/nf-core-gasclustering_logo_dark.png
files_unchanged - File ignored due to lint config: docs/README.md
files_unchanged - File ignored due to lint config: .gitignore or .prettierignore
actions_awstest - 'awstest.yml' workflow not found: /home/runner/work/gasclustering/gasclustering/.github/workflows/awstest.yml
actions_awsfulltest - actions_awsfulltest
pipeline_name_conventions - pipeline_name_conventions

✅ Tests passed:

files_exist - File found: .gitattributes
files_exist - File found: .gitignore
files_exist - File found: .nf-core.yml
files_exist - File found: .editorconfig
files_exist - File found: .prettierignore
files_exist - File found: .prettierrc.yml
files_exist - File found: CHANGELOG.md
files_exist - File found: CITATIONS.md
files_exist - File found: CODE_OF_CONDUCT.md
files_exist - File found: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_exist - File found: nextflow_schema.json
files_exist - File found: nextflow.config
files_exist - File found: README.md
files_exist - File found: .github/.dockstore.yml
files_exist - File found: .github/CONTRIBUTING.md
files_exist - File found: .github/ISSUE_TEMPLATE/bug_report.yml
files_exist - File found: .github/ISSUE_TEMPLATE/config.yml
files_exist - File found: .github/ISSUE_TEMPLATE/feature_request.yml
files_exist - File found: .github/PULL_REQUEST_TEMPLATE.md
files_exist - File found: .github/workflows/branch.yml
files_exist - File found: .github/workflows/ci.yml
files_exist - File found: .github/workflows/linting_comment.yml
files_exist - File found: .github/workflows/linting.yml
files_exist - File found: assets/email_template.html
files_exist - File found: assets/email_template.txt
files_exist - File found: assets/sendmail_template.txt
files_exist - File found: conf/modules.config
files_exist - File found: conf/test.config
files_exist - File found: conf/test_full.config
files_exist - File found: docs/output.md
files_exist - File found: docs/README.md
files_exist - File found: docs/README.md
files_exist - File found: docs/usage.md
files_exist - File found: main.nf
files_exist - File found: assets/multiqc_config.yml
files_exist - File found: conf/base.config
files_exist - File found: conf/igenomes.config
files_exist - File found: modules.json
files_exist - File not found check: .github/ISSUE_TEMPLATE/bug_report.md
files_exist - File not found check: .github/ISSUE_TEMPLATE/feature_request.md
files_exist - File not found check: .github/workflows/push_dockerhub.yml
files_exist - File not found check: .markdownlint.yml
files_exist - File not found check: .nf-core.yaml
files_exist - File not found check: .yamllint.yml
files_exist - File not found check: bin/markdown_to_html.r
files_exist - File not found check: conf/aws.config
files_exist - File not found check: docs/images/nf-core-gasclustering_logo.png
files_exist - File not found check: lib/Checks.groovy
files_exist - File not found check: lib/Completion.groovy
files_exist - File not found check: lib/Workflow.groovy
files_exist - File not found check: parameters.settings.json
files_exist - File not found check: pipeline_template.yml
files_exist - File not found check: Singularity
files_exist - File not found check: lib/nfcore_external_java_deps.jar
files_exist - File not found check: .travis.yml
nextflow_config - Found nf-validation plugin
nextflow_config - Config variable found: manifest.nextflowVersion
nextflow_config - Config variable found: manifest.description
nextflow_config - Config variable found: manifest.version
nextflow_config - Config variable found: timeline.enabled
nextflow_config - Config variable found: trace.enabled
nextflow_config - Config variable found: report.enabled
nextflow_config - Config variable found: dag.enabled
nextflow_config - Config variable found: process.cpus
nextflow_config - Config variable found: process.memory
nextflow_config - Config variable found: process.time
nextflow_config - Config variable found: params.outdir
nextflow_config - Config variable found: params.input
nextflow_config - Config variable found: manifest.mainScript
nextflow_config - Config variable found: timeline.file
nextflow_config - Config variable found: trace.file
nextflow_config - Config variable found: report.file
nextflow_config - Config variable found: dag.file
nextflow_config - Config variable (correctly) not found: params.nf_required_version
nextflow_config - Config variable (correctly) not found: params.container
nextflow_config - Config variable (correctly) not found: params.singleEnd
nextflow_config - Config variable (correctly) not found: params.igenomesIgnore
nextflow_config - Config variable (correctly) not found: params.name
nextflow_config - Config variable (correctly) not found: params.enable_conda
nextflow_config - Config timeline.enabled had correct value: true
nextflow_config - Config report.enabled had correct value: true
nextflow_config - Config trace.enabled had correct value: true
nextflow_config - Config dag.enabled had correct value: true
nextflow_config - Config dag.file ended with .html
nextflow_config - Config variable manifest.nextflowVersion started with >= or !>=
nextflow_config - nextflow.config contains configuration profile test
nextflow_config - Config default value correct: params.metadata_1_header= metadata_1
nextflow_config - Config default value correct: params.metadata_2_header= metadata_2
nextflow_config - Config default value correct: params.metadata_3_header= metadata_3
nextflow_config - Config default value correct: params.metadata_4_header= metadata_4
nextflow_config - Config default value correct: params.metadata_5_header= metadata_5
nextflow_config - Config default value correct: params.metadata_6_header= metadata_6
nextflow_config - Config default value correct: params.metadata_7_header= metadata_7
nextflow_config - Config default value correct: params.metadata_8_header= metadata_8
nextflow_config - Config default value correct: params.pd_outfmt= matrix
nextflow_config - Config default value correct: params.pd_distm= hamming
nextflow_config - Config default value correct: params.pd_missing_threshold= 1.0
nextflow_config - Config default value correct: params.pd_sample_quality_threshold= 1.0
nextflow_config - Config default value correct: params.pd_file_type= text
nextflow_config - Config default value correct: params.gm_thresholds= 10,5,0
nextflow_config - Config default value correct: params.gm_method= average
nextflow_config - Config default value correct: params.gm_delimiter= .
nextflow_config - Config default value correct: params.max_cpus= 4
nextflow_config - Config default value correct: params.max_memory= 2.GB
nextflow_config - Config default value correct: params.max_time= 1.h
nextflow_config - Config default value correct: params.publish_dir_mode= copy
nextflow_config - Config default value correct: params.validate_params= true
files_unchanged - .gitattributes matches the template
files_unchanged - .prettierrc.yml matches the template
files_unchanged - .github/.dockstore.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/feature_request.yml matches the template
files_unchanged - .github/workflows/linting_comment.yml matches the template
files_unchanged - .github/workflows/linting.yml matches the template
actions_ci - '.github/workflows/ci.yml' is triggered on expected events
actions_ci - '.github/workflows/ci.yml' checks minimum NF version
readme - README Zenodo placeholder was replaced with DOI.
pipeline_todos - No TODO strings found
plugin_includes - No wrong validation plugin imports have been found
template_strings - Did not find any Jinja template strings (0 files)
schema_lint - Schema lint passed
schema_lint - Input mimetype lint passed: 'text/csv'
schema_params - Schema matched params returned from nextflow config
system_exit - No System.exit calls found
actions_schema_validation - Workflow validation passed: linting.yml
actions_schema_validation - Workflow validation passed: branch.yml
actions_schema_validation - Workflow validation passed: ci.yml
actions_schema_validation - Workflow validation passed: linting_comment.yml
merge_markers - No merge markers found in pipeline files
modules_json - Only installed modules found in modules.json
multiqc_config - assets/multiqc_config.yml found and not ignored.
multiqc_config - assets/multiqc_config.yml contains report_section_order
multiqc_config - assets/multiqc_config.yml contains export_plots
multiqc_config - assets/multiqc_config.yml contains report_comment
multiqc_config - assets/multiqc_config.yml follows the ordering scheme of the minimally required plugins.
multiqc_config - assets/multiqc_config.yml contains 'export_plots: true'.
modules_structure - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL'
base_config - conf/base.config found and not ignored.
base_config - CUSTOM_DUMPSOFTWAREVERSIONS found in conf/base.config and Nextflow scripts.
modules_config - conf/modules.config found and not ignored.
modules_config - INPUT_ASSURE found in conf/modules.config and Nextflow scripts.
modules_config - GAS_MCLUSTER found in conf/modules.config and Nextflow scripts.
modules_config - LOCIDEX_MERGE found in conf/modules.config and Nextflow scripts.
modules_config - PROFILE_DISTS found in conf/modules.config and Nextflow scripts.
modules_config - ARBOR_VIEW found in conf/modules.config and Nextflow scripts.
modules_config - CUSTOM_DUMPSOFTWAREVERSIONS found in conf/modules.config and Nextflow scripts.
nfcore_yml - Repository type in .nf-core.yml is valid: pipeline
nfcore_yml - nf-core version in .nf-core.yml is set to the latest version: 3.0.1

Run details

nf-core/tools version 3.0.1
Run at 2024-11-04 22:20:32

kylacochrane

Looks great Steven!

apetkau

This looks great. I tested in IRIDA Next and works perfectly. Thanks so much Steven 😄

emarinier · 2024-10-29T19:08:54Z

CHANGELOG.md

+
+- Added the ability to include a `sample_name` column in the input samplesheet.csv. Allows for compatibility with IRIDA-Next input configuration.
+
+  - `sample_name` special characters will be replaced with `"_"`


Might be useful to include what characters are special (non-alphanumeric?).

emarinier · 2024-10-29T19:10:40Z

README.md

+
+`sample_name`: An **optional** column, that overrides `sample` for outputs (filenames and sample names) and reference assembly identification.
+
+`sample_name`, allows more flexibility in naming output files or sample identification. Unlike `sample`, `sample_name` is not required to contain unique values. `Nextflow` requires unique sample names, and therefore in the instance of repeat `sample_names`, `sample` will be suffixed to any `sample_name`. Non-alphanumeric characters (excluding `_`,`-`,`.`) will be replaced with `"_"`.


I think the comma after sample_name might be unneeded.

emarinier · 2024-10-29T19:15:51Z

workflows/gasclustering.nf

+    ID_COLUMN = "sample_name"
+    ID_COLUMN2 = "sample"


I would recommend changing ID_COLUMN and ID_COLUMN2 to something more like SAMPLE_NAME_COLUMN and SAMPLE_COLUMN, or ID_SAMPLE_NAME and ID_SAMPLE, so that's it's more clear the difference between the two when reading the code.

Also, is ID_COLUMN used within the code? If it's not, it should probably be removed and maybe that changes the name to ID_HEADER or something?

Good catch ID_COLUMN is a vestige from some testing. Well, really I should remove ID_COLUMN2, and replace with ID_COLUMN = "sample".
It goes into the formatting of the metadata_headers channel:

metadata_headers = Channel.of( tuple( ID_COLUMN2, params.metadata_1_header, params.metadata_2_header, params.metadata_3_header, params.metadata_4_header, params.metadata_5_header, params.metadata_6_header, params.metadata_7_header, params.metadata_8_header) )

I think I will change it to SAMPLE_HEADER so it fits nicely in the channel. Here d877ee2

sgsutcliffe added 3 commits October 25, 2024 10:52

Modified workflow, modules and tests to add sample_name to pipeline

f1fe0c6

First step in fixing the linting issues

dae97e0

Final step in fixing linting

3e4bb87

sgsutcliffe added 2 commits October 25, 2024 11:14

Fix editorconfig-checker

7b25df7

Update documentation

207ed9e

sgsutcliffe requested review from apetkau, emarinier and kylacochrane October 25, 2024 18:49

kylacochrane approved these changes Oct 25, 2024

View reviewed changes

apetkau approved these changes Oct 28, 2024

View reviewed changes

emarinier requested changes Oct 29, 2024

View reviewed changes

sgsutcliffe added 2 commits November 4, 2024 16:52

Fix typos and wording

98b747a

Modified variable name

d877ee2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add `sample_name` as possible column in samplesheet #31

add `sample_name` as possible column in samplesheet #31

sgsutcliffe commented Oct 25, 2024 •

edited

Loading

github-actions bot commented Oct 25, 2024 •

edited

Loading

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

kylacochrane left a comment

apetkau left a comment

emarinier Oct 29, 2024

emarinier Oct 29, 2024

emarinier Oct 29, 2024

emarinier Oct 29, 2024

sgsutcliffe Nov 4, 2024 •

edited

Loading


		- Added the ability to include a `sample_name` column in the input samplesheet.csv. Allows for compatibility with IRIDA-Next input configuration.

		- `sample_name` special characters will be replaced with `"_"`


		`sample_name`: An optional column, that overrides `sample` for outputs (filenames and sample names) and reference assembly identification.

		`sample_name`, allows more flexibility in naming output files or sample identification. Unlike `sample`, `sample_name` is not required to contain unique values. `Nextflow` requires unique sample names, and therefore in the instance of repeat `sample_names`, `sample` will be suffixed to any `sample_name`. Non-alphanumeric characters (excluding `_`,`-`,`.`) will be replaced with `"_"`.

add sample_name as possible column in samplesheet #31

Are you sure you want to change the base?

add sample_name as possible column in samplesheet #31

Conversation

sgsutcliffe commented Oct 25, 2024 • edited Loading

PR checklist

github-actions bot commented Oct 25, 2024 • edited Loading

nf-core pipelines lint overall result: Passed ✅ ⚠️

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

kylacochrane left a comment

Choose a reason for hiding this comment

apetkau left a comment

Choose a reason for hiding this comment

emarinier Oct 29, 2024

Choose a reason for hiding this comment

emarinier Oct 29, 2024

Choose a reason for hiding this comment

emarinier Oct 29, 2024

Choose a reason for hiding this comment

emarinier Oct 29, 2024

Choose a reason for hiding this comment

sgsutcliffe Nov 4, 2024 • edited Loading

Choose a reason for hiding this comment

add `sample_name` as possible column in samplesheet #31

add `sample_name` as possible column in samplesheet #31

sgsutcliffe commented Oct 25, 2024 •

edited

Loading

github-actions bot commented Oct 25, 2024 •

edited

Loading

`nf-core pipelines lint` overall result: Passed ✅ ⚠️

sgsutcliffe Nov 4, 2024 •

edited

Loading