Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add sample_name as possible column in samplesheet #31

Open
wants to merge 7 commits into
base: dev
Choose a base branch
from

Conversation

sgsutcliffe
Copy link

@sgsutcliffe sgsutcliffe commented Oct 25, 2024

Modified the template for input samplesheet.csv file to include the sample_name column in addition to sample in-line with changes to IRIDA-Next update as seen with the speciesabundance pipeline and staramrnf for example. What this means is that the output files and the sample name will be changed to sample_name if a sample_name is called. If gasclustering is being locally then the sample_name can be left blank.

Made a few changes:
- sample_name special characters will be replaced with "_"
- If no sample_name is supplied in the column sample will be used
- To avoid repeat values for sample_name all sample_name values will be suffixed with sample
- Tests to check that the variety of different sample_names work with the

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Add nf-test to test new feature
  • Usage Documentation in docs/usage.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).
  • Tested out in IRIDA-Next locally
  • Tested in Azure

Copy link

github-actions bot commented Oct 25, 2024

nf-core pipelines lint overall result: Passed ✅ ⚠️

Posted for pipeline commit d877ee2

+| ✅ 147 tests passed       |+
#| ❔  28 tests were ignored |#
!| ❗   4 tests had warnings |!

❗ Test warnings:

❔ Tests ignored:

  • files_exist - File is ignored: assets/nf-core-gasclustering_logo_light.png
  • files_exist - File is ignored: docs/images/nf-core-gasclustering_logo_light.png
  • files_exist - File is ignored: docs/images/nf-core-gasclustering_logo_dark.png
  • files_exist - File is ignored: .github/workflows/awstest.yml
  • files_exist - File is ignored: .github/workflows/awsfulltest.yml
  • files_exist - File is ignored: lib/Utils.groovy
  • files_exist - File is ignored: lib/WorkflowMain.groovy
  • files_exist - File is ignored: lib/NfcoreTemplate.groovy
  • files_exist - File is ignored: lib/WorkflowGasclustering.groovy
  • nextflow_config - Config variable ignored: manifest.name
  • nextflow_config - Config variable ignored: manifest.homePage
  • nextflow_config - Config variable ignored: params.max_cpus
  • files_unchanged - File ignored due to lint config: LICENSE or LICENSE.md or LICENCE or LICENCE.md
  • files_unchanged - File ignored due to lint config: .github/CONTRIBUTING.md
  • files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/bug_report.yml
  • files_unchanged - File ignored due to lint config: .github/PULL_REQUEST_TEMPLATE.md
  • files_unchanged - File ignored due to lint config: .github/workflows/branch.yml
  • files_unchanged - File ignored due to lint config: assets/email_template.html
  • files_unchanged - File ignored due to lint config: assets/email_template.txt
  • files_unchanged - File ignored due to lint config: assets/sendmail_template.txt
  • files_unchanged - File does not exist: assets/nf-core-gasclustering_logo_light.png
  • files_unchanged - File does not exist: docs/images/nf-core-gasclustering_logo_light.png
  • files_unchanged - File does not exist: docs/images/nf-core-gasclustering_logo_dark.png
  • files_unchanged - File ignored due to lint config: docs/README.md
  • files_unchanged - File ignored due to lint config: .gitignore or .prettierignore
  • actions_awstest - 'awstest.yml' workflow not found: /home/runner/work/gasclustering/gasclustering/.github/workflows/awstest.yml
  • actions_awsfulltest - actions_awsfulltest
  • pipeline_name_conventions - pipeline_name_conventions

✅ Tests passed:

Run details

  • nf-core/tools version 3.0.1
  • Run at 2024-11-04 22:20:32

Copy link
Contributor

@kylacochrane kylacochrane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great Steven!

Copy link
Member

@apetkau apetkau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. I tested in IRIDA Next and works perfectly. Thanks so much Steven 😄

CHANGELOG.md Outdated

- Added the ability to include a `sample_name` column in the input samplesheet.csv. Allows for compatibility with IRIDA-Next input configuration.

- `sample_name` special characters will be replaced with `"_"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be useful to include what characters are special (non-alphanumeric?).

README.md Outdated

`sample_name`: An **optional** column, that overrides `sample` for outputs (filenames and sample names) and reference assembly identification.

`sample_name`, allows more flexibility in naming output files or sample identification. Unlike `sample`, `sample_name` is not required to contain unique values. `Nextflow` requires unique sample names, and therefore in the instance of repeat `sample_names`, `sample` will be suffixed to any `sample_name`. Non-alphanumeric characters (excluding `_`,`-`,`.`) will be replaced with `"_"`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the comma after sample_name might be unneeded.

Comment on lines 70 to 71
ID_COLUMN = "sample_name"
ID_COLUMN2 = "sample"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend changing ID_COLUMN and ID_COLUMN2 to something more like SAMPLE_NAME_COLUMN and SAMPLE_COLUMN, or ID_SAMPLE_NAME and ID_SAMPLE, so that's it's more clear the difference between the two when reading the code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, is ID_COLUMN used within the code? If it's not, it should probably be removed and maybe that changes the name to ID_HEADER or something?

Copy link
Author

@sgsutcliffe sgsutcliffe Nov 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch ID_COLUMN is a vestige from some testing. Well, really I should remove ID_COLUMN2, and replace with ID_COLUMN = "sample".
It goes into the formatting of the metadata_headers channel:

metadata_headers = Channel.of(
        tuple(
            ID_COLUMN2,
            params.metadata_1_header, params.metadata_2_header,
            params.metadata_3_header, params.metadata_4_header,
            params.metadata_5_header, params.metadata_6_header,
            params.metadata_7_header, params.metadata_8_header)
        )

I think I will change it to SAMPLE_HEADER so it fits nicely in the channel. Here d877ee2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants