Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parameter --skip_fastp throws an error, parameter trim_fastq set to false not working as expected #263

Closed
jonoave opened this issue Aug 9, 2023 · 4 comments
Assignees
Labels
bug Something isn't working

Comments

@jonoave
Copy link

jonoave commented Aug 9, 2023

Description of the bug

As part of my testing on comparing trimming parameters, I would like to use input data that has been trimmed, and turn off all trimming steps in of the pipeline.

First I tried with adding skip_fastp to my run. This resulted in the following error

ERROR ~ Error executing process > 'NFCORE_SMRNASEQ:SMRNASEQ:MIRTRACE:MIRTRACE_RUN (1)'

Caused by:
  Not a valid path value type: java.util.ArrayList ([/sfs/9/ws/qeajl01-smrnaseq_test/data/smrnaseq_trimmed_reads/QBCOS019AT_trimmed.fastq.gz])

However, if I remove the parameter --skip_fastp and add the following parameters instead:

--trim_fastq false \
--clip_r1 0 \
--three_prime_clip_r1 0 \
--fastp_min_length 15 \

The pipeline completed without any errors. However, looking at the multiQC it appears there are still some adapter trimming being performed.

image

Could it be that the default sequence of --three_prime_adapter is being used for adapter trimming?

Thank you.

Command used and terminal output

nextflow run nf-core/smrnaseq
		 -r 2.2.1
		 -profile cfc
		 --input /sfs/9/ws/qeajl01-smrnaseq_test/data/samplesheet_1_trimmed_nfcore.csv
		 --genome GRCh38
		 --skip_fastp
		 --mirtrace_species hsa
		 --hairpin /sfs/9/ws/qeajl01-smrnaseq_test/data/mirBase/hairpin.fa
		 --mature /sfs/9/ws/qeajl01-smrnaseq_test/data/mirBase/mature.fa
		 --mirna_gtf /sfs/9/ws/qeajl01-smrnaseq_test/data/mirBase/hsa.gff3
		 --outdir /sfs/9/ws/qeajl01-smrnaseq_test/results/out_smrnaseq_1_nfcore_skipTrim_skipfastp

Relevant files

No response

System information

Nextflow version: 23.04.2 build 5870
Hardware: HPC
Executor: slurm
Container engine: Singularity
OS: CentOS
Version of nf-core/smrnaseq: 2.2.1

@jonoave jonoave added the bug Something isn't working label Aug 9, 2023
@musaqa
Copy link

musaqa commented Feb 8, 2024

I have the same problem and i suspect its because the files are gzipped and mirtrace doesn't try to unzip them. Try unzipping the files and then running the command/

@apeltzer apeltzer added this to smrnaseq Aug 8, 2024
@atrigila atrigila self-assigned this Aug 21, 2024
@atrigila
Copy link
Contributor

I attempted to reproduce this error with the following command in latest dev version:

nextflow run smrnaseq -profile docker --outdir issue_236_skip_fastp -resume --skip_fastp --input /workspace/smrnaseq/assets/samplesheet.csv --mirtrace_species hsa

The pipeline finished correctly and the fastp step was not executed.

However, according to the methods in the paper, miRTrace applies its own trimming logic, which includes removing reads shorter than 18 nucleotides after adapter trimming, handling specific adapter sequences, even if the data has already been trimmed before being passed into the pipeline.

The "reads < 18 nt after adapter removal" metric in the MultiQC report is sourced from the mirtrace-results.json file generated by miRTrace, specifically from the statsQC array. This metric counts reads that were trimmed to a length of less than 18 nucleotides by miRTrace during its processing, which indicates that miRTrace is still performing trimming. This means that disabling external trimming steps (e.g., --skip_fastp, --trim_fastq false) affects the initial trimming phases in the pipeline, but the internal miRTrace trimming is independent of these settings.

If the pipeline profile is set to a specific protocol (e.g., illumina, qiaseq, cats, nextflex), the miRTrace module in this pipeline will adjust its processing steps to match the structure of the reads expected for that protocol. If no protocol is specified, miRTrace defaults to the illumina protocol.

If you want miRTrace to handle the trimming in a specific way, set the profile explicitly to one of those available in the pipeline.

However, if you do not want the internal trimming in miRTrace it is possible to disable it. You should use protocol 'custom' which will default to no protocol option and therefore to adapter none.

In the older version (-r 2.2.1) of the MIRTRACE_RUN process, the adapter parameter was explicitly passed to miRTrace using the --adapter flag. If --skip_fastp was not set, then the adapter sequences were obtained from this process. This should be resolved now.

I am working on a test case that uses --skip_fastp.

@atrigila
Copy link
Contributor

This error is linked to #367 . The input channel for mirTrace requires an adapter sequence value that is absent when profile protocol custom or --skip_fastp is applied.

The adapter sequence is not used in the mirTrace module, so it could be removed.

cc @nschcolnicov

@atrigila
Copy link
Contributor

Closed via #383

@github-project-automation github-project-automation bot moved this from On Hold to Done in smrnaseq Aug 24, 2024
nschcolnicov added a commit that referenced this issue Oct 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Done
Development

No branches or pull requests

3 participants