rmdup: dup-num-file not created if no duplicated reads #436

fgvieira · 2024-01-23T10:17:11Z

Please check the items below before submitting an issue.
They help to improve the communication efficiency between us.
Thanks!

Prerequisites

Make sure you've installed the correct executable binary file.
For Mac users, Please download
- seqkit_darwin_amd64.tar.gz for Mac with Intel CPUs.
- seqkit_darwin_arm64.tar.gz for Mac with M series CPUs.
Make sure you are using the latest version by seqkit version -u.
Read the usage and examples for the specific subcommand.

Describe your issue in detail

Please copy and paste the command you ran and the error information if reported.
It would be more helpful to provide as much information as you can:
- Are you running on a personal computer or a server?
- What's the operating system, and how much RAM (memory) is available?
- Show the types and sizes of input files with file xxx and ls -lh xxx.
- Show some lines of input files with head -n 5 xxx or zcat xxx.gz | head -n 5.
Provide a reproducible example.
- Has this problem happened many times?
- Or it only failed with this input file or/and these command/parameters.

I am running seqkit on a RedHat server:

seqkit rmdup --threads 10  --dup-num-file dup.tsv --ignore-case --by-seq  --out-file collapsed.fastq.gz collapsed.rmdup.fastq.gz

But seqkit rmdup does not create the dup-num-file (dup.tsv) file if there are no duplicated reads in the input file.

Input file is a FASTQ:

$ zcat collapsed.fastq.gz | head -n 8
@T0_RID60_S1_CM000682.2_ngsngs:13496936-13497014_length:79_mod0000 F2 R1 merged_79_0
TAAGGAAGCAGTGGAAAAAGAATAAATGCTGTAGATGAGGACAAGAAATTAGTTGAACTTTAATAAACTTCAAATGACT
+
CCCGGGGGG=GGGJJJGJGJJGJJJJJJGJCJJC=GJJJJJJGG1JGGGJJCGJJJG=JGGCGJCCJJGJJJGJGCCJG
@T2_RID60_S1_CM000666.2_ngsngs:130549431-130549518_length:88_mod0000 F3 R1 merged_88_0
TTTGCTCATATTTTGTGAAGTATTTTTATATCTGTATTCATGAATGATATTGCCATGCAATTGTCTTTTATTTTAATAATCTTGTCTT
+
CC8G=GGGGGGGGJJGJJJJJJJJJGCGGJJGJCJJJJJGJG8J1J=GJCJGJJJJJ(GGJGJGGJGGGGJJJJGJGGGCJJCJGGCJ

The text was updated successfully, but these errors were encountered:

shenwei356 · 2024-01-23T10:51:55Z

seqkit rmdup does not create the dup-num-file (dup.tsv) file if there are no duplicated reads in the input file.

Oh, yes, it's designed to act like this. https://github.com/shenwei356/seqkit/blob/master/seqkit/cmd/rmdup.go#L181

fgvieira · 2024-01-23T17:22:58Z

I can see the logic of not creating the file if there are no duplicated reads but, when using seqkit on a snakemake workflow, sometimes it crashes because the file is not created.
An alternative would be to create an empty file if there are no duplicates.

shenwei356 · 2024-01-23T19:38:47Z

Well, I can change the behaviour. But it should be easy to detect if a file exists in snakemake with something like os.path.exists. If the file does not exist, just skip the downstream steps.

shenwei356 · 2024-01-24T10:06:02Z

Changed. Try this.

fgvieira · 2024-01-24T10:18:37Z

Nice! It seems to work!
thanks,

shenwei356 added a commit that referenced this issue Jan 24, 2024

rmdup: always write the file for duplicates number and IDs. #436

e62aeb8

fgvieira closed this as completed Jan 26, 2024

shenwei356 mentioned this issue Jan 31, 2024

Update SeqKit to v2.7.0 bioconda/bioconda-recipes#45527

Merged

Porkepix mentioned this issue Jan 31, 2024

seqkit 2.7.0 Homebrew/homebrew-core#161463

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rmdup: dup-num-file not created if no duplicated reads #436

rmdup: dup-num-file not created if no duplicated reads #436

fgvieira commented Jan 23, 2024 •

edited

Loading

shenwei356 commented Jan 23, 2024

fgvieira commented Jan 23, 2024

shenwei356 commented Jan 23, 2024

shenwei356 commented Jan 24, 2024

fgvieira commented Jan 24, 2024

rmdup: dup-num-file not created if no duplicated reads #436

rmdup: dup-num-file not created if no duplicated reads #436

Comments

fgvieira commented Jan 23, 2024 • edited Loading

Prerequisites

Describe your issue in detail

shenwei356 commented Jan 23, 2024

fgvieira commented Jan 23, 2024

shenwei356 commented Jan 23, 2024

shenwei356 commented Jan 24, 2024

fgvieira commented Jan 24, 2024

fgvieira commented Jan 23, 2024 •

edited

Loading