Issue with sequence of length 1 and quality '+' #408

dehui333 · 2023-09-18T04:36:13Z

Prerequisites

make sure you're are using the latest version by seqkit version
read the usage

Describe your issue

describe the problem
provide a reproducible example

Problem:

seqkit sana flags a sequence of length 1 and having the quality string of '+' as problematic for some reasons. This does not happen when the quality value is some other valid values which I have tried or when the sequence is longer than 1bp.

Example:

echo -e '@seq\nA\n+\n+\n' | seqkit sana
[INFO] File: - Discarded line: Invalid line states! 1: @seq
[INFO] File: - Discarded line: Invalid line states! 2: A
[INFO] File: - Discarded line: Invalid line states! 3: +
[INFO] File: - Discarded line: Invalid line states! 4: +
[INFO] File: - Pass records: 0 Discarded lines: 4

echo -e '@seq\nA\n+\n?\n' | seqkit sana
[INFO] File: - Pass records: 1 Discarded lines: 0
@seq
A
+
?

echo -e '@seq\nAA\n+\n++\n' | seqkit sana
[INFO] File: - Pass records: 1 Discarded lines: 0
@seq
AA
+
++

The text was updated successfully, but these errors were encountered:

shenwei356 · 2023-09-18T10:29:31Z

@botond-sipos might help. I tried but failed to understand the code logic.

botond-sipos · 2023-09-23T10:15:23Z

This is an unfortunate edge case. The parser does not rely on the 4-line structure of the fastq files, hence it needs a way to classify the input lines (see here).
In the case described in the thread, lines containing a single '+' are classified as separator lines and hence the record will have two consecutive separator lines which is invalid. Unfortunately, I cannot fix this as there is little else to rely on when classifying separator vs. quality lines.
Please consider this a known bug.

dehui333 · 2023-09-25T22:41:44Z

Thanks for your explanation. This in itself certainly is not a big issue but I suspect it could have something to do with the output of seqkit sana and seqkit seq -m being corrupted in some cases.

There were instances when the above operations led to outputs that seqkit stats complained as invalid fastx; for the same input, the output was okay when I used seqtk but somehow seqkit seq -m corrupted it. I noticed that in all these cases the input fastq had sequences of length 1.

Unfortunately, I don't have time at the moment to investigate more about this. I also cannot rule out the possibility that it's due to something else. Anyway, I hope this information could be useful in some way if something similar is ever observed by others in the future.

Edit: I realized my fasta file is getting corrupted after copying from one storage system to another, this increases the likelihood that the abovementioned issue is not due to seqkit. You probably don't have to worry about it.

shenwei356 · 2024-03-19T19:58:42Z

It's fixed by @botond-sipos .

shenwei356 · 2024-04-07T09:15:55Z

Fixed in v2.8.1.

shenwei356 mentioned this issue Dec 24, 2023

Seqkit sana fails on valid FASTQ #429

Closed

shenwei356 closed this as completed Feb 23, 2024

shenwei356 mentioned this issue Apr 7, 2024

Update SeqKit to v2.8.1 bioconda/bioconda-recipes#47030

Merged

BrewTestBot mentioned this issue Apr 7, 2024

seqkit 2.8.1 Homebrew/homebrew-core#168248

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with sequence of length 1 and quality '+' #408

Issue with sequence of length 1 and quality '+' #408

dehui333 commented Sep 18, 2023

shenwei356 commented Sep 18, 2023

botond-sipos commented Sep 23, 2023

dehui333 commented Sep 25, 2023 •

edited

Loading

shenwei356 commented Mar 19, 2024

shenwei356 commented Apr 7, 2024

Issue with sequence of length 1 and quality '+' #408

Issue with sequence of length 1 and quality '+' #408

Comments

dehui333 commented Sep 18, 2023

Prerequisites

Describe your issue

shenwei356 commented Sep 18, 2023

botond-sipos commented Sep 23, 2023

dehui333 commented Sep 25, 2023 • edited Loading

shenwei356 commented Mar 19, 2024

shenwei356 commented Apr 7, 2024

dehui333 commented Sep 25, 2023 •

edited

Loading