-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug for handling /1 suffix in single-ended reads #580
Comments
Hmmm.. That's strange and I can't reproduce with
The headers have the expected format in both cases:
|
Could you please try updating to |
Sorry for not seeing your reply earlier. I continue to see the issue with both 1.1.3 and 1.1.4. my command is
You can grab this same input to test here: https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/genomics/homo_sapiens/illumina/fastq/test_rnaseq_1.fastq.gz . i think in the example you used the header might have been formatted differently than in mine -- there is no flowcell information in the header in my case. |
Thanks for the example 😁 I get the same output. While I look into it, you can try using the
|
I think I'm right in saying there is no strict convention for denoting read 1/2 in field 1. Usually it's e.g In your case, you have a paired end read file with the read numbers as part of the read name. As you show above, this is handled appropriately with @IanSudbery, would just need to add the equivalent of UMI-tools/umi_tools/umi_methods.py Lines 119 to 132 in d98ebac
|
Not quite that simple. There isn't a single end fastq parser that is the equivalent of the joinedFastqParser. This is presumably where the problem is coming from. The structure here is that the joinedParser is called with two fastq iteratrors as parameters. In the single-end case there is no parser, just the raw iterator. I don't think there should be any problem with moving the suffix code to the iterator. Except that I'll need to make sure it always gets called correctly. |
Ah, yeah, I was scanning the code too lightly. Nice work on the PR 👍 |
i installed the latest updates including this PR and unfortunately it errored out.
|
is there any update here? i've tested out the master branch by installing from a clone into my conda environment (
so now both PE and SE are failing in the latest code. to use the same data as me, you can grab them using the URLs:
|
Hi Anna, Looks like I fixed this back in april and then never merged the fix :(. Sorry. |
ok great! i thought the fix was in master because #591 was merged already, and didn't realize any more updates had been planned. i tested out the unmerged branch, looks like it resolves the issue i was experiencing at least. |
I have noticed inconsistent behavior in how
umi_tools extract
handles the /1 suffix depending on whether the inputs are paired end or single end. when the input is paired-end read id is changed from@SRR5665260.1.36873006/1
to@SRR5665260.1.36873006_CAG
, basically removing the read number as it separates them into two distinct files. If I take only the read1 fastq file and runumi_tools extract
, the same read becomes@SRR5665260.1.36873006/1_CAG
, even though i am using--ignore-read-pair-suffixes
in both cases. I am using v1.1.2.This is causing issues downstream for me when i run alignment with STAR. STAR discards everything after
/
in the read ID. just wondering if this can be fixed on theumi_tools
end.The text was updated successfully, but these errors were encountered: