-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pyfastx.read.raw mangling reads #75
Comments
Thank you for reporting this issue. I will check and fix it. |
I have download the SRR5382385 to test. But I could not reproduce the issue. Could you provide a simple script for me to reproduce this issue. Thanks. |
Sure no problem, I successfully replicated the issue with this script:
Using version 2.0.2 installed from pip on an M2 arm64 macbook with a 697M gzipped fastq, whether or not the fastq is compressed seems to have no impact leading to identical mangling. I can provide the specific fastq if desired but I get the same issue with any large fastq. |
Thanks! I have no macbook with M2 arm64 to test. But i will try to do it. That may take a long time. You can try to use |
I just fully reproduced the problem in a fresh environment on an x86_64 cluster like: mamba env create -n pyfastx_test pip seqtk
wget {fastq_file}
seqtk size {fastq_file} Returns
Returns With the 647th read looking like (quality line for first read merged with read id line from next read):
|
I'm also interested in this fix |
Fixed in v2.1.0 |
Amazing, thanks @lmdu ! |
When iterating through a gzipped fastq file and dumping reads to various output filehandles using read.raw from records stored in a dict reads dumped to files are sometimes mangled. It seems that the way the raw string is generated is flawed in some way leading to weirdness like the below, this occurs predictably and the mangling can be changed by removing a read from the start of the file.
I managed to get around this by not indexing the fastq files in question and reconstructing the record from the name, seq, qual tuple.
An example of a mangled read record:
A script which can be used to reproduce this issue is available here happy to provide any more information if needed to trace the problem.
The text was updated successfully, but these errors were encountered: