-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
output fastq file(without gzip) when process a fastq.gz file #18
Comments
Sorry, I don't think I will implement this. It is not a common use case. If you have a program that does not read gzipped fastq, you can always create a dummy file with
|
The program such as trimmomatic/bwa that I am using can support gzipped fastq, but the performance of gzip fastq reading is worse than that of fastq file, because the gunzip process is not multiple thread support. The fastq.gz file will save disk space, but the file of NxTrim is not the final result file, so we can delete it after it is used. And It is not the fastq final result , because of the lacking of low qua remove function? |
I am not convinced of the need to trim low quality bases. Aligners can split reads and modern assemblers use error correction as a pre-processing step. It is not clear a trimming heuristic does a better job than these sophisticated algorithms. I get very nice assemblies directly from the nxtrim output. See here: https://github.com/sequencing/NxTrim/wiki/Bacterial-assembles-using-Nextera-Mate-pairs As for performance, decompression is not a bottleneck for any serious bioinformatics task. When aligning with bwa, I see negligible differences in compute time for uncompressed versus gzipped fastq:
|
gzip will become bottleneck on very fast I/O and very large files. And it's not always the case that you directly run bwa after trimming ... So I'd go for (optional) uncompressed output as well :-) |
Yes, if you can afford to store uncompressed fastq on your SSD then this might save you some time. On the other hand, on my system, it is actually slightly slower to pull uncompressed fastq from a network disk (probably because it is i/o bound and you have to read more data). I am not convinced, but I do take pull requests ;) |
At least for the MP fraction we could use |
That is correct. I use |
ok, makes sense (for direct alignment). Maybe you could separate both by writing to |
Yes, for scaffolding you probably only want to use mp (and hence the I would really like to see a realistic use case for unzipped fastq (with actual timings) before I consider implementing it. It would have to be at least twice as fast as using the gzipped input. |
I am not sure if Concerning the speed issue ... I played around a bit. Not really a fully blown benchmark but just enough to get an idea (if I am not completely wrong):
A little perl script which simply opens the fastq files (.gz via Reading the uncompressed file is done at a rate of roughly 260MiB/sec (as seen in htop) and takes ~330sec. Compressed file is read at a rate of about 37-44MiB/sec and takes ~800sec. I also used another tool, just for checking the reading rates, https://github.com/ADAC-UoN/fqcounter This tool reads both fastq files at about the same rate as the simple perl script. This has been tested on a local filesystem (xfs) of my workstation (HP Z800). edit: as I alter the fastq-headers after nxtrim I can perfectly live with |
Sorry this isn't clear and the behaviour should be changed. If you run with In your example, you are just reading the files, but my point is that if you have to process the data in some way (ie align it for scaffolding), the decompression won't be a bottleneck. I guess conceivably if you are piping it to another trimmer that is very fast, the compression will be a bottleneck.
This makes sense for a few different reasons. I will add this. |
I just wanted to show that (de)compression in general may become a bottleneck with large data volumes and fast I/O. NFS mounts cannot deliver data that fast ... that's OK. Compression is slower but may be sped up with multithreading. |
I have added |
Can NxTrim output fastq file(without gzip) when the input is a fastq.gz file?
If the following process access the fastq file(without gzip), it will become faster.
The text was updated successfully, but these errors were encountered: