Skip to content

Commit

Permalink
BUG Fix concatenation of multiple FastQ files
Browse files Browse the repository at this point in the history
Adds a test too.
  • Loading branch information
luispedro committed Jul 4, 2019
1 parent 43c3bcb commit bd2bed1
Show file tree
Hide file tree
Showing 10 changed files with 41 additions and 24 deletions.
1 change: 1 addition & 0 deletions ChangeLog
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
Version 1.0.0+
* Reintroduce zstd compression (after fixes upstream)
* Fix CIGAR interpretation (#109) occurring when I is present
* Fix bug with external modules and multiple fastQ inputs

Version 1.0.0 2019-04-24 by luispedro
* Fix multiple features usage (#63)
Expand Down
35 changes: 16 additions & 19 deletions NGLess/Data/FastQ/Utils.hs
Original file line number Diff line number Diff line change
Expand Up @@ -7,37 +7,34 @@ module Data.FastQ.Utils
import qualified Data.Conduit as C
import qualified Data.Conduit.Binary as CB
import Data.Conduit ((.|))
import System.IO (hClose)
import Control.Monad
import Control.Monad.Except
import Control.Monad (forM_)

import Data.List (isSuffixOf)

import NGLess.NGError

import FileManagement
import FileManagement (makeNGLTempFile)
import Utils.Conduit (linesC)
import Data.FastQ
import Data.FastQ (FastQFilePath(..), fqDecodeC, fqEncodeC)
import Data.Conduit.Algorithms.Async (asyncGzipTo, conduitPossiblyCompressedFile)

concatenateFQs :: [FastQFilePath] -> NGLessIO FastQFilePath
concatenateFQs [] = throwShouldNotOccur "Empty argument to concatenateFQs"
concatenateFQs [f] = return f
concatenateFQs (FastQFilePath enc fp:rest) = do
(fres, h) <- openNGLTempFile "concatenate" fp "fq.gz"
let catTo f enc'
| enc /= enc' =
conduitPossiblyCompressedFile f
.| linesC
.| fqDecodeC enc'
.| fqEncodeC enc
.| asyncGzipTo h
| ".gz" `isSuffixOf` f = CB.sourceFile f .| CB.sinkHandle h
| otherwise = conduitPossiblyCompressedFile f .| asyncGzipTo h
C.runConduitRes $ catTo fp enc
forM_ rest $ \(FastQFilePath enc' f') ->
C.runConduitRes (catTo f' enc')
liftIO $ hClose h
fres <- makeNGLTempFile fp "concatenate" "fq.gz" $ \hout -> do
let catTo f enc'
| enc /= enc' =
conduitPossiblyCompressedFile f
.| linesC
.| fqDecodeC enc'
.| fqEncodeC enc
.| asyncGzipTo hout
| ".gz" `isSuffixOf` f = CB.sourceFile f .| CB.sinkHandle hout
| otherwise = conduitPossiblyCompressedFile f .| asyncGzipTo hout
C.runConduitRes $ catTo fp enc
forM_ rest $ \(FastQFilePath enc' f') ->
C.runConduitRes (catTo f' enc')
return $ FastQFilePath enc fres


2 changes: 1 addition & 1 deletion NGLess/ExternalModules.hs
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ import Control.Monad
import System.Process
import System.Environment (getEnvironment, getExecutablePath)
import System.Directory (getDirectoryContents, doesFileExist, doesDirectoryExist)
import System.Exit
import System.Exit (ExitCode(..))
import System.IO
import System.FilePath
import Data.Maybe
Expand Down
5 changes: 4 additions & 1 deletion NGLess/FileManagement.hs
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,10 @@ checkFilenameLength base ext = if len > 240
-- directory and deleting the file when necessary)
--
-- These files will be auto-removed when ngless exits
openNGLTempFile' :: FilePath -> String -> String -> NGLessIO (ReleaseKey, (FilePath, Handle))
openNGLTempFile' :: FilePath -- ^ basename
-> String -- ^ prefix
-> String -- ^ extension
-> NGLessIO (ReleaseKey, (FilePath, Handle))
openNGLTempFile' base prefix ext = do
tdir <- nConfTemporaryDirectory <$> nglConfiguration
liftIO $ createDirectoryIfMissing True tdir
Expand Down
13 changes: 13 additions & 0 deletions docs/sources/whatsnew.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,19 @@
What's New (History)
====================

Post Version 1.0
----------------

User-visible improvements
~~~~~~~~~~~~~~~~~~~~~~~~~
- ZSTD compression is available for output and intermediate files use it for
reduced temporary space usage (and possibly faster processing).

Bugfixes
~~~~~~~~

- Fix bug with external modules and multiple fastQ inputs.

Version 1.0
-----------

Expand Down
5 changes: 3 additions & 2 deletions tests/exampleExternalModule/example-cmd.ngl
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
ngless '0.0'
ngless '1.0'
import "example-cmd" version "0.0"
import "mocat" version "0.6"

input = fastq('sample.fq')
input = load_mocat_sample('sample')

testing(input)

Expand Down
1 change: 0 additions & 1 deletion tests/exampleExternalModule/sample.fq

This file was deleted.

1 change: 1 addition & 0 deletions tests/exampleExternalModule/sample/sample1.fq
1 change: 1 addition & 0 deletions tests/exampleExternalModule/sample/sample2.fq
1 change: 1 addition & 0 deletions tests/exampleExternalModule/sample/sample3.fq

0 comments on commit bd2bed1

Please sign in to comment.