Fixed concatenation of many VCF files #309
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR checklist
Description
Fix for a hazard that took place during VCF file concatenation. This issue is quite difficult to understand, so please let me elaborate on it. These are the two lines causing a problem:
The hazard comes from the fact that sometimes the head app closes before ls and sometimes it's the opposite. When you have a small numbef of *.vcf files, then usually ls quickly prints everything to stdout and closes. Then head runs and reads the content from stdin (piped from ls) and then it closes too. No error is reported. However things work in a different way when you have many *.vcf files. in this case it takes some time for ls to print everything to stdout. head runs earlier, reads the input, prints the first line and closes. head is optimized to exit as soon as its work is complete so that when you head 100GB file it only reads the first line and not the whole file. It does not wait for closing stdint to exit like most other aplications. As soon as head get one line from stdin and prints it to the stdout, it closes itself and its stdin. However the ls is still trying to write to its stdout that is piped to the now-closed stdin of head. This does not work and ls exists with the 141 code (broken pipe).
This hazard clearly depends on the number of *.vcf files and probably also on implementation details of your OS, ls and head. In my company we had a huge sample that resulted in multiple VCF files and obviously concatenation was failing in a consistent way. When you run
a broken pipe is expected and is not a problem. The script should not exit in this situation like it does at the moment.