Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding ability to provide bed file for sambamba slice #307

Merged
merged 7 commits into from
Aug 31, 2017
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .test_suite.sh
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,12 @@ testSliceMultipleRegions() {
assertEquals 0 $?
}

testSliceMultipleRegionsBed() {
./build/sambamba slice -L test/chr2_test_region.bed test/issue_204.bam |\
./build/sambamba view -c /dev/stdin > /dev/null
assertEquals 0 $?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps check at least the number of reads? (see test below, the test above is admittedly not a good example)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also add another BED with overlapping/repeated entries.

}

testView() {
assertEquals "1806" `./build/sambamba view -c ex1_header.sorted.bam chr2`
assertEquals "1464" `./build/sambamba view -c ex1_header.sorted.bam chr1`
Expand Down
23 changes: 20 additions & 3 deletions sambamba/slice.d
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ import std.parallelism;
import std.conv;
import std.stdio;
import std.exception;
import sambamba.utils.common.bed;

import sambamba.utils.common.overwrite;

Expand Down Expand Up @@ -167,6 +168,7 @@ void fetchRegions(BamReader bam, Region[] regions, ref Stream stream)
// write R1
foreach (read; r1.data)
writer.writeRecord(read);
// Flush the BamWriter before our copyAsIs call.
writer.flush();

copyAsIs(bam, stream, s2_start_offset, s2_end_offset);
Expand Down Expand Up @@ -271,6 +273,8 @@ void printUsage()
stderr.writeln();
stderr.writeln("OPTIONS: -o, --output-filename=OUTPUT_FILENAME");
stderr.writeln(" output BAM filename");
stderr.writeln(" -L, --regions=FILENAME");
stderr.writeln(" output only reads overlapping one of regions from the BED file");
}

version(standalone) {
Expand All @@ -283,17 +287,24 @@ int slice_main(string[] args) {

// at least two arguments must be presented
string output_filename = null;
string bed_filename = null;

try {
getopt(args,
std.getopt.config.caseSensitive,
"output-filename|o", &output_filename);
"output-filename|o", &output_filename,
"regions|L", &bed_filename);

if (args.length < 3) {
if ( (bed_filename.length == 0 && args.length < 3) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be more readable to first check if args.length < 2, then put args[2..$] into a variable and then check if it's empty/non-empty.

(args.length < 2) ) {
printUsage();
return 0;
}

if (bed_filename.length > 0 && args.length > 2) {
throw new Exception("specifying both region and BED filename is disallowed");
}

protectFromOverwrite(args[1], output_filename);

import std.parallelism;
Expand Down Expand Up @@ -321,7 +332,13 @@ int slice_main(string[] args) {
if (args[2] == "*") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With new changes, if BED file is provided, this causes access outside array boundaries.

fetchUnmapped(bam, stream);
} else {
auto regions = map!parseRegion(args[2 .. $]).array;
Region[] regions;
if (bed_filename !is null) {
auto bam_regions = parseBed(bed_filename, bam);
regions = cast(Region[])bam_regions.map!(r => Region(bam.reference(r.ref_id).name, r.start, r.end)).array;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cast() seems superfluous

} else {
regions = cast(Region[])map!parseRegion(args[2 .. $]).array;
}
fetchRegions(bam, regions, stream);
}

Expand Down
3 changes: 3 additions & 0 deletions test/chr2_test_region.bed
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
2 166860000 166870000
2 166875000 166890000
3 100000000 200000000