-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding ability to provide bed file for sambamba slice #307
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I put a few suggestions, but the only crucial bit is about array boundaries.
sambamba/slice.d
Outdated
@@ -321,7 +332,13 @@ int slice_main(string[] args) { | |||
if (args[2] == "*") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With new changes, if BED file is provided, this causes access outside array boundaries.
sambamba/slice.d
Outdated
|
||
if (args.length < 3) { | ||
if ( (bed_filename.length == 0 && args.length < 3) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be more readable to first check if args.length < 2
, then put args[2..$]
into a variable and then check if it's empty/non-empty.
sambamba/slice.d
Outdated
Region[] regions; | ||
if (bed_filename !is null) { | ||
auto bam_regions = parseBed(bed_filename, bam); | ||
regions = cast(Region[])bam_regions.map!(r => Region(bam.reference(r.ref_id).name, r.start, r.end)).array; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cast()
seems superfluous
.test_suite.sh
Outdated
testSliceMultipleRegionsBed() { | ||
./build/sambamba slice -L test/chr2_test_region.bed test/issue_204.bam |\ | ||
./build/sambamba view -c /dev/stdin > /dev/null | ||
assertEquals 0 $? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps check at least the number of reads? (see test below, the test above is admittedly not a good example)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could also add another BED with overlapping/repeated entries.
@godotgildor thank you for contributing to sambamba! Wish we got more of those. @lomereiter I think we can fix these ourselves. |
* Making the arg checks more readable * Avoiding accessing out of bounds array element if bedfile provided * Updating test to include * Actual check that # lines matches expected * Bed file has overlapping region
Hi, sorry it took me a while to get to the requested revisions. I think I have addressed the above concerns in the latest commit. Let me know what you think. |
Hi, no hurry, you're not the only busy person here. |
Hi Artem - I'm wondering if you make a quick release containing this change? I understand that making a proper release requires lots of testing and other associated work, but probably you could through some kind of |
Released. |
Much appreciated! |
I have added the ability to provide a bed file to specify regions for the sambamba slice operation. Users may specify either a bed file or the regions on the command line, but not both.
I have also added a test to the .test_suite.sh script to verify that it will slice and not return an error.
One note: the bed parser will combine regions, so if a bed file has two identical entries the sliced bam will only contain the given slice one time. This is unlike taking a region from the command line where no region combining is attempted. So if two identical regions are specified on the command line then the sliced bam would contain the given slice two times.