Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented watershed algorithm for finding local minima in 1D data based on topological persistence. #3515

Merged
merged 2 commits into from
Sep 13, 2017

Conversation

samuelklee
Copy link
Contributor

This bit of code will be used for the new segmentation method, but could be more generally useful as well. I put it in a utils package within the copynumber package for now, but we can put it somewhere else if more appropriate.

@samuelklee samuelklee force-pushed the sl_persistence_optimizer branch 2 times, most recently from b25c8ea to 2688531 Compare August 25, 2017 19:21
@samuelklee
Copy link
Contributor Author

@asmirnov239 Please review!

@samuelklee samuelklee force-pushed the sl_persistence_optimizer branch 2 times, most recently from daf3b49 to fe8ba43 Compare August 25, 2017 19:28
@codecov-io
Copy link

codecov-io commented Aug 25, 2017

Codecov Report

Merging #3515 into master will increase coverage by 0.012%.
The diff coverage is 84.946%.

@@               Coverage Diff               @@
##              master     #3515       +/-   ##
===============================================
+ Coverage     79.905%   79.917%   +0.012%     
- Complexity     17918     17945       +27     
===============================================
  Files           1199      1200        +1     
  Lines          65102     65195       +93     
  Branches       10142     10160       +18     
===============================================
+ Hits           52020     52102       +82     
- Misses          9042      9049        +7     
- Partials        4040      4044        +4
Impacted Files Coverage Δ Complexity Δ
...umber/utils/optimization/PersistenceOptimizer.java 84.946% <84.946%> (ø) 27 <27> (?)
...oadinstitute/hellbender/utils/gcs/BucketUtils.java 78.571% <0%> (+0.649%) 39% <0%> (ø) ⬇️
...e/hellbender/engine/spark/SparkContextFactory.java 73.973% <0%> (+2.74%) 11% <0%> (ø) ⬇️

.mapToDouble(i -> (double) i / numPoints - 0.5)
.map(x -> (x - 0.75) * (x - 0.5) * (x - 0.25) * (x - 0.1) * (x + 0.25) * (x + 0.5) + 0.0001 * rng.nextGaussian())
.toArray();
System.out.println(Arrays.stream(dataPolynomial).boxed().map(Object::toString).collect(Collectors.joining(", ")));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops! Reminder to self to remove this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Collaborator

@asmirnov239 asmirnov239 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done with the review. Looks great, just a few minor comments!

/**
* Identifies the local minima of {@code data} based on topological persistence upon construction.
*/
public PersistenceOptimizer(final double[] data) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you leave a description here of the what the data parameter represents, i.e. that it's ordered values of a 1D function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


//find extrema pairs via a watershed algorithm
private static List<ExtremaPair> findExtremaPairs(final double[] data,
final List<Integer> sortedIndices) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that sortedIndices has to be stably sorted (which it was in the constructor) for the algorithm to work correctly (since if we have a local constant interval in the provided function, we will could get a lot of spurious components). Could you still mention this requirement here for the future devs, and also add an extra test for this test case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expanded the comment. I have a test case where the entire function is constant, but I'll add one like you describe.

extremaPairs.add(new ExtremaPair(data, components.get(rightColor).minIndex, index));
}
mergeComponents(components, colors, leftColor, rightColor);
colors[index] = colors[index - 1]; //local maxmimum takes color of component to left by convention
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maxmimum -> maximum

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

extremaPairs.add(new ExtremaPair(data, components.get(rightColor).minIndex, index));
}
mergeComponents(components, colors, leftColor, rightColor);
colors[index] = colors[index - 1]; //local maxmimum takes color of component to left by convention
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little confused here. Isn't the local maximum a part of a larger component at this point in the code, color of which is determined by which of the merged components has the lowest minimum? Also can't we just skip recoloring of the local maximum as we skip recoloring of non-boundary component values in the mergeComponents method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the maximum has been assigned a color yet, since we are traversing the points in ascending order, and both createComponent and extendComponent only assign colors to the point under consideration.

You are right in that assigning a color is not necessary just to find and order the local minima, but it's nice to have all points assigned a color at the end (and that's probably what would reasonably be expected from anyone looking to reuse this code in the future).

But perhaps it would be more consistent to also assign the color within mergeComponents; i.e., I'll change the signature of mergeComponents so that we call mergeComponents(components, colors, leftColor, rightColor, index) and will move the assignment of the color into the method accordingly.

colors[components.get(indexToMerge).leftIndex] = indexToKeep;
colors[components.get(indexToMerge).rightIndex] = indexToKeep;
if (components.get(indexToKeep).minIndex > components.get(indexToMerge).minIndex) {
components.get(indexToKeep).leftIndex = components.get(indexToMerge).leftIndex;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not removing the components.get(indexToMerge) component may introduce some space overhead, which could be linear in most degenerate examples, but I don't see a quick work-around..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically true, but space considerations are probably not an issue in most cases (and I think actually removing "merged" components also complicates the bookkeeping a bit).

.toArray();
final List<Integer> minimaIndicesExpectedIdentityNegative = Collections.singletonList(numPoints - 1);

final double[] dataFlat = new double[numPoints];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add an extra test case, which has a flat part, but also has a single maxima and minima?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion---the result of that test is actually a little non-intuitive! Added some extra cases to boot.

@samuelklee samuelklee force-pushed the sl_persistence_optimizer branch 5 times, most recently from 2c5d6b5 to 5b039fd Compare September 13, 2017 04:17
@samuelklee
Copy link
Contributor Author

samuelklee commented Sep 13, 2017

Addressed review comments, expanded documentation, added some test cases, and did some minor renaming/cleanup. Thanks, back to you @asmirnov239!

@asmirnov239
Copy link
Collaborator

asmirnov239 commented Sep 13, 2017

The changes look good @samuelklee!

@samuelklee samuelklee merged commit d44babd into master Sep 13, 2017
@samuelklee samuelklee deleted the sl_persistence_optimizer branch September 13, 2017 20:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants