Increase efficiency of IoU scores with Linear Algebra #15

JacobGlennAyers · 2021-02-14T00:27:19Z

The current approach to finding the IoU scores is an O(n^2) nested for-loop approach. It can be refactored taking advantage of numpy's linear algebra capabilities to be closer to O(n). Low priority for now since we merely need to demo, but will be crucial as we get closer to larger-scale tests.

JacobGlennAyers · 2021-03-24T05:15:56Z

Relabeled as medium priority due to the slow run-time when trying to garner statistics in the optimization process. It can take up to ten minutes to run the algorithm 100 times on six clips.

JacobGlennAyers · 2021-04-15T22:38:06Z

We would first want to add a bunch of timing statements in the global_IoU_Statistics() function to identify where the largest timing bottlenecks exist. I am pretty certain the generation of the IoU Matrices are the largest bottleneck. Linear algebra might be able to speed it up, though a simpler approach could be simply using the "OFFSET" and "DURATION" column times directly instead of creating numpy arrays for the task.

JacobGlennAyers · 2021-04-20T04:04:40Z

clip_IoU is currently the largest bottleneck

sprestrelski · 2022-07-12T22:29:47Z

In testing, I wrote a modified version of the original clip_IoU() method where I skipped calculating the IoU for automated annotations where the start time was greater than the manual start, and where the end time was less than the manual end. This runtime is at most $O(n^2)$.

For the NumPy/linear algebra rewritten method, I used matrix multiplication to calculate the intersection, instead of a nested for loop. This achieves the intended behavior of multiply each row by each column. A nested for-loop was still used to calculate the union of each manual label to each annotated label, but using numpy instead (logical ORR of the manual and automated annotation, then sum the list). Although the overall function runtime is still $O(n^2)$, it should be closer to the lower bound, as shown in the benchmarks below. Optimizing the union calculation could get the runtime below $O(n^2)$.

Benchmarks for Screaming Piha dataset (clip_IoU):
Metrics are shown to show that IoU is calculated the same way between all of the methods.

Benchmarks for Screaming Piha dataset (automated_labeling_statistics):
Note: 6 of them generated a division by 0 error, leading to 0 for precision/recall/F1

Given that 25% ≈ 600 clips of Mixed Bird produces 5000 annotations, it is still unreasonable to use for large amounts of data (93 * 254 comparisons is small compared to 5000 x 5000 comparisons)

The notebook can be found on the performance-iou branch.

Some ideas discussed with Nathan to further optimize this were:

Use C instead
- thrown out b/c of the NumPy refactor, unless you wanted to go the nuclear option
- Pybind11 - call C++ functions from Python directly
- Working Example: Radio Collar Tracker
Switch from floating point to fixed point
Sort and throw out non-relevant annotations (similar to modified version)

sprestrelski · 2022-07-13T00:48:56Z

Throwing out annotations with an intersection of 0 makes the function much faster, while giving equivalent statistics. This makes it actually feasible to run on large datasets 😃

Dataset	Method	skip	lin-alg	lin-alg + skip
Screaming Piha	clip_IoU	286.4s	171.8s	23.4s
Screaming Piha	automated_labeling_statistics()	49.9s	31.1s	5.6s
25% Mixed Bird	automated_labeling_statistics()	-	2221.1s	337.7s

Notebook and relevant files (e.g. modified statistics.py file) were moved to the passive-acoustic-biodiversity repo.

Description of algorithms used:
skip: modified original version, skipping calculations when end<min, start>max
lin: matrix transposition for intersection, nested for loop
lin-skip: matrix transposition for intersection, nested for loop and skipping calculations when intersection = 0 (highest bound = $O(n^2)$

Benchmarks for Screaming Piha dataset (clip_IoU):

Benchmarks for Screaming Piha dataset (automated_labeling_statistics):

Benchmarks for 25% Mixed Bird (automated_labeling_statistics):

Refactors clip_IoU method with linear algebra and numpy

JacobGlennAyers added Low Priority Non-Trivial labels Feb 14, 2021

JacobGlennAyers self-assigned this Mar 4, 2021

JacobGlennAyers added Medium Priority and removed Low Priority labels Mar 24, 2021

JacobGlennAyers added Challenging and removed Non-Trivial labels Apr 20, 2021

JacobGlennAyers assigned ryleigh-3-14159 Apr 20, 2021

JacobGlennAyers added this to the Improving Temporal Performance of Functions milestone Apr 20, 2021

JacobGlennAyers assigned JacobGlennAyers and unassigned JacobGlennAyers and ryleigh-3-14159 May 1, 2021

JacobGlennAyers assigned sprestrelski May 25, 2022

sprestrelski mentioned this issue Jul 13, 2022

Increase efficiency of IoU scores (#15) #129

Merged

sprestrelski closed this as completed in #129 Jul 18, 2022

sprestrelski added a commit that referenced this issue Jul 18, 2022

Increase efficiency of IoU scores (#15)

fa91052

Refactors clip_IoU method with linear algebra and numpy

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase efficiency of IoU scores with Linear Algebra #15

Increase efficiency of IoU scores with Linear Algebra #15

JacobGlennAyers commented Feb 14, 2021

JacobGlennAyers commented Mar 24, 2021

JacobGlennAyers commented Apr 15, 2021

JacobGlennAyers commented Apr 20, 2021

sprestrelski commented Jul 12, 2022

sprestrelski commented Jul 13, 2022 •

edited

Loading

Increase efficiency of IoU scores with Linear Algebra #15

Increase efficiency of IoU scores with Linear Algebra #15

Comments

JacobGlennAyers commented Feb 14, 2021

JacobGlennAyers commented Mar 24, 2021

JacobGlennAyers commented Apr 15, 2021

JacobGlennAyers commented Apr 20, 2021

sprestrelski commented Jul 12, 2022

sprestrelski commented Jul 13, 2022 • edited Loading

sprestrelski commented Jul 13, 2022 •

edited

Loading