Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase efficiency of IoU scores with Linear Algebra #15

Closed
JacobGlennAyers opened this issue Feb 14, 2021 · 5 comments · Fixed by #129
Closed

Increase efficiency of IoU scores with Linear Algebra #15

JacobGlennAyers opened this issue Feb 14, 2021 · 5 comments · Fixed by #129

Comments

@JacobGlennAyers
Copy link
Contributor

The current approach to finding the IoU scores is an O(n^2) nested for-loop approach. It can be refactored taking advantage of numpy's linear algebra capabilities to be closer to O(n). Low priority for now since we merely need to demo, but will be crucial as we get closer to larger-scale tests.

@JacobGlennAyers
Copy link
Contributor Author

Relabeled as medium priority due to the slow run-time when trying to garner statistics in the optimization process. It can take up to ten minutes to run the algorithm 100 times on six clips.

@JacobGlennAyers
Copy link
Contributor Author

We would first want to add a bunch of timing statements in the global_IoU_Statistics() function to identify where the largest timing bottlenecks exist. I am pretty certain the generation of the IoU Matrices are the largest bottleneck. Linear algebra might be able to speed it up, though a simpler approach could be simply using the "OFFSET" and "DURATION" column times directly instead of creating numpy arrays for the task.

@JacobGlennAyers
Copy link
Contributor Author

clip_IoU is currently the largest bottleneck

@sprestrelski
Copy link
Member

In testing, I wrote a modified version of the original clip_IoU() method where I skipped calculating the IoU for automated annotations where the start time was greater than the manual start, and where the end time was less than the manual end. This runtime is at most $O(n^2)$.

For the NumPy/linear algebra rewritten method, I used matrix multiplication to calculate the intersection, instead of a nested for loop. This achieves the intended behavior of multiply each row by each column. A nested for-loop was still used to calculate the union of each manual label to each annotated label, but using numpy instead (logical ORR of the manual and automated annotation, then sum the list). Although the overall function runtime is still $O(n^2)$, it should be closer to the lower bound, as shown in the benchmarks below. Optimizing the union calculation could get the runtime below $O(n^2)$.

Benchmarks for Screaming Piha dataset (clip_IoU):
Metrics are shown to show that IoU is calculated the same way between all of the methods.
image

Benchmarks for Screaming Piha dataset (automated_labeling_statistics):
Note: 6 of them generated a division by 0 error, leading to 0 for precision/recall/F1
image

Given that 25% ≈ 600 clips of Mixed Bird produces 5000 annotations, it is still unreasonable to use for large amounts of data (93 * 254 comparisons is small compared to 5000 x 5000 comparisons)

The notebook can be found on the performance-iou branch.

Some ideas discussed with Nathan to further optimize this were:

  • Use C instead
  • Switch from floating point to fixed point
  • Sort and throw out non-relevant annotations (similar to modified version)

@sprestrelski
Copy link
Member

sprestrelski commented Jul 13, 2022

Throwing out annotations with an intersection of 0 makes the function much faster, while giving equivalent statistics. This makes it actually feasible to run on large datasets 😃

Dataset Method skip lin-alg lin-alg + skip
Screaming Piha clip_IoU 286.4s 171.8s 23.4s
Screaming Piha automated_labeling_statistics() 49.9s 31.1s 5.6s
25% Mixed Bird automated_labeling_statistics() - 2221.1s 337.7s

Notebook and relevant files (e.g. modified statistics.py file) were moved to the passive-acoustic-biodiversity repo.

Description of algorithms used:
skip: modified original version, skipping calculations when end<min, start>max
lin: matrix transposition for intersection, nested for loop
lin-skip: matrix transposition for intersection, nested for loop and skipping calculations when intersection = 0 (highest bound = $O(n^2)$

Benchmarks for Screaming Piha dataset (clip_IoU):
image
Benchmarks for Screaming Piha dataset (automated_labeling_statistics):
image

Benchmarks for 25% Mixed Bird (automated_labeling_statistics):
image

sprestrelski added a commit that referenced this issue Jul 18, 2022
Refactors clip_IoU method with linear algebra and numpy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants