-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase efficiency of IoU scores with Linear Algebra #15
Comments
Relabeled as medium priority due to the slow run-time when trying to garner statistics in the optimization process. It can take up to ten minutes to run the algorithm 100 times on six clips. |
We would first want to add a bunch of timing statements in the global_IoU_Statistics() function to identify where the largest timing bottlenecks exist. I am pretty certain the generation of the IoU Matrices are the largest bottleneck. Linear algebra might be able to speed it up, though a simpler approach could be simply using the "OFFSET" and "DURATION" column times directly instead of creating numpy arrays for the task. |
clip_IoU is currently the largest bottleneck |
In testing, I wrote a modified version of the original clip_IoU() method where I skipped calculating the IoU for automated annotations where the start time was greater than the manual start, and where the end time was less than the manual end. This runtime is at most For the NumPy/linear algebra rewritten method, I used matrix multiplication to calculate the intersection, instead of a nested for loop. This achieves the intended behavior of multiply each row by each column. A nested for-loop was still used to calculate the union of each manual label to each annotated label, but using numpy instead (logical ORR of the manual and automated annotation, then sum the list). Although the overall function runtime is still Benchmarks for Screaming Piha dataset (clip_IoU): Benchmarks for Screaming Piha dataset (automated_labeling_statistics): Given that 25% ≈ 600 clips of Mixed Bird produces 5000 annotations, it is still unreasonable to use for large amounts of data (93 * 254 comparisons is small compared to 5000 x 5000 comparisons) The notebook can be found on the performance-iou branch. Some ideas discussed with Nathan to further optimize this were:
|
Throwing out annotations with an intersection of 0 makes the function much faster, while giving equivalent statistics. This makes it actually feasible to run on large datasets 😃
Notebook and relevant files (e.g. modified statistics.py file) were moved to the passive-acoustic-biodiversity repo. Description of algorithms used: Benchmarks for Screaming Piha dataset (clip_IoU): Benchmarks for 25% Mixed Bird (automated_labeling_statistics): |
Refactors clip_IoU method with linear algebra and numpy
The current approach to finding the IoU scores is an O(n^2) nested for-loop approach. It can be refactored taking advantage of numpy's linear algebra capabilities to be closer to O(n). Low priority for now since we merely need to demo, but will be crucial as we get closer to larger-scale tests.
The text was updated successfully, but these errors were encountered: