Optimize MLLM ambiguity calculation #825

osma · 2024-12-20T12:55:28Z

This PR is an attempt to optimize the calculation of the ambiguity feature in MLLM. In cases of extremely many matches, the ambiguity calculation can take a long time, as reported by @RietdorfC in #822.

This PR changes the calculation so that it will first group together TokenSets with the same tokens; for example, all concepts with the same (or very similar) label can be considered together instead of calculating ambiguity for each of them separately. Due to the quadratic nature of the ambiguity calculation (it's O(N^2) where N is the number of matches found in a piece of text), reducing N by grouping TokenSets may reduce the amount of necessary comparisons quite drastically.

I'm leaving this as a draft PR because this needs to be tested further. I'm not yet 100% sure that the calculation result matches the original.

Closes #822

sonarqubecloud · 2024-12-20T12:55:57Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

codecov · 2024-12-20T12:58:15Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.63%. Comparing base (8f13d7d) to head (5163ae3).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #825   +/-   ##
=======================================
  Coverage   99.63%   99.63%           
=======================================
  Files          95       95           
  Lines        7171     7182   +11     
=======================================
+ Hits         7145     7156   +11     
  Misses         26       26

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

optimize MLLM ambiguity calculation

5163ae3

osma added the enhancement label Dec 20, 2024

osma added this to the 1.3 milestone Dec 20, 2024

osma self-assigned this Dec 20, 2024

osma mentioned this pull request Dec 20, 2024

Slow calculation of ambiguity feature in MLLM #822

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize MLLM ambiguity calculation #825

Optimize MLLM ambiguity calculation #825

osma commented Dec 20, 2024 •

edited

Loading

sonarqubecloud bot commented Dec 20, 2024

codecov bot commented Dec 20, 2024 •

edited

Loading

Optimize MLLM ambiguity calculation #825

Are you sure you want to change the base?

Optimize MLLM ambiguity calculation #825

Conversation

osma commented Dec 20, 2024 • edited Loading

sonarqubecloud bot commented Dec 20, 2024

Quality Gate passed

codecov bot commented Dec 20, 2024 • edited Loading

Codecov Report

osma commented Dec 20, 2024 •

edited

Loading

codecov bot commented Dec 20, 2024 •

edited

Loading