You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
tl;dr Using a sequence alignment algorithm can improve the fuzzy matching quality, as what fzf does. But this approach is much simpler without sacrifice of quality. See the following example:
0. c++-mode ; deserved champion
1. customize
2. eldoc-mode
4. company-mode ;;this should be the runner-up, because `c m` are Head-position matching
counsel-M-x:
0. customize-group ; m is Tail-position matching, this should be given lower weight
1. chmod ; this is short so it comes first, but `m` is a Tail position
4. c++-mode ; our champion shouldn't be here
5. css-mode
..........
I have read many fuzzy matching algorithms, including the one used by fzf. It is too complex and also messes about heuristics.
The heuristics are gathered and coded in one placeFuzzyMatcher::MatchScore, instead of scattering all over the code (which is unfortunately the case for many other fuzzy matching algorithms). I have also done something similar for rofi which you can experience by appending -sort -matching fuzzy to its command line.
I'll explain briefly about the current heuristics. Similar rules can be added.
intFuzzyMatcher::MatchScore(int i, int j, bool last) {
int s = 0;
// Bonus for case matchif (pat[i] == text[j]) {
s++;
// Bonus for prefix match or case match when the pattern contains upper-case lettersif ((pat_set & 1 << Upper) || i == j)
s++;
}
// For initial positions of pattern wordsif (pat_role[i] == Head) {
// Bonus if it is matched to an initial position of some text wordif (text_role[j] == Head)
s += 30;
// Penalty for non-initial positionselseif (text_role[j] == Tail)
s -= 10;
}
if (text_role[j] == Tail && i && !last)
s -= 30;
// The first character of pattern is matched to a non-initial position in the textif (i == 0 && text_role[j] == Tail)
s -= 40;
return s;
}
One caveat is that for space complexity I use compressed two-row dp[2][n][2] instead of full dp[m][n][2]. This way, we lose the ability to reconstruct the optimal path (matching positions).
Given pattern cm, the sequence alignment algorithm will prefer [c]ompany-[m]ode over [c]o[m]pany-mode, as the initial positions of words are given more weight. However, I do not store the full sequence alignment table dp[m][n][2] so it is not possible to reconstruct the optimal path. But this should only cause a minor visual glitch.
I'm not that good at elisp (implying that I cannot implement this ....) but I believe it is a huge improvement. Appreciated if someone could implement this..........
Use a subsequence filter because applying this O(n*m) algorithm on every candidate will be slow. And the algorithm will return an awful value for non-matching candidates anyway
Calculate the score for each candidate and use it to sort candidates
This can only be used for ivy--regex-fuzzy, as other modes do not allow subsequence filtering. Also we cannot use arbitrary regular expressions in patterns, because subsequence filter and this sequence alignment algorithm does not understand them. This is not a limitation, if the algorithm is smart enough to assign proper weights to consecutive matching, initial-character matching, .... You will find very rare case where regex is helpful
I don't read the ivy code carefully, because ivy--flx-sort and other regular expression constructions seem unnecessary.
The text was updated successfully, but these errors were encountered:
Sorry if this wasn't clear, but I believe flx already implements a O(nm) matching algorithm. The implementation is a bit non-traditional, but the asymptotic complexity should be fine (as far as I could tell when I wrote it). There's definitely room to do fuzzy matching more quickly by a constant factor, and I wouldn't be surprised if a large constant speedup could be achieved.
tl;dr Using a sequence alignment algorithm can improve the fuzzy matching quality, as what fzf does. But this approach is much simpler without sacrifice of quality. See the following example:
What
helm-M-x
returns for the patternc m
:counsel-M-x
:I have read many fuzzy matching algorithms, including the one used by fzf. It is too complex and also messes about heuristics.
I adapted and simplified the fuzzy matching algorithm used in clangd and created https://github.com/MaskRay/cquery/blob/fuzzy/src/fuzzy_match.cc
It takes many factors into account:
While keeps it conceptually simple in the main loop:
The heuristics are gathered and coded in one place
FuzzyMatcher::MatchScore
, instead of scattering all over the code (which is unfortunately the case for many other fuzzy matching algorithms). I have also done something similar for rofi which you can experience by appending-sort -matching fuzzy
to its command line.I'll explain briefly about the current heuristics. Similar rules can be added.
One caveat is that for space complexity I use compressed two-row
dp[2][n][2]
instead of fulldp[m][n][2]
. This way, we lose the ability to reconstruct the optimal path (matching positions).Given pattern
cm
, the sequence alignment algorithm will prefer[c]ompany-[m]ode
over[c]o[m]pany-mode
, as the initial positions of words are given more weight. However, I do not store the full sequence alignment tabledp[m][n][2]
so it is not possible to reconstruct the optimal path. But this should only cause a minor visual glitch.I'm not that good at elisp (implying that I cannot implement this ....) but I believe it is a huge improvement. Appreciated if someone could implement this..........
If you speak Chinese, you may read my complain in https://emacs-china.org/t/topic/5368 ............
ivy integration
ivy--regex-fuzzy
, as other modes do not allow subsequence filtering. Also we cannot use arbitrary regular expressions in patterns, because subsequence filter and this sequence alignment algorithm does not understand them. This is not a limitation, if the algorithm is smart enough to assign proper weights to consecutive matching, initial-character matching, .... You will find very rare case where regex is helpfulivy--flx-sort
and other regular expression constructions seem unnecessary.The text was updated successfully, but these errors were encountered: