-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support finding data with typos #153
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I like this, but worry about the impact on filtering. I worry that it may end up matching far too much and I think it may be worth adding another ranking level for this. What do you think?
I think ranking level is not really appropriate for controlling this functionality. As an example for query
All those results are not good matches for the query, but when ordering for similarity to In my view the MATCHES ranking is for results where no good matches were found and we are scraping the bottom of the barrel to see if something might be similar. Even current approach for MATCHES ranking can find lots of weird matches - and proposed changes do not drastically change behaviour. If spurious matches are not appropriate, increasing But if this new matching should be controlled separately, I would do it like this: interface MatchSorterOptions<ItemType = unknown> {
<..>
sorter?: Sorter<ItemType>
+ fuzzy?: 'scattered' | 'partial'
} I think |
I just had an idea to allow typo tolerance with ranking like this: - MATCHES
+ TIGHT_PARTIAL
+ SCATTERED Basically typo tolerant matching would be higher than the current MATCHES matching. However my proposed code change would have to be modified so that it allowed skipping letters only if matching letters are packed tightly. I would imagine restricting finding all the letters except one within a span that is just a few letters longer than initial query would almost always be higher quality match than finding all letters, but widely scattered. |
I think you're right. This is fine. If people find it's problematic then we can ship a breaking change with new levels. |
🎉 This PR is included in version 6.4.0 🎉 The release is available on: Your semantic-release bot 📦🚀 |
FYI, this change caused one of our automated tests to fail: https://github.com/ariakit/ariakit/pull/4218/files#diff-ffb64734a719f82219f5f82d3ee8c52e111c4a560940b6bb14b60ab6d3e74de0 This isn't a problem. I'll update the test. The UI works well, and I believe it's a good feature. I'm sharing this so you might consider making it optional to avoid any friction until the next major. |
@diegohaz, I think I'll just do a major version bump and deprecate 6.4.0 |
#153 (comment) BREAKING CHANGE: The last release was arguably a breaking change so we're going to trigger this major version and deprecate the 6.4.0.
I've deprecated 6.4.0 and released 7.0.0: https://github.com/kentcdodds/match-sorter/releases/tag/v7.0.0 |
This PR contains the following updates: | Package | Change | Age | Adoption | Passing | Confidence | |---|---|---|---|---|---| | [match-sorter](https://github.com/kentcdodds/match-sorter) | [`6.3.4` -> `7.0.0`](https://renovatebot.com/diffs/npm/match-sorter/6.3.4/7.0.0) | [![age](https://developer.mend.io/api/mc/badges/age/npm/match-sorter/7.0.0?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![adoption](https://developer.mend.io/api/mc/badges/adoption/npm/match-sorter/7.0.0?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![passing](https://developer.mend.io/api/mc/badges/compatibility/npm/match-sorter/6.3.4/7.0.0?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![confidence](https://developer.mend.io/api/mc/badges/confidence/npm/match-sorter/6.3.4/7.0.0?slim=true)](https://docs.renovatebot.com/merge-confidence/) | --- ### Release Notes <details> <summary>kentcdodds/match-sorter (match-sorter)</summary> ### [`v7.0.0`](https://github.com/kentcdodds/match-sorter/releases/tag/v7.0.0) [Compare Source](https://github.com/kentcdodds/match-sorter/compare/v6.4.0...v7.0.0) This has the same contents as v6.4.0: ##### Features - support finding data with typos ([#​153](https://github.com/kentcdodds/match-sorter/issues/153)) ([8fc0645](https://github.com/kentcdodds/match-sorter/commit/8fc0645af4b2dfdbb53dd9c1c088ab52cd997f5f)) ##### Bug Fixes - **release:** manually release a major version ([d9b7dab](https://github.com/kentcdodds/match-sorter/commit/d9b7dab7d10f65db0dbc7df5788ee6e81eb26377)), closes [/github.com/kentcdodds/match-sorter/pull/153#issuecomment-2417996730](https://github.com//github.com/kentcdodds/match-sorter/pull/153/issues/issuecomment-2417996730) ##### BREAKING CHANGES - **release:** The last release was arguably a breaking change so we're going to trigger this major version and deprecate the 6.4.0. ### [`v6.4.0`](https://github.com/kentcdodds/match-sorter/releases/tag/v6.4.0) [Compare Source](https://github.com/kentcdodds/match-sorter/compare/v6.3.4...v6.4.0) ##### Features - support finding data with typos ([#​153](https://github.com/kentcdodds/match-sorter/issues/153)) ([8fc0645](https://github.com/kentcdodds/match-sorter/commit/8fc0645af4b2dfdbb53dd9c1c088ab52cd997f5f)) </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Enabled. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR was generated by [Mend Renovate](https://mend.io/renovate/). View the [repository job log](https://developer.mend.io/github/ariakit/ariakit). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOC4xMjAuMSIsInVwZGF0ZWRJblZlciI6IjM4LjEyMC4xIiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6W119--> --------- Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com> Co-authored-by: Haz <hazdiego@gmail.com>
This PR contains the following updates: | Package | Change | Age | Adoption | Passing | Confidence | |---|---|---|---|---|---| | [match-sorter](https://github.com/kentcdodds/match-sorter) | [`^6.3.1` -> `^7.0.0`](https://renovatebot.com/diffs/npm/match-sorter/6.3.4/7.0.0) | [![age](https://developer.mend.io/api/mc/badges/age/npm/match-sorter/7.0.0?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![adoption](https://developer.mend.io/api/mc/badges/adoption/npm/match-sorter/7.0.0?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![passing](https://developer.mend.io/api/mc/badges/compatibility/npm/match-sorter/6.3.4/7.0.0?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![confidence](https://developer.mend.io/api/mc/badges/confidence/npm/match-sorter/6.3.4/7.0.0?slim=true)](https://docs.renovatebot.com/merge-confidence/) | --- ### Release Notes <details> <summary>kentcdodds/match-sorter (match-sorter)</summary> ### [`v7.0.0`](https://github.com/kentcdodds/match-sorter/releases/tag/v7.0.0) [Compare Source](https://github.com/kentcdodds/match-sorter/compare/v6.4.0...v7.0.0) This has the same contents as v6.4.0: ##### Features - support finding data with typos ([#​153](https://github.com/kentcdodds/match-sorter/issues/153)) ([8fc0645](https://github.com/kentcdodds/match-sorter/commit/8fc0645af4b2dfdbb53dd9c1c088ab52cd997f5f)) ##### Bug Fixes - **release:** manually release a major version ([d9b7dab](https://github.com/kentcdodds/match-sorter/commit/d9b7dab7d10f65db0dbc7df5788ee6e81eb26377)), closes [/github.com/kentcdodds/match-sorter/pull/153#issuecomment-2417996730](https://github.com//github.com/kentcdodds/match-sorter/pull/153/issues/issuecomment-2417996730) ##### BREAKING CHANGES - **release:** The last release was arguably a breaking change so we're going to trigger this major version and deprecate the 6.4.0. ### [`v6.4.0`](https://github.com/kentcdodds/match-sorter/releases/tag/v6.4.0) [Compare Source](https://github.com/kentcdodds/match-sorter/compare/v6.3.4...v6.4.0) **DEPRECATED**: This was arguably a breaking change so we've deprecated this version and released a major version. ##### Features - support finding data with typos ([#​153](https://github.com/kentcdodds/match-sorter/issues/153)) ([8fc0645](https://github.com/kentcdodds/match-sorter/commit/8fc0645af4b2dfdbb53dd9c1c088ab52cd997f5f)) </details> --- ### Configuration 📅 **Schedule**: Branch creation - "before 4am on Monday" in timezone Asia/Kolkata, Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR is behind base branch, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] If you want to rebase/retry this PR, check this box --- This PR was generated by [Mend Renovate](https://mend.io/renovate/). View the [repository job log](https://developer.mend.io/github/timelessco/recollect).
This new typo matching definitely seems too broad. Before my search setup was returning relevant results, but the typo inclusion now brings up many irrelevant results. Search: Results Before Typo Inclusion:
Results After Typo Inclusion:
Most of those extra results are not really logical. While it is great that a search for Are there any ways to control the typo matching in the new version? |
Yeah, that's really really bad. I'm open to anyone who wants to make a PR to simply revert that change and we'll release another major version bump to remove this functionality. I never should have merged that. |
I think I have an idea how to improve typo tolerance: skipping letters can be allowed only if all other letters from query are found and those found letters are clustered in a short substring. I can make such PR. This would make |
@overengineered If typo tolerance is going to be added, it's crucial that it's optional/configurable. Should the detection be complicated and potentially weighty, then it should be written in a way that lets it get tree-shaken out if typo tolerance isn't enabled. Probably best to spec things out further before getting deep into any implementation. |
This reverts commit 8fc0645.
I agree that typo tolerance should be optional. Complexity would be ~30 additional lines in 560 line file - not worth the effort to support tree shaking IMO. |
PR for revert. Should meet all the contribution guidelines. #155 |
What: Increases tolerance for typos in data and/or query text
Why: match sorter currently finds data with some typos, but others. E.g. "canceled" would find "cancelled", but not "cacneled".
How:
getClosenessRanking
calculator allows skipping one of the characters from the query. This allows finding data that contains the most common typos: swapped letters, missing letters and diverging spelling (localisation/localization). Matches that have missing letters have reduced ranking.Checklist:
Since this change only affects searches where no good match (like full word, or exact start or word) gets found, I think there's no need to have add to options some flag to disable/enable this feature. If searching
ua
and findingUnited States of America
is good enough right now, then my changes that improve tolerance for typos should be also acceptable.But I can see that there could be some value in providing configuration for this new feature - and I am open to do it.