Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(component): use sift3 to detect email differences instead of sif4 or js-levenshtein #3

Merged
merged 9 commits into from
Oct 21, 2022

Conversation

ferreiro
Copy link
Contributor

@ferreiro ferreiro commented Oct 21, 2022

Description of change

With the default configuration that mailbox.js had for the domain threshold both js-levensthein and sift4, the fuzzy matching algo was not able to correctly show the right suggestions. This PR solves that by using Sift3, which out of the box with 2, 2, 2 for threshold is able to detect string.

🎉 Also removing an external dependency, making this package 0-dependencies :)

Problem:

Showing an example on why js-levensthein and sift4 created a regression on mailcheck.js

As explained above, replacing sift3 with sift4 will break the 90k users that are already using mailcheck.js, this is because those algorithms have a different output for computing the distance than sift3. In the video I explained how and why.

Screen Shot 2022-10-21 at 9 39 50 PM

Screen Shot 2022-10-21 at 9 07 18 PM

https://www.loom.com/share/aeb067878f8c43aabbbf904402a53689

Pull-Request Checklist

  • Code is up-to-date with the main branch
  • npm run lint passes with this change
  • npm run test passes with this change
  • This pull request links relevant issues as Fixes #0000
  • There are new or updated unit tests validating the change
  • Documentation has been updated to reflect this change
  • The new commits follow conventions outlined in the conventional commit spec

@ferreiro ferreiro changed the title fix(component): use sift3 since 4 and js-levenshtein are not accurate fix(component): use sift3 to detect email differences instead of sif4 or js-levenshtein Oct 21, 2022
@Siderite
Copy link

Sift4 has a direct functionality on the blog post announcing it: https://siderite.dev/blog/super-fast-and-accurate-string-distance.html/ where you can test the distance between two strings. It was impossible to reproduce the demo you made in the video here.

I would be glad to work with you on this, because I don't see any reason why yaho would be closer to zoho than yahoo, especially in an implementation of levenstein, which is not fuzzy at all :)

Also, even if you use Sift3, please update the URL to the blog in the comment of the source. It is now https://siderite.dev/blog/super-fast-and-accurate-string-distance-sift3.html/

Let me know if I can help in any way. Thanks!

@robsonsobral
Copy link

On the demo provided at the blog post, I couldn't get yaho closer to zoho than to yahoo at all.

yahoo zoho
Levenstein 1 2
Sift3 0.5 2
Sift4 1 2

@robsonsobral
Copy link

Which version do you use on ZooTools, @ferreiro ?

image

Ironic, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants