Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partial word matching? #1248

Closed
NicTorgersen opened this issue Feb 16, 2021 · 9 comments
Closed

Partial word matching? #1248

NicTorgersen opened this issue Feb 16, 2021 · 9 comments

Comments

@NicTorgersen
Copy link

NicTorgersen commented Feb 16, 2021

Describe the bug
I'm trying to get a document from Meilisearch by partially typing in a word.

To Reproduce
Steps to reproduce the behavior:

  1. Add index
  2. Add a document with some attribute, with the content of "Github" for example
  3. Search for "hub"
  4. No results

Expected behavior
Documents which contain the partial word should ideally be returned, right?

Additional context
Can't find anything about this in the documents, tried searching for "partial word matching" or similar, but so far nothing is said anywhere about this in relation to MeiliSearch.

@CaroFG
Copy link
Collaborator

CaroFG commented Feb 17, 2021

Hi @NicTorgersen! MeiliSearch is designed for a search-as-you-type use, therefore it is built based on prefix search It's one of our primary features. Following your example, you will get results for "git" but it is not designed to provide results for "hub". It's not an issue, it's just its design :) There are users that use https://nlp.h-its.org/bpemb/ to preprocess their dataset and split words, if it can help. If you think MeiliSearch should have this feature, you can add it to the roadmap :)

@fheider
Copy link

fheider commented Oct 8, 2021

big fail to dont have this feature!

@veneliniliev
Copy link

is there any solution to the case?

@curquiza
Copy link
Member

curquiza commented Jan 10, 2022

hey @veneliniliev
No workaround at the moment, sorry

FYI @meilisearch/product-team, I ping you since it's an old issue. It does not mean we will do it of course, because as Caro said, MeiliSearch is currently designed to provide a search-as-you-type experience.

@Kerollmops
Copy link
Member

Hey @fheider and @veneliniliev,

Providing such a feature would drastically slow down the engine, it would force the algorithm the search for any query word in any of the word dictionary indexed by Meilisearch, the set of all the words MeiliSearch knows from your dataset.

It is kind of the same algorithm than Aho Corasick, an algorithm that searches patterns inside of a text. Fortunately, in MeiliSearch we have already split the different tokens (words) of your dataset and put them in a compressed data structure on which we can execute certain kinds of algorithms.

To be able to search for such patterns, a string wherever in a list of strings, it would force the engine to be O(N) where N is the whole list of words. It means that the more there are different words, the more it takes time, linearly. This is where I'll let you read about the Zipf's law, a law that describes the distribution of new worlds regarding the amount of text, in short: The more text, the fewer words. But it really means a deceleration, it continues to increase. If we wanted to introduce such a feature, it would drastically slow down the engine, and everyone would be impacted.

The solution to your problem would simply be to split the words at interesting points and put them in a new document field that the engine can use to prefix-search and find our document.

@veneliniliev
Copy link

yes, i make this for phone number... its work for now. tnx.

@ghomem
Copy link

ghomem commented Dec 18, 2024

yes, i make this for phone number... its work for now. tnx.

Can you share how you made it work for a subtring of the phone number?

@veneliniliev
Copy link

yes, i make this for phone number... its work for now. tnx.

Can you share how you made it work for a subtring of the phone number?

something like this:

protected function wordEmbedding(int|string|null $word): string
    {
        $word = trim($word);

        if (empty($word)) {
            return '';
        }

        $return = '';
        for ($i = 0; $i < Str::length($word); $i++) {
            $return .= ' ' . Str::substr($word, $i);
        }

        return trim($return);
    }

@ghomem
Copy link

ghomem commented Dec 18, 2024

yes, i make this for phone number... its work for now. tnx.

Can you share how you made it work for a subtring of the phone number?

something like this:

protected function wordEmbedding(int|string|null $word): string
    {
        $word = trim($word);

        if (empty($word)) {
            return '';
        }

        $return = '';
        for ($i = 0; $i < Str::length($word); $i++) {
            $return .= ' ' . Str::substr($word, $i);
        }

        return trim($return);
    }

Thanks a lot for sarhing @veneliniliev . I will investigate a similar solution. Cheers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants