Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FEATURE: Allow/improve partial search #3747

Closed
andresmrm opened this issue May 24, 2024 · 3 comments
Closed

FEATURE: Allow/improve partial search #3747

andresmrm opened this issue May 24, 2024 · 3 comments
Labels
docs Issues related to Aleph’s documentation Moderate Issue that may require attention

Comments

@andresmrm
Copy link

Is your feature request related to a problem? Please describe.
Sometimes the searched term appears without space separation to another word (like nº123, instead of nº 123, so the query doesn't find anything if I just use 123, I need to search for nº123).

Describe the solution you'd like
I would like to search for 123 and find nº123.

Describe alternatives you've considered
Sometimes using ??123 can help, but not if the number of chars vary.

As discussed in Slack, I've managed to make queries directly to ElasticSearch to use regex queries. But they were too slow (~3s each) and I needed to query a huge list of terms. So I ended up doing regular queries for the most common patterns (~30ms each). For example, in my case the terms generally appear like 0123456789 or 012.345.678-9, so I queried each version of the term for each term (2x30ms=60ms << 3s). But I gave up less common cases, like nº123.

It maybe good to allow regex queries, even if slow, for when you just need to search for a few terms. And, if possible, make regex faster or offer another type of partial match.

@andresmrm andresmrm added feature-request Requests for new features or enhancements of existing features triage These issues need to be reviewed by the Aleph team labels May 24, 2024
@tillprochaska
Copy link
Contributor

Just for context, you can use wildcard and regex queries in Aleph using the ElasticSearch query string syntax.

As you already noticed, both wilcard and regex queries are computationally expensive at search time which makes them slow. While there are options to speed up such queries, these require indexing contents differently (e.g. using ngrams) which usually comes at a significantly higher cost for ingesting and storing the data. This makes it a difficult trade-off.

@andresmrm
Copy link
Author

andresmrm commented May 27, 2024 via email

@tillprochaska tillprochaska added Moderate Issue that may require attention and removed triage These issues need to be reviewed by the Aleph team labels May 29, 2024
@tillprochaska tillprochaska added docs Issues related to Aleph’s documentation and removed feature-request Requests for new features or enhancements of existing features labels Jul 19, 2024
@tillprochaska
Copy link
Contributor

Hi @andresmrm, sorry for the late reply. Thanks for your suggestion, I have added a section to the docs that links to the full ES query syntax reference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Issues related to Aleph’s documentation Moderate Issue that may require attention
Projects
None yet
Development

No branches or pull requests

2 participants