You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Sometimes the searched term appears without space separation to another word (like nº123, instead of nº 123, so the query doesn't find anything if I just use 123, I need to search for nº123).
Describe the solution you'd like
I would like to search for 123 and find nº123.
Describe alternatives you've considered
Sometimes using ??123 can help, but not if the number of chars vary.
As discussed in Slack, I've managed to make queries directly to ElasticSearch to use regex queries. But they were too slow (~3s each) and I needed to query a huge list of terms. So I ended up doing regular queries for the most common patterns (~30ms each). For example, in my case the terms generally appear like 0123456789 or 012.345.678-9, so I queried each version of the term for each term (2x30ms=60ms << 3s). But I gave up less common cases, like nº123.
It maybe good to allow regex queries, even if slow, for when you just need to search for a few terms. And, if possible, make regex faster or offer another type of partial match.
The text was updated successfully, but these errors were encountered:
Just for context, you can use wildcard and regex queries in Aleph using the ElasticSearch query string syntax.
As you already noticed, both wilcard and regex queries are computationally expensive at search time which makes them slow. While there are options to speed up such queries, these require indexing contents differently (e.g. using ngrams) which usually comes at a significantly higher cost for ingesting and storing the data. This makes it a difficult trade-off.
Yes, I understand it's hard to make it faster... =/
I knew about the "abc?" query, but not the "abc*". Maybe it should be
added to the docs?
https://docs.aleph.occrp.org/users/search/advanced/
Regex search "abc.*" doesn't seem to work for me from Aleph search
page. Only when accessing ES directly.
Edit: Ops, I see now why. It should be "/abc.*/". Sorry for the confusion.
Hi @andresmrm, sorry for the late reply. Thanks for your suggestion, I have added a section to the docs that links to the full ES query syntax reference.
Is your feature request related to a problem? Please describe.
Sometimes the searched term appears without space separation to another word (like
nº123
, instead ofnº 123
, so the query doesn't find anything if I just use123
, I need to search fornº123
).Describe the solution you'd like
I would like to search for
123
and findnº123
.Describe alternatives you've considered
Sometimes using
??123
can help, but not if the number of chars vary.As discussed in Slack, I've managed to make queries directly to ElasticSearch to use regex queries. But they were too slow (~3s each) and I needed to query a huge list of terms. So I ended up doing regular queries for the most common patterns (~30ms each). For example, in my case the terms generally appear like
0123456789
or012.345.678-9
, so I queried each version of the term for each term (2x30ms=60ms << 3s). But I gave up less common cases, likenº123
.It maybe good to allow regex queries, even if slow, for when you just need to search for a few terms. And, if possible, make regex faster or offer another type of partial match.
The text was updated successfully, but these errors were encountered: