-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very slow completion suggester #22357
Comments
How big is one document ? Considering that you use |
Current index with 1.6mln docs is using 16GB disk space. One document can have from 50 up to 500 fields. Currently fields have only short values, but eventually lot of fields can store text content of scanned pages of books. In previous version of elasticsearch all objects was indexed full text fields. I was trying to use _source_include, but with no results. |
You can try to disable the stored fields completely: |
Disabling stored fields didn't help. Anyway, it's not possible to disable stored_fileds, because it blocks using _source_include, so I wont get any metadata about document (stored previous in payload).
Using another index may be a solution, but it didn't change that here we have very big regression comparing to 2.x. |
I cannot reopen this issue @jimczi |
Ok so it's something else. Can you share your mapping and some example input for the completion field? |
Example mapping is here: https://gist.github.com/sicarrots/e28eac006a2d37b05c462a4c75a271d9 |
Thanks @sicarrots |
With 2.1 there was 4 shards, currently 3. Number of nodes and hardware is the same. |
Interesting, the query spent most of the time trying to find all the path that starts with "war". This is only the first step of the suggestion query and it should be much faster. What sort of contexts do you have in your suggestions ? Can you provide the exact input for the suggestion field that you send to elasticsearch ? |
@sicarrots you're using a category context but then querying the completion suggester without specifying a context. I've seen this cause very slow suggestions. I think, in this case, that the suggester has to be run once for every context value (and I'm guessing you have a lot of them), which is causing this slow performance. |
/cc @mikemccand @areek |
@sicarrots As @clintongormley pointed out, slow performance in completion is due to enabling contexts in your completion mappings. When you don't specify a context with completion query, the query gets resolved to matching all contexts for the query prefix, which can slow down the completion query if you have high cardinality context values. You should use context enabled completion mappings only when you intend to query completions with contexts. The match-all context behaviour when no context is specified in the completion query is for convenience and should not be used when performant suggestions are desired. If you do not intend on filtering suggestions with contexts, do not enable contexts in the completion field mapping. |
@clintongormley, @areek - The problem can be improved by using a non-printing character at index time for each of the suggestion input and using the same non-printing character at query time when no context is provided. This will increase a bit of index time and memory utilization. I checked it with indexing ~20 K documents and the memory utilization was as follows:
This reduces the iteration under FSTUtil.intersectPrefixPaths by ~50%, which leads to improved consistent query performance. The code changes is required as below
|
Any opinion on the above suggestion? |
cc @elastic/es-search-aggs |
I think we should simplify the experience with contexts aware suggestions by refusing queries without contexts. It's easy to build two suggesters, one with a context and the other one without if the usage requires to query without contexts. I opened #29222 to discuss this so I hope you don't mind if I close this issue. We can continue the discussion on the new issue to see what the best options are. I am also not very happy with the current solution since it makes the user believe that they can perform boolean operations on the contexts (which we can't) but that's another issue. |
@jimczi it would be indeed helpful to refuse queries with missing contexts when they are enabled |
@evgenyfadeev This has already been implemented: deprecation starting with v 6.4, and removing the ability to index or query context suggestions without context from v 7.0 |
Elasticsearch version:
5.1.1
Plugins installed:
No plugins
JVM version:
java version "1.8.0_111"
Java(TM) SE Runtime Environment (build 1.8.0_111-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.111-b14, mixed mode)
OS version:
Debian 8
Description of the problem including expected versus actual behavior:
3 nodes, two data nodes and one for communication with application without data.
Index has 3 shards (because in the near feature third data node will be added) with one replica per shard.
Currently index has about 1.6 mln docs, each document has ~100 fields with various content.
Field "suggest_title" is mapped as:
Query
/prod-entity/entity/_search?_source_include=title,id
with bodyreturns expected results but after very huge amount of time and cpu usage:
Experiment with one primary shard and one replica with the same data with no results (query time slightly lower, but still a lot above expected).
The same setup using elasticseach 2.1.x (old completions with payloads) was running as expected (query times < 20ms), so this is a big regression.
The text was updated successfully, but these errors were encountered: