Very slow completion suggester #22357

sicarrots · 2016-12-27T22:31:41Z

Elasticsearch version:
5.1.1
Plugins installed:
No plugins
JVM version:
java version "1.8.0_111"
Java(TM) SE Runtime Environment (build 1.8.0_111-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.111-b14, mixed mode)
OS version:
Debian 8
Description of the problem including expected versus actual behavior:
3 nodes, two data nodes and one for communication with application without data.
Index has 3 shards (because in the near feature third data node will be added) with one replica per shard.
Currently index has about 1.6 mln docs, each document has ~100 fields with various content.
Field "suggest_title" is mapped as:

"suggest_title": {
    "type" : "completion",
     "contexts": [{
         "name": "public",
         "type": "category",
         "path": "is_public"
    }]
}

Query /prod-entity/entity/_search?_source_include=title,id with body

{
	"suggest": {
		"title-completion" : {
	        "prefix" : "war",
	        "completion" : {
	            "field" : "suggest_title"
	        }
	    }	
	}
}

returns expected results but after very huge amount of time and cpu usage:

"took": 60727,
"timed_out": false,
"_shards": {
    "total": 3,
    "successful": 3,
    "failed": 0
}

Experiment with one primary shard and one replica with the same data with no results (query time slightly lower, but still a lot above expected).
The same setup using elasticseach 2.1.x (old completions with payloads) was running as expected (query times < 20ms), so this is a big regression.

The text was updated successfully, but these errors were encountered:

jimczi · 2016-12-28T17:06:31Z

Currently index has about 1.6 mln docs, each document has ~100 fields with various content.
Field "suggest_title" is mapped as:

How big is one document ? Considering that you use _source_include I suspect that your _source can be big ? One of the difference with 2.1 is that the completion suggester now returns the source of the suggested document.

sicarrots · 2016-12-28T18:11:50Z

Current index with 1.6mln docs is using 16GB disk space. One document can have from 50 up to 500 fields. Currently fields have only short values, but eventually lot of fields can store text content of scanned pages of books. In previous version of elasticsearch all objects was indexed full text fields. I was trying to use _source_include, but with no results.

jimczi · 2016-12-28T18:22:33Z

You can try to disable the stored fields completely:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-stored-fields.html#_disable_stored_fields_entirely
... or you can have a separate index for the suggestions.
I'll close the issue but feel free to reopen if disabling the stored fields doesn t help

sicarrots · 2016-12-28T18:37:51Z

Disabling stored fields didn't help. Anyway, it's not possible to disable stored_fileds, because it blocks using _source_include, so I wont get any metadata about document (stored previous in payload).

{
	"stored_fields": "_none_",
	"suggest": {
		"suggest-title" : {
	        "prefix" : "war",
	        "completion" : {
	            "field" : "suggest_title"
	        }
	    }	
	}
}

"took": 62791

Using another index may be a solution, but it didn't change that here we have very big regression comparing to 2.x.

sicarrots · 2016-12-28T18:39:26Z

I cannot reopen this issue @jimczi

jimczi · 2016-12-28T18:47:17Z

Ok so it's something else. Can you share your mapping and some example input for the completion field?

sicarrots · 2016-12-28T20:41:56Z

Example mapping is here: https://gist.github.com/sicarrots/e28eac006a2d37b05c462a4c75a271d9
Example inputs for suggest_title is ['Plan Warszawy', 'Warszawa w słowach i obrazach', 'Wojna i pokój', 'Wars i Sawa']'

jimczi · 2016-12-29T09:31:41Z

Thanks @sicarrots
From what you provided it is hard to tell what's wrong. I though it was the stored fields but it seems not. Can you capture the output of the hot_threads during the suggest query:
https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-nodes-hot-threads.html
Is there any difference in you deployment for 5.1 vs 2.1 ? Number of nodes, ram, data ? Can you also compare the size of your indices in 5.1 and 2.1 ?

sicarrots · 2016-12-29T09:45:12Z

With 2.1 there was 4 shards, currently 3. Number of nodes and hardware is the same.
Currently objects are indexed wihout text fields, so physical size on disc is lower (on 2.1 cluster with test fields it was ~1TB on each node, currently its ~20GB). Number of objects is the same.
Here is output of hot threads: https://gist.github.com/sicarrots/4c1b70fa52c2fbcc289e58a1ba8edd83

jimczi · 2016-12-29T11:02:36Z

Interesting, the query spent most of the time trying to find all the path that starts with "war". This is only the first step of the suggestion query and it should be much faster. What sort of contexts do you have in your suggestions ? Can you provide the exact input for the suggestion field that you send to elasticsearch ?

clintongormley · 2017-01-20T15:21:37Z

@sicarrots you're using a category context but then querying the completion suggester without specifying a context. I've seen this cause very slow suggestions. I think, in this case, that the suggester has to be run once for every context value (and I'm guessing you have a lot of them), which is causing this slow performance.

clintongormley · 2017-01-20T15:21:50Z

/cc @mikemccand @areek

areek · 2017-01-23T16:12:47Z

@sicarrots As @clintongormley pointed out, slow performance in completion is due to enabling contexts in your completion mappings. When you don't specify a context with completion query, the query gets resolved to matching all contexts for the query prefix, which can slow down the completion query if you have high cardinality context values.

You should use context enabled completion mappings only when you intend to query completions with contexts. The match-all context behaviour when no context is specified in the completion query is for convenience and should not be used when performant suggestions are desired. If you do not intend on filtering suggestions with contexts, do not enable contexts in the completion field mapping.

nilabhsagar · 2017-02-20T15:15:10Z

@clintongormley, @areek - The problem can be improved by using a non-printing character at index time for each of the suggestion input and using the same non-printing character at query time when no context is provided. This will increase a bit of index time and memory utilization. I checked it with indexing ~20 K documents and the memory utilization was as follows:
Captured completion size

Without non-printing: 110596 bytes
With non-printing: 110654 bytes

This reduces the iteration under FSTUtil.intersectPrefixPaths by ~50%, which leads to improved consistent query performance.

The code changes is required as below
ContextMappings.java

      protected Iterable<CharSequence> contexts() {
            Set<CharSequence> typedContexts = new HashSet<>();
            final CharsRefBuilder scratch = new CharsRefBuilder();
            scratch.grow(1);
            for (int typeId = 0; typeId < contextMappings.size(); typeId++) {
                scratch.setCharAt(0, (char) typeId);
                scratch.setLength(1);
                ContextMapping mapping = contextMappings.get(typeId);
                Set<CharSequence> contexts = new HashSet<>(mapping.parseContext(document));
                if (this.contexts.get(mapping.name()) != null) {
                    contexts.addAll(this.contexts.get(mapping.name()));
                }
                for (CharSequence context : contexts) {
                    scratch.append(context);
                    typedContexts.add(scratch.toCharsRef());
                    scratch.setLength(1);
                }
            }
            
            /*Add non-printing character for empty context query*/
            typedContexts.add(new String(new char['\u001B']));
            
            return typedContexts;
        }

      public ContextQuery toContextQuery(CompletionQuery query, Map<String, List<ContextMapping.InternalQueryContext>> queryContexts) {
        ContextQuery typedContextQuery = new ContextQuery(query);
        if (queryContexts.isEmpty() == false) {
            CharsRefBuilder scratch = new CharsRefBuilder();
            scratch.grow(1);
            for (int typeId = 0; typeId < contextMappings.size(); typeId++) {
                scratch.setCharAt(0, (char) typeId);
                scratch.setLength(1);
                ContextMapping mapping = contextMappings.get(typeId);
                List<ContextMapping.InternalQueryContext> internalQueryContext = queryContexts.get(mapping.name());
                if (internalQueryContext != null) {
                    for (ContextMapping.InternalQueryContext context : internalQueryContext) {
                        scratch.append(context.context);
                        typedContextQuery.addContext(scratch.toCharsRef(), context.boost, !context.isPrefix);
                        scratch.setLength(1);
                    }
                }
            }
        } else {
            /* Add non-printing character query for empty context query */
            CharsRefBuilder scratch = new CharsRefBuilder();
            scratch.grow(1);

            scratch.append(new String(new char['\u001B']));
            typedContextQuery.addContext(scratch.toCharsRef(), 1, false);
        }
    
        return typedContextQuery;
    }

nilabhsagar · 2017-04-12T13:16:44Z

Any opinion on the above suggestion?

jimczi · 2018-03-19T18:08:53Z

cc @elastic/es-search-aggs

jimczi · 2018-03-23T14:38:57Z

I think we should simplify the experience with contexts aware suggestions by refusing queries without contexts. It's easy to build two suggesters, one with a context and the other one without if the usage requires to query without contexts. I opened #29222 to discuss this so I hope you don't mind if I close this issue. We can continue the discussion on the new issue to see what the best options are. I am also not very happy with the current solution since it makes the user believe that they can perform boolean operations on the contexts (which we can't) but that's another issue.

evgenyfadeev · 2019-01-31T16:50:38Z

@jimczi it would be indeed helpful to refuse queries with missing contexts when they are enabled

mayya-sharipova · 2019-02-07T22:26:42Z

@evgenyfadeev This has already been implemented: deprecation starting with v 6.4, and removing the ability to index or query context suggestions without context from v 7.0

clintongormley added :Search Relevance/Suggesters "Did you mean" and suggestions as you type feedback_needed labels Dec 28, 2016

jimczi closed this as completed Dec 28, 2016

jimczi reopened this Dec 28, 2016

clintongormley added discuss and removed feedback_needed labels Jan 20, 2017

jimczi closed this as completed Mar 23, 2018

javanna added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Jul 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Very slow completion suggester #22357

Very slow completion suggester #22357

sicarrots commented Dec 27, 2016 •

edited

Loading

jimczi commented Dec 28, 2016

sicarrots commented Dec 28, 2016 •

edited

Loading

jimczi commented Dec 28, 2016

sicarrots commented Dec 28, 2016

sicarrots commented Dec 28, 2016

jimczi commented Dec 28, 2016

sicarrots commented Dec 28, 2016

jimczi commented Dec 29, 2016

sicarrots commented Dec 29, 2016

jimczi commented Dec 29, 2016

clintongormley commented Jan 20, 2017

clintongormley commented Jan 20, 2017

areek commented Jan 23, 2017

nilabhsagar commented Feb 20, 2017

nilabhsagar commented Apr 12, 2017

jimczi commented Mar 19, 2018

jimczi commented Mar 23, 2018

evgenyfadeev commented Jan 31, 2019

mayya-sharipova commented Feb 7, 2019

Very slow completion suggester #22357

Very slow completion suggester #22357

Comments

sicarrots commented Dec 27, 2016 • edited Loading

jimczi commented Dec 28, 2016

sicarrots commented Dec 28, 2016 • edited Loading

jimczi commented Dec 28, 2016

sicarrots commented Dec 28, 2016

sicarrots commented Dec 28, 2016

jimczi commented Dec 28, 2016

sicarrots commented Dec 28, 2016

jimczi commented Dec 29, 2016

sicarrots commented Dec 29, 2016

jimczi commented Dec 29, 2016

clintongormley commented Jan 20, 2017

clintongormley commented Jan 20, 2017

areek commented Jan 23, 2017

nilabhsagar commented Feb 20, 2017

nilabhsagar commented Apr 12, 2017

jimczi commented Mar 19, 2018

jimczi commented Mar 23, 2018

evgenyfadeev commented Jan 31, 2019

mayya-sharipova commented Feb 7, 2019

sicarrots commented Dec 27, 2016 •

edited

Loading

sicarrots commented Dec 28, 2016 •

edited

Loading