Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very slow completion suggester #22357

Closed
sicarrots opened this issue Dec 27, 2016 · 19 comments
Closed

Very slow completion suggester #22357

sicarrots opened this issue Dec 27, 2016 · 19 comments
Labels
discuss :Search Relevance/Suggesters "Did you mean" and suggestions as you type Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch

Comments

@sicarrots
Copy link

sicarrots commented Dec 27, 2016

Elasticsearch version:
5.1.1
Plugins installed:
No plugins
JVM version:
java version "1.8.0_111"
Java(TM) SE Runtime Environment (build 1.8.0_111-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.111-b14, mixed mode)
OS version:
Debian 8
Description of the problem including expected versus actual behavior:
3 nodes, two data nodes and one for communication with application without data.
Index has 3 shards (because in the near feature third data node will be added) with one replica per shard.
Currently index has about 1.6 mln docs, each document has ~100 fields with various content.
Field "suggest_title" is mapped as:

"suggest_title": {
    "type" : "completion",
     "contexts": [{
         "name": "public",
         "type": "category",
         "path": "is_public"
    }]
}

Query /prod-entity/entity/_search?_source_include=title,id with body

{
	"suggest": {
		"title-completion" : {
	        "prefix" : "war",
	        "completion" : {
	            "field" : "suggest_title"
	        }
	    }	
	}
}

returns expected results but after very huge amount of time and cpu usage:

"took": 60727,
"timed_out": false,
"_shards": {
    "total": 3,
    "successful": 3,
    "failed": 0
}

Experiment with one primary shard and one replica with the same data with no results (query time slightly lower, but still a lot above expected).
The same setup using elasticseach 2.1.x (old completions with payloads) was running as expected (query times < 20ms), so this is a big regression.

@jimczi
Copy link
Contributor

jimczi commented Dec 28, 2016

Currently index has about 1.6 mln docs, each document has ~100 fields with various content.
Field "suggest_title" is mapped as:

How big is one document ? Considering that you use _source_include I suspect that your _source can be big ? One of the difference with 2.1 is that the completion suggester now returns the source of the suggested document.

@clintongormley clintongormley added :Search Relevance/Suggesters "Did you mean" and suggestions as you type feedback_needed labels Dec 28, 2016
@sicarrots
Copy link
Author

sicarrots commented Dec 28, 2016

Current index with 1.6mln docs is using 16GB disk space. One document can have from 50 up to 500 fields. Currently fields have only short values, but eventually lot of fields can store text content of scanned pages of books. In previous version of elasticsearch all objects was indexed full text fields. I was trying to use _source_include, but with no results.

@jimczi
Copy link
Contributor

jimczi commented Dec 28, 2016

You can try to disable the stored fields completely:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-stored-fields.html#_disable_stored_fields_entirely
... or you can have a separate index for the suggestions.
I'll close the issue but feel free to reopen if disabling the stored fields doesn t help

@jimczi jimczi closed this as completed Dec 28, 2016
@sicarrots
Copy link
Author

Disabling stored fields didn't help. Anyway, it's not possible to disable stored_fileds, because it blocks using _source_include, so I wont get any metadata about document (stored previous in payload).

{
	"stored_fields": "_none_",
	"suggest": {
		"suggest-title" : {
	        "prefix" : "war",
	        "completion" : {
	            "field" : "suggest_title"
	        }
	    }	
	}
}

"took": 62791

Using another index may be a solution, but it didn't change that here we have very big regression comparing to 2.x.

@sicarrots
Copy link
Author

I cannot reopen this issue @jimczi

@jimczi
Copy link
Contributor

jimczi commented Dec 28, 2016

Ok so it's something else. Can you share your mapping and some example input for the completion field?

@jimczi jimczi reopened this Dec 28, 2016
@sicarrots
Copy link
Author

Example mapping is here: https://gist.github.com/sicarrots/e28eac006a2d37b05c462a4c75a271d9
Example inputs for suggest_title is ['Plan Warszawy', 'Warszawa w słowach i obrazach', 'Wojna i pokój', 'Wars i Sawa']'

@jimczi
Copy link
Contributor

jimczi commented Dec 29, 2016

Thanks @sicarrots
From what you provided it is hard to tell what's wrong. I though it was the stored fields but it seems not. Can you capture the output of the hot_threads during the suggest query:
https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-nodes-hot-threads.html
Is there any difference in you deployment for 5.1 vs 2.1 ? Number of nodes, ram, data ? Can you also compare the size of your indices in 5.1 and 2.1 ?

@sicarrots
Copy link
Author

With 2.1 there was 4 shards, currently 3. Number of nodes and hardware is the same.
Currently objects are indexed wihout text fields, so physical size on disc is lower (on 2.1 cluster with test fields it was ~1TB on each node, currently its ~20GB). Number of objects is the same.
Here is output of hot threads: https://gist.github.com/sicarrots/4c1b70fa52c2fbcc289e58a1ba8edd83

@jimczi
Copy link
Contributor

jimczi commented Dec 29, 2016

Interesting, the query spent most of the time trying to find all the path that starts with "war". This is only the first step of the suggestion query and it should be much faster. What sort of contexts do you have in your suggestions ? Can you provide the exact input for the suggestion field that you send to elasticsearch ?

@clintongormley
Copy link
Contributor

@sicarrots you're using a category context but then querying the completion suggester without specifying a context. I've seen this cause very slow suggestions. I think, in this case, that the suggester has to be run once for every context value (and I'm guessing you have a lot of them), which is causing this slow performance.

@clintongormley
Copy link
Contributor

/cc @mikemccand @areek

@areek
Copy link
Contributor

areek commented Jan 23, 2017

@sicarrots As @clintongormley pointed out, slow performance in completion is due to enabling contexts in your completion mappings. When you don't specify a context with completion query, the query gets resolved to matching all contexts for the query prefix, which can slow down the completion query if you have high cardinality context values.

You should use context enabled completion mappings only when you intend to query completions with contexts. The match-all context behaviour when no context is specified in the completion query is for convenience and should not be used when performant suggestions are desired. If you do not intend on filtering suggestions with contexts, do not enable contexts in the completion field mapping.

@nilabhsagar
Copy link
Contributor

@clintongormley, @areek - The problem can be improved by using a non-printing character at index time for each of the suggestion input and using the same non-printing character at query time when no context is provided. This will increase a bit of index time and memory utilization. I checked it with indexing ~20 K documents and the memory utilization was as follows:
Captured completion size

  • Without non-printing: 110596 bytes
  • With non-printing: 110654 bytes

This reduces the iteration under FSTUtil.intersectPrefixPaths by ~50%, which leads to improved consistent query performance.

The code changes is required as below
ContextMappings.java

      protected Iterable<CharSequence> contexts() {
            Set<CharSequence> typedContexts = new HashSet<>();
            final CharsRefBuilder scratch = new CharsRefBuilder();
            scratch.grow(1);
            for (int typeId = 0; typeId < contextMappings.size(); typeId++) {
                scratch.setCharAt(0, (char) typeId);
                scratch.setLength(1);
                ContextMapping mapping = contextMappings.get(typeId);
                Set<CharSequence> contexts = new HashSet<>(mapping.parseContext(document));
                if (this.contexts.get(mapping.name()) != null) {
                    contexts.addAll(this.contexts.get(mapping.name()));
                }
                for (CharSequence context : contexts) {
                    scratch.append(context);
                    typedContexts.add(scratch.toCharsRef());
                    scratch.setLength(1);
                }
            }
            
            /*Add non-printing character for empty context query*/
            typedContexts.add(new String(new char['\u001B']));
            
            return typedContexts;
        }

      public ContextQuery toContextQuery(CompletionQuery query, Map<String, List<ContextMapping.InternalQueryContext>> queryContexts) {
        ContextQuery typedContextQuery = new ContextQuery(query);
        if (queryContexts.isEmpty() == false) {
            CharsRefBuilder scratch = new CharsRefBuilder();
            scratch.grow(1);
            for (int typeId = 0; typeId < contextMappings.size(); typeId++) {
                scratch.setCharAt(0, (char) typeId);
                scratch.setLength(1);
                ContextMapping mapping = contextMappings.get(typeId);
                List<ContextMapping.InternalQueryContext> internalQueryContext = queryContexts.get(mapping.name());
                if (internalQueryContext != null) {
                    for (ContextMapping.InternalQueryContext context : internalQueryContext) {
                        scratch.append(context.context);
                        typedContextQuery.addContext(scratch.toCharsRef(), context.boost, !context.isPrefix);
                        scratch.setLength(1);
                    }
                }
            }
        } else {
            /* Add non-printing character query for empty context query */
            CharsRefBuilder scratch = new CharsRefBuilder();
            scratch.grow(1);

            scratch.append(new String(new char['\u001B']));
            typedContextQuery.addContext(scratch.toCharsRef(), 1, false);
        }
    
        return typedContextQuery;
    }

@nilabhsagar
Copy link
Contributor

Any opinion on the above suggestion?

@jimczi
Copy link
Contributor

jimczi commented Mar 19, 2018

cc @elastic/es-search-aggs

@jimczi
Copy link
Contributor

jimczi commented Mar 23, 2018

I think we should simplify the experience with contexts aware suggestions by refusing queries without contexts. It's easy to build two suggesters, one with a context and the other one without if the usage requires to query without contexts. I opened #29222 to discuss this so I hope you don't mind if I close this issue. We can continue the discussion on the new issue to see what the best options are. I am also not very happy with the current solution since it makes the user believe that they can perform boolean operations on the contexts (which we can't) but that's another issue.

@jimczi jimczi closed this as completed Mar 23, 2018
@evgenyfadeev
Copy link

@jimczi it would be indeed helpful to refuse queries with missing contexts when they are enabled

@mayya-sharipova
Copy link
Contributor

@evgenyfadeev This has already been implemented: deprecation starting with v 6.4, and removing the ability to index or query context suggestions without context from v 7.0

@javanna javanna added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss :Search Relevance/Suggesters "Did you mean" and suggestions as you type Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch
Projects
None yet
Development

No branches or pull requests

8 participants