-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggester: Phrase suggest option to limit suggestions to exising phrases #3482
Comments
Thanks for opening this. do you feel like attaching a pullrequest? I'd be happy to help you sketching out the functionality here! |
So I had a similar problem that came from filters: I was getting phrases that exist in the index but were filtered out on subsequent searches with the same filter set. I've since mostly worked around the problem by splitting my index along the most common filter. So if it isn't that much more work to get it to include filters that'd be great. If OTOH, you do something like only return suggestions that match an n-gram then by all means just do that and ignore me. I'd still use it and it'd probably be faster in the end. |
folks, I don't think we can filter the process of drawing candidates etc. since performance will suffer badly. What I can imagine is to execute a match query with each suggestoin that is returned to make a decision to drop them or not. Yet, this will allow for filtering for sure but it will only be a helper to prune the result list. if this is ok for you guys I think we can certainly do that! |
@nik9000 do you wanna take a look at this. I won't be able to do in the near future. |
I can take a look sometime in the next few days, yeah. We just deployed our elasticsearch software to a much larger group of users yesterday so I'm getting a bunch of high priority bugs that aren't (yet) this. What I'll do is file this issue as a medium priority one on my side and pick it up when I've cleared everything higher. So what'd be most useful for me would be to have an api like this: curl -XPOST 'localhost:9200/_search' -d {
"suggest" : {
"text" : "Xor the Got-Jewel",
"simple_phrase" : {
"phrase" : {
"field" : "body",
"size" : 5,
"shard_size": 10,
"confidence": 2.0,
"filter_replace_string": "{}",
"filter": {
"bool" : {
"must" : {
"query": { "match_phrase" : { "body" : "{}", "slop": 3 } }
},
"must_not" : {
"range" : {
"age" : { "from" : 10, "to" : 20 }
}
},
"should" : [
{
"term" : { "tag" : "sometag" }
},
{
"term" : { "tag" : "sometagtag" }
}
]
}
}
}
}
}
} I can see how this could be a performance problem but caching should kick in and help with most of the filters. Also, if you set your confidence nice and high you might not have to do this too many time. And another thing, you'd have to crank up the |
Now that I'm digging into this I like this API better curl -XPOST 'localhost:9200/_search' -d {
"suggest" : {
"text" : "Xor the Got-Jewel",
"simple_phrase" : {
"phrase" : {
"field" : "body",
"size" : 5,
"shard_size": 10,
"confidence": 2.0,
"filter": {
"bool" : {
"must_not" : {
"range" : {
"age" : { "from" : 10, "to" : 20 }
}
},
"should" : [
{
"term" : { "tag" : "sometag" }
},
{
"term" : { "tag" : "sometagtag" }
}
]
}
}
}
}
}
}```
Internally I'd build a bool filter containing a phrase_match against the field against which we generate suggestions and the filter you passes in. This is less fiddly to code and allows some simple syntax shortcuts:
```bash
curl -XPOST 'localhost:9200/_search' -d {
"suggest" : {
"text" : "Xor the Got-Jewel",
"simple_phrase" : {
"phrase" : {
"field" : "body",
"size" : 5,
"shard_size": 10,
"confidence": 2.0,
"filter": "yes"
}
}
}
}```
which would add the phrase_match and
```bash
curl -XPOST 'localhost:9200/twitter/_search?pretty=true' -d '
{
"query" : {
"term" : { "message" : "something" }
},
"filter" : {
"term" : { "tag" : "green" }
}
"suggest" : {
"text" : "Xor the Got-Jewel",
"simple_phrase" : {
"phrase" : {
"field" : "body",
"size" : 5,
"shard_size": 10,
"confidence": 2.0,
"filter": "query"
}
}
}
}' which would go and get the filters from the top level query. I'm still concerned about how slow this might be. |
So I finally have something for this that kinda works. It doesn't fully work and I'd like some guidance. I can't post the code right now because I'm traveling. Such is life. Any way, I'd like giluidance on two things:
|
This implementation has a bunch of problems that'll need to be worked before it is a valid candidate for merging. I don't have time to rebase it right now but would still love the feedback on problem. The ones I remember: 1. It performs the filtering by blocking the suggesting thread. 2. Because there is no "exists" query type it uses a limit. I now know that isn't ass efficient as just using a count but it might be worth implementing an exists query type for it any way. 3. It feels like there are a lot of plumbing changes required for this feature. My guess is that is because I'm going about it wrong. This correlates with #1 pretty well. 4. I have to wrap the filter through the map nodes and parse it during the reduce step. That feels silly. Closes elastic#3482
moved over to |
Any news on this, or the similar #2842? |
I think now that we have templates we can implement this much simpler - I think we should revisit it soon, thanks for pinging @timbunce |
👍 |
This would be awesome to have. I ended up having to do it outside ES.
|
@areek something you could take a look at? |
@clintongormley already started looking into it! |
After looking into it, I have come up with the following API for the suggestion filter option: curl -XPOST 'localhost:9200/_search' -d {
"suggest": {
"text": "Xor the Got-Jewel",
"simple_phrase": {
"phrase": {
"field": "body",
"size": 5,
"shard_size": 10,
"confidence": 2,
"filter": {
"match": {
"body": "{{suggestion}}"
}
}
}
}
}
} The filter option above is just a query template with the magic variable "suggestion", which will be populated once phrase suggestions are made. |
…ching any documents for a given query The newly added filter option will let the user provide a template query which will be executed for every phrase suggestions generated to ensure that the suggestion matches at least one document for the query. The filter query is only executed on the local node for now. When the new filter option is used, the size of the suggestion is restricted to 20. Closes elastic#3482
@areek Just a note: |
@clintongormley Currently the filter param takes queries! The reason the name is filter, is to indicate that the query is used to filter out the suggestions after being generated |
Its confusing to take queries and call it filter, I think. Why not make it On Tue, Jul 8, 2014 at 10:26 AM, Areek Zillur notifications@github.com
|
Agree with @nik9000 about confusing. |
It does seem confusing, the updated API looks like the following: curl -XPOST 'localhost:9200/_search' -d {
"suggest": {
"text": "Xor the Got-Jewel",
"simple_phrase": {
"phrase": {
"field": "body",
"size": 5,
"shard_size": 10,
"confidence": 2,
"filter": {
"template": {
"body": "{{suggestion}}"
},
"preference": "_only_local"
}
}
}
}
} The change was mainly due to adding the |
@areek I assume that any query can be run there? it's not limited to just Either way, I'd be perfectly happy just renaming it to |
what about: "collate" : {
"filter|query" : { ... }
"preference": "_only_local"
} then folks can pick if we should parse it as a query or not. The template engine doesn't care really it's just string replacements.... we can document it that it is passed through mustache. |
@s1monw I like @clintongormley all your assumptions are correct, except that the |
@areek i'm thinking that users might want to use a |
@clintongormley I will end up doing that, thanks for suggesting. |
So the updated API looks like the following: curl -XPOST 'localhost:9200/_search' -d {
"suggest": {
"text": "Xor the Got-Jewel",
"simple_phrase": {
"phrase": {
"field": "body",
"size": 5,
"shard_size": 10,
"confidence": 2,
"collate": {
"query": {
"{{field_name}}": "{{suggestion}}"
},
"preference": "_primary",
"params": {"field_name": "title"}
}
}
}
}
}
|
The newly added collate option will let the user provide a template query/filter which will be executed for every phrase suggestions generated to ensure that the suggestion matches at least one document for the filter/query. The user can also add routing preference `preference` to route the collate query/filter and additional `params` to inject into the collate template. Closes #3482
When using phrase suggest API to provide "Did you mean ?" corrections it would be nice to include only suggestions that would return results.
So returned phrase must exist at least in one document in the index.
The text was updated successfully, but these errors were encountered: