Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search result changed since 1.24 (current 1.3.2) #7348

Closed
dominikmank opened this issue Aug 20, 2014 · 11 comments
Closed

Search result changed since 1.24 (current 1.3.2) #7348

dominikmank opened this issue Aug 20, 2014 · 11 comments
Assignees
Labels
>docs General docs changes

Comments

@dominikmank
Copy link

Heya,

in ES 1.2.4 i do something like this:

curl -XPUT 'http://localhost:9200/twitter/user/kimchy' -d '{ "name_test" : "Shay Banon" }'
curl -XPUT 'http://localhost:9200/twitter/user/foo' -d '{ "name_test" : "" }'
curl -XPOST 'http://localhost:9200/_search' -d '{"query":{"filtered":{"filter":{"and":{"filters":[{"missing":{"field": "name_test"}}]}}}}}'

Filter in "nice view"

{
  "query": {
    "filtered": {
      "filter": {
        "and": {
          "filters": [
            {
              "missing": {
                "field": "name_test"
              }
            }
          ]
        }
      }
    }
  }
}

This one gives 1 hit.

In Version 1.3.2 i do the same stuff, but got 0 hits.

Am I missing something? (I read the changelog, but didnt find something that could possible do this...)

That happens, as i may suggest, when the field name got an underscore.
Tested it with "name" then it will also give one hit.

Thanks for watchin.

Dominik

@jpountz
Copy link
Contributor

jpountz commented Aug 20, 2014

@dominikmank The way that missing and exists is implemented changed in 1.3.0, see #5659

If I understand what is happening correctly, the difference happens on analyzed fields that generate no tokens. In that case the old implementation assumed that the field didn't exist while the new implementation assumes that it exists since a field value was provided. I'm wondering that the new approach may be more correct?

@clintongormley
Copy link
Contributor

@jpountz it's breaking bwc. i think people probably rely on the old behaviour.

@jpountz
Copy link
Contributor

jpountz commented Aug 20, 2014

I guess it could be considered a bug fix as well since the documentation mentions it is supposed to find fields that have no values while "", "_", or any other value whose analyzed form contains no tokens is still a valid value?

If we want to maintain bwc, I guess we can revert the change to the missing and exists filters in 1.x (but they'll be slow again) and document this break for 2.0.

@dominikmank
Copy link
Author

@clintongormley yep, i'm relying on it - but if there's another way to do it, i would love to see the solution.

@jpountz So it's because the field is on default analyzed - yes?
Can you show me an alternative with your changes in my example?

We reverted back to 1.2.4 now, so it's not that urgent :-)

@jpountz
Copy link
Contributor

jpountz commented Aug 20, 2014

So it's because the field is on default analyzed - yes?

Yes.

It should be possible to emulate the old behavior by doing a query like this one, just replace f with the field name that you want to check:

GET test/_search
{
  "query": {
    "filtered": {
      "filter": {
        "not": {
          "filter": {
            "range": {
              "f": {
              }
            }
          }
        }
      }
    }
  }
}

exists should be the same without the not filter.

@dominikmank
Copy link
Author

@jpountz yeah, that solves it, thanks! ... but we take the 1.2.4 anyway for the time being :D So, this is the method to do it in the future? :'(

@jpountz
Copy link
Contributor

jpountz commented Aug 20, 2014

@dominikmank I don't know yet. This way of computing documents with missing values is very costly and this was the reason for the refactoring in #5659. I don't think we can get back to the old behavior with the new impl whithout analyzing twice, which I'd like to avoid. Let's see what @clintongormley thinks about it.

@jprante
Copy link
Contributor

jprante commented Sep 1, 2014

I like the new behavior of #5659 because it is faster and I have high cardinality fields all over the place, and suggest to add the missing documentation of the breaking change in one of the subsequent version release notes. My 2¢

@jpountz jpountz added the discuss label Sep 3, 2014
@LesBarstow
Copy link

While I like the idea of increased speed, this change does require (perhaps significant) extra client-side massaging when inserting variable data that might include empty strings.

The change is also not properly documented in the missing filter documentation or anywhere else that I've been able to find that references null value processing.

@clintongormley clintongormley added >docs General docs changes and removed >bug >regression discuss labels Oct 31, 2014
@clintongormley
Copy link
Contributor

I have clarified the behaviour of the missing and exists filters in b9149f8

@salimane
Copy link

salimane commented Mar 1, 2015

@jpountz the query you provided, on 1.4.4, works for type string but does not work for empty type objects like { "user": [] }.
Basically i would love a sample query that works for :

{ "user": null }
{ "user": [] } 
{ "user": [null] } 
{ "foo":  "bar" } 
{ "user":  "" } 
{ "user":  "a and to" }  # only stopwords that would be cleaned up with an analyzer like stop

All the above should match the query:

{
        "filter" : {
            "missing" : { "field" : "user" }
        }
}

Any ideas ?
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>docs General docs changes
Projects
None yet
Development

No branches or pull requests

6 participants