Search result changed since 1.24 (current 1.3.2) #7348

dominikmank · 2014-08-20T11:12:14Z

Heya,

in ES 1.2.4 i do something like this:

curl -XPUT 'http://localhost:9200/twitter/user/kimchy' -d '{ "name_test" : "Shay Banon" }'
curl -XPUT 'http://localhost:9200/twitter/user/foo' -d '{ "name_test" : "" }'
curl -XPOST 'http://localhost:9200/_search' -d '{"query":{"filtered":{"filter":{"and":{"filters":[{"missing":{"field": "name_test"}}]}}}}}'

Filter in "nice view"

{
  "query": {
    "filtered": {
      "filter": {
        "and": {
          "filters": [
            {
              "missing": {
                "field": "name_test"
              }
            }
          ]
        }
      }
    }
  }
}

This one gives 1 hit.

In Version 1.3.2 i do the same stuff, but got 0 hits.

Am I missing something? (I read the changelog, but didnt find something that could possible do this...)

That happens, as i may suggest, when the field name got an underscore.
Tested it with "name" then it will also give one hit.

Thanks for watchin.

Dominik

jpountz · 2014-08-20T13:02:20Z

@dominikmank The way that missing and exists is implemented changed in 1.3.0, see #5659

If I understand what is happening correctly, the difference happens on analyzed fields that generate no tokens. In that case the old implementation assumed that the field didn't exist while the new implementation assumes that it exists since a field value was provided. I'm wondering that the new approach may be more correct?

clintongormley · 2014-08-20T13:24:04Z

@jpountz it's breaking bwc. i think people probably rely on the old behaviour.

jpountz · 2014-08-20T13:33:27Z

I guess it could be considered a bug fix as well since the documentation mentions it is supposed to find fields that have no values while "", "_", or any other value whose analyzed form contains no tokens is still a valid value?

If we want to maintain bwc, I guess we can revert the change to the missing and exists filters in 1.x (but they'll be slow again) and document this break for 2.0.

dominikmank · 2014-08-20T13:48:03Z

@clintongormley yep, i'm relying on it - but if there's another way to do it, i would love to see the solution.

@jpountz So it's because the field is on default analyzed - yes?
Can you show me an alternative with your changes in my example?

We reverted back to 1.2.4 now, so it's not that urgent :-)

jpountz · 2014-08-20T13:54:05Z

So it's because the field is on default analyzed - yes?

Yes.

It should be possible to emulate the old behavior by doing a query like this one, just replace f with the field name that you want to check:

GET test/_search
{
  "query": {
    "filtered": {
      "filter": {
        "not": {
          "filter": {
            "range": {
              "f": {
              }
            }
          }
        }
      }
    }
  }
}

exists should be the same without the not filter.

dominikmank · 2014-08-20T13:59:51Z

@jpountz yeah, that solves it, thanks! ... but we take the 1.2.4 anyway for the time being :D So, this is the method to do it in the future? :'(

jpountz · 2014-08-20T14:27:37Z

@dominikmank I don't know yet. This way of computing documents with missing values is very costly and this was the reason for the refactoring in #5659. I don't think we can get back to the old behavior with the new impl whithout analyzing twice, which I'd like to avoid. Let's see what @clintongormley thinks about it.

jprante · 2014-09-01T11:50:42Z

I like the new behavior of #5659 because it is faster and I have high cardinality fields all over the place, and suggest to add the missing documentation of the breaking change in one of the subsequent version release notes. My 2¢

LesBarstow · 2014-09-09T23:48:38Z

While I like the idea of increased speed, this change does require (perhaps significant) extra client-side massaging when inserting variable data that might include empty strings.

The change is also not properly documented in the missing filter documentation or anywhere else that I've been able to find that references null value processing.

clintongormley · 2014-11-08T16:08:31Z

I have clarified the behaviour of the missing and exists filters in b9149f8

salimane · 2015-03-01T19:42:18Z

@jpountz the query you provided, on 1.4.4, works for type string but does not work for empty type objects like { "user": [] }.
Basically i would love a sample query that works for :

{ "user": null }
{ "user": [] } 
{ "user": [null] } 
{ "foo":  "bar" } 
{ "user":  "" } 
{ "user":  "a and to" }  # only stopwords that would be cleaned up with an analyzer like stop

All the above should match the query:

{
        "filter" : {
            "missing" : { "field" : "user" }
        }
}

Any ideas ?
Thanks

clintongormley assigned jpountz Aug 20, 2014

clintongormley added bug labels Aug 20, 2014

jpountz mentioned this issue Sep 1, 2014

Finding documents with empty string as value #7515

Closed

jpountz added the discuss label Sep 3, 2014

clintongormley added >docs General docs changes and removed >bug >regression discuss labels Oct 31, 2014

clintongormley assigned clintongormley and unassigned jpountz Oct 31, 2014

clintongormley closed this as completed Nov 8, 2014

jpountz mentioned this issue Nov 30, 2021

LUCENE-10263: Implement Weight.count() on NormsFieldExistsQuery apache/lucene#477

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Search result changed since 1.24 (current 1.3.2) #7348

Search result changed since 1.24 (current 1.3.2) #7348

dominikmank commented Aug 20, 2014

jpountz commented Aug 20, 2014

clintongormley commented Aug 20, 2014

jpountz commented Aug 20, 2014

dominikmank commented Aug 20, 2014

jpountz commented Aug 20, 2014

dominikmank commented Aug 20, 2014

jpountz commented Aug 20, 2014

jprante commented Sep 1, 2014

LesBarstow commented Sep 9, 2014

clintongormley commented Nov 8, 2014

salimane commented Mar 1, 2015

Search result changed since 1.24 (current 1.3.2) #7348

Search result changed since 1.24 (current 1.3.2) #7348

Comments

dominikmank commented Aug 20, 2014

jpountz commented Aug 20, 2014

clintongormley commented Aug 20, 2014

jpountz commented Aug 20, 2014

dominikmank commented Aug 20, 2014

jpountz commented Aug 20, 2014

dominikmank commented Aug 20, 2014

jpountz commented Aug 20, 2014

jprante commented Sep 1, 2014

LesBarstow commented Sep 9, 2014

clintongormley commented Nov 8, 2014

salimane commented Mar 1, 2015