Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Query planner generating wrong bounds for compound indexes #46

Closed
apkar opened this issue Jan 18, 2019 · 1 comment · Fixed by #78
Closed

Query planner generating wrong bounds for compound indexes #46

apkar opened this issue Jan 18, 2019 · 1 comment · Fixed by #78
Assignees
Labels
bug Something isn't working In progress Actively working on the issue
Milestone

Comments

@apkar
Copy link
Contributor

apkar commented Jan 18, 2019

$ python test/correctness/document-correctness.py --doclayer-host localhost --doclayer-port 27018 forever doclayer mm --seed 9203356367461099619 --num-doc 300 --num-iter 1 --no-update --no-sort --no-numeric-fieldnames
Instance: 0459504932136
========================================================
ID : 35746 iteration : 1
========================================================
Query results didn't match!
Query: {'$and': [{u'E': None}, {'$and': [{u'C': None}, {u'A': {'$lte': 'c'}}]}]}
Projection: OrderedDict([(u'C', True), (u'D', True)])

  pymongo.collection   (0)
  mongo_model          (1): {u'_id': datetime.datetime(1970, 1, 22, 10, 7, 43)}

  RESULT SET DIFFERENCES (as 'sets' so order within the returned results is not considered)
    Only in mongo_model : {'_id': 1970-01-22 10:07:43}

python /Users/bmuppana/src/fdb-document-layer/test/correctness/document-correctness.py --mongo-host localhost --mongo-port 27018 --doclayer-host localhost --doclayer-port 27018 forever doclayer mm --seed 9203356367461099619 --num-doc 300 --num-iter 1 --no-update --no-sort --no-numeric-fieldnames

Found this against d2840e9. Consistently reproducible with the above seed.

@apkar apkar added bug Something isn't working good first issue Good for newcomers labels Jan 18, 2019
@apkar apkar changed the title Correctness issue with nested $and predicate Correctness issue with nested $and predicate Jan 18, 2019
@apkar apkar changed the title Correctness issue with nested $and predicate Correctness issue with nested $and predicate Jan 18, 2019
@apkar apkar added this to the 1.7 milestone Jan 25, 2019
@apkar apkar removed the good first issue Good for newcomers label Feb 2, 2019
@apkar apkar self-assigned this Feb 4, 2019
@apkar apkar added the In progress Actively working on the issue label Feb 4, 2019
@apkar
Copy link
Contributor Author

apkar commented Feb 6, 2019

A simpler case that reproduces the bug

db.test.remove()
db.test.drop_indexes()
db.test.insert({'A': 'hello', 'B': 'world', 'C': 'HELLO'})
db.test.create_index([('A', pymongo.ASCENDING), ('B', pymongo.ASCENDING)])
db.test.create_index([('A', pymongo.ASCENDING), ('B', pymongo.ASCENDING), ('C', pymongo.ASCENDING)])

A simple query on this that supposed to match the document returns nothing

In [38]: for row in db.test.find({'A': 'hello', 'B': 'world', 'C': 'HELLO'}):
    ...:     print row
    ...:

In [39]: db.test.find({'A': 'hello', 'B': 'world', 'C': 'HELLO'}).explain()
Out[39]:
{
    "explanation":{
        "source_plan":{
            "projection":"{}",
            "source_plan":{
                "filter":"ANY(ExtPath((B\\x00) matching EQUALS(\"world\"))",
                "source_plan":{
                    "bounds":{
                        "begin":"(hello\\x00(HELLO\\x00",
                        "end":"(hello\\x00(HELLO\\x00"
                    },
                    "index name":"A_1_B_1_C_1",
                    "type":"index scan"
                },
                "type":"filter"
            },
            "type":"projection"
        },
        "type":"non-isolated"
    }
}

In [40]:

If we drop the indexes and try again

In [40]: db.test.drop_indexes()

In [41]: for row in db.test.find({'A': 'hello', 'B': 'world', 'C': 'HELLO'}):
    ...:     print row
    ...:
{u'A': u'hello', u'C': u'HELLO', u'B': u'world', u'_id': ObjectId('5c5b62e2945b2617f81e60a6')}

In [42]: db.test.find({'A': 'hello', 'B': 'world', 'C': 'HELLO'}).explain()
Out[42]:
{
    "explanation":{
        "source_plan":{
            "projection":"{}",
            "source_plan":{
                "filter":"AND(ANY(ExtPath((A\\x00) matching EQUALS(\"hello\")), ANY(ExtPath((C\\x00) matching EQUALS(\"HELLO\")), ANY(ExtPath((B\\x00) matching EQUALS(\"world\")))",
                "source_plan":{
                    "type":"table scan"
                },
                "type":"filter"
            },
            "type":"projection"
        },
        "type":"non-isolated"
    }
}

In [43]:

That gives the correct result.

The bug is in the compound index matching scheme - code handles term by term in a multi-term predicate and tries to extend index such that longest prefix is matched. We have a bug in prefix handling. In the above example after matching term on A, instead of taking prefix A we take prefix A:B causing the missing index key on B.

This part of the code is a bit of a mess. As we do store integer only field names as integers (instead of strings), we try to store the index prefixes as byte streams (DataKey) making all this code super hard to analyze. This code can be cleaned up and simplified a lot.

@apkar apkar changed the title Correctness issue with nested $and predicate Query planner generating wrong bounds for compound indexes Feb 7, 2019
@apkar apkar closed this as completed in #78 Feb 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working In progress Actively working on the issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant