You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[ ORDER BY b ] <- [SELECT b, d FROM $q WHERE a IN (a1, a2, ... an) ] <- q: [ TypeFilter(T) ]
This is roughly equivalent to a SQL query:
SELECT b, d FROM T WHERE a IN (a1, a2, ..., an) ORDER BY b
Then suppose we have an index on (a, b, c, d). This can be used to execute this query. For example, this could be done by:
For each value in a's in-predicate, do a (covering) index scan of the (a, b, c, d) index limited to that value a
Then use an IN-union plan to merge those different index scans together
Project the b and d columns out from the original index scan
So, something like:
map (∪ [qun IN (a1, a2, ... an)] Covering(Index(abcd [EQUALS $qun])) ON [b, c, d, primary_key]) [_.b, _.d]
Importantly, the IN-union plan uses a comparison key function to properly merge the different child plans together in the right order. In this case, the order needs to be on something like [_.b, _.c, _.d, _.primary_key] (that is, the index ordering components after _.a) to ensure that each unique entry from the original scan is returned.
However, the plan we get is instead more like:
∪ [qun IN (a1, a2, ... an)] map(Covering(Index(abcd [EQUALS $qun])) [b, d]) ON [_.b]
That is, it:
For each value in the a's in-preidcate, does a covering index scan of the (a, b, c, d) index limited to that value a (so far so good)
Then uses a map plan to project b and d from each inner scan
And finally merges the different plans together using an IN-union plan with only b in the comparison key
This means that if two legs of the union have the same b values, then one can end up being marked as a dupe of the other, even if it has a different d value.
Note that if we project more columns out, this problem isn't hit. For example, if we include c, we get the original [_.b, _.c, _.d, _.primary_key] comparison key. Or if we include the comparison column a, we get the comparison key [_.b, _.a]. That latter comparison key function actually also works. This is because when we de-dupe using the comparison key, we only consider one child from each leg of the union, and we only remove it if there is another element that exactly matches the comparison key. With the comparison key [_.b, _.a], at no point are there ever children from different children of the union with the same comparison keys.
The text was updated successfully, but these errors were encountered:
…rom InUnion plans
This adds a test case for FoundationDB#2728. It showcases the way that the unprojected columns get dropped, and the final results of the query are missing data. There are comments in the test case about how a more correct solution would work.
alecgrieser
changed the title
Unprojected columns can be dropped from InUnion plans
Unprojected columns can be dropped from InUnion plan comparison keys
May 20, 2024
alecgrieser
added a commit
to alecgrieser/fdb-record-layer
that referenced
this issue
May 22, 2024
…rom InUnion plans
This adds a test case for FoundationDB#2728. It showcases the way that the unprojected columns get dropped, and the final results of the query are missing data. There are comments in the test case about how a more correct solution would work.
alecgrieser
added a commit
to alecgrieser/fdb-record-layer
that referenced
this issue
May 22, 2024
…rom InUnion plans
This adds a test case for FoundationDB#2728. It showcases the way that the unprojected columns get dropped, and the final results of the query are missing data. There are comments in the test case about how a more correct solution would work.
If we have a Cascades query that looks a bit
This is roughly equivalent to a SQL query:
Then suppose we have an index on (a, b, c, d). This can be used to execute this query. For example, this could be done by:
a
So, something like:
Importantly, the IN-union plan uses a comparison key function to properly merge the different child plans together in the right order. In this case, the order needs to be on something like
[_.b, _.c, _.d, _.primary_key]
(that is, the index ordering components after_.a
) to ensure that each unique entry from the original scan is returned.However, the plan we get is instead more like:
That is, it:
a
(so far so good)b
andd
from each inner scanb
in the comparison keyThis means that if two legs of the union have the same
b
values, then one can end up being marked as a dupe of the other, even if it has a differentd
value.Note that if we project more columns out, this problem isn't hit. For example, if we include
c
, we get the original[_.b, _.c, _.d, _.primary_key]
comparison key. Or if we include the comparison columna
, we get the comparison key[_.b, _.a]
. That latter comparison key function actually also works. This is because when we de-dupe using the comparison key, we only consider one child from each leg of the union, and we only remove it if there is another element that exactly matches the comparison key. With the comparison key[_.b, _.a]
, at no point are there ever children from different children of the union with the same comparison keys.The text was updated successfully, but these errors were encountered: