Unprojected columns can be dropped from InUnion plan comparison keys #2728

alecgrieser · 2024-05-17T13:46:55Z

If we have a Cascades query that looks a bit

[ ORDER BY b ] <- [SELECT b, d FROM $q WHERE a IN (a1, a2, ... an)  ] <- q: [ TypeFilter(T) ]

This is roughly equivalent to a SQL query:

SELECT b, d FROM T WHERE a IN (a1, a2, ..., an) ORDER BY b

Then suppose we have an index on (a, b, c, d). This can be used to execute this query. For example, this could be done by:

For each value in a's in-predicate, do a (covering) index scan of the (a, b, c, d) index limited to that value a
Then use an IN-union plan to merge those different index scans together
Project the b and d columns out from the original index scan

So, something like:

map (∪ [qun IN (a1, a2, ... an)] Covering(Index(abcd [EQUALS $qun])) ON [b, c, d, primary_key]) [_.b, _.d]

Importantly, the IN-union plan uses a comparison key function to properly merge the different child plans together in the right order. In this case, the order needs to be on something like [_.b, _.c, _.d, _.primary_key] (that is, the index ordering components after _.a) to ensure that each unique entry from the original scan is returned.

However, the plan we get is instead more like:

∪ [qun IN (a1, a2, ... an)] map(Covering(Index(abcd [EQUALS $qun])) [b, d]) ON [_.b]

That is, it:

For each value in the a's in-preidcate, does a covering index scan of the (a, b, c, d) index limited to that value a (so far so good)
Then uses a map plan to project b and d from each inner scan
And finally merges the different plans together using an IN-union plan with only b in the comparison key

This means that if two legs of the union have the same b values, then one can end up being marked as a dupe of the other, even if it has a different d value.

Note that if we project more columns out, this problem isn't hit. For example, if we include c, we get the original [_.b, _.c, _.d, _.primary_key] comparison key. Or if we include the comparison column a, we get the comparison key [_.b, _.a]. That latter comparison key function actually also works. This is because when we de-dupe using the comparison key, we only consider one child from each leg of the union, and we only remove it if there is another element that exactly matches the comparison key. With the comparison key [_.b, _.a], at no point are there ever children from different children of the union with the same comparison keys.

The text was updated successfully, but these errors were encountered:

…rom InUnion plans This adds a test case for FoundationDB#2728. It showcases the way that the unprojected columns get dropped, and the final results of the query are missing data. There are comments in the test case about how a more correct solution would work.

alecgrieser added bug Something isn't working planner Related to the query planner labels May 17, 2024

alecgrieser mentioned this issue May 17, 2024

Test case for #2728: Unprojected columns can be dropped from InUnion plans #2729

Open

alecgrieser changed the title ~~Unprojected columns can be dropped from InUnion plans~~ Unprojected columns can be dropped from InUnion plan comparison keys May 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unprojected columns can be dropped from InUnion plan comparison keys #2728

Unprojected columns can be dropped from InUnion plan comparison keys #2728

alecgrieser commented May 17, 2024

Unprojected columns can be dropped from InUnion plan comparison keys #2728

Unprojected columns can be dropped from InUnion plan comparison keys #2728

Comments

alecgrieser commented May 17, 2024