Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The distributed engine decides when to push down certain operations by checking if the external labels are still present, i.e. we can push down a binary operation if its vector matching includes all external labels. This is great but if you have multiple external labels that are irrelevant for the partition this is problematic since query authors must be aware of those irrelevant labels and must incorporate them into their queries.
This PR attempts to solve that by giving an option to focus on the labels that are relevant for the partition.
Example: if you have
region
anddatacenter
external labels and everydatacenter
has a distinct label already then you could stripregion
for the decision if we can push something down. For exampleA * on (datacenter) B
would be able to be pushed down but natively the engine would require it to beA * on (region, datacenter) B
which is redundant somewhat.An alternative that doesnt need new code would be to register the unnecessary labels as replica labels but it feels wrong since they dont correspond to any replicas and "region" is a horrible replica label.
Changes
--query.partition-label
flagVerification