Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
56399: colexec: add support for disk-backed distinct r=yuzefovich a=yuzefovich **colmem: fix appending column to the windowed batches** We use `MaybeAppendColumn` to enforce that a vector of the desired type is at the correct position. However, with the introduction of the concept of capacity to the batches we now also need to make sure that the vector has the desired capacity. In vast majority of cases we are reallocating the whole batch because of the dynamic size, so it shouldn't be an issue, still this commit fixes that theoretical problem. Additionally, this commit fixes a problem of appending a vector to the windowed batches - such batches are instantiated with 0 capacity, and previously when we were appending a vector to it, we would always allocate the vector of 0 capacity. Release note: None **colexec: extract hash-based partitioner base from external hash joiner** This commit extracts the logic of the Grace hash join algorithm into a hash-based partitioner that can be reused by the unordered distinct and the hash aggregator to add support for disk-spilling. The idea is that at planning time we can provide different in-memory "main" and disk-backed "fallback" strategies. If a partition fits under the memory limit, then the former is used; however, if recursive partitioning isn't successful in reducing the size, that partition is assigned to be handled by the latter. In case of the external hash joiner, the in-memory hash joiner is the "main" operator whereas the external sort and the merge joiner form the "fallback" operator. Release note: None **colexec: fix test harness in some cases when comparing sets** This commit fixes our test harness when comparing expected and actual tuples as sets. We use a sort to speed things up, and previously in some cases we could have different ordering for essentially the same sets of tuples because in the actual ones we have tree.Datums and in the expected ones we have strings. Release note: None **colexec: implement disk-spilling for unordered distinct** This commit introduces an external distinct operator that reuses the hash-based partitioner that works according to "partitioning by hash" scheme when dividing the input into separate partitions. The "main" strategy for the new operator is already existing in-memory unordered distinct whereas the "fallback" strategy is the external sort followed by the ordered distinct. The benchmarks have shown that using such approach is significantly faster in most cases when comparing against the external sort + ordered distinct approach (i.e. against making the fallback strategy the main one). One important detail of the fallback strategy is that we need to make sure that we keep the very first tuple from the input among all tuples that are identical on the distinct columns. If we naively use a sort plus ordered distinct, we might break that, so in order to go around it, we plan an ordinality operator and include the ordinality column as the last one in the ordering, we then project out that temporary column when feeding into the ordered distinct. Another important detail is that distinct is expected to maintain the ordering of the output stream (to be the same as the ordering of the input stream) when the output ordering is specified. Previously, this was achieved for "free" in both the vectorized and the row-by-row engines; however, with the partitioning by hash approach we don't have that anymore. Therefore, a new field was added to DistinctSpec that specified the desired output ordering, and the field is now used to optionally plan an external sort on top of the external distinct. The benchmarks have shown that the performance overhead of having such sort is relatively small. Release note: None Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
- Loading branch information