Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
60593: colexec: make external sorter respect memory limit better r=yuzefovich a=yuzefovich **colexec: register memory used by dequeued batches from partitioned queue** Previously, we forgot to perform the memory accounting of the batches that are dequeued from the partitions in the external sort (which could be substantial when we're merging multiple partitions at once and the tuples are wide) and in the hash based partitioner. This is now fixed. Additionally, this commit retains references to some internal operators in the external sort in order to reuse the memory under the dequeued batches (this will be beneficial if we perform repeated merging). Also, this commit fixes an issue with repeated re-initializing of the disk-backed operators in the disk spiller if the latter has been reset (the problem would lead to redundant allocations and not reusing of the available memory). Slight complication with accounting was because of the fact that we were using the same allocator for all usages. This would be quite wrong because in the merge phase we have two distinct memory usage with different lifecycles - the memory under the dequeued batches is kept (and reused later) whereas the memory under the output batch of the ordered synchronizer is released. We now correctly handle these lifecycles by using separate allocators. Release note (bug fix): CockroachDB previously didn't account for some RAM used when disk-spilling operations (like sorts and hash joins) were using the temporary storage in the vectorized execution engine. This could result in OOM crashes, especially when the rows are large in size. **colexec: make external sorter respect memory limit better** This commit improves in how the external sorter manages its available RAM. There are two different main usages that overlap because we are keeping the references to both at all times: 1. during the spilling/sorting phase, we use a single in-memory sorter 2. during the merging phase, we use the ordered synchronizer that reads one batch from each of the partitions and also allocates an output batch. Previously, we would give the whole memory limit to the in-memory sorter in 1. which resulted in the external sorter using at least 2x of its memory limit. This is now fixed by giving only a half to the in-memory sorter. The handling of 2. was even worse - we didn't have any logic that would limit the number of active partitions based on the memory footprint. If the batches are large (say 1GB in size), during the merge phase we would be using on the order of 16GB of RAM (number 16 would be determined based on the number of file descriptors). Additionally, we would give the whole memory limit to the output batch too. This misbehavior is also now fixed by tracking the maximum size of a single batch in each active partition and computing the actual maximum number of partitions to have using those sizes. Fixes: #60017. Release note: None 60604: sql: remove QueryWithCols method from the internal executor r=yuzefovich a=yuzefovich Previous commit removed this method from the interface, and this commit follows up to remove the method entirely. This is done in a similar fashion - by changing to using `QueryRowExWithCols` (when at most one row is expected) and to using the iterator API (avoiding the buffering of rows in all cases). Addresses: #48595. Release note: None Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
- Loading branch information