Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support vectorized append and compare for multi group by #12996

Merged
merged 63 commits into from
Nov 6, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
be6a67d
simple support vectorized append.
Rachelint Oct 16, 2024
2cdf05d
fix tests.
Rachelint Oct 16, 2024
04ea2d2
some logs.
Rachelint Oct 17, 2024
a83c2ea
add `append_n` in `MaybeNullBufferBuilder`.
Rachelint Oct 19, 2024
3df75ac
impl basic append_batch
Rachelint Oct 19, 2024
13c9489
fix equal to.
Rachelint Oct 19, 2024
5fd63e8
define `GroupIndexContext`.
Rachelint Oct 21, 2024
d4b5820
define the structs useful in vectorizing.
Rachelint Oct 21, 2024
04f35bb
re-define some structs for vectorized operations.
Rachelint Oct 22, 2024
d215937
impl some vectorized logics.
Rachelint Oct 22, 2024
2af6ff5
impl chekcing hashmap stage.
Rachelint Oct 22, 2024
473914a
fix compile.
Rachelint Oct 22, 2024
14f8881
tmp
Rachelint Oct 22, 2024
ebbeb5a
define and impl `vectorized_compare`.
Rachelint Oct 22, 2024
dad79c0
fix compile.
Rachelint Oct 22, 2024
1a7c2eb
impl `vectorized_equal_to`.
Rachelint Oct 22, 2024
d79b813
impl `vectorized_append`.
Rachelint Oct 23, 2024
6edc646
finish the basic vectorized ops logic.
Rachelint Oct 25, 2024
150248f
impl `take_n`.
Rachelint Oct 26, 2024
37d68e6
fix `renaming clear` and `groups fill`.
Rachelint Oct 27, 2024
ebd9db9
fix death loop due to rehashing.
Rachelint Oct 27, 2024
71c45ce
fix vectorized append.
Rachelint Oct 27, 2024
2f272f2
add counter.
Rachelint Oct 27, 2024
731723c
use extend rather than resize.
Rachelint Oct 27, 2024
a77f516
remove dbg!.
Rachelint Oct 27, 2024
1830c1a
remove reserve.
Rachelint Oct 27, 2024
b6f2d00
refactor the codes to make simpler and more performant.
Rachelint Oct 27, 2024
6375d93
clear `scalarized_indices` in `intern` to avoid some corner case.
Rachelint Oct 27, 2024
7979f74
fix `scalarized_equal_to`.
Rachelint Oct 27, 2024
86dcb11
fallback to total scalarized `GroupValuesColumn` in streaming aggrega…
Rachelint Oct 28, 2024
197656b
add unit test for `VectorizedGroupValuesColumn`.
Rachelint Oct 29, 2024
cc96beb
add unit test for emitting first n in `VectorizedGroupValuesColumn`.
Rachelint Oct 30, 2024
2c1ec19
sort out tests codes in for group columns and add vectorized tests fo…
Rachelint Oct 30, 2024
fa6343c
add vectorized test for byte builder.
Rachelint Oct 30, 2024
41ac655
add vectorized test for byte view builder.
Rachelint Oct 30, 2024
4f8924e
add test for the all nulls or not nulls branches in vectorized.
Rachelint Oct 30, 2024
c9b147a
Merge branch 'main' into vectorize-append-value
Rachelint Oct 30, 2024
236b0bc
fix clippy.
Rachelint Oct 30, 2024
15aaab1
fix fmt.
Rachelint Oct 30, 2024
a0aa7b7
fix compile in rust 1.79.
Rachelint Oct 30, 2024
c2088f7
improve comments.
Rachelint Oct 30, 2024
7acfef0
fix doc.
Rachelint Oct 30, 2024
7875d50
add more comments to explain the really complex vectorized intern pro…
Rachelint Oct 30, 2024
41f5f04
add comments to explain why we still need origin `GroupValuesColumn`.
Rachelint Oct 30, 2024
7efce58
remove some stale comments.
Rachelint Oct 30, 2024
5cbe3fa
fix clippy.
Rachelint Oct 30, 2024
8b23ff3
add comments for `vectorized_equal_to` and `vectorized_append`.
Rachelint Oct 30, 2024
df81f8f
fix clippy.
Rachelint Oct 30, 2024
81f99a8
use zip to simplify codes.
Rachelint Oct 30, 2024
b7a2443
use izip to simplify codes.
Rachelint Oct 31, 2024
4b45708
Update datafusion/physical-plan/src/aggregates/group_values/group_col…
Rachelint Oct 31, 2024
d1b879a
first_n attempt
jayzhan211 Oct 31, 2024
14841db
add test
jayzhan211 Nov 1, 2024
fd9a71a
Merge pull request #2 from jayzhan211/first-n
Rachelint Nov 1, 2024
8cd581d
improve hashtable modifying in emit first n test.
Rachelint Nov 1, 2024
75aa1dc
add `emit_group_index_list_buffer` to avoid allocating new `Vec` to s…
Rachelint Nov 1, 2024
406acb4
make comments in VectorizedGroupValuesColumn::intern simpler and clea…
Rachelint Nov 1, 2024
7a1ed90
define `VectorizedOperationBuffers` to hold buffers used in vectorize…
Rachelint Nov 2, 2024
e8c0aaa
Merge branch 'main' into vectorize-append-value
Rachelint Nov 2, 2024
2d982a1
unify `VectorizedGroupValuesColumn` and `GroupValuesColumn`.
Rachelint Nov 4, 2024
e4bd579
fix fmt.
Rachelint Nov 4, 2024
14fffb8
fix comments.
Rachelint Nov 4, 2024
d479cc2
fix clippy.
Rachelint Nov 4, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion datafusion/common/src/utils/memory.rs
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ pub fn estimate_memory_size<T>(num_elements: usize, fixed_size: usize) -> Result

#[cfg(test)]
mod tests {
use std::collections::HashSet;
use std::{collections::HashSet, mem::size_of};

use super::estimate_memory_size;

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
//! user defined aggregate functions

use std::hash::{DefaultHasher, Hash, Hasher};
use std::mem::{size_of, size_of_val};
use std::sync::{
atomic::{AtomicBool, Ordering},
Arc,
Expand Down
Loading